CS 584 - Stream Database Systems
Homework 6

Due: See class webpage

## 1. Student's programs

• Implements Arasu's ε-approximation algorithm to find the count of the number of occurrences of items in a fixed window of items in a stream.

• Programming required in this assignment:

• Command line to invoke program:

 ``` Arasu InputFile W ε ChkPt or: java Arasu InputFile W ε ChkPt ```

• Make sure the capitalization of the command "Arasu" is exactly as given above.

• Meaning of the input arguments:

• InputFile = input data file.

The data file must conform to the following format:

```	N    (number of data points)
v1   (data point 1)
v2   (data point 2)
v3   (data point 3)
......
vN   (data point N)
```
• W = window size (W is fixed - constant)

• ε = the precision parameter in Arasu's algorithm

• ChkPt = checkpoint - when the checkpoint is reach, your algorithm stops and prints out the output specified in the output section

• Output that you need to generate:

• The algorithm will run silently until the given checkpoint is reach:
```   Input stream:

.....  a   b   c   d   e   f   g   h   i   j   k   l  ....
^
|<----------------------------->|
|	Window W	       |
|
Checkpoint
```
• From the checkpoint (including the element at the checkpoint), the previous W elements is the current window

• The output of the algorithm are:

1. The items in the current window (for verification purpose).

Output format:

```       c   d   e   f   g   h   i   j   k
```

2. Print the value of W, epsilon, W', ε' and L used in your algorithm (remember that W' and ε' are derived from W and ε: W and ε are not restrained, but W' and ε' are powers of 2)

Format:

```    W = ...   eps = ...   W' = ...    eps' = ...     L = ...
```

3. Print the state information of every ACTIVE block in each level that was in computing the sketch for the current window.

(There are many more ACTIVE block in some level (especially level 0), do NOT print these.)

At each level, there are at most 2 ACTIVE blocks used to compute the sketch for the current window.. So if your printout shows 3 or more ACTIVE blocks in some level, you have a bug...

Format:

```  Level L:   [(e1,f1,Δ1), (e2,f2,Δ2), ...]
Level L-1: [(e1,f1,Δ1), (e2,f2,Δ2), ...]
[(e1,f1,Δ1), (e2,f2,Δ2), ...]
```

4. The sketch for the current window constructed using the ACTIVE blocks printed in step 3 (see lecture note on how the summary is computed).

Format:

```  Summary: [(e1,f1), (e2,f2), ...]
```

5. Finally, for each item in the current window, prints the actual number of occurences and its approximate count from the sketch.

6. The output format is:
``` 	(e1, f1, real1)
(e2, f2, real2)
(e3, f3, real3)
...
```

## 2. Help Material

I will put some data files here for you to test - if I can find the time :-). Preparing to teach this class takes up more time than I had expected...

## 3. Turn in

• Turn in a Makefile using the command:
```    /home/cs584000/turnin Makefile hw6
```
(If you use Java, you still need a Makefile. Let me know if you don't know how to create a Makefile)

• Turn in each header and program file using this command:
```    /home/cs584000/turnin Filename hw6-?
```
where "?" is a number from 0 to 9.