CS 584 - Stream Database Systems
Homework 1: Sketching

Due: See class webpage

1. Assignment

• Implement the two versions of the probabilistic sketching algorithms discussed in this webpage: click here

2. Student programs

• Program 1: FixStat

• The program must accept the following parameters:

• FixStat Seed m k Lambda Epsilon < InputFile

• Meaning of the parameters:

 InputFile = input file (generated using the help programs) Seed = is the seed for the random number generator used in YOUR program (remember that your program will need to generate random number, well, use this seed) - if you use C, you would call: srandom(Seed) m = # values in the InputFile k = the exponent in the value computed in Fk Lambda = see lecture notes Epsilon = see lecture notes

• NOTE:

The FixStat program will need to know the distribution of the input data (the value N in the class note) to compute the parameter s1

This parameter is the first value in the input file

• Outputs of the program:

 s1 used (see class notes) s2 used (see class notes) The compute estimate Y using the first sketching algorithm (i.e., when m is known)

• Program 2: StreamStat

• The program must accept the following parameters:

• StreamStat Seed k Lambda Epsilon < InputFile

• Meaning of the parameters:

 InputFile = input file (generated using the help programs) Seed = is the seed of the random number generator used in YOUR program (remeber that your program will need to generate random number, well, use this seed) k = the exponent in the value computed in Fk Lambda = see lecture notes Epsilon = see lecture notes

• NOTE: notice that the parameter m is missing in StreamStat !!!

• NOTE:

The StreamStat program will also need to know the distribution of the input data (the value n in the class note) to compute the parameter s1

This parameter is the first value in the input file

• Outputs of the program:

 s1 used (see class notes) s2 used (see class notes) The compute estimate Y using the first sketching algorithm (i.e., when m is known)

• Note: Try different values of k !!!

3. Help Files

NOTE: the first value in the input data file is not a stream data point, but the parameter n

So the format of the data file is:

 ``` n (needed to compute s1) data1 data2 data3 data4 ... ```

• Compile with: cc -o FindFk FindFk.c
• Usage:     FindFk k < dataFile
• This program computes the exact answer Fk

4. Turn in

• Turn in a Makefile using the command:
```    /home/cs584001/turnin Makefile hw1
```
(If you use Java, you still need a Makefile. Let me know if you don't know how to create a Makefile)

• Turn in each program file using this command:
```    /home/cs584001/turnin Filename hw1-?
```
where "?" is a number from 0 to 9.

Example: if your program files are named FixStat.c and StreamStat.c, then you turn the 2 program files in as:

 /home/cs584001/turnin     FixStat.c     hw1-1 /home/cs584001/turnin     StreamStat.c     hw1-2