CS 584 - Stream Database Systems Homework 2: Counting Sample

• Assignment

• Implement the the counting sample algorithm discussed in this webpage: click here

• Student program

• The program must accept the following parameters:

• CountingSample Seed N a< inputFile

• Meaning of the parameters:

 Seed = is the seed for the random number generator used in YOUR program (remember that your program will need to generate random number, well, use this seed) - if you use C, you would call: srandom(Seed) N = Max size of the counting sample a = increase in probability (T' = a × T)

• Outputs of the program:

• The final value of the selection probability (the value (1/T) in the algorithm)

• The content of the sample

The output format is a sequence of lines like this:

 value         count

• So the output file looks like this: The format of the data file is:

 ``` 1/T value1 count1 value2 count2 value3 count3 value4 count4 ... ```

• Note of the class notes

• In the webpage classnotes on Counting Sample algorithm, I used:

 ``` // --------------------------------------------------- // Reduce counting sample when size exceeds threshold // --------------------------------------------------- if ( size(S) > threshold ) { ... } ```

This will not guarantee to reduce the size. I used the construct to explain the concept.

The correct code is:

 ``` // --------------------------------------------------- // Reduce counting sample when size exceeds threshold // --------------------------------------------------- while ( size(S) > threshold ) { ... } ```

• Help files

• data1: test input data 1 (50000 values from uniform 1..10000) - click here
• data2: test input data 2 (50000 values from sift distribution 80% [1..1000], 20% [1001..10000]) - click here

The format of the data file is:

 ``` datapoint1 datapoint2 datapoint3 datapoint4 ... ```

• Compile with: cc -o avg avg.c
• Usage:     avg < datafile

• This program computes the average of the INPUT data

• Compile with: cc -o avg2 avg2.c
• Usage:     avg2 < outputFile

• This program computes the adjusted average of the Counting Sample output file

• Turn in

• Turn in a Makefile using the command:
```    /home/cs584001/turnin Makefile hw2
```
(If you use Java, you still need a Makefile. Let me know if you don't know how to create a Makefile)

• Turn in each program file using this command:
```    /home/cs584001/turnin Filename hw2-?
```
where "?" is a number from 0 to 9.