Assignment 3: Output-Sensitive Skyline Computation Algorithm (Programming)

Your task for this assignment is to implement and evaluate the output-sensitive skyline computation algorithm [1]. To calculate the skyline points of a two-dimensional dataset, the naïve way is to compare all points pair wisely, which has time complexity O(n^2). A ˇ°smartˇ± way with time complexity O(nlogn) (the scanning algorithm for 2-dimensional case) is to sort points using one dimension in ascending order first, and then start from the second point, compare it with its previous point, remove it if it has bigger value in another dimension, and finally all remaining points are skyline points. The output-sensitive skyline computation algorithm [1] has time complexity O(nlogk) where k is the number of skyline points which is far less than n in general.

1.      Implement both the ˇ°smartˇ± algorithm with time complexity O(nlogn) described above and the faster output-sensitive skyline computation algorithm [1]. Compare the execution time of two algorithms on test datasets.

• You can use any programming language that you are familiar with.
• The program should be executable with 2 parameters: the name of the input dataset file and the name of the output file.
• The program should output a file that contains all the skyline points. The output file should have the following format: The first line contains the total number of skyline points that your program returns, and each line in the following contains a single skyline point. Duplicated skyline points should appear only once in your output file and should be counted only once in the total number of skyline points. The last line contains the execution time of your program.

2.      Test your implementations on three test datasets with different distribution patterns: correlated dataset (CORR.datCORR.dat.gz), independent dataset (INDE.dat, INDE.dat.gz), and anti-correlated dataset (ANTI.dat, ANTI.dat.gz). Measure execution time of those algorithms and compare them on all three test datasets. Each test dataset is a synthetic dataset that contains 2^20 two dimensional points. Points in the correlated dataset are positively correlated which results in fewer skyline points. Points in the anti-correlated dataset are negatively correlated which results in more skyline points, and points in the independent dataset are distributed independently. You can also try your programs with various other two-dimensional datasets.

3.      Write a brief report in PDF presenting your results on the test datasets and any other datasets you tried.  Explain and discuss, if any, the data structure or algorithmic optimizations you have used in your implementation or if you are proposing/implementing a new algorithm.  Discuss the experiences and lessons you have learned from the implementation.

4.      You can work as a team of up to two.  If you work on your own, you get 5 bonus points.  If you work as a team of two, please explain the contribution of each team member in your report.

5.      Your submission should be a zip or tar file that contains the PDF report as well as the program deliverables including your source files, the executable, a readme file explaining how to compile/run your program, the output file for each test dataset, and the PDF report.

Competition

We will run a competition on the output-sensitive skyline computation algorithm that you implemented using a few test datasets and select top winner that offers the best performance with correct and complete results. For fairness, no multithreading or parallelization should be used for the competition.

Reference:

[1] Liu, Jinfei, Li Xiong, and Xiaofeng Xu. "Faster output-sensitive skyline computation algorithm." Information Processing Letters114.12 (2014): 710-713.