Assignment 3: Output-Sensitive Skyline Computation Algorithm (Programming)
Your task for this assignment is to implement and evaluate the
output-sensitive skyline computation algorithm [1]. To calculate the skyline
points of a two-dimensional dataset, the naïve way is to compare all points
pair wisely, which has time complexity O(n^2).
A ˇ°smartˇ± way with time complexity O(nlogn) (the scanning algorithm for
2-dimensional case) is to sort points using one dimension in ascending order
first, and then start from the second point, compare it with its previous
point, remove it if it has bigger value in another dimension, and finally
all remaining points are skyline points. The output-sensitive skyline
computation algorithm [1] has time complexity O(nlogk) where k is the number
of skyline points which is far less than n in general.
1. Implement
both the ˇ°smartˇ± algorithm with time complexity O(nlogn) described above and
the faster output-sensitive skyline computation algorithm [1]. Compare the
execution time of two algorithms on test datasets.
2. Test
your implementations on three test datasets with different distribution
patterns: correlated dataset (CORR.dat, CORR.dat.gz),
independent dataset (INDE.dat,
INDE.dat.gz),
and anti-correlated dataset (ANTI.dat,
ANTI.dat.gz).
Measure execution time of those algorithms and compare them on all three
test datasets. Each test dataset is a synthetic dataset that contains 2^20
two dimensional points. Points in the correlated dataset are positively
correlated which results in fewer skyline points. Points in the
anti-correlated dataset are negatively correlated which results in more
skyline points, and points in the independent dataset are distributed
independently. You can also try your programs with various other
two-dimensional datasets.
3. Write
a brief report in PDF presenting your results on the test datasets and any
other datasets you tried. Explain and discuss, if any, the data structure
or algorithmic optimizations you have used in your implementation or if you
are proposing/implementing a new algorithm. Discuss the experiences
and lessons you have learned from the implementation.
4. You
can work as a team of up to two. If you work on your own, you get 5 bonus
points. If you work as a team of two, please explain the contribution of
each team member in your report.
5. Your
submission should be a zip or tar file that contains the PDF report as well
as the program deliverables including your source files, the executable, a
readme file explaining how to compile/run your program, the output file for
each test dataset, and the PDF report.
Competition
We will run a competition on the output-sensitive skyline computation
algorithm that you implemented using a few test datasets and select top
winner that offers the best performance with correct and complete results.
For fairness, no multithreading or parallelization should be used for the
competition.
Reference:
[1]
Liu, Jinfei, Li Xiong, and Xiaofeng Xu. "Faster output-sensitive skyline
computation algorithm." Information Processing Letters114.12 (2014):
710-713.