Assignment 1. Differential private histogram (programming)

Out: 10/04/2016
Due: 10/18/2016, 11:59pm

Overview

The main goal of this task is to understand the concept of differential privacy by implementing and evaluating a differentially private histogram. Differentially private histogram can be implemented using the standard Laplace mechanism to generate noisy histogram bin accounts.  Synthetic records can be then generated from the differentially private histogram (you can round all negative bin counts to zero) to answer random range queries.  The query accuracy is primarily measured by the relative error between the true answer from the original data and the answer from the differentially private data. (You can find more details in Differentially Private Histogram and Synthetic Data Publication)

You task for this assignment is to implement a differential private histogram using Laplace mechanism and evaluate it using random range queries.

Input and output

Your program should take the following parameters:

Requirement

Your program needs to do the following:

Test dataset

You can test your implementation using the provided Adult dataset of three attributes (Age, gender, race) which is extracted from the original Adult dataset from the 1994 Census database at the UCI data repository.

Other requirements

You can use any programming language that you are familiar with. Document your code using comments.

Deliverables

Competition

We will run a competition using a few test datasets and award prizes to two winners that offer the best accuracy. :)