You will propose your own project and work as a team of up to 3 for the final project. You are encouraged to discuss with the instructor on your project ideas, plans and progress throughout the project period.

Project Options

  • Data mining challenges (Below are some open and past challenges, you are also welcome to find your own)
    Kaggle competitions
    Drivendata competitions
    KDD Cup
  • Comparative study and/or implementation/evaluation/extension of current mining methods.
  • Propose your own data mining problems and applications



  • Project proposal (10 points)
  • In class project presentation (10 points)
  • Project report (20 points)
  • Project deliverables (60 points)


Grading Guidelines

  • Significance and relevance: is the problem significant and well motivated? Is it relevant to the class?
  • Technical quality and depth: is the technical/experimental approach sound and solid?  Is there sufficient work involved, e.g. data collection, data preprocessing, algorithm implementation, parameter tuning, feature engineering, experimental comparison?  Please note not all of them are required or applicable. 
  • Novelty and difficulty: is the problem/work novel and does it require some research?
  • Results: are there substantial results and interesting findings?  
  • Presentation/report quality: is the presentation/report clear and coherent?

Important Dates

  • Project proposal due: 3/23, 11:59pm
  • In class project presentation: 4/23, 4/25, or 4/30
  • Project report and deliverables due: 5/7, 11:59pm


1. Project proposal should be roughtly 2 pages (in PDF) and contain the following content (when applicable):

  • Motivation and objective: motivate the problem that you are investigating and summarize your goals.
  • Related work/methods: review existing methods which may be applicable for your project.
  • Proposed work: outline what you will do in the project.
  • Evaluation: describe the datasets you will be using and your evaluation metrics and plans.
  • Plan of action: outline a weekly schedule of how progress will be made on the project and how the workload will be distributed among the team members if you work as a team

2. Project presentation should highlight the problem you are solving, your approach and methodolgy, results in progress, and a demo if available. Depending on the number of projects we will have in class, we will assign a presentation slot of 5-15 mintues for each project.

3. Project report should be roughtly 5 pages (in PDF) and contain the following content (when applicable):

  • Motivation and contributions: motivate the problem that you are investigating and summarize your goals and contributions
  • Related work/methods: review existing work/methods that may be applicable for your project
  • Approach and methodology: describe your approach/solution including any algorithmic developments and/or prototype implementation and/or experimental methodology
  • Evaluation and results: describe your datasets, evaluation metrics, and experiment parameters, present and discuss your results.
  • Conclusion and future work: a) discuss what you have learned through the project and what concepts and techniques you learned in class are used in the project; and b) discuss potential extensions and future work

4. Project deliverables should be submitted as a tar or zip file and should contain your source code, the executable, a readme file explaining how to compile/run your program, and any dataset (or link to the dataset) you have used in your project, if applicable.