Assignment 2: Decision Tree and Naive Bayesian Classifier


Out: 2/19/2018
Due: 3/5/2018, 11:59pm


Your task for this assignment is to implement and evaluate the C4.5 decision tree classifier and Nave Bayesian classifier.


1.     Implement the C4.5 classifier and Nave Bayesian classifier.

       You can use any programming language that you are familiar with.

       The program should be executable with 3 command-line parameters: the name of the training dataset file, the name of the test dataset file, and the name of the output file.

       The program should output a file that contains the class labels for all the records in the test dataset and the classification accuracy computed as the percentage of correctly classified records in the test dataset.

       You are only required to handle categorical attributes (numerical attributes are not required).

2.     Evaluate your implementations using the provided training set and test set (, mushroom.test). The provided dataset were created using the original mushroom dataset from UCI repository, with one attribute with missing values removed. The training dataset contains 7423 records and the test dataset 701 records. The first attribute is the class of each record and the rest 21 attributes are categorical attributes. You can also test your programs with other datasets using cross-validations.

3.     Write a brief report in PDF presenting your results on the provided dataset and other datasets if you have tried. Discuss the experiences and lessons you have learned from the implementation and experimentation.

4.     You can work as a team of up to two. If you work on your own, you get 5 bonus points. If you work as a team of two, please explain the contribution of each team member in your report. The grading of the implementation will be based on correctness and not on the performance.

5.     Submission. You (or your partner) will upload two items to Canvas: your PDF report and a zip or tar file.

This zip/tar file must contain:

your source files (include your name(s) in commented form at the top of all source files),

the executable,

a README file explaining how to compile/run your program,

the output file for the test dataset.

Name your programs (, etc.) and (, etc.) and name your folder PersonOnePersonTwoHW2.