Description
 In assignment 4 you shall implement the Naïve Bayes machine learning algorithm and use it on some datasets
 It can be implemented in any programming language you like
 You can work alone or in group of two students
 You shall present your application and code at an oral examination
 You are not required to build a REST web service for this assignment
Submission instructions
See the Deadlines and Submissions page.
Requirements
Grade  Requirements 

E 

CD 

AB 

Note! The purpose of this assignment is that you shall learn how to implement Naïve Bayes, encoding label strings to integers, calculating accuracy scores, performing crossvalidation and generating confusion matrix. These functionalities are often available in machine learning libraries such as Weka or Scikitlearn, which you are not allowed to use. You are allowed to use library functions for loading and shuffling the data and all necessary mathematical operations, data structures etc.
Code structure requirements
NaiveBayes class  

void fit ( X:float[][], y:int[] )  Trains the model on input examples X and labels y 
int[] predict ( X:float[][] )  Classifies examples X and returns a list of predictions 
Other methods  

float accuracy_score ( preds:int[], y:int[] )  Calculates accuracy score for a list of predictions 
int[][] confusion_matrix ( preds:int[], y:int[] )  Generates a confusion matrix and returns it as an integer matrix 
int[] crossval_predict ( X:float[][], y:int[], folds:int )  Runs nfold crossvalidation and returns a list of predictions 
Input data (a float matrix with input variables as columns and examples as rows) is usually denoted with X and the categories/labels (a list of integers) is usually denoted as y. Predictions (a list of integers) shall be compared with the actual labels (y) when calculating the accuracy score (percentage correct predictions) and generating the confusion matrix.
Test cases
You can verify your results with the results in Web ML Experimenter. The Iris dataset is builtin in Web ML (click the Try Iris dataset button), and the Banknote authentication can be uploaded from the csv file. Note that the crossvalidation results can differ slightly due to differences in how the data is split into folds, but the accuracy you get should be almost similar to the accuracy in Web ML.