P2 – Clustering
This is one of the pre-defined project ideas you can choose for your project.
Clustering Wikipedia articles
Modify your clustering system from Assignment 2 to use Wikipedia articles (90 articles about Programming, 90 about Games). The dataset can be downloaded on the Datasets page.
To use the dataset for clustering, you need to select some words and calculate the frequency of these words in each Wikipedia article. It is not recommended to use all words from the articles since similarity calculations will then take a long time. You can, for example, use the following words:
language, programming, computer, software, hardware, data, player, online, system, development, machine, console, developer, design, history, technology, standard, information, article, example
The article Arcade_game would then have the following frequencies:
0;4;14;1;58;1;11;7;12;4;9;17;0;5;33;8;1;2;7;1
Grading
| Grade | Requirements | 
|---|---|
| E | 
 | 
| C-D | 
 | 
| A-B | 
 |