A2 - Clustering
Description
- In this assignment, you shall implement clustering on the blogs dataset containing 99 blogs.
- You can use any programming language you like.
You can work alone or in a group of two students.- You shall present your application and code at an oral examination.
Submission instructions
See the Deadlines and Submissions page.
Requirements
Grade | Requirements |
---|---|
E |
|
C-D |
|
A-B |
|
Test cases
K-means
K-means is not deterministic (the results differ between runs), but you usually find related blogs such as the Google and search engine blogs in one cluster as shown here:
Note that you can make some performance improvements to speed up the cluster generation. In comparison, my implementation in Python takes around 3 seconds to generate the clusters and build a JSON response.
Hierarchical
Hierarchical clustering always gives the same result. The tree is too large to show here, but if you get the branch shown below, it most likely works correctly:
Note that there are many performance improvements you can make to speed up tree generation. As a comparison, my implementation in Python takes around 10 seconds to generate the tree and build a JSON response.