A3 - Search Engine

Description

  • In this assignment, you shall implement a basic search engine for Wikipedia articles.
  • You can use any programming language you like.
  • You can work alone or in a group of two students.
  • You shall present your application and code at an oral examination.

Submission instructions

See the Deadlines and Submissions page.

Requirements

GradeRequirements
E
  • Implement a basic search engine that index all pages in the Wikipedia dataset (see Datasets page).
  • Search queries shall only contain single words.
  • Results shall be ranked using the word frequency metric.
  • The user shall input the search queries in a web client, and display the search results returned from the server.
  • Display the top 5 search results with page and rank score.
  • Implement the system using a REST web service where:
    1. client sends a request to a server.
    2. the server responds with json data.
    3. the json data is decoded and presented in a client GUI.
C-D
  • It shall be possible to use search queries of more than one word.
  • Results shall be ranked using:
  • Display the top five search results with page and rank score.
A-B
  • Implement the PageRank algorithm and use it to rank the search results.
  • Run the algorithm for 20 iterations.
  • Results shall be ranked using:
  • Display the top five search results with page and rank score.
Note that the dataset has been updated, so the last part of the recording results is inaccurate. The slides PDF has been updated to the new dataset.

Test cases

Here are some test cases you can use to verify that your search engine works correctly.

Grade E

resources/A3-E.png

Grade C-D

resources/A3-CD.png

Grade A-B

resources/A3-AB.png

Note that updating PageRanks can be very slow if implemented with inefficient data structures. As a comparison, my Python implementation takes around four seconds to update PageRanks. It is okay if your PageRank scores differ slightly, ±0.02, compared to the test cases.