A3 – Search Engine

Description

  • In assignment 3 you shall implement a basic search engine for Wikipedia articles
  • You can use any programming language you like
  • You can work alone or in group of two students
  • You shall present your application and code at an oral examination

 

Requirements

Grade Requirements
E
  • Implement a basic search engine that index all pages in the Wikipedia dataset (see Datasets page)
  • Search queries shall only contain single words
  • Results shall be ranked using the word frequency metric
  • The user shall input the search queries in a web client, and display the search results returned from the server
  • Display the top 5 search results with page and rank score
  • Implement the system using a REST web service where:
     1) client sends a request to a server
     2) the server responds with json data
     3) the json data is decoded and presented in a client GUI
C-D
  • It shall be possible to use search queries of more than one word
  • Results shall be ranked using:
    score = word_frequency + 0.8 * document_location
  • Display the top 5 search results with page and rank score
A-B
  • Implement the PageRank algorithm and use it to rank the search results
  • Run the algorithm for 20 iterations
  • Results shall be ranked using:
    score = word_frequency + 0.8 * document_location + 0.5 * pagerank
  • Display the top 5 search results with page and rank score

Note that the dataset has been updated, so the results in the last part of the recording is not accurate. The slides PDF has been updated to the new dataset.

Test cases

Here are some test cases you can use to verify that your search engine works correctly.

Grade E:



Grade C-D:


Grade A-B:

Note that updating PageRanks can be very slow if implemented with inefficient data structures. As comparison, my Python implementation takes around 4 seconds to update PageRanks.

Welcome to CoursePress

en utav Linnéuniversitets lärplattformar. Som inloggad student kan du kommunicera, hålla koll på dina kurser och mycket mer. Du som är gäst kan nå de flesta kurser och dess innehåll utan att logga in.

Läs mer lärplattformar vid Linnéuniversitetet

Student account

To log in you need a student account at Linnaeus University.

Read more about collecting your account

Log in LNU