P4 – Web scraping
This is one of the pre-defined project ideas you can choose for your project.
Web scraping
In this project, you shall use a web scraping library to download articles that can be used in your search engine from Assignment 3.
If you use Python, the BeautifulSoup library is very powerful and easy to use. A quick start guide can be found here. For Java can check out HtmlUnit. A quick start guide can be found here.
When scraping a site such as Wikipedia, you usually start on one page and follow all outgoing links.
You can download pages from Wikipedia or any other site.
Grading
Grade | Requirements |
---|---|
E |
|
C-D |
|
A-B |
|