As information of interest is scattered around the World Wide Web, the need for fully automatic extraction processes to fetch relevant data cannot be ignored . Nowadays, five billion pages are available on the Internet and almost two million new pages are being added daily. This thesis aims at defining a comprehensive issue to extract news articles specially, from the early classification of significant pages to the article retrieval properly speaking. We developed News Ripper, a "wrapper" that achieves this Web mining task by clustering similar news pages before comparing their layouts to bring the articles to light
Date of Award | 2005 |
---|
Original language | English |
---|
Supervisor | Monique Fraiture (Supervisor) |
---|
Automatically extracting news articles from the Internet
Jasselette, A. (Author), Vanderwhale, M. (Author). 2005
Student thesis: Master types › Master in Computer science