As information of interest is scattered around the World Wide Web, the need for fully automatic extraction processes to fetch relevant data cannot be ignored . Nowadays, five billion pages are available on the Internet and almost two million new pages are being added daily. This thesis aims at defining a comprehensive issue to extract news articles specially, from the early classification of significant pages to the article retrieval properly speaking. We developed News Ripper, a "wrapper" that achieves this Web mining task by clustering similar news pages before comparing their layouts to bring the articles to light
la date de réponse | 2005 |
---|
langue originale | Anglais |
---|
Superviseur | Monique Fraiture (Promoteur) |
---|
Automatically extracting news articles from the Internet
Jasselette, A. (Auteur), Vanderwhale, M. (Auteur). 2005
Student thesis: Master types › Master en sciences informatiques