Outlier Detection is a wide area of application. It can be applied to a large number of real applications. As for the definition of an outlier, it depends of the group of technique used for detecting them: distance-based, density-based, etc. Five Outlier Detection Techniques are shown : Detection with the help of the K Nearest Neighbors, a distance-based method, the Local Outlier Factor, a density-based method, the Isolation Forest, an ensemble-based method, and Detection with the help of DBSCAN, a clustering-based method. Other statistical tests, such as the trimmed mean and the box plot are mentioned. These five techniques are tested, with a variation of a parameter and then evaluated by two Evaluation Methods : The Precision and the Receiver Operating Characteristic curve, coupled with the Area Under the ROC curve, on two datasets: a synthetic dataset and a real-world dataset. Results showed the DBSCAN and the Isolation Forest performed better than others for both datasets.
la date de réponse | 2021 |
---|
langue originale | Anglais |
---|
L'institution diplômante | |
---|
Superviseur | Benoît Frénay (Promoteur) |
---|
Outlier Detection and Evaluation Methods for Classification
Davidt, M. (Auteur). 2021
Student thesis: Master types › Master en ingénieur de gestion à finalité spécialisée en data science