Uncertainty and label noise in machine learning

Michel Verleysen; Benoît Frénay

Uncertainty and label noise in machine learning

Research output: External Thesis › Doctoral Thesis

Abstract

This thesis addresses three challenge of machine learning: high-dimensional data, label noise and limited computational resources. Learning is usually hard in high-dimensional spaces, due to the curse of dimensionality and other phenomena like the concentration of distances. One can either handle such data with specific tools or try to reduce their dimensionality using e.g. feature selection. The first contribution of this thesis is to study the adequacy of mutual information to select relevant subsets of features. For both classification and regression problems, mutual information is shown to be a sensible criterion for feature selection in most cases. Counterexamples are discussed, where mutual information fails to select optimal features with respect to common error criteria for classification and regression. However, the probability and impact of such failures is also shown to be limited. The second contribution of this thesis is a survey of the label noise literature. Indeed, label noise is an important problem in classification, whose consequences are various and complex. For example, this thesis shows that label noise affects the segmentation of electrocardiogram signals and the results of feature selection. In each case, a new algorithm is proposed to deal with label noise using a probabilistic modelling introduced by Lawrence and Sch"{o}lkopf. Afterwards, a more generic framework is proposed to deal with instances which have a too large influence on learning. This framework is used to robustify several probabilistic learning algorithms. The last contribution of this thesis is the study of large extreme learning machines. Indeed, extreme learning is a recent trend in machine learning which allows learning non-linear models much faster than other state-of-the-art methods. Extreme learning machines are single layer feedforward neural networks whose hidden layer is randomly initialised and not optimised during learning. Only the output weights of such networks have to be optimised, which explains why learning becomes much faster. This thesis shows that when the number of hidden neurons is large, overfitting can be avoided using regularisation. In this case, a new kernel can be defined using extreme learning, which is shown to give good results for both classification and regression problems. This kernel offers a compromise between prediction accuracy and computational needs which can be useful in contexts where computational time is precious.

Original language	English
Awarding Institution	SST/ICTM/ELEN - Pôle en ingénierie électrique
Publisher	Université catholique de Louvain (UCL)
Publication status	Published - 2013
Externally published	Yes

Keywords

Feature selection Label noise Outliers Mutual information Extreme learning Robust inference

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Cite this

@phdthesis{d015632a45754b3ea7e02c757a2da398,

title = "Uncertainty and label noise in machine learning",

abstract = "This thesis addresses three challenge of machine learning: high-dimensional data, label noise and limited computational resources. Learning is usually hard in high-dimensional spaces, due to the curse of dimensionality and other phenomena like the concentration of distances. One can either handle such data with specific tools or try to reduce their dimensionality using e.g. feature selection. The first contribution of this thesis is to study the adequacy of mutual information to select relevant subsets of features. For both classification and regression problems, mutual information is shown to be a sensible criterion for feature selection in most cases. Counterexamples are discussed, where mutual information fails to select optimal features with respect to common error criteria for classification and regression. However, the probability and impact of such failures is also shown to be limited. The second contribution of this thesis is a survey of the label noise literature. Indeed, label noise is an important problem in classification, whose consequences are various and complex. For example, this thesis shows that label noise affects the segmentation of electrocardiogram signals and the results of feature selection. In each case, a new algorithm is proposed to deal with label noise using a probabilistic modelling introduced by Lawrence and Sch{"}{o}lkopf. Afterwards, a more generic framework is proposed to deal with instances which have a too large influence on learning. This framework is used to robustify several probabilistic learning algorithms. The last contribution of this thesis is the study of large extreme learning machines. Indeed, extreme learning is a recent trend in machine learning which allows learning non-linear models much faster than other state-of-the-art methods. Extreme learning machines are single layer feedforward neural networks whose hidden layer is randomly initialised and not optimised during learning. Only the output weights of such networks have to be optimised, which explains why learning becomes much faster. This thesis shows that when the number of hidden neurons is large, overfitting can be avoided using regularisation. In this case, a new kernel can be defined using extreme learning, which is shown to give good results for both classification and regression problems. This kernel offers a compromise between prediction accuracy and computational needs which can be useful in contexts where computational time is precious.",

keywords = "Feature selection Label noise Outliers Mutual information Extreme learning Robust inference",

author = "Michel Verleysen and Beno{\^i}t Fr{\'e}nay",

year = "2013",

language = "English",

publisher = "Universit{\'e} catholique de Louvain (UCL)",

school = "SST/ICTM/ELEN - P{\^o}le en ing{\'e}nierie {\'e}lectrique",

}

TY - BOOK

T1 - Uncertainty and label noise in machine learning

AU - Verleysen, Michel

AU - Frénay, Benoît

PY - 2013

Y1 - 2013

N2 - This thesis addresses three challenge of machine learning: high-dimensional data, label noise and limited computational resources. Learning is usually hard in high-dimensional spaces, due to the curse of dimensionality and other phenomena like the concentration of distances. One can either handle such data with specific tools or try to reduce their dimensionality using e.g. feature selection. The first contribution of this thesis is to study the adequacy of mutual information to select relevant subsets of features. For both classification and regression problems, mutual information is shown to be a sensible criterion for feature selection in most cases. Counterexamples are discussed, where mutual information fails to select optimal features with respect to common error criteria for classification and regression. However, the probability and impact of such failures is also shown to be limited. The second contribution of this thesis is a survey of the label noise literature. Indeed, label noise is an important problem in classification, whose consequences are various and complex. For example, this thesis shows that label noise affects the segmentation of electrocardiogram signals and the results of feature selection. In each case, a new algorithm is proposed to deal with label noise using a probabilistic modelling introduced by Lawrence and Sch"{o}lkopf. Afterwards, a more generic framework is proposed to deal with instances which have a too large influence on learning. This framework is used to robustify several probabilistic learning algorithms. The last contribution of this thesis is the study of large extreme learning machines. Indeed, extreme learning is a recent trend in machine learning which allows learning non-linear models much faster than other state-of-the-art methods. Extreme learning machines are single layer feedforward neural networks whose hidden layer is randomly initialised and not optimised during learning. Only the output weights of such networks have to be optimised, which explains why learning becomes much faster. This thesis shows that when the number of hidden neurons is large, overfitting can be avoided using regularisation. In this case, a new kernel can be defined using extreme learning, which is shown to give good results for both classification and regression problems. This kernel offers a compromise between prediction accuracy and computational needs which can be useful in contexts where computational time is precious.

AB - This thesis addresses three challenge of machine learning: high-dimensional data, label noise and limited computational resources. Learning is usually hard in high-dimensional spaces, due to the curse of dimensionality and other phenomena like the concentration of distances. One can either handle such data with specific tools or try to reduce their dimensionality using e.g. feature selection. The first contribution of this thesis is to study the adequacy of mutual information to select relevant subsets of features. For both classification and regression problems, mutual information is shown to be a sensible criterion for feature selection in most cases. Counterexamples are discussed, where mutual information fails to select optimal features with respect to common error criteria for classification and regression. However, the probability and impact of such failures is also shown to be limited. The second contribution of this thesis is a survey of the label noise literature. Indeed, label noise is an important problem in classification, whose consequences are various and complex. For example, this thesis shows that label noise affects the segmentation of electrocardiogram signals and the results of feature selection. In each case, a new algorithm is proposed to deal with label noise using a probabilistic modelling introduced by Lawrence and Sch"{o}lkopf. Afterwards, a more generic framework is proposed to deal with instances which have a too large influence on learning. This framework is used to robustify several probabilistic learning algorithms. The last contribution of this thesis is the study of large extreme learning machines. Indeed, extreme learning is a recent trend in machine learning which allows learning non-linear models much faster than other state-of-the-art methods. Extreme learning machines are single layer feedforward neural networks whose hidden layer is randomly initialised and not optimised during learning. Only the output weights of such networks have to be optimised, which explains why learning becomes much faster. This thesis shows that when the number of hidden neurons is large, overfitting can be avoided using regularisation. In this case, a new kernel can be defined using extreme learning, which is shown to give good results for both classification and regression problems. This kernel offers a compromise between prediction accuracy and computational needs which can be useful in contexts where computational time is precious.

KW - Feature selection Label noise Outliers Mutual information Extreme learning Robust inference

M3 - Doctoral Thesis

PB - Université catholique de Louvain (UCL)

ER -

Uncertainty and label noise in machine learning

Abstract

Keywords

UN SDGs

Fingerprint

Cite this