La régression linéaire pour des données intervalles

  • Alice Vassart

    Student thesis: Master typesMaster in Mathematics

    Abstract

    The advent of computers makes possible today extremely large databases. We may then be interested in analysing classes of individuals called concepts instead of single individuals. Moreover, observations in a large data set can be studied more easily after aggregation to one of a smaller size. The resulting observations will not be single-valued anymore, but interval-valued, multi-valued, histograms or diagrams. These are called symbolic data. In this paper, we extend classical linear regression to symbolic and more especially interval-valued data. In the first part, the different types of symbolic data are introduced. We then study descriptive statistics for such data. These will be used to fit a symbolic linear regression model. In the second part, we recall classical linear regression. The third part concerns the fitting of a linear regression model to interval-valued data. We develop several methods: center method, lower bound and upper bound methods, center and range method and two other methods only for simple linear regression. We illustrate and compare these different methods by applying them on artificial and real data sets. We note that, among these methods, the center and range method and the center method seems to be the most efficient. In the fourth part, we extend the center method to histogram-valued variables and we also propose a linear regression method in the case of explanatory diagram-valued variables. In the fifth and last part, two applications are studied with the help of the module SREG of symbolic linear regression of the SODAS 2 software. We compare classical linear regression and linear regression at the level of concepts defined from the dependant variables. We notice that symbolic linear regression gives interesting results in comparison with linear regression on the first order individuals. In particular, Fisher and Student tests are less efficient in the presence of a large number of individuals in the regression.
    Date of Award2006
    Original languageFrench
    SupervisorAndre Hardy (Supervisor), Marcel Remon (Jury) & Jean Paul Rasson (Jury)

    Cite this

    '