Exploiting Physics to Constrain Convolutional Neural Networks
: From using Bessel functions for equivariance to quaternions for 3D image processing

Student thesis: Doc typesDoctor of Sciences

Abstract

Machine Learning (ML) is taking more and more importance in our society. It brings innovative solutions to tasks that would have been perceived as unachievable a few decades ago. One can for example think about the recent and impressive improvements in natural language processing, even though the success of ML is more general and extends to many other domains.

Thanks to its ability to deal with complex data, Deep Learning (DL), a subfield of ML, receives a lot of the spotlight. However, using DL techniques is generally synonymous with an extensive use of computational resources as well as a lack of interpretability for the decision process. This work was carried out in this context and attempts to provide new specific tools to improve sustainability of DL. More precisely, attention is focused on how to enforce specific constraints in Convolutional Neural Networks (CNNs), one of the most powerful tools that we have today to deal with computer vision tasks. This work is also deeply rooted in a multidisciplinary approach between ML and the physical science, as mathematical developments from physics prove to be an important source of inspiration.

The first contribution of this work is to provide a new way to deal with rotation invariance in computer vision, while providing strong theoretical guarantees. For some applications, the orientation of the whole image or some objects that can be contained in it is arbitrary and does not contain any information. Yet, vanilla CNNs are not able to identify rotated versions of the same object as equivalent. Therefore, we propose a new variant of CNNs that we called Bessel-CNNs (B-CNNs), which is a constraint-based method to mainly enforce rotation invariance. The use of B-CNNs when meaningful lead to a better efficiency (fewer data and parameters are generally required for similar performances), and can also simply lead to better performances.

The second main contribution is to provide a new approach to use 2-dimensional convolutions with 3-dimensional images. Indeed, 3D-CNNs are much more computationally expensive than their 2D counterparts, and approaches to build 2D representation of 3D images can lead to a significant speed up. To do so, we leverage the quaternion algebra to store several complementary 2D views of the data. By building on this representation, we are able to drastically reduce the computational cost, while minimizing the loss of information.
Date of Award13 Sept 2024
Original languageEnglish
Awarding Institution
  • University of Namur
SponsorsFNRS-FRIA
SupervisorBenoît Frénay (Supervisor), Alexandre Mayer (Co-Supervisor), Wim Vanhoof (President), Gilles Perrouin (Jury), John Aldo Lee (Jury) & Paul Temple (Jury)

Keywords

  • machine learning
  • computer vision
  • convolutional neural networks
  • Bessel functions
  • equivariance
  • quaternions

Cite this

'