Fault injection is a technique used in the field of Software Engineering for various purposes, such as test assessment, fault tolerance assessment, analysers evaluation and data corpus augmentation. The technique consists on introducing changes to the input source-code in order to alter its expected behaviour, thus, generating multiple faulty versions of a program.
The main challenge of such techniques is to find where (source-code locations) and what (source-code transformations) to inject to introduce specific faults or flaws in the program. Typically, these changes are induced by applying mutation operators (code transformation patterns), that are usually based on the language syntax.
Therefore, using Natural Language Processing (NLP) techniques and more precisely a Machine Translation (MT) one, we propose an approach that captures automatically the vulnerability intent of the code and, based on a dataset of known vulnerabilities, modifies the code to inject these flaws.
This work proposes a complete data processing workflow starting from a corpus of vulnerable and benign code to obtain a dataset that can be exploited by machine learning techniques. Moreover, we propose different ways to augment a dataset by making various syntactic changes without altering the semantics. Finally, our results show that our approach can insert up to 99% of vulnerabilities in the same dataset.
la date de réponse | 20 juin 2022 |
---|
langue originale | Anglais |
---|
L'institution diplômante | |
---|
Superviseur | Gilles Perrouin (Promoteur) |
---|
Automatic vulnerability injection using Natural Language Processing
PETIT, B. (Auteur). 20 juin 2022
Student thesis: Master types › Master en sciences informatiques à finalité spécialisée en data science