Towards LLM-Generated Code Tours for Onboarding

Martin Balfroid; Benoit Vanderose; Xavier Devroey

doi:10.1145/3643787.3648033

Towards LLM-Generated Code Tours for Onboarding

Martin Balfroid, Benoit Vanderose, Xavier Devroey

Research output: Contribution in Book/Catalog/Report/Conference proceeding › Conference contribution

2 Downloads (Pure)

Abstract

Onboarding new developers is a challenge for any software project. Addressing this challenge relies on human resources (e.g., having a senior developer write documentation or mentor the new developer). One promising solution is using annotated code tours. While this approach partially lifts the need for mentorship, it still requires a senior developer to write this interactive form of documentation. This paper argues that a Large Language Model (LLM) might help with this documentation process. Our approach is to record the stack trace between a failed test and a faulty method. We then extract code snippets from the methods in this stack trace using CodeQL, a static analysis tool and have them explained by gpt-3.5-turbo-1106, the LLM behind ChatGPT. Finally, we evaluate the quality of a sample of these generated tours using a checklist. We show that the automatic generation of code tours is feasible but has limitations like redundant and low-level explanations.

Original language	English
Title of host publication	2024 ACM/IEEE International Workshop on NL-based Software Engineering (NLBSE ’24)
Place of Publication	Lisbon, Portugal
Publisher	ACM Press
DOIs	https://doi.org/10.1145/3643787.3648033
Publication status	Published - Apr 2024
Event	3rd Intl. Workshop on NL-based Software Engineering - Lisbon, Portugal Duration: 20 Apr 2024 → 20 Apr 2024 Conference number: 3 https://nlbse2024.github.io

Conference

Conference	3rd Intl. Workshop on NL-based Software Engineering
Abbreviated title	NLBSE '24
Country/Territory	Portugal
City	Lisbon
Period	20/04/24 → 20/04/24
Internet address	https://nlbse2024.github.io

Access to Document

10.1145/3643787.3648033

balfroid-2024
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM. This is the author’s version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in 2024 ACM/IEEE International Workshop on NL-based Software Engineering (NLBSE ’24), April 20, 2024, Lisbon, Portugal, https://doi.org/10.1145/3643787.3648033.
Other version, 444 KB

Cite this

@inproceedings{1802f8b7dd684f1db6975cbb8d247403,

title = "Towards LLM-Generated Code Tours for Onboarding",

abstract = "Onboarding new developers is a challenge for any software project. Addressing this challenge relies on human resources (e.g., having a senior developer write documentation or mentor the new developer). One promising solution is using annotated code tours. While this approach partially lifts the need for mentorship, it still requires a senior developer to write this interactive form of documentation. This paper argues that a Large Language Model (LLM) might help with this documentation process. Our approach is to record the stack trace between a failed test and a faulty method. We then extract code snippets from the methods in this stack trace using CodeQL, a static analysis tool and have them explained by gpt-3.5-turbo-1106, the LLM behind ChatGPT. Finally, we evaluate the quality of a sample of these generated tours using a checklist. We show that the automatic generation of code tours is feasible but has limitations like redundant and low-level explanations.",

author = "Martin Balfroid and Benoit Vanderose and Xavier Devroey",

year = "2024",

month = apr,

doi = "10.1145/3643787.3648033",

language = "English",

booktitle = "2024 ACM/IEEE International Workshop on NL-based Software Engineering (NLBSE {\textquoteright}24)",

publisher = "ACM Press",

address = "United States",

note = "3rd Intl. Workshop on NL-based Software Engineering, NLBSE '24 ; Conference date: 20-04-2024 Through 20-04-2024",

url = "https://nlbse2024.github.io",

}

TY - GEN

T1 - Towards LLM-Generated Code Tours for Onboarding

AU - Balfroid, Martin

AU - Vanderose, Benoit

AU - Devroey, Xavier

N1 - Conference code: 3

PY - 2024/4

Y1 - 2024/4

N2 - Onboarding new developers is a challenge for any software project. Addressing this challenge relies on human resources (e.g., having a senior developer write documentation or mentor the new developer). One promising solution is using annotated code tours. While this approach partially lifts the need for mentorship, it still requires a senior developer to write this interactive form of documentation. This paper argues that a Large Language Model (LLM) might help with this documentation process. Our approach is to record the stack trace between a failed test and a faulty method. We then extract code snippets from the methods in this stack trace using CodeQL, a static analysis tool and have them explained by gpt-3.5-turbo-1106, the LLM behind ChatGPT. Finally, we evaluate the quality of a sample of these generated tours using a checklist. We show that the automatic generation of code tours is feasible but has limitations like redundant and low-level explanations.

AB - Onboarding new developers is a challenge for any software project. Addressing this challenge relies on human resources (e.g., having a senior developer write documentation or mentor the new developer). One promising solution is using annotated code tours. While this approach partially lifts the need for mentorship, it still requires a senior developer to write this interactive form of documentation. This paper argues that a Large Language Model (LLM) might help with this documentation process. Our approach is to record the stack trace between a failed test and a faulty method. We then extract code snippets from the methods in this stack trace using CodeQL, a static analysis tool and have them explained by gpt-3.5-turbo-1106, the LLM behind ChatGPT. Finally, we evaluate the quality of a sample of these generated tours using a checklist. We show that the automatic generation of code tours is feasible but has limitations like redundant and low-level explanations.

U2 - 10.1145/3643787.3648033

DO - 10.1145/3643787.3648033

M3 - Conference contribution

BT - 2024 ACM/IEEE International Workshop on NL-based Software Engineering (NLBSE ’24)

PB - ACM Press

CY - Lisbon, Portugal

T2 - 3rd Intl. Workshop on NL-based Software Engineering

Y2 - 20 April 2024 through 20 April 2024

ER -

Towards LLM-Generated Code Tours for Onboarding

Abstract

Conference

Access to Document

Fingerprint

Cite this