TY - GEN
T1 - An Empirical Study of (Multi-) Database Models in Open-Source Projects
AU - Benats, Pol
AU - Gobert, Maxime
AU - Meurice, Loup
AU - Nagy, Csaba
AU - Cleve, Anthony
N1 - Funding Information:
Acknowledgments. This research is supported by the F.R.S.-FNRS and FWO EOS project 30446992 SECO-ASSIST and the SNF-FNRS project INSTINCT.
Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Managing data-intensive systems has long been recognized as an expensive and error-prone process. This is mainly due to the often implicit consistency relationships that hold between applications and their database. As new technologies emerged for specialized purposes (e.g., graph databases, document stores), the joint use of database models has also become popular. There are undeniable benefits of such multi-database models where developers combine various technologies. However, the side effects on design, querying, and maintenance are not well-known yet. In this paper, we study multi-database models in software systems by mining major open-source repositories. We consider four years of history, from 2017 to 2020, of a total number of 40,609 projects with databases. Our results confirm the emergence of hybrid data-intensive systems as we found (multi-) database models (e.g., relational and non-relational) used together in 16% of all database-dependent projects. One percent of the systems added, deleted, or changed a database during the four years. The majority (62%) of these systems had a single database before becoming hybrid, and another significant part (19%) became “mono-database” after initially using multiple databases. We examine the evolution of these systems to understand the rationale of the design choices of the developers. Our study aims to guide future research towards new challenges posed by those emerging data management architectures.
AB - Managing data-intensive systems has long been recognized as an expensive and error-prone process. This is mainly due to the often implicit consistency relationships that hold between applications and their database. As new technologies emerged for specialized purposes (e.g., graph databases, document stores), the joint use of database models has also become popular. There are undeniable benefits of such multi-database models where developers combine various technologies. However, the side effects on design, querying, and maintenance are not well-known yet. In this paper, we study multi-database models in software systems by mining major open-source repositories. We consider four years of history, from 2017 to 2020, of a total number of 40,609 projects with databases. Our results confirm the emergence of hybrid data-intensive systems as we found (multi-) database models (e.g., relational and non-relational) used together in 16% of all database-dependent projects. One percent of the systems added, deleted, or changed a database during the four years. The majority (62%) of these systems had a single database before becoming hybrid, and another significant part (19%) became “mono-database” after initially using multiple databases. We examine the evolution of these systems to understand the rationale of the design choices of the developers. Our study aims to guide future research towards new challenges posed by those emerging data management architectures.
KW - Data models
KW - Open-source projects
KW - Empirical study
UR - http://www.scopus.com/inward/record.url?scp=85118179973&partnerID=8YFLogxK
U2 - https://doi.org/10.1007/978-3-030-89022-3_8
DO - https://doi.org/10.1007/978-3-030-89022-3_8
M3 - Conference contribution
SN - 978-3-030-89021-6
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 87
EP - 101
BT - Conceptual Modeling - 40th International Conference, ER 2021, Proceedings
A2 - Ghose, Aditya
A2 - Horkoff, Jennifer
A2 - Silva Souza, Vítor E.
A2 - Parsons, Jeffrey
A2 - Evermann, Joerg
PB - Springer
ER -