An Empirical Study of (Multi-) Database Models in Open-Source Projects

Pol Benats; Maxime Gobert; Loup Meurice; Csaba Nagy; Anthony Cleve

doi:https://doi.org/10.1007/978-3-030-89022-3_8

An Empirical Study of (Multi-) Database Models in Open-Source Projects

Pol Benats, Maxime Gobert, Loup Meurice, Csaba Nagy, Anthony Cleve

Research output: Contribution in Book/Catalog/Report/Conference proceeding › Conference contribution

Abstract

Managing data-intensive systems has long been recognized as an expensive and error-prone process. This is mainly due to the often implicit consistency relationships that hold between applications and their database. As new technologies emerged for specialized purposes (e.g., graph databases, document stores), the joint use of database models has also become popular. There are undeniable benefits of such multi-database models where developers combine various technologies. However, the side effects on design, querying, and maintenance are not well-known yet. In this paper, we study multi-database models in software systems by mining major open-source repositories. We consider four years of history, from 2017 to 2020, of a total number of 40,609 projects with databases. Our results confirm the emergence of hybrid data-intensive systems as we found (multi-) database models (e.g., relational and non-relational) used together in 16% of all database-dependent projects. One percent of the systems added, deleted, or changed a database during the four years. The majority (62%) of these systems had a single database before becoming hybrid, and another significant part (19%) became “mono-database” after initially using multiple databases. We examine the evolution of these systems to understand the rationale of the design choices of the developers. Our study aims to guide future research towards new challenges posed by those emerging data management architectures.

Original language	English
Title of host publication	Conceptual Modeling - 40th International Conference, ER 2021, Proceedings
Subtitle of host publication	40th International Conference, ER 2021, Virtual Event, October 18–21, 2021, Proceedings
Editors	Aditya Ghose, Jennifer Horkoff, Vítor E. Silva Souza, Jeffrey Parsons, Joerg Evermann
Publisher	Springer
Pages	87-101
Number of pages	15
ISBN (Electronic)	978-3-030-89022-3
ISBN (Print)	978-3-030-89021-6
DOIs	https://doi.org/10.1007/978-3-030-89022-3_8
Publication status	Published - 2021

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13011 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Keywords

Data models
Open-source projects
Empirical study

Access to Document

https://doi.org/10.1007/978-3-030-89022-3_8

Cite this

Benats, P., Gobert, M., Meurice, L., Nagy, C., & Cleve, A. (2021). An Empirical Study of (Multi-) Database Models in Open-Source Projects. In A. Ghose, J. Horkoff, V. E. Silva Souza, J. Parsons, & J. Evermann (Eds.), Conceptual Modeling - 40th International Conference, ER 2021, Proceedings: 40th International Conference, ER 2021, Virtual Event, October 18–21, 2021, Proceedings (pp. 87-101). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13011 LNCS). Springer. https://doi.org/10.1007/978-3-030-89022-3_8

Benats, Pol ; Gobert, Maxime ; Meurice, Loup et al. / An Empirical Study of (Multi-) Database Models in Open-Source Projects. Conceptual Modeling - 40th International Conference, ER 2021, Proceedings: 40th International Conference, ER 2021, Virtual Event, October 18–21, 2021, Proceedings. editor / Aditya Ghose ; Jennifer Horkoff ; Vítor E. Silva Souza ; Jeffrey Parsons ; Joerg Evermann. Springer, 2021. pp. 87-101 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{ed0861680efe46ec994a1eca886f188d,

title = "An Empirical Study of (Multi-) Database Models in Open-Source Projects",

abstract = "Managing data-intensive systems has long been recognized as an expensive and error-prone process. This is mainly due to the often implicit consistency relationships that hold between applications and their database. As new technologies emerged for specialized purposes (e.g., graph databases, document stores), the joint use of database models has also become popular. There are undeniable benefits of such multi-database models where developers combine various technologies. However, the side effects on design, querying, and maintenance are not well-known yet. In this paper, we study multi-database models in software systems by mining major open-source repositories. We consider four years of history, from 2017 to 2020, of a total number of 40,609 projects with databases. Our results confirm the emergence of hybrid data-intensive systems as we found (multi-) database models (e.g., relational and non-relational) used together in 16% of all database-dependent projects. One percent of the systems added, deleted, or changed a database during the four years. The majority (62%) of these systems had a single database before becoming hybrid, and another significant part (19%) became “mono-database” after initially using multiple databases. We examine the evolution of these systems to understand the rationale of the design choices of the developers. Our study aims to guide future research towards new challenges posed by those emerging data management architectures.",

keywords = "Data models, Open-source projects, Empirical study",

author = "Pol Benats and Maxime Gobert and Loup Meurice and Csaba Nagy and Anthony Cleve",

note = "Funding Information: Acknowledgments. This research is supported by the F.R.S.-FNRS and FWO EOS project 30446992 SECO-ASSIST and the SNF-FNRS project INSTINCT. Publisher Copyright: {\textcopyright} 2021, Springer Nature Switzerland AG.",

year = "2021",

doi = "https://doi.org/10.1007/978-3-030-89022-3_8",

language = "English",

isbn = "978-3-030-89021-6",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer",

pages = "87--101",

editor = "Aditya Ghose and Jennifer Horkoff and {Silva Souza}, {V{\'i}tor E.} and Jeffrey Parsons and Joerg Evermann",

booktitle = "Conceptual Modeling - 40th International Conference, ER 2021, Proceedings",

}

Benats, P , Gobert, M , Meurice, L, Nagy, C & Cleve, A 2021, An Empirical Study of (Multi-) Database Models in Open-Source Projects. in A Ghose, J Horkoff, VE Silva Souza, J Parsons & J Evermann (eds), Conceptual Modeling - 40th International Conference, ER 2021, Proceedings: 40th International Conference, ER 2021, Virtual Event, October 18–21, 2021, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13011 LNCS, Springer, pp. 87-101. https://doi.org/10.1007/978-3-030-89022-3_8

An Empirical Study of (Multi-) Database Models in Open-Source Projects. / Benats, Pol ; Gobert, Maxime ; Meurice, Loup et al.
Conceptual Modeling - 40th International Conference, ER 2021, Proceedings: 40th International Conference, ER 2021, Virtual Event, October 18–21, 2021, Proceedings. ed. / Aditya Ghose; Jennifer Horkoff; Vítor E. Silva Souza; Jeffrey Parsons; Joerg Evermann. Springer, 2021. p. 87-101 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13011 LNCS).

Research output: Contribution in Book/Catalog/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - An Empirical Study of (Multi-) Database Models in Open-Source Projects

AU - Benats, Pol

AU - Gobert, Maxime

AU - Meurice, Loup

AU - Nagy, Csaba

AU - Cleve, Anthony

N1 - Funding Information: Acknowledgments. This research is supported by the F.R.S.-FNRS and FWO EOS project 30446992 SECO-ASSIST and the SNF-FNRS project INSTINCT. Publisher Copyright: © 2021, Springer Nature Switzerland AG.

PY - 2021

Y1 - 2021

N2 - Managing data-intensive systems has long been recognized as an expensive and error-prone process. This is mainly due to the often implicit consistency relationships that hold between applications and their database. As new technologies emerged for specialized purposes (e.g., graph databases, document stores), the joint use of database models has also become popular. There are undeniable benefits of such multi-database models where developers combine various technologies. However, the side effects on design, querying, and maintenance are not well-known yet. In this paper, we study multi-database models in software systems by mining major open-source repositories. We consider four years of history, from 2017 to 2020, of a total number of 40,609 projects with databases. Our results confirm the emergence of hybrid data-intensive systems as we found (multi-) database models (e.g., relational and non-relational) used together in 16% of all database-dependent projects. One percent of the systems added, deleted, or changed a database during the four years. The majority (62%) of these systems had a single database before becoming hybrid, and another significant part (19%) became “mono-database” after initially using multiple databases. We examine the evolution of these systems to understand the rationale of the design choices of the developers. Our study aims to guide future research towards new challenges posed by those emerging data management architectures.

AB - Managing data-intensive systems has long been recognized as an expensive and error-prone process. This is mainly due to the often implicit consistency relationships that hold between applications and their database. As new technologies emerged for specialized purposes (e.g., graph databases, document stores), the joint use of database models has also become popular. There are undeniable benefits of such multi-database models where developers combine various technologies. However, the side effects on design, querying, and maintenance are not well-known yet. In this paper, we study multi-database models in software systems by mining major open-source repositories. We consider four years of history, from 2017 to 2020, of a total number of 40,609 projects with databases. Our results confirm the emergence of hybrid data-intensive systems as we found (multi-) database models (e.g., relational and non-relational) used together in 16% of all database-dependent projects. One percent of the systems added, deleted, or changed a database during the four years. The majority (62%) of these systems had a single database before becoming hybrid, and another significant part (19%) became “mono-database” after initially using multiple databases. We examine the evolution of these systems to understand the rationale of the design choices of the developers. Our study aims to guide future research towards new challenges posed by those emerging data management architectures.

KW - Data models

KW - Open-source projects

KW - Empirical study

UR - http://www.scopus.com/inward/record.url?scp=85118179973&partnerID=8YFLogxK

U2 - https://doi.org/10.1007/978-3-030-89022-3_8

DO - https://doi.org/10.1007/978-3-030-89022-3_8

M3 - Conference contribution

SN - 978-3-030-89021-6

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 87

EP - 101

BT - Conceptual Modeling - 40th International Conference, ER 2021, Proceedings

A2 - Ghose, Aditya

A2 - Horkoff, Jennifer

A2 - Silva Souza, Vítor E.

A2 - Parsons, Jeffrey

A2 - Evermann, Joerg

PB - Springer

ER -

Benats P , Gobert M , Meurice L, Nagy C, Cleve A. An Empirical Study of (Multi-) Database Models in Open-Source Projects. In Ghose A, Horkoff J, Silva Souza VE, Parsons J, Evermann J, editors, Conceptual Modeling - 40th International Conference, ER 2021, Proceedings: 40th International Conference, ER 2021, Virtual Event, October 18–21, 2021, Proceedings. Springer. 2021. p. 87-101. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: https://doi.org/10.1007/978-3-030-89022-3_8

An Empirical Study of (Multi-) Database Models in Open-Source Projects

Abstract

Publication series

Keywords

Access to Document

Other files and links

Fingerprint

Cite this