AbstractIn the last years, advances in the understanding of the 3D structure of the
genome has revealed that the way genes are packed into 3D structural units depends on which genes are active or inactive. Among the structures, chromatin
loops have an important function in the regulation of gene expression. Chromatin loop prediction is useful for studying DNA damage occurring during cell metabolism or caused by external harmful agents, incorrect replication of DNA may result in malignant growth of cellular tissues.
In this study, several Machine Learning algorithms are built for predicting
chromatin loops in two human cancerous cell lines, cell K562 and cell GM12878,
the data relative to these cells contain 23 factors, mostly proteins. Some of the
models trained with the GM12878 dataset are also tested on the K562 data.
The best scores are obtained by the XGBoost models; the GM12878 XGBoost
model is also capable of predicting K562 loops with an acceptable accuracy.
The factors are analyzed to verify that the important features in determining
the loops are indeed localized around the anchors.
|Date of Award||Jun 2019|
|Supervisor||Wim VANHOOF (Supervisor)|
- machine learning
- topologically associated domains