TY - JOUR
T1 - Predicting User Preferences of Dimensionality Reduction Embedding Quality
AU - Morariu, Cristina
AU - Bibal, Adrien
AU - Cutura, Rene
AU - Frenay, Benoit
AU - Sedlmair, Michael
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2023/1/1
Y1 - 2023/1/1
N2 - A plethora of dimensionality reduction techniques have emerged over the past decades, leaving researchers and analysts with a wide variety of choices for reducing their data, all the more so given some techniques come with additional hyper-parametrization (e.g., t-SNE, UMAP, etc.). Recent studies are showing that people often use dimensionality reduction as a black-box regardless of the specific properties the method itself preserves. Hence, evaluating and comparing 2D embeddings is usually qualitatively decided, by setting embeddings side-by-side and letting human judgment decide which embedding is the best. In this work, we propose a quantitative way of evaluating embeddings, that nonetheless places human perception at the center. We run a comparative study, where we ask people to select 'good' and 'misleading' views between scatterplots of low-dimensional embeddings of image datasets, simulating the way people usually select embeddings. We use the study data as labels for a set of quality metrics for a supervised machine learning model whose purpose is to discover and quantify what exactly people are looking for when deciding between embeddings. With the model as a proxy for human judgments, we use it to rank embeddings on new datasets, explain why they are relevant, and quantify the degree of subjectivity when people select preferred embeddings.
AB - A plethora of dimensionality reduction techniques have emerged over the past decades, leaving researchers and analysts with a wide variety of choices for reducing their data, all the more so given some techniques come with additional hyper-parametrization (e.g., t-SNE, UMAP, etc.). Recent studies are showing that people often use dimensionality reduction as a black-box regardless of the specific properties the method itself preserves. Hence, evaluating and comparing 2D embeddings is usually qualitatively decided, by setting embeddings side-by-side and letting human judgment decide which embedding is the best. In this work, we propose a quantitative way of evaluating embeddings, that nonetheless places human perception at the center. We run a comparative study, where we ask people to select 'good' and 'misleading' views between scatterplots of low-dimensional embeddings of image datasets, simulating the way people usually select embeddings. We use the study data as labels for a set of quality metrics for a supervised machine learning model whose purpose is to discover and quantify what exactly people are looking for when deciding between embeddings. With the model as a proxy for human judgments, we use it to rank embeddings on new datasets, explain why they are relevant, and quantify the degree of subjectivity when people select preferred embeddings.
KW - Dimensionality reduction
KW - Human-centered computing
KW - Manifold learning
UR - http://www.scopus.com/inward/record.url?scp=85139500016&partnerID=8YFLogxK
U2 - 10.1109/TVCG.2022.3209449
DO - 10.1109/TVCG.2022.3209449
M3 - Article
C2 - 36166539
AN - SCOPUS:85139500016
SN - 1077-2626
VL - 29
SP - 745
EP - 755
JO - IEEE Transactions on Visualization and Computer Graphics
JF - IEEE Transactions on Visualization and Computer Graphics
IS - 1
ER -