Single-cell data clustering on Low-dimensional features#

This supplementary data accompanies the research conducted by Park in 2022 on the importance of data transformation in single-cell RNA sequencing (scRNA-seq) data integration [PH22]. In this supplementary material, we focus on the practice of single-cell RNA sequencing data integration through low-dimensional embedding techniques such as Principal Component Analysis (PCA), t-distributed Stochastic Neighbor Embedding (t-SNE) [VdMH08], and Uniform Manifold Approximation and Projection (UMAP) [MHM18].

Dimensionality reduction techniques play a pivotal role in simplifying complex datasets by eliminating noise while preserving essential information. These techniques facilitate the discovery of hidden patterns and structures, enhancing the manageability of data analysis. However, it is crucial to exercise caution, as their misuse can lead to misinterpretations of results in practical applications.

It is worth noting that the use of low-dimensional (2D) embedding for single-cell analysis is not a conventional approach, and it has sparked controversy due to concerns about potential data distortion. Interested readers are encouraged to refer to related papers for a comprehensive understanding of this issue [CP23].

The primary objective of this work is to underscore the significance of data transformation in the context of low-dimensional embedding in single-cell data analysis. This observation aligns with findings reported by [DJorgensenN+23]. Moreover, our study reveals unanticipated potential applications in the batch integration of single-cell clustering analysis.

Within this supplementary material, we provide visualizations and analyses of the results. Specifically, we demonstrate Louvain clustering applied to high-dimensional raw data and DBSCAN clustering applied to low-dimensional data. To facilitate a comprehensive comparison, we also perform conventional single-cell RNA sequencing data clustering analysis using the Scnapy library.

In summary, this supplementary data contributes to our understanding of the role of data transformation, normalization, in low-dimensional embedding for single-cell RNA sequencing data analysis.

Dataset description#

Human Pancreas : GSE84133, GSE85241, E-MTAB-5061, GSE81608, and GSE8313

Mouse Pancreas : GSE84133

Tabular Muris [SKN+18]: https://tabula-muris.ds.czbiohub.org/

Mouse Cell Atlas [HWZ+18] : https://bis.zju.edu.cn/MCA/

Contents#

References#

[CP23]

Tara Chari and Lior Pachter. The specious art of single-cell genomics. PLOS Computational Biology, 19(8):e1011288, 2023.

[DJorgensenN+23]

Andrew Draganov, Jakob Rødsgaard Jørgensen, Katrine Scheel Nellemann, Davide Mottin, Ira Assent, Tyrus Berry, and Cigdem Aslay. Actup: analyzing and consolidating tsne and umap. arXiv preprint arXiv:2305.07320, 2023.

[HWZ+18]

Xiaoping Han, Renying Wang, Yincong Zhou, Lijiang Fei, Huiyu Sun, Shujing Lai, Assieh Saadatpour, Ziming Zhou, Haide Chen, Fang Ye, and others. Mapping the mouse cell atlas by microwell-seq. Cell, 172(5):1091–1107, 2018.

[MHM18]

Leland McInnes, John Healy, and James Melville. Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.

[PH22]

Youngjun Park and Anne-Christin Hauschild. On the importance of data transformation for data integration in single-cell rna sequencing analysis. bioRxiv, pages 2022–07, 2022.

[SKN+18]

Nicholas Schaum, Jim Karkanias, Norma F Neff, Andrew P May, Stephen R Quake, Tony Wyss-Coray, Spyros Darmanis, Joshua Batson, Olga Botvinnik, Michelle B Chen, and others. Single-cell transcriptomics of 20 mouse organs creates a tabula muris: the tabula muris consortium. Nature, 562(7727):367, 2018.

[VdMH08]

Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 2008.