Integration of scRNA-seq Data: Can Target Cell Subpopulations Be Extracted First and Then Integrated? #9586
Replies: 2 comments 2 replies
-
Hi @kestlermai, Not member of dev team but hopefully can be helpful. First, as fyi, I will transfer this issue to discussions where it is better suited to get feedback on this type of thing (vs. errors, bugs etc in Issues). In my personal opinion yes this is totally fine as long as you can properly/accurately identify all cells of specific cell type prior to subsetting. There is no need to analyze any cells that are irrelevant to your analysis goal/question (e.g., when you have hypothesis about specific cell type as it appears you do). In fact integration as a whole should often be a follow on to analysis conducted with integration first. You may either not need integration period or not need it for the specific biological question you are interested in. Using any integrative analysis technique (Seurat, Harmony, LIGER, etc) when it is not required will likely on serve to obscure your data and potentially mask real biological differences. Best, |
Beta Was this translation helpful? Give feedback.
-
Hi kestlermai, Disclaimer: I'm not a bioinformatician but I've been diving into it myself, so anyone with more experience please feel free to correct me. As your experimental dataset gets larger I think the main obstacle you will run into is the computing power required to run all the comparisons. After that, there are certain parameters you probably want to tweak as your set gets larger (like the resolution parameter for clustering). A million cells is a lot! For your question regarding where to start, I don't think what you said is mutually exclusive. What I would do is extract the specific cell populations I'm interested in, and then default back to the raw counts for (re)normalization and integration. I would not use already scaled/transformed counts for the integration pipeline. |
Beta Was this translation helpful? Give feedback.
-
I'm working on scRNA-seq data and am trying to understand the best approach for data integration, especially when the cell count reaches the million level?Is it necessary to start the integration process from the original counts matrix, or can I first extract specific cell subpopulations from each dataset and then merge the batches? What are the recommendations for these approaches, and could using pre-processed cell subpopulations affect the quality of the integrated results?
Beta Was this translation helpful? Give feedback.
All reactions