-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to link peaks to genes #1855
Comments
Have you encountered an issue where peaks and their linked genes are located on different chromosomes? @zgb963 |
Hello @timoast I was able to fix the warning when running the RegionStats function by changing the seqlevel/chromosome level naming convention to UCSC style to match the UCSC style of the annotation stored in the sample seurat object.
However, I'm still unable to run the `LinkPeaks' function without getting the same error
|
Hi, could you install from the develop branch and see if you still have this issue? |
Hi @timoast thanks for your message. I uninstalled the current release and installed the development version of Signac and re-ran Link Peaks, but unfortunately it still didn't work. Please see below. The data in all of the sample objects are numeric so I don't understand why I keep getting this error when linking peaks?
First compute the GC content for each peak (worked without issues or warnings)
Link peaks to genes (OSTN) (got same error)
Link peaks to genes (GAPDH)
Link peaks to genes (FGR)
Link peaks to genes (omit genes.use)
Is data in SCT assay numberic? Yes
Is data in ATAC assay numeric? Yes
Check if genes tested with LinkPeaks are in SCT assay/ATAC assay
|
I think the issue is the GC content metadata for some peaks seems to contain non-numeric data (likely NA). Can you show the output of these lines: md <- liftoff_1_MI5_V1_SO_filtered[['ATAC']][[]]
head(md)
md[is.na(md$GC.percent), ] |
It looks like you have NA for all the peak sequence metadata. Maybe this was stored incorrectly when you ran |
Hi @timoast thanks for your message. Below I've put everything I've run and the order I've run them from the beginning of the tutorial up until linking peaks to genes. Everything else runs great up until that point STEP 1 read in count matrices & atac fragments path
STEP 2 create seurat object
STEP 3 add macfas6 gencodev47 liftoff gtf so TSS.enrichment score can be calculated IMPORTANT: chromosomes in liftoff gtf aren't listed as 'chr1', 'chr2', 'chr3' in UCSC style or '1', '2', '3' in NCBI style. They are listed as genbank seq accession numbers 'CM021939.1' for chrom 1, 'CM021940.1' for chrom 2, 'CM021941.1' for chrom 3, etc (has chromosomes 1-20, chromosome X, then the rest are 'Un' chromosomes)
STEP 4 calculate nucleosome signal
STEP 5 Make QC violin plots
STEP 6 make density scatter plot
STEP 7 subset/filter based on tutorial cutoffs
STEP 8 normalize the gene expression data using SCTransform, and reduce the dimensionality using PCA.
STEP 9 process the DNA accessibility assay the same way we would process a scATAC-seq dataset, by performing latent semantic indexing (LSI)
STEP 10 make snRNAseq, snATACseq, & snRNAseq + ATACseq UMAPs
STEP 11 Linking peaks to genes (where error occurs)
STEP 12 check sequence level style
STEP 13 check GC content metadata (still see NA's)
STEP 14 Print session info
|
Hi @timoast I tried changing the chromosome names in BSgenome.Mfascicularis.NCBI.6.0 (has NCBI naming style of chromosomes "1", "2", "3", etc) to match the ones that are in the GTF (chromosomes listed as genbank accession numbers 'CM021939.1' for chrom 1, 'CM021940.1' for chrom 2, 'CM021941.1' for chrom 3, etc) that I added to the seurat object genome parameter for
So I read in my indexed fasta file, and the chromosome names match exactly with the GTF in STEP 3 above. I still got an error for linking peaks to genes
|
Hello,
I've been following the Joint RNA and ATAC analysis: 10x multiomic tutorial on the Stuart Lab website to process output from the 10X Genomics Cellranger ARC pipeline. I've been trying to run the following steps in the 'Link peaks to genes' section for one of my sample Seurat objects and for one or two genes, but I keep getting errors and I'm not sure how to fix them in order to make coverage plots. Any insight on how to fix this would be appreciated, thanks.
I then tried to make a coverage plot with just one gene, but I also got an error.
When I set genes.use to NULL to determine genes from expression assay, I get a similar error
Below is my R session info
The text was updated successfully, but these errors were encountered: