Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Non-concatenating --pmerge[-list] is under development. #232

Open
jacorvar opened this issue Jan 23, 2023 · 9 comments
Open

Error: Non-concatenating --pmerge[-list] is under development. #232

jacorvar opened this issue Jan 23, 2023 · 9 comments

Comments

@jacorvar
Copy link

Hi,

running Plink (v2.00a4LM AVX2 Intel) errors out when merging multiple datasets.

$ ../plink2 --debug --memory 8000 --threads 6 --pmerge-list input_sources.txt --out merged
PLINK v2.00a4LM AVX2 Intel (9 Jan 2023)        www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to merged.log.
Options in effect:
  --debug
  --memory 8000
  --out merged
  --pmerge-list input_sources.txt
  --threads 6

Start time: Mon Jan 23 17:06:52 2023
385417 MiB RAM detected; reserving 8000 MiB for main workspace.
Using up to 6 compute threads.
--pmerge-list: 2 filesets specified.
--pmerge-list: 2 samples present.
--pmerge-list: Merged .psam written to merged.psam .
--pmerge-list: 2 .pvar files scanned, headers merged.
Error: Non-concatenating --pmerge[-list] is under development.

Contents of input_sources.txt:

$ cat input_sources.txt 
test3
test4

test3 and test4 have been generated from VCF files:

$ plink2 --vcf ../3.vcf.gz --out test3 --make-pgen
$ plink2 --vcf ../4.vcf.gz --out test4 --make-pgen

I'm a newbie with Plink and suspect I'm doing something wrong but after some digging I've found no clue.

System specs: CentOS 7.9, Intel(R) Xeon(R) Silver 4210R

@chrchang
Copy link
Owner

The error message means exactly what it says: this feature isn't implemented in plink2 yet. ("Concatenating" merge refers to the "bcftools concat" use case, though plink2's behavior differs a bit from bcftools's here.) Use e.g. bcftools or plink 1.9 to merge for now.

@myz540
Copy link

myz540 commented May 9, 2023

The error message means exactly what it says: this feature isn't implemented in plink2 yet. ("Concatenating" merge refers to the "bcftools concat" use case, though plink2's behavior differs a bit from bcftools's here.) Use e.g. bcftools or plink 1.9 to merge for now.

Are you sure? as of march 13th, we were able to use plink2 to concat data sets.

Here is a log of a working example:

PLINK v2.00a3.7LM AVX2 Intel (24 Oct 2022)
Options in effect:
  --out ukb24068_c5_merged_sample_filtered
  --pfile ukb24068_c5_b1_merged_sample_filtered
  --pmerge-list chr5_list

Hostname: 80b217465abd
Working directory: /home/ubuntu/exome_pgen
Start time: Mon Mar 13 15:49:53 2023

Random number seed: 1678722593
63628 MiB RAM detected; reserving 31814 MiB for main workspace.
Using up to 16 threads (change this with --threads).
--pmerge-list: 19 filesets specified (including main fileset).
--pmerge-list: 422625 samples present.
--pmerge-list: Merged .psam written to ukb24068_c5_merged_sample_filtered.psam
.
--pmerge-list: 19 .pvar files scanned, headers merged.
Concatenation job detected.
Concatenating... 747813/747813 variants complete.
Results written to ukb24068_c5_merged_sample_filtered.pgen +
ukb24068_c5_merged_sample_filtered.pvar .

End time: Mon Mar 13 15:51:11 2023

However, we see this same error for 2 of our chromosomes, not sure why yet. Same code is run in a loop, the pvar and psam files are made, but the pgen file is not produced. Any ideas?

PLINK v2.00a3.7LM AVX2 Intel (24 Oct 2022)
Options in effect:
  --out ukb24068_c8_merged_sample_filtered
  --pfile ukb24068_c8_b1_merged_sample_filtered
  --pmerge-list chr8_list

Hostname: 80b217465abd
Working directory: /home/ubuntu/exome_pgen
Start time: Mon Mar 13 15:54:05 2023

Random number seed: 1678722845
63628 MiB RAM detected; reserving 31814 MiB for main workspace.
Using up to 16 threads (change this with --threads).
--pmerge-list: 15 filesets specified (including main fileset).
--pmerge-list: 422625 samples present.
--pmerge-list: Merged .psam written to ukb24068_c8_merged_sample_filtered.psam
.
--pmerge-list: 15 .pvar files scanned, headers merged.
Error: Non-concatenating --pmerge-list is under development.

End time: Mon Mar 13 15:54:10 2023

@gulumk for visibility

@chrchang
Copy link
Owner

chrchang commented May 9, 2023

When two variants share a position, --pmerge-list uses the --sort-vars setting (https://www.cog-genomics.org/plink/2.0/data#sort_vars ) to determine their output order. In particular, if the end of one .pvar and the beginning of the next have variants at the same position, and their IDs are in the wrong order, --pmerge-list can no longer "concatenate".

I will update the online documentation today to spell this out.

@myz540
Copy link

myz540 commented May 9, 2023

When two variants share a position, --pmerge-list uses the --sort-vars setting (https://www.cog-genomics.org/plink/2.0/data#sort_vars ) to determine their output order. In particular, if the end of one .pvar and the beginning of the next have variants at the same position, and their IDs are in the wrong order, --pmerge-list can no longer "concatenate".

I will update the online documentation today to spell this out.

I see, thank you for the quick reply. Would you say that inspecting the heads and tails of the pvar files is a good place to start? Is this issue strictly due to the pvar file or could issues in the pgen file throw this error as well?

@chrchang
Copy link
Owner

chrchang commented May 9, 2023

  1. Yes; if you don't want to resort to exporting to BCF and using "bcftools concat", one option is temporarily editing the offending leading/trailing variant IDs so that they no longer violate --sort-vars order.
  2. No, pgen file contents can't cause this.

@myz540
Copy link

myz540 commented May 9, 2023

Thanks @chrchang , we were able to resolve our issue

AMCalejandro added a commit to michael-ta/longitudinal-GWAS-pipeline that referenced this issue Aug 3, 2023
We have seen plink2 failing to concatenate multiple chinks coming after liftover operation. \n We found he issue to be caused efter lifting over in splits within chromosomes. \n We found some positions hg19->hg39 SNP positions to fall very far from the neighbour SNPs. This causes some variants to break the incresing position ordering per chromosome, in which case, some variants should belong to other splits. \n To fix this we are performing the merge relying on plink1.9 merge-list function. \n To see more about the issue -> chrchang/plink-ng#232
@123huynguyen
Copy link

123huynguyen commented Sep 20, 2023

@myz540 Hi Mike, would you mind providing me with your codes to address this issue since I got the same issue as yours? I really look forward to receiving your help.

@vicentepese
Copy link

Are there any updates on this?

@myz540
Copy link

myz540 commented Aug 14, 2024

@myz540 Hi Mike, would you mind providing me with your codes to address this issue since I got the same issue as yours? I really look forward to receiving your help.

Hey @123huynguyen, I would love to help but this was at an old job so I no longer have access to the code base or the context required to provide you a solution. I believe the issue was in the sorting, when we inspected the pvar file head and tail, we saw that the chunks weren't sorted correctly. I can't be 100% that was the issue given how long it's been but hope this helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants