Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging bug in bmerge #140

Open
andrew-slater opened this issue Apr 30, 2020 · 2 comments
Open

Logging bug in bmerge #140

andrew-slater opened this issue Apr 30, 2020 · 2 comments

Comments

@andrew-slater
Copy link

See two small datasets attached. I'm running merge-mode 6 and would thus expect the ordering of the datasets not to matter but I get different results as highlighted in bold

PLINK v1.90b6.17 64-bit (28 Apr 2020) www.cog-genomics.org/plink/1.9/
(C) 2005-2020 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to PlusFirst.log.
Options in effect:
--bfile Plus
--bmerge Normed.bed Normed.bim Normed.fam
--merge-mode 6
--out PlusFirst
16340 MB RAM detected; reserving 8170 MB for main workspace.
4 people loaded from Plus.fam.
4 people to be merged from Normed.fam.
Of these, 0 are new, while 4 are present in the base dataset.
Warning: Multiple positions seen for variant 'MNV'.
Warning: Multiple chromosomes seen for variant 'PAR-X'.
30 markers loaded from Plus.bim.
29 markers to be merged from Normed.bim.
Of these, 0 are new, while 29 are present in the base dataset.
Warning: Variants '0monomorphic' and '00missing' have the same position.
Warning: Variants '2:103037578' and '0monomorphic' have the same position.
Warning: Variants '2:103037578:G:T' and '2:103037578' have the same position.
24 more same-position warnings: see log file.
Performing 1-pass diff (mode 6), writing results to PlusFirst.diff .
116 overlapping calls, 84 nonmissing in both filesets.
76 concordant, for a concordance rate of 0.904762.

PLINK v1.90b6.17 64-bit (28 Apr 2020) www.cog-genomics.org/plink/1.9/
(C) 2005-2020 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to NormedFirst.log.
Options in effect:
--bfile Normed
--bmerge Plus.bed Plus.bim Plus.fam
--merge-mode 6
--out NormedFirst
16340 MB RAM detected; reserving 8170 MB for main workspace.
4 people loaded from Normed.fam.
4 people to be merged from Plus.fam.
Of these, 0 are new, while 4 are present in the base dataset.
Warning: Multiple positions seen for variant 'MNV'.
Warning: Multiple chromosomes seen for variant 'PAR-X'.
27 markers loaded from Normed.bim.
30 markers to be merged from Plus.bim.
Of these, 3 are new, while 27 are present in the base dataset.
Warning: Variants '0monomorphic' and '00missing' have the same position.
Warning: Variants '2:103037578' and '0monomorphic' have the same position.
Warning: Variants '2:103037578:G:T' and '2:103037578' have the same position.
23 more same-position warnings: see log file.
Performing 1-pass diff (mode 6), writing results to NormedFirst.diff .
108 overlapping calls, 78 nonmissing in both filesets.
70 concordant, for a concordance rate of 0.897436.

@chrchang
Copy link
Owner

chrchang commented May 1, 2020

Thanks for reporting this.

--bmerge is not symmetric; the first appearance of a variant has priority when there are duplicates, and I can't change this behavior without breaking backward compatibility. With that said, if you use PLINK 1.07 to execute the second command, it actually errors out instead of just reporting a different result due to the totally-duplicate .bim entries in Normed.bim. I will think about making PLINK 1.9 error out as well in the second case.

@andrew-slater
Copy link
Author

Thanks for looking into this. I thought my data might have some weird edge case but I hadn't noticed the duplicate identifiers in Normed.bim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants