Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad assumption about file name and chromosome name #140

Closed
mbhall88 opened this issue Nov 15, 2021 · 1 comment
Closed

Bad assumption about file name and chromosome name #140

mbhall88 opened this issue Nov 15, 2021 · 1 comment

Comments

@mbhall88
Copy link
Member

reference_set_name = ".".join(os.path.basename(
args.reference_set).split(".")[:-1])
try:
reference_set = ReferenceSet.objects.get(name=reference_set_name)
except DoesNotExist:
reference_set = ReferenceSet.create_and_save(name=reference_set_name)
# Hack

⚠️ any time you see # Hack you know there are good times ahead.

When we add variants to the mongoDB database with mykrobe variants add, these lines show that there is an assumption that the file name prefix is the same as the name of the chromosome in said file.

For example, I have a file called h37rv.fa, reference_set_name gets set as h37rv. However, the chromosome name in that file is NC_000962.3. So later on, this command fails with

KeyError: 'Reference NC_000962.3 cannot be found in reference set 6191efc47f6ea7585aa56abd (h37rv). Please add it to the database.'

The simple thing to do to fix this would be to extract the chromosome name from the reference file, but there is an assumption there that there will only be one chromosome. I suspect this is fine, but just wanted to run it by you @martinghunt and @iqbal-lab.

I can add this fix to #138

@martinghunt
Copy link
Member

I think this is ok. AFAIK many places in the code assume one chromosome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants