Bad assumption about file name and chromosome name #140

mbhall88 · 2021-11-15T05:47:27Z

mykrobe/src/mykrobe/cmds/variants/add.py

Lines 56 to 62 in 6a7e7f6

    
           reference_set_name = ".".join(os.path.basename( 
        
               args.reference_set).split(".")[:-1]) 
        
           try: 
        
               reference_set = ReferenceSet.objects.get(name=reference_set_name) 
        
           except DoesNotExist: 
        
               reference_set = ReferenceSet.create_and_save(name=reference_set_name) 
        
               # Hack

⚠️ any time you see # Hack you know there are good times ahead.

When we add variants to the mongoDB database with mykrobe variants add, these lines show that there is an assumption that the file name prefix is the same as the name of the chromosome in said file.

For example, I have a file called h37rv.fa, reference_set_name gets set as h37rv. However, the chromosome name in that file is NC_000962.3. So later on, this command fails with

KeyError: 'Reference NC_000962.3 cannot be found in reference set 6191efc47f6ea7585aa56abd (h37rv). Please add it to the database.'

The simple thing to do to fix this would be to extract the chromosome name from the reference file, but there is an assumption there that there will only be one chromosome. I suspect this is fine, but just wanted to run it by you @martinghunt and @iqbal-lab.

I can add this fix to #138

The text was updated successfully, but these errors were encountered:

martinghunt · 2021-11-15T09:28:37Z

I think this is ok. AFAIK many places in the code assume one chromosome.

mbhall88 mentioned this issue Nov 16, 2021

add stop codon to table and other small fixes #138

Merged

mbhall88 closed this as completed Dec 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad assumption about file name and chromosome name #140

Bad assumption about file name and chromosome name #140

mbhall88 commented Nov 15, 2021

martinghunt commented Nov 15, 2021

Bad assumption about file name and chromosome name #140

Bad assumption about file name and chromosome name #140

Comments

mbhall88 commented Nov 15, 2021

martinghunt commented Nov 15, 2021