Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: malformed BED entry at line xxx. Start Coordinate detected that is < 0. #48

Open
minw2828 opened this issue Sep 11, 2024 · 4 comments

Comments

@minw2828
Copy link

Hello,

Thank you for developing the tool.

I can see my issue is similar to #20, but I don't have patch sequences in my reference genome.

Could you advise what other reason might have caused this error please?

My error message:

raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError:
Command was:
bedtools sort -i straglr_tmp/tmpplaw7wyh.bed
Error message was:
Error: malformed BED entry at line 84281. Start Coordinate detected that is < 0. Exiting.

My reference genome only has chromosomes 1 to 22, X, Y and M.

Many thanks,
Min

@readmanchiu
Copy link
Collaborator

I guess you may first want to check if the alignment bam and the genome fasta you provided for Straglr both used the same chromosome name convention - without the "chr" prefix.
Can you show me the full command? and which version you were using?
And if you specify --tmpdir to a specific directory and run with --debug, we can locate the "malformed" line in the BED file based on the error message.

@minw2828
Copy link
Author

minw2828 commented Sep 15, 2024

Hello @readmanchiu,

Thank you for your quick response.

I split the genome into different chunks that were named ~{region_bed}, so straglr could process them concurrently.

The command that I ran was:

  python /usr/local/bin/straglr.py \
    --regions ~{region_bed} \
    --min_ins_size 3 \
    --nprocs ~{threads} \
    --tmpdir ~{region_name + "_" + pname + "_straglr_tmp"} \
    ~{bam} ~{ref_fasta} ~{region_name + "_" + pname + "_straglr"}

The same command was passed through five individuals. Of those, straglr ran through two individuals successfully, but the remaining three individuals hit the same error:

The first individual:

Error: malformed BED entry at line 59197. Start Coordinate detected that is < 0. Exiting.

The second individual:

Error: malformed BED entry at line 9899. Start Coordinate detected that is < 0. Exiting.

The third individual:

Error: malformed BED entry at line 58727. Start Coordinate detected that is < 0. Exiting.

Hence, the error was not caused by different chromosome name conventions.

I am thinking of two possible causes:

  1. Insufficient memory which usually threw out odd errors.
  2. The repeats being genotyped might have a start or end coordinate that is beyond the definition of the chromosomes.

Would reason 2 be possible?

I am keen to hear your thoughts on this.

Many thanks,
Min

@ljohansson
Copy link

I was wondering what are the respective lines of the different bed files?

@readmanchiu
Copy link
Collaborator

--min_ins_size of 3 is a bit too much. Just a reminder that the unit for --min_ins_size is bp, not copy number. I think some insertions are picked up near the end of chromsomes so negative coordinates are generated when flank sizes are taken into account.
I usually used 100 for --min_ins_size as ONT reads can be quite noisy.
Also I usually skip centromeres or long repeat/segdups (which can be curated from UCSC annotation tracks) in genome scans by passing the coordinates to --exclude

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants