Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chromosomes not contiguous #55

Open
adbeggs opened this issue Jan 20, 2025 · 1 comment
Open

Chromosomes not contiguous #55

adbeggs opened this issue Jan 20, 2025 · 1 comment

Comments

@adbeggs
Copy link

adbeggs commented Jan 20, 2025

Very sorry - more errors on build:

Downloading bgdata datasets
bgdata get datasets/genomereference/hg19
2025-01-20 09:47:54 bgdata.manager INFO -- Tag "master" for "datasets/genomereference/hg19" resolved as 20190201
2025-01-20 09:47:54 bgdata INFO -- Dataset downloaded
../datasets//bgdata/datasets/genomereference/hg19-20190201
bgdata get datasets/genomereference/hg38
2025-01-20 09:47:54 bgdata.manager INFO -- Tag "master" for "datasets/genomereference/hg38" resolved as 20161209
2025-01-20 09:47:54 bgdata INFO -- Dataset downloaded
../datasets//bgdata/datasets/genomereference/hg38-20161209
bgdata get intogen/coverage/hg19
2025-01-20 09:47:55 bgdata.manager INFO -- Tag "master" for "intogen/coverage/hg19" resolved as 20191209
2025-01-20 09:47:55 bgdata INFO -- Dataset downloaded
../datasets//bgdata/intogen/coverage/hg19-20191209/hg19_100bp.coverage.regions.gz
bgdata get intogen/coverage/hg38
2025-01-20 09:47:56 bgdata.manager INFO -- Tag "master" for "intogen/coverage/hg38" resolved as 20191209
2025-01-20 09:47:56 bgdata INFO -- Dataset downloaded
../datasets//bgdata/intogen/coverage/hg38-20191209/hg38_100bp.coverage.regions.gz
bgdata get intogen/dndscv/pan
2025-01-20 09:47:56 bgdata.manager INFO -- Tag "master" for "intogen/dndscv/pan" resolved as 20200818
2025-01-20 09:47:56 bgdata INFO -- Dataset downloaded
../datasets//bgdata/intogen/dndscv/pan-20200818/PCATLAS_WXS_PAN.out.gz
Creating phylop datasets
datasets/boostdm/hg38.download.sh ../datasets//boostdm
+ DEST=../datasets//boostdm
+ path_output=../datasets//boostdm/hg38.phyloP100way.bw
+ rsync -avz --progress rsync://hgdownload.cse.ucsc.edu/goldenPath/hg38/phyloP100way/hg38.phyloP100way.bw ../datasets//boostdm/hg38.phyloP100way.bw
receiving incremental file list

sent 24 bytes  received 67 bytes  20.22 bytes/sec
total size is 9,870,053,206  speedup is 108,462,123.14
tabix -f -s 1 -b 2 -e 2 ../datasets//vep/vep.tsv.gz
[E::hts_idx_push] Chromosome blocks not continuous
tbx_index_build failed: ../datasets//vep/vep.tsv.gz
make: *** [datasets/vep/vep.mk:25: ../datasets//vep/vep.tsv.gz.tbi] Error 1
@FedericaBrando
Copy link
Member

Hi @adbeggs.

Could you remove ../datasets//vep/vep.tsv.gz and rerun the step?

In order to build the vep.tsv.gz we run a multiprocess step that first split the run and then combines the outputs in one file. If on of the split processes fails the final file will end up with missing regions. Could you also provide the output of:

$ ls -lha vep.tsv.gz

and

$ zcat vep.tsv.gz | wc

the latest might take some time to process. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants