Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run colbert indexing in multi-GPUs #54

Open
cramraj8 opened this issue Oct 21, 2022 · 2 comments
Open

How to run colbert indexing in multi-GPUs #54

cramraj8 opened this issue Oct 21, 2022 · 2 comments
Labels
good first issue Good for newcomers

Comments

@cramraj8
Copy link

I tried to run colbert indexing on trec-deep-learning-passages, and my environment is available with 4 GPUs. But when I call the APIs for indexing such as below it only utilizing 1 GPU.

from pyterrier_colbert.indexing import ColBERTIndexer
indexer = ColBERTIndexer(checkpoint, "/path/to/index", "index_name", ids=True)
indexer.index(dataset.get_corpus_iter())

How can I leverage all 4 GPUs with pyterrier API ?

@cmacdonald
Copy link
Collaborator

Good question @cramraj8 !
(cc @seanmacavaney)

Our setup is driven by Docker environments with primarily single GPU access. I know the underlying Colbert codebase can do distributed indexing, but we haven't integrated that.

Can you print the contents of from colbert.parameters import DEVICE and tell us which device is associated with indexer.colbert?

The relevant actual model encoding line is https://github.com/cmacdonald/ColBERT/blob/v0.2/colbert/modeling/inference.py#L30

I wonder if you can wrap indexer.colbert in torch.nn.DataParallel, as per https://www.run.ai/guides/multi-gpu/pytorch-multi-gpu-4-techniques-explained#Technique-1 and https://stackoverflow.com/a/64825728. Probably you can increase the value of indexer.args.bsize if you do?

Please let us know how you get on.

Craig

@cmacdonald cmacdonald added the good first issue Good for newcomers label Oct 21, 2022
@cmacdonald
Copy link
Collaborator

any update @cramraj8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants