Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] FSDP example #77

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

[WIP] FSDP example #77

wants to merge 2 commits into from

Conversation

mreso
Copy link
Contributor

@mreso mreso commented Jan 22, 2025

Start light house

RUST_BACKTRACE=1 torchft_lighthouse --min_replicas 1 --quorum_tick_ms 100 --join_timeout_ms 1000

Start worker 0

REPLICA_GROUP_ID=0 CUDA_VISIBLE_DEVICES=2,3 TORCHFT_MANAGER_PORT=29512 TORCHFT_LIGHTHOUSE=http://localhost:29510 torchrun --nnodes 1 --nproc-per-node 2 train_fsdp.py

Start worker1:

REPLICA_GROUP_ID=1 CUDA_VISIBLE_DEVICES=6,7 TORCHFT_MANAGER_PORT=29513 TORCHFT_LIGHTHOUSE=http://localhost:29510/ torchrun --nnodes 1 --nproc-per-node 2 --rdzv-endpoint=localhost:29400 train_fsdp.py

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants