You can specify the trainer:
def train_func(config):
results = []
for i in range(config["num_epochs"]):
results.append(i)
return results
def main(backend):
trainer = Trainer(backend=backend, num_workers=2)
trainer.start()
print(trainer.run(train_func, config={"num_epochs": 2}))
# [[0, 1], [0, 1]]
print(trainer.run(train_func, config={"num_epochs": 5}))
# [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]
trainer.shutdown()
Train with tensorflow:
poetry run python ray/05-train/simple_example.py -b tensorflow
Train with pytorch:
poetry run python ray/05-train/simple_example.py -b pytorch
MultiWorkerMirroredStrategy
: All workers train over different slices of input data in sync, and aggregating gradients at each step
The script tensorflow_example.py
is based on Multi-worker training with Keras
smoke test (with in-process cluster):
python tensorflow_example.py --smoke-test
Run (You need to specify a cluster address by --address
):
python tensorflow_example.py
-
Create a Ray cluster Cluster When we create a cluster, we need to add
pip install tensorflow
insetup_command
. -
Submit a job.
cd ../03-cluster ray submit aws-config.docker.yaml ../05-train/tensorflow_example.py