Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

Commit

Permalink
v1.4, rm unused code and codepaths
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 179822701
  • Loading branch information
Ryan Sepassi committed Dec 21, 2017
1 parent 4354f3b commit bac1321
Show file tree
Hide file tree
Showing 27 changed files with 723 additions and 1,898 deletions.
4 changes: 2 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ env:
- T2T_DATA_DIR=/tmp/t2t-data
- T2T_TRAIN_DIR=/tmp/t2t-train
script:
- pytest --ignore=tensor2tensor/utils/registry_test.py --ignore=tensor2tensor/utils/trainer_utils_test.py --ignore=tensor2tensor/problems_test.py --ignore=tensor2tensor/tpu/tpu_trainer_lib_test.py
- pytest --ignore=tensor2tensor/utils/registry_test.py --ignore=tensor2tensor/problems_test.py --ignore=tensor2tensor/tpu/tpu_trainer_lib_test.py
- pytest tensor2tensor/utils/registry_test.py
- pytest tensor2tensor/utils/trainer_utils_test.py
- pytest tensor2tensor/tpu/tpu_trainer_lib_test.py
- t2t-datagen 2>&1 | grep translate && echo passed
- python -c "from tensor2tensor.models import transformer; print(transformer.Transformer.__name__)"
- t2t-trainer --registry_help
Expand Down
48 changes: 28 additions & 20 deletions docs/cloud_tpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,19 @@
Tensor2Tensor supports running on Google Cloud Platforms TPUs, chips specialized
for ML training.

Not all models are supported but we've tested so far with Transformer (sequence
model) as well as Xception (image model).
Models and hparams that are known to work on TPU:
* `transformer` with `transformer_tpu`
* `xception` with `xception_base`
* `resnet50` with `resnet_base`

To run on TPUs, you need to be part of the alpha program; if you're not, these
commands won't work for you currently, but access will expand soon, so get
excited for your future ML supercomputers in the cloud.

## Tutorial: Transformer En-De translation on TPU

Update `gcloud`: `gcloud components update`

Set your default zone to a TPU-enabled zone. TPU machines are only available in
certain zones for now.
```
Expand Down Expand Up @@ -40,29 +44,32 @@ gcloud alpha compute tpus create \
To see all TPU instances running: `gcloud alpha compute tpus list`. The
`TPU_IP` should be unique amongst the list and follow the format `10.240.i.2`.

Generate data to GCS
If you already have the data locally, use `gsutil cp` to cp to GCS.
SSH in with port forwarding for TensorBoard
```
DATA_DIR=gs://my-bucket/t2t/data/
t2t-datagen --problem=translate_ende_wmt8k --data_dir=$DATA_DIR
gcloud compute ssh $USER-vm -- -L 6006:localhost:6006
```

SSH in with port forwarding for TensorBoard
Now that you're on the cloud instance, install T2T:
```
gcloud compute ssh $USER-vm -L 6006:localhost:6006
pip install tensor2tensor --user
# If your python bin dir isn't already in your path
export PATH=$HOME/.local/bin:$PATH
```

Now that you're on the cloud instance, install T2T:
Generate data to GCS
If you already have the data, use `gsutil cp` to copy to GCS.
```
pip install tensor2tensor
GCS_BUCKET=gs://my-bucket
DATA_DIR=$GCS_BUCKET/t2t/data/
t2t-datagen --problem=translate_ende_wmt8k --data_dir=$DATA_DIR
```

Setup some vars used below. `TPU_IP` and `DATA_DIR` should be the same as what
was used above. Note that the `DATA_DIR` and `OUT_DIR` must be GCS buckets.
```
TPU_IP=<IP of TPU machine>
DATA_DIR=gs://my-bucket/t2t/data/
OUT_DIR=gs://my-bucket/t2t/training/
DATA_DIR=$GCS_BUCKET/t2t/data/
OUT_DIR=$GCS_BUCKET/t2t/training/
TPU_MASTER=grpc://$TPU_IP:8470
```

Expand All @@ -73,25 +80,26 @@ tensorboard --logdir=$OUT_DIR > /tmp/tensorboard_logs.txt 2>&1 &

Train and evaluate.
```
t2t-tpu-trainer \
--master=$TPU_MASTER \
--data_dir=$DATA_DIR \
--output_dir=$OUT_DIR \
--problems=translate_ende_wmt8k \
t2t-trainer \
--model=transformer \
--hparams_set=transformer_tiny_tpu \
--hparams_set=transformer_tpu \
--problems=translate_ende_wmt8k \
--train_steps=10 \
--eval_steps=10 \
--local_eval_frequency=10 \
--iterations_per_loop=10
--iterations_per_loop=10 \
--master=$TPU_MASTER \
--use_tpu=True \
--data_dir=$DATA_DIR \
--output_dir=$OUT_DIR
```

The above command will train for 10 steps, then evaluate for 10 steps. You can
(and should) increase the number of total training steps with the
`--train_steps` flag. Evaluation will happen every `--local_eval_frequency`
steps, each time for `--eval_steps`. When you increase then number of training
steps, also increase `--iterations_per_loop`, which controls how frequently the
TPU machine returns control to the Python code (1000 seems like a fine number).
TPU machine returns control to the host machine (1000 seems like a fine number).

Back on your local machine, open your browser and navigate to `localhost:6006`
for TensorBoard.
Expand Down
197 changes: 0 additions & 197 deletions docs/example_life.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,6 @@ documentation, from basic tutorials to full code documentation.

## Deep Dive

* [Life of an Example](example_life.md): how all parts of T2T are connected and
* [System Overview](overview.md): how all parts of T2T are connected and
work together
* [Distributed Training](distributed_training.md)
Loading

0 comments on commit bac1321

Please sign in to comment.