Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
batch_request.py	batch_request.py
local_model.py	local_model.py
model_composition.drawio.svg	model_composition.drawio.svg
model_composition.py	model_composition.py
model_on_ray_serve.py	model_on_ray_serve.py
quickstart.py	quickstart.py
quickstart_extended.py	quickstart_extended.py
router_client.py	router_client.py

Ray Serve

1. Quickstart

Install ray[serve]
```
pip install "ray[serve]"
```

Prepare quickstart.py.

from time import sleep

import ray
from ray import serve

ray.init()  # 1. init cluster

serve.start()  # 2. serve start


@serve.deployment  # 3. define deployment
class Counter:
    def __init__(self):
        self.count = 0

    def __call__(self, request):
        self.count += 1
        return {"count": self.count}


Counter.deploy()  # 4. deploy
sleep(100)

Run

python quickstart.py

2022-05-22 06:46:24,611 INFO services.py:1456 -- View the Ray dashboard at http://127.0.0.1:8265
(ServeController pid=27612) 2022-05-22 06:46:29,494     INFO checkpoint_path.py:15 -- Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=27612) 2022-05-22 06:46:29,609     INFO http_state.py:106 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:DylMax:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
2022-05-22 06:46:30,915 INFO api.py:794 -- Started Serve instance in namespace '9c985508-1c83-47f5-a078-ca1faa3ed450'.
(HTTPProxyActor pid=27613) INFO:     Started server process [27613]
2022-05-22 06:46:30,934 INFO api.py:615 -- Updating deployment 'Counter'. component=serve deployment=Counter
(ServeController pid=27612) 2022-05-22 06:46:31,015     INFO deployment_state.py:1210 -- Adding 1 replicas to deployment 'Counter'. component=serve deployment=Counter
2022-05-22 06:46:35,004 INFO api.py:630 -- Deployment 'Counter' is ready at `http://127.0.0.1:8000/Counter`. component=serve deployment=Counter

Check on http://127.0.0.1:8000/Counter.

Get the counter:
```
curl -X GET localhost:8000/Counter/
{"count": 0}
```

You can also try an extended quickstart with quickstart_extended.py

2. End-to-End Tutorial

Summarize a long English text with transformers library.

Install libraries. Use https://huggingface.co/docs/transformers/index.
```
pip install transformers
pip install 'transformers[torch]'
```

Run locally.

python local_model.py

Downloading: 100%|█████████████████████████████████████████████████████████| 231M/231M [00:43<00:00, 5.61MB/s]
Downloading: 100%|██████████████████████████████████████████████████████████| 773k/773k [00:02<00:00, 298kB/s]
Downloading: 100%|████████████████████████████████████████████████████████| 1.32M/1.32M [00:07<00:00, 184kB/s]
/Users/nakamasato/.pyenv/versions/3.9.0/lib/python3.9/site-packages/transformers/models/t5/tokenization_t5_fast.py:155: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-small automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.
- To avoid this warning, please instantiate this tokenizer with `model_max_length` set to your preferred value.
  warnings.warn(
two astronauts steered their fragile lunar module safely and smoothly to the historic landing . the first men to reach the moon -- Armstrong and his co-pilot, col. Edwin E. Aldrin Jr. of the air force -- brought their ship to rest on a level, rock-strewn plain .

Deploy this model using Ray Serve

Create local ray cluster.
```
ray start --head
```
Run.
```
python model_on_ray_serve.py
```

Send request (either via HTTP or Python).

python router_client.py
----- HTTP START -----
two astronauts steered their fragile lunar module safely and smoothly to the historic landing . the first men to reach the moon -- Armstrong and his co-pilot, col. Edwin E. Aldrin Jr. of the air force -- brought their ship to rest on a level, rock-strewn plain .
----- HTTP END -----
----- ServeHandle START -----
two astronauts steered their fragile lunar module safely and smoothly to the historic landing . the first men to reach the moon -- Armstrong and his co-pilot, col. Edwin E. Aldrin Jr. of the air force -- brought their ship to rest on a level, rock-strewn plain .
----- ServeHandle END -----

Stop the ray cluster
```
ray stop
```

Notes:

You can create deployment either with a function or a class using @serve.deployment decorator.
If you want to support both HTTP request and ServeHandle request, it's recommended to use a class for the deployment to separate a internal function that can be called via ServeHandle. (ServeHandle for deployment created with a function with query_params?) Example: summarize function in Router class.
```
print(ray.get(handle.summarize.remote(article_text)))
```

3. Model Composition

Run

python model_composition.py

python model_composition.py
2022-05-26 09:54:33,004 INFO services.py:1456 -- View the Ray dashboard at http://127.0.0.1:8265
(ServeController pid=33393) 2022-05-26 09:54:40,580     INFO checkpoint_path.py:15 -- Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=33393) 2022-05-26 09:54:40,688     INFO http_state.py:106 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:iLqOel:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
2022-05-26 09:54:42,210 INFO api.py:794 -- Started Serve instance in namespace 'ff4601f8-69e6-4998-a739-04c85414beaa'.
(HTTPProxyActor pid=33400) INFO:     Started server process [33400]
2022-05-26 09:54:42,244 INFO api.py:615 -- Updating deployment 'model_one'. component=serve deployment=model_one
(ServeController pid=33393) 2022-05-26 09:54:42,276     INFO deployment_state.py:1210 -- Adding 1 replicas to deployment 'model_one'. component=serve deployment=model_one
2022-05-26 09:54:44,353 INFO api.py:630 -- Deployment 'model_one' is ready at `http://127.0.0.1:8000/model_one`. component=serve deployment=model_one
2022-05-26 09:54:44,361 INFO api.py:615 -- Updating deployment 'model_two'. component=serve deployment=model_two
(ServeController pid=33393) 2022-05-26 09:54:44,449     INFO deployment_state.py:1210 -- Adding 1 replicas to deployment 'model_two'. component=serve deployment=model_two
2022-05-26 09:54:46,387 INFO api.py:630 -- Deployment 'model_two' is ready at `http://127.0.0.1:8000/model_two`. component=serve deployment=model_two
2022-05-26 09:54:46,398 INFO api.py:615 -- Updating deployment 'ComposedModel'. component=serve deployment=ComposedModel
(ServeController pid=33393) 2022-05-26 09:54:46,423     INFO deployment_state.py:1210 -- Adding 1 replicas to deployment 'ComposedModel'. component=serve deployment=ComposedModel
2022-05-26 09:54:49,431 INFO api.py:630 -- Deployment 'ComposedModel' is ready at `http://127.0.0.1:8000/composed`. component=serve deployment=ComposedModel
(model_one pid=33403) Model 1 called with data:b'Hey!'
(model_two pid=33404) Model 2 called with data:b'Hey!'
{'model_used: 1 & 2;  score': 0.5288903723877777}
{'model_used: 1 ; score': 0.45322933175804514}
{'model_used: 1 ; score': 0.16603929376288173}
{'model_used: 1 & 2;  score': 0.9928666980845869}
(model_one pid=33403) Model 1 called with data:b'Hey!'
(model_one pid=33403) Model 1 called with data:b'Hey!'
(model_one pid=33403) Model 1 called with data:b'Hey!'
(model_one pid=33403) Model 1 called with data:b'Hey!'
(model_two pid=33404) Model 2 called with data:b'Hey!'
(model_two pid=33404) Model 2 called with data:b'Hey!'
{'model_used: 1 & 2;  score': 0.83580165395007}
{'model_used: 1 & 2;  score': 0.8273714159873894}
{'model_used: 1 ; score': 0.4718116502142262}
{'model_used: 1 & 2;  score': 0.8397308071154511}

4. Batch Requesting

Use @serve.batch decorator with async.

@serve.batch
async def my_batch_handler(self, requests: List):
    pass

Run

python batch_request.py

2022-05-28 08:55:45,668 INFO services.py:1456 -- View the Ray dashboard at http://127.0.0.1:8265
(ServeController pid=14405) 2022-05-28 08:55:51,849     INFO checkpoint_path.py:15 -- Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=14405) 2022-05-28 08:55:51,965     INFO http_state.py:106 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:dfCeLG:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
2022-05-28 08:55:53,217 INFO api.py:794 -- Started Serve instance in namespace 'bbc1baad-f997-4685-954f-46c3a45ccc30'.
2022-05-28 08:55:53,232 INFO api.py:615 -- Updating deployment 'BatchAdder'. component=serve deployment=BatchAdder
(HTTPProxyActor pid=14420) INFO:     Started server process [14420]
(ServeController pid=14405) 2022-05-28 08:55:53,312     INFO deployment_state.py:1210 -- Adding 1 replicas to deployment 'BatchAdder'. component=serve deployment=BatchAdder
2022-05-28 08:55:55,257 INFO api.py:630 -- Deployment 'BatchAdder' is ready at `http://127.0.0.1:8000/adder`. component=serve deployment=BatchAdder
(BatchAdder pid=14423) Our input array has shape: (1,)
(BatchAdder pid=14423) Our input array has shape: (3,)
(BatchAdder pid=14423) Our input array has shape: (1,)
(BatchAdder pid=14423) Our input array has shape: (3,)
(BatchAdder pid=14423) Our input array has shape: (1,)
Result returned: [1, 2, 3, 4, 5, 6, 7, 8, 9]
Input batch is [0, 1, 2, 3, 4, 5, 6, 7, 8]
(BatchAdder pid=14423) Our input array has shape: (2,)
(BatchAdder pid=14423) Our input array has shape: (1,)
(BatchAdder pid=14423) Our input array has shape: (4,)
(BatchAdder pid=14423) Our input array has shape: (2,)
Result batch is [1, 2, 3, 4, 5, 6, 7, 8, 9]
(ServeController pid=14405) 2022-05-28 08:55:57,689     INFO deployment_state.py:1236 -- Removing 1 replicas from deployment 'BatchAdder'. component=serve deployment=BatchAdder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

04-serve

04-serve

README.md

Ray Serve

1. Quickstart

2. End-to-End Tutorial

3. Model Composition

4. Batch Requesting

References

Files

04-serve

Directory actions

More options

Directory actions

More options

Latest commit

History

04-serve

Folders and files

parent directory

README.md

Ray Serve

1. Quickstart

2. End-to-End Tutorial

3. Model Composition

4. Batch Requesting

References