-
Install
ray[serve]
pip install "ray[serve]"
-
Prepare
quickstart.py
.from time import sleep import ray from ray import serve ray.init() # 1. init cluster serve.start() # 2. serve start @serve.deployment # 3. define deployment class Counter: def __init__(self): self.count = 0 def __call__(self, request): self.count += 1 return {"count": self.count} Counter.deploy() # 4. deploy sleep(100)
-
Run
python quickstart.py
2022-05-22 06:46:24,611 INFO services.py:1456 -- View the Ray dashboard at http://127.0.0.1:8265 (ServeController pid=27612) 2022-05-22 06:46:29,494 INFO checkpoint_path.py:15 -- Using RayInternalKVStore for controller checkpoint and recovery. (ServeController pid=27612) 2022-05-22 06:46:29,609 INFO http_state.py:106 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:DylMax:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000' 2022-05-22 06:46:30,915 INFO api.py:794 -- Started Serve instance in namespace '9c985508-1c83-47f5-a078-ca1faa3ed450'. (HTTPProxyActor pid=27613) INFO: Started server process [27613] 2022-05-22 06:46:30,934 INFO api.py:615 -- Updating deployment 'Counter'. component=serve deployment=Counter (ServeController pid=27612) 2022-05-22 06:46:31,015 INFO deployment_state.py:1210 -- Adding 1 replicas to deployment 'Counter'. component=serve deployment=Counter 2022-05-22 06:46:35,004 INFO api.py:630 -- Deployment 'Counter' is ready at `http://127.0.0.1:8000/Counter`. component=serve deployment=Counter
-
Check on http://127.0.0.1:8000/Counter.
Get the counter:
curl -X GET localhost:8000/Counter/ {"count": 0}
You can also try an extended quickstart with quickstart_extended.py
Summarize a long English text with transformers
library.
-
Install libraries. Use https://huggingface.co/docs/transformers/index.
pip install transformers pip install 'transformers[torch]'
-
Run locally.
python local_model.py
Downloading: 100%|█████████████████████████████████████████████████████████| 231M/231M [00:43<00:00, 5.61MB/s] Downloading: 100%|██████████████████████████████████████████████████████████| 773k/773k [00:02<00:00, 298kB/s] Downloading: 100%|████████████████████████████████████████████████████████| 1.32M/1.32M [00:07<00:00, 184kB/s] /Users/nakamasato/.pyenv/versions/3.9.0/lib/python3.9/site-packages/transformers/models/t5/tokenization_t5_fast.py:155: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5. For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`. - Be aware that you SHOULD NOT rely on t5-small automatically truncating your input to 512 when padding/encoding. - If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding. - To avoid this warning, please instantiate this tokenizer with `model_max_length` set to your preferred value. warnings.warn( two astronauts steered their fragile lunar module safely and smoothly to the historic landing . the first men to reach the moon -- Armstrong and his co-pilot, col. Edwin E. Aldrin Jr. of the air force -- brought their ship to rest on a level, rock-strewn plain .
-
Deploy this model using Ray Serve
-
Create local ray cluster.
ray start --head
-
Run.
python model_on_ray_serve.py
-
Send request (either via HTTP or Python).
python router_client.py ----- HTTP START ----- two astronauts steered their fragile lunar module safely and smoothly to the historic landing . the first men to reach the moon -- Armstrong and his co-pilot, col. Edwin E. Aldrin Jr. of the air force -- brought their ship to rest on a level, rock-strewn plain . ----- HTTP END ----- ----- ServeHandle START ----- two astronauts steered their fragile lunar module safely and smoothly to the historic landing . the first men to reach the moon -- Armstrong and his co-pilot, col. Edwin E. Aldrin Jr. of the air force -- brought their ship to rest on a level, rock-strewn plain . ----- ServeHandle END -----
-
-
Stop the ray cluster
ray stop
Notes:
- You can create deployment either with a function or a class using
@serve.deployment
decorator. - If you want to support both HTTP request and ServeHandle request, it's recommended to use a class for the deployment to separate a internal function that can be called via ServeHandle. (ServeHandle for deployment created with a function with query_params?)
Example:
summarize
function inRouter
class.print(ray.get(handle.summarize.remote(article_text)))
Run
python model_composition.py
python model_composition.py
2022-05-26 09:54:33,004 INFO services.py:1456 -- View the Ray dashboard at http://127.0.0.1:8265
(ServeController pid=33393) 2022-05-26 09:54:40,580 INFO checkpoint_path.py:15 -- Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=33393) 2022-05-26 09:54:40,688 INFO http_state.py:106 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:iLqOel:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
2022-05-26 09:54:42,210 INFO api.py:794 -- Started Serve instance in namespace 'ff4601f8-69e6-4998-a739-04c85414beaa'.
(HTTPProxyActor pid=33400) INFO: Started server process [33400]
2022-05-26 09:54:42,244 INFO api.py:615 -- Updating deployment 'model_one'. component=serve deployment=model_one
(ServeController pid=33393) 2022-05-26 09:54:42,276 INFO deployment_state.py:1210 -- Adding 1 replicas to deployment 'model_one'. component=serve deployment=model_one
2022-05-26 09:54:44,353 INFO api.py:630 -- Deployment 'model_one' is ready at `http://127.0.0.1:8000/model_one`. component=serve deployment=model_one
2022-05-26 09:54:44,361 INFO api.py:615 -- Updating deployment 'model_two'. component=serve deployment=model_two
(ServeController pid=33393) 2022-05-26 09:54:44,449 INFO deployment_state.py:1210 -- Adding 1 replicas to deployment 'model_two'. component=serve deployment=model_two
2022-05-26 09:54:46,387 INFO api.py:630 -- Deployment 'model_two' is ready at `http://127.0.0.1:8000/model_two`. component=serve deployment=model_two
2022-05-26 09:54:46,398 INFO api.py:615 -- Updating deployment 'ComposedModel'. component=serve deployment=ComposedModel
(ServeController pid=33393) 2022-05-26 09:54:46,423 INFO deployment_state.py:1210 -- Adding 1 replicas to deployment 'ComposedModel'. component=serve deployment=ComposedModel
2022-05-26 09:54:49,431 INFO api.py:630 -- Deployment 'ComposedModel' is ready at `http://127.0.0.1:8000/composed`. component=serve deployment=ComposedModel
(model_one pid=33403) Model 1 called with data:b'Hey!'
(model_two pid=33404) Model 2 called with data:b'Hey!'
{'model_used: 1 & 2; score': 0.5288903723877777}
{'model_used: 1 ; score': 0.45322933175804514}
{'model_used: 1 ; score': 0.16603929376288173}
{'model_used: 1 & 2; score': 0.9928666980845869}
(model_one pid=33403) Model 1 called with data:b'Hey!'
(model_one pid=33403) Model 1 called with data:b'Hey!'
(model_one pid=33403) Model 1 called with data:b'Hey!'
(model_one pid=33403) Model 1 called with data:b'Hey!'
(model_two pid=33404) Model 2 called with data:b'Hey!'
(model_two pid=33404) Model 2 called with data:b'Hey!'
{'model_used: 1 & 2; score': 0.83580165395007}
{'model_used: 1 & 2; score': 0.8273714159873894}
{'model_used: 1 ; score': 0.4718116502142262}
{'model_used: 1 & 2; score': 0.8397308071154511}
Use @serve.batch
decorator with async
.
@serve.batch
async def my_batch_handler(self, requests: List):
pass
Run
python batch_request.py
2022-05-28 08:55:45,668 INFO services.py:1456 -- View the Ray dashboard at http://127.0.0.1:8265
(ServeController pid=14405) 2022-05-28 08:55:51,849 INFO checkpoint_path.py:15 -- Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=14405) 2022-05-28 08:55:51,965 INFO http_state.py:106 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:dfCeLG:SERVE_PROXY_ACTOR-node:127.0.0.1-0' on node 'node:127.0.0.1-0' listening on '127.0.0.1:8000'
2022-05-28 08:55:53,217 INFO api.py:794 -- Started Serve instance in namespace 'bbc1baad-f997-4685-954f-46c3a45ccc30'.
2022-05-28 08:55:53,232 INFO api.py:615 -- Updating deployment 'BatchAdder'. component=serve deployment=BatchAdder
(HTTPProxyActor pid=14420) INFO: Started server process [14420]
(ServeController pid=14405) 2022-05-28 08:55:53,312 INFO deployment_state.py:1210 -- Adding 1 replicas to deployment 'BatchAdder'. component=serve deployment=BatchAdder
2022-05-28 08:55:55,257 INFO api.py:630 -- Deployment 'BatchAdder' is ready at `http://127.0.0.1:8000/adder`. component=serve deployment=BatchAdder
(BatchAdder pid=14423) Our input array has shape: (1,)
(BatchAdder pid=14423) Our input array has shape: (3,)
(BatchAdder pid=14423) Our input array has shape: (1,)
(BatchAdder pid=14423) Our input array has shape: (3,)
(BatchAdder pid=14423) Our input array has shape: (1,)
Result returned: [1, 2, 3, 4, 5, 6, 7, 8, 9]
Input batch is [0, 1, 2, 3, 4, 5, 6, 7, 8]
(BatchAdder pid=14423) Our input array has shape: (2,)
(BatchAdder pid=14423) Our input array has shape: (1,)
(BatchAdder pid=14423) Our input array has shape: (4,)
(BatchAdder pid=14423) Our input array has shape: (2,)
Result batch is [1, 2, 3, 4, 5, 6, 7, 8, 9]
(ServeController pid=14405) 2022-05-28 08:55:57,689 INFO deployment_state.py:1236 -- Removing 1 replicas from deployment 'BatchAdder'. component=serve deployment=BatchAdder