[Bug]: Cooldown Not Working in LiteLLM #7779

ZPerling · 2025-01-15T10:53:08Z

What happened?

While using LiteLLM, I encountered an issue where the cooldown mechanism isn't functioning correctly. The debug logs indicate an error occurring when attempting to add the model to the cooldown list. The model was added to the cooldown due to a timeout in the stream response.

Relevant log output

10:48:42 - LiteLLM Router:DEBUG: cooldown_handlers.py:182 - Attempting to add ef8a6269c22774c736474b5562f40781bd9b91f7432c2d804c38a7f205c36208 to cooldown list
10:48:42 - LiteLLM Proxy:DEBUG: proxy_server.py:2882 - An error occurred:

Debug this by setting `--debug`, e.g. `litellm --model gpt-3.5-turbo --debug`
10:48:42 - LiteLLM Router:DEBUG: cooldown_handlers.py:117 - percent fails for deployment = ef8a6269c22774c736474b5562f40781bd9b91f7432c2d804c38a7f205c36208, percent fails = 1.0, num successes = 0, num fails = 8
10:48:42 - LiteLLM Router:DEBUG: cooldown_handlers.py:342 - Unable to cast exception status to int. Defaulting to status=500.
10:48:42 - LiteLLM:DEBUG: utils.py:289 - Custom Logger Error - Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/litellm/integrations/custom_logger.py", line 287, in log_event callback_func(
    kwargs,  # kwargs to func
  ~   end_time,
  ~   )
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3523, in deployment_callback_on_failure
    raise e
  File "/usr/lib/python3.13/site-packages/litellm/router.py", line 3510, in deployment_callback_on_failure
    result = _set_cooldown_deployments(
      litellm_router_instance=self,
    ~
  File "/usr/lib/python3.13/site-packages/litellm/router_utils/cooldown_handlers.py", line 198, in _set_cooldown_deployments
    asyncio.create_task(
      router_cooldown_event_callback(
    ~
  File "/usr/lib/python3.13/asyncio/tasks.py", line 407, in create_task
    loop = events.get_running_loop()
RuntimeError: no running event loop

The model was added to the cooldown due to a timeout in the stream response:

10:48:43 - LiteLLM Proxy:ERROR: proxy_server.py:2872 - litellm.proxy.proxy_server.async_data_generator(): Exception occured -
Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/httpx/_transports/default.py", line 69, in map_httpcore_exceptions
    yield
  File "/usr/lib/python3.13/site-packages/httpx/_transports/default.py", line 254, in __aiter__
    async for part in self._httpcore_stream:
        yield part
  File "/usr/lib/python3.13/site-packages/httpcore/_async/connection_pool.py", line 407, in __aiter__
    raise exc from None
  File "/usr/lib/python3.13/site-packages/httpcore/_async/connection_pool.py", line 403, in __aiter__
    async for part in self._stream:
        yield part
  File "/usr/lib/python3.13/site-packages/httpcore/_async/http11.py", line 342, in __aiter__
    raise exc
  File "/usr/lib/python3.13/site-packages/httpcore/_async/http11.py", line 334, in __aiter__
    async for chunk in self._connection._receive_response_body(**kwargs):
        yield chunk
  File "/usr/lib/python3.13/site-packages/httpcore/_async/http11.py", line 203, in _receive_response_body
    event = await self._receive_event(timeout=timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/site-packages/httpcore/_async/http11.py", line 217, in _receive_event
    data = await self._network_stream.read(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        self.READ_NUM_BYTES, timeout=timeout
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3.13/site-packages/httpcore/_backends/anyio.py", line 32, in read
    with map_exceptions(exc_map):
         ~~~~~~~~~~~~~~^^^^^^^^^
  File "/usr/lib/python3.13/contextlib.py", line 162, in __exit__
    self.gen.throw(value)
    ~~~~~~~~~~~~~~^^^^^^^
  File "/usr/lib/python3.13/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ReadTimeout

Are you a ML Ops Team?

No

What LiteLLM version are you on ?

v1.58.2

Twitter / LinkedIn details

No response

ZPerling · 2025-01-16T11:04:24Z

I wanted to follow up on this issue to see if there has been any progress or if there is any additional information I can provide to help with the investigation. This issue is causing some disruption in using the cooldown functionality, and any updates or guidance would be greatly appreciated.

ZPerling added the bug Something isn't working label Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Cooldown Not Working in LiteLLM #7779

[Bug]: Cooldown Not Working in LiteLLM #7779

ZPerling commented Jan 15, 2025 •

edited

Loading

ZPerling commented Jan 16, 2025

[Bug]: Cooldown Not Working in LiteLLM #7779

[Bug]: Cooldown Not Working in LiteLLM #7779

Comments

ZPerling commented Jan 15, 2025 • edited Loading

What happened?

Relevant log output

Are you a ML Ops Team?

What LiteLLM version are you on ?

Twitter / LinkedIn details

ZPerling commented Jan 16, 2025

ZPerling commented Jan 15, 2025 •

edited

Loading