RAM leak in version 4.0.0 #520

AlexValue · 2025-02-03T14:01:15Z

Describe the bug
A clear and concise description of what the bug is.

After the latest update (to version 4.0.0), the amount of RAM consumed when running AI models has increased by at least 2 times. This leads to the application being closed because consumes all available RAM. RAM continues to leak even after answer generation is completed. Now I can't even run small models of size 2b, although previously I could run even models of size 14b.

Expected behavior
A clear and concise description of what you expected to happen.

The application should use the minimum possible amount of RAM. Need to find and fix a RAM leak.

Screenshots
If applicable, add screenshots to help explain your problem.

The first screenshot in the upper right corner shows how much RAM is available before the response is generated.

The second screenshot shows the moment before the application crashed, when the model managed to generate a response, but the RAM continued to leak. Larger models do not have time to generate an answer before crashing.

Debugging information

Please paste here the debugging information available at 'About Alpaca' > 'Troubleshooting' > 'Debugging Information'

Couldn't find '/home/alex/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMMjY/wJfj+6XqcZjnvRnh4pvLIZh4+Aqd+PzIBzUc7Z

2025/02/03 18:57:59 routes.go:1187: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11435 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/alex/.var/app/com.jeffser.Alpaca/data/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-02-03T18:57:59.907+05:00 level=INFO source=images.go:432 msg="total blobs: 45"
time=2025-02-03T18:57:59.909+05:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST   /api/embed                --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] GET    /api/ps                   --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST   /v1/completions           --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST   /v1/embeddings            --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models                --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models/:model         --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
[GIN-debug] GET    /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
time=2025-02-03T18:57:59.911+05:00 level=INFO source=routes.go:1238 msg="Listening on 127.0.0.1:11435 (version 0.5.7)"
time=2025-02-03T18:57:59.911+05:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners="[rocm_avx cpu cpu_avx cpu_avx2 cuda_v11_avx cuda_v12_avx]"
time=2025-02-03T18:57:59.911+05:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
time=2025-02-03T18:57:59.922+05:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2025-02-03T18:57:59.929+05:00 level=WARN source=amd_linux.go:378 msg="amdgpu is not supported (supported types:[gfx1010 gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942])" gpu_type=gfx902 gpu=0 library=/app/plugins/AMD/lib/ollama
time=2025-02-03T18:57:59.929+05:00 level=WARN source=amd_linux.go:385 msg="See https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides for HSA_OVERRIDE_GFX_VERSION usage"
time=2025-02-03T18:57:59.929+05:00 level=INFO source=amd_linux.go:404 msg="no compatible amdgpu devices detected"
time=2025-02-03T18:57:59.929+05:00 level=INFO source=gpu.go:392 msg="no compatible GPUs were discovered"
time=2025-02-03T18:57:59.929+05:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="13.5 GiB" available="10.0 GiB"
[GIN] 2025/02/03 - 18:57:59 | 200 |    5.563985ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/02/03 - 18:58:00 | 200 |  115.017359ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/02/03 - 18:58:00 | 200 |  255.479812ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/02/03 - 18:58:00 | 200 |  260.558241ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/02/03 - 18:58:00 | 200 |  246.655899ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/02/03 - 18:58:00 | 200 |  264.307692ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/02/03 - 18:58:00 | 200 |  293.296434ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/02/03 - 18:58:00 | 200 |  291.016702ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/02/03 - 18:58:00 | 200 |   286.62801ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/02/03 - 18:58:00 | 200 |  358.653767ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/02/03 - 18:58:00 | 200 |  258.851819ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/02/03 - 18:58:00 | 200 |  312.296069ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/02/03 - 18:58:00 | 200 |  305.906667ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/02/03 - 18:58:00 | 200 |  358.504098ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/02/03 - 18:58:00 | 200 |  337.877769ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/02/03 - 18:58:00 | 200 |  287.382788ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/02/03 - 18:58:00 | 200 |  346.950766ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/02/03 - 18:58:00 | 200 |  415.590881ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/02/03 - 18:58:00 | 200 |  363.174196ms |       127.0.0.1 | POST     "/api/show"

The text was updated successfully, but these errors were encountered:

AlexValue added the bug Something isn't working label Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAM leak in version 4.0.0 #520

RAM leak in version 4.0.0 #520

AlexValue commented Feb 3, 2025 •

edited

Loading

RAM leak in version 4.0.0 #520

RAM leak in version 4.0.0 #520

Comments

AlexValue commented Feb 3, 2025 • edited Loading

AlexValue commented Feb 3, 2025 •

edited

Loading