sam_fast model memory leak #2399

xuzhao9 · 2024-07-31T02:42:09Z

https://github.com/pytorch/benchmark/actions/runs/10168992530/job/28133362873

good: 2.5.0.dev20240729+cu124 (500aea8d5033fd3540c6ed325dd80e7e1420b0f3)
bad: torch: 2.5.0.dev20240730+cu124 (05a8540041cea936a63355c2e38b7b3beb5ce168)
bisect userbenchmark: test_bench
arguments: -m sam_fast -t eval --memleak

Bisection workflow: https://github.com/pytorch/benchmark/actions/runs/10173197195

xuzhao9 · 2024-07-31T12:55:36Z

{
  "target_repo": "pytorch",
  "start": "500aea8d5033fd3540c6ed325dd80e7e1420b0f3",
  "start_version": "2.5.0.dev20240730+cu124",
  "end": "05a8540041cea936a63355c2e38b7b3beb5ce168",
  "end_version": "2.5.0.dev20240730+cu124",
  "result": [
    {
      "commit1": "4c2bcf92cb",
      "commit1_time": "2024-07-29 19:19:54 +0000",
      "commit1_digest": {
        "name": "test_bench",
        "environ": {
          "pytorch_git_version": "4c2bcf92cbecd36b7881904bceb8dc50c9b9741d",
          "pytorch_version": "2.5.0a0+git4c2bcf9",
          "device": "NVIDIA A100-SXM4-40GB",
          "git_commit_hash": "4c2bcf92cbecd36b7881904bceb8dc50c9b9741d"
        },
        "metrics": {
          "model=sam_fast, test=eval, device=cuda, bs=None, extra_args=['--memleak'], metric=memleak": "False"
        }
      },
      "commit2": "f44446e851",
      "commit2_time": "2024-07-29 20:01:51 +0000",
      "commit2_digest": {
        "name": "test_bench",
        "environ": {
          "pytorch_git_version": "f44446e851294569fabb1b7f354618a33e06a75e",
          "pytorch_version": "2.5.0a0+gitf44446e",
          "device": "NVIDIA A100-SXM4-40GB",
          "git_commit_hash": "f44446e851294569fabb1b7f354618a33e06a75e"
        },
        "metrics": {
          "model=sam_fast, test=eval, device=cuda, bs=None, extra_args=['--memleak'], metric=memleak": "True"
        }
      }
    }
  ]
}

xuzhao9 · 2024-07-31T12:58:47Z

sam_fast uses torch.compile: https://github.com/pytorch-labs/segment-anything-fast/blob/main/segment_anything_fast/build_sam.py#L70

Root cause: pytorch/pytorch@f44446e851

xuzhao9 · 2024-07-31T17:19:41Z

Confirmed by PR author, but will take ~1 week to fix

Until #2399 is resolved

Summary: Until #2399 is resolved Pull Request resolved: #2402 Reviewed By: xuzhao9 Differential Revision: D60557916 Pulled By: kit1980 fbshipit-source-id: 5729dd8842c0b817e5265c0a1e816af76d37c632

anijain2305 · 2024-08-06T00:17:48Z

cc @mlazos @bdhirsh if this could be related to tensor_dict holding on to references.

xuzhao9 assigned kit1980 and unassigned kit1980 Jul 31, 2024

xuzhao9 mentioned this issue Jul 31, 2024

[tracker] Open issue with inline_inbuilt_nn_modules pytorch/pytorch#131696

Open

kit1980 added a commit that referenced this issue Aug 1, 2024

Skip memory leak for sam_fast

1bc7b9a

Until #2399 is resolved

kit1980 mentioned this issue Aug 1, 2024

Skip memory leak for sam_fast #2402

Closed

huydhn added this to PyTorch OSS Dev Infra Oct 1, 2024

ZainRizvi moved this to Prioritized in PyTorch OSS Dev Infra Oct 1, 2024

ZainRizvi moved this from Prioritized to Cold Storage in PyTorch OSS Dev Infra Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sam_fast model memory leak #2399

sam_fast model memory leak #2399

xuzhao9 commented Jul 31, 2024 •

edited

Loading

xuzhao9 commented Jul 31, 2024

xuzhao9 commented Jul 31, 2024 •

edited

Loading

xuzhao9 commented Jul 31, 2024

anijain2305 commented Aug 6, 2024

sam_fast model memory leak #2399

sam_fast model memory leak #2399

Comments

xuzhao9 commented Jul 31, 2024 • edited Loading

xuzhao9 commented Jul 31, 2024

xuzhao9 commented Jul 31, 2024 • edited Loading

xuzhao9 commented Jul 31, 2024

anijain2305 commented Aug 6, 2024

xuzhao9 commented Jul 31, 2024 •

edited

Loading

xuzhao9 commented Jul 31, 2024 •

edited

Loading