Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sam_fast model memory leak #2399

Open
xuzhao9 opened this issue Jul 31, 2024 · 4 comments
Open

sam_fast model memory leak #2399

xuzhao9 opened this issue Jul 31, 2024 · 4 comments
Assignees

Comments

@xuzhao9
Copy link
Contributor

xuzhao9 commented Jul 31, 2024

https://github.com/pytorch/benchmark/actions/runs/10168992530/job/28133362873

good: 2.5.0.dev20240729+cu124 (500aea8d5033fd3540c6ed325dd80e7e1420b0f3)
bad: torch: 2.5.0.dev20240730+cu124 (05a8540041cea936a63355c2e38b7b3beb5ce168)
bisect userbenchmark: test_bench
arguments: -m sam_fast -t eval --memleak

Bisection workflow: https://github.com/pytorch/benchmark/actions/runs/10173197195

@xuzhao9
Copy link
Contributor Author

xuzhao9 commented Jul 31, 2024

{
  "target_repo": "pytorch",
  "start": "500aea8d5033fd3540c6ed325dd80e7e1420b0f3",
  "start_version": "2.5.0.dev20240730+cu124",
  "end": "05a8540041cea936a63355c2e38b7b3beb5ce168",
  "end_version": "2.5.0.dev20240730+cu124",
  "result": [
    {
      "commit1": "4c2bcf92cb",
      "commit1_time": "2024-07-29 19:19:54 +0000",
      "commit1_digest": {
        "name": "test_bench",
        "environ": {
          "pytorch_git_version": "4c2bcf92cbecd36b7881904bceb8dc50c9b9741d",
          "pytorch_version": "2.5.0a0+git4c2bcf9",
          "device": "NVIDIA A100-SXM4-40GB",
          "git_commit_hash": "4c2bcf92cbecd36b7881904bceb8dc50c9b9741d"
        },
        "metrics": {
          "model=sam_fast, test=eval, device=cuda, bs=None, extra_args=['--memleak'], metric=memleak": "False"
        }
      },
      "commit2": "f44446e851",
      "commit2_time": "2024-07-29 20:01:51 +0000",
      "commit2_digest": {
        "name": "test_bench",
        "environ": {
          "pytorch_git_version": "f44446e851294569fabb1b7f354618a33e06a75e",
          "pytorch_version": "2.5.0a0+gitf44446e",
          "device": "NVIDIA A100-SXM4-40GB",
          "git_commit_hash": "f44446e851294569fabb1b7f354618a33e06a75e"
        },
        "metrics": {
          "model=sam_fast, test=eval, device=cuda, bs=None, extra_args=['--memleak'], metric=memleak": "True"
        }
      }
    }
  ]
}

@xuzhao9 xuzhao9 assigned kit1980 and unassigned kit1980 Jul 31, 2024
@xuzhao9
Copy link
Contributor Author

xuzhao9 commented Jul 31, 2024

@xuzhao9
Copy link
Contributor Author

xuzhao9 commented Jul 31, 2024

Confirmed by PR author, but will take ~1 week to fix

kit1980 added a commit that referenced this issue Aug 1, 2024
facebook-github-bot pushed a commit that referenced this issue Aug 1, 2024
Summary:
Until #2399 is resolved

Pull Request resolved: #2402

Reviewed By: xuzhao9

Differential Revision: D60557916

Pulled By: kit1980

fbshipit-source-id: 5729dd8842c0b817e5265c0a1e816af76d37c632
@anijain2305
Copy link
Contributor

cc @mlazos @bdhirsh if this could be related to tensor_dict holding on to references.

@ZainRizvi ZainRizvi moved this to Prioritized in PyTorch OSS Dev Infra Oct 1, 2024
@ZainRizvi ZainRizvi moved this from Prioritized to Cold Storage in PyTorch OSS Dev Infra Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Cold Storage
Development

No branches or pull requests

3 participants