Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory spike issue with Next.js 15.1.4 on Azure #74855

Open
1 task
skerdi-zogaj opened this issue Jan 14, 2025 · 35 comments
Open
1 task

Memory spike issue with Next.js 15.1.4 on Azure #74855

skerdi-zogaj opened this issue Jan 14, 2025 · 35 comments
Labels
linear: next Confirmed issue that is tracked by the Next.js team.

Comments

@skerdi-zogaj
Copy link

skerdi-zogaj commented Jan 14, 2025

Verify canary release

  • I verified that the issue exists in the latest Next.js canary release

Provide environment information

Operating System:
  Platform: darwin
  Arch: arm64
  Version: Darwin Kernel Version 24.2.0: Fri Dec  6 18:51:28 PST 2024; root:xnu-11215.61.5~2/RELEASE_ARM64_T8112
  Available memory (MB): 16384
  Available CPU cores: 8
Binaries:
  Node: 20.13.1
  npm: 10.8.1
  Yarn: 1.22.22
  pnpm: N/A
Relevant Packages:
  next: 15.1.4 // Latest available version is detected (15.1.4).
  eslint-config-next: 14.2.3
  react: 18.3.1
  react-dom: 18.3.1
  typescript: 5.4.5
Next.js Config:
  output: standalone

Which example does this report relate to?

This issue is not related to any specific example in the examples folder. The problem occurs in a general Next.js application deployed on Azure.

What browser are you using? (if relevant)

No response

How are you deploying your application? (if relevant)

No response

Describe the Bug

We are experiencing a significant memory spike and auto-scaling issues when using Next.js 15.1 in our Azure deployments. Memory usage increases unpredictably under typical traffic conditions, leading to higher resource utilization and triggering unnecessary auto-scaling.

When downgrading to Next.js 14.2, these issues are resolved, and memory usage returns to stable levels. This suggests a regression introduced in version 15.1.

Graphs comparing memory usage for versions 15.1 and 14.2 are attached below for reference.

Image

Image

Expected Behavior

Memory usage should remain stable and consistent under typical traffic conditions when using Next.js 15.1, similar to the behavior observed in Next.js 14.2.

To Reproduce

Deploy a Next.js 15.1 application on Azure with typical production traffic patterns.
Monitor the memory usage and auto-scaling behavior using Azure's monitoring tools.
Observe that memory usage increases significantly and unpredictably, causing auto-scaling to trigger even under normal load.
Downgrade the application to Next.js 14.2.
Re-monitor the application, noticing that memory usage stabilizes and auto-scaling behaves as expected.

Image

Image

@skerdi-zogaj skerdi-zogaj added the examples Issue was opened via the examples template. label Jan 14, 2025
@frankbo
Copy link

frankbo commented Jan 14, 2025

We are facing a similar issue in our kubernetes pods. With Nextjs 15.1.3 the memory consumption was/is fine and the pods run as expected, but with the update to 15.1.4 the pods want to consume more and more memory and die at some point. Normally our pods use 110mb memory and are fine with that. With the update to Nextjs 15.1.4 they start with 115 and the memory get up to 300mb, then they die and it starts over again. From the graphs it looks like a memory leak or at least something that consumes more memory then necessary over time.
Let me know if you need further information.

@skerdi03 did you try version 15.1.3 as well. And do you face the same issues there?

@vitalyiegorov
Copy link

vitalyiegorov commented Jan 15, 2025

We are facing the same issue with the node:18-alpine image on NextJS 15.1.4

@cjcheshire
Copy link

I don't want to do the me too… node:20-alpine + NextJS 15.1.4. We have dropped back to NextJS 15.1.3 to see if it settles.

Hard to create an example repo for this. We have 40+ page templates (1400 pages) that use unstable_cache to revalidate every 10 minutes. We also have a couple of apis.

You can see when we deployed 15.1.4:

Image

@skerdi-zogaj
Copy link
Author

We are facing a similar issue in our kubernetes pods. With Nextjs 15.1.3 the memory consumption was/is fine and the pods run as expected, but with the update to 15.1.4 the pods want to consume more and more memory and die at some point. Normally our pods use 110mb memory and are fine with that. With the update to Nextjs 15.1.4 they start with 115 and the memory get up to 300mb, then they die and it starts over again. From the graphs it looks like a memory leak or at least something that consumes more memory then necessary over time. Let me know if you need further information.

@skerdi03 did you try version 15.1.3 as well. And do you face the same issues there?

Yes, we had the same issue with 15.1.3 as well

@justinadkins
Copy link

We've experienced memory issues since 15.1.x, 15.0.x is stable for us. Deployed on AWS via FlightControl. We're running Node 18 right now.

@samcx samcx added bug Issue was opened via the bug report template. and removed examples Issue was opened via the examples template. bug Issue was opened via the bug report template. labels Jan 16, 2025
@lubieowoce
Copy link
Member

lubieowoce commented Jan 17, 2025

We've identified one leak that became much more noticeable in 15.1.4 -- specifically, since that release, each revalidation (in next start) would leak a promise that'd then stick around forever. That fix is here: #75041). We're gonna ship a patch with this soon, and update this thread when it's out

@github-actions github-actions bot added the linear: next Confirmed issue that is tracked by the Next.js team. label Jan 20, 2025
@huozhi
Copy link
Member

huozhi commented Jan 22, 2025

The fix mentioned above is landed in v15.1.6, please upgrade and let us know if that fixes your issue 🙏 Thanks

@sbehrends
Copy link

For us looks like the latest version fixed the issue.

Here is memory usage before and after the update.
Image

@DonikaV
Copy link

DonikaV commented Jan 22, 2025

As i can see on the graphs of memory usage the problem has been fixed in 15.1.6

@huozhi huozhi closed this as completed Jan 22, 2025
@farzadsoltani
Copy link

I have noticed that it's been climbing much slower than previously, but I can definitely still see more RAM usage compared to 15.0.x and a much higher jump compared to 14.x.

On 14.x, we would average around 300-400mb per pod. On 15.0.x the average would be around 400-500mb. But I've been seeing 600mb+ each time I check my pods. Something's not quite right.

@maxigs7
Copy link

maxigs7 commented Jan 23, 2025

We tried the same 15.1.6, but it is still happening, even though we created a container to measure it and the memory usage is increasing. We finally downgraded it to version 15.0.3, and everything stabilized. We are not using any server-side feature, only the pages router, and with no getServerSide props.

@justinadkins
Copy link

We are still seeing a pretty significant memory leak on 15.1.6. I was hopeful this was the fix. 15.0.x is stable for us. We deployed the version bump yesterday.

Image

@DonikaV
Copy link

DonikaV commented Jan 24, 2025

@justinadkins
In our case the issue has been fixed after 15.1.6, maybe you have other problems?
Have you tried something from, here?
https://nextjs.org/docs/app/building-your-application/optimizing/memory-usage

Image

@justinadkins
Copy link

@DonikaV I have taken a look at that documentation and done some troubleshooting. I don't have a memory leak in 15.0.4 but I have one in 15.1.6. There is either something about my codebase that becomes leaky in 15.1.x or something included in the Next bump that is now leaking in certain scenarios.

@TJC
Copy link

TJC commented Jan 25, 2025

We're also seeing quite a bad memory leak on 15.1.6. It's not as severe as 15.1.4, but it's still quite problematic, such that we had to revert the upgrade back again.

Let me know if it's helpful for us to measure the requests vs memory usage, to give you an idea of scale?

@huozhi huozhi reopened this Jan 27, 2025
@frankbo
Copy link

frankbo commented Jan 27, 2025

For us the memory consumption is again at 110mb. We had one weird peak up to 300mb, but I think that had nothing to do with nextjs. Since the update to 15.1.6 the Pods run fine again. If you need further information how we are using nextjs let me know.

@aakashbapna
Copy link

aakashbapna commented Jan 28, 2025

We had recently upgraded from next 14.2.x to 15.1.x. We encountered similar memory leak issues with next 15.1.x. Upgrading to next 15.1.6 didn't help reduce memory usage. Our pod memory usage kept growing and they would eventually get killed with OOM.

Things stabilized when we downgraded to next 15.0.4, we saw dramatically less memory usage hovering around 120mb per pod like earlier (next 14.2.x).

We extensively use middleware and are fully on app router.

Next 15.1.6 vs Next 15.0.4:

Image

@dennieriechelman
Copy link

dennieriechelman commented Jan 28, 2025

We are experiencing exactly the same on two of our ecommerce sites with medium/low traffic (around 2M request per day). Please see screenshot for both apps. We were running 15.0.3 und deployed 15.1.6.

Some more info:

  • Running in latest k8s on a node:20.15.0-alpine container.
  • We do some work in the middleware (Mostly deleting cookies for some request, updating token if expired)

I have tried to do some heap dump analysis but did not get far.

If there is any more info that could help you please let me know. I would really like to be able to update our app.

Image

Image

@justinadkins
Copy link

As stated above, we are still affected by memory leaks on 15.1.6. Our project is using:

  • Middleware primarily for auth, we use Clerk (using clerkMiddleware function)
  • App router
  • Sentry for exception observability

@wslp12
Copy link

wslp12 commented Feb 3, 2025

Memory leakage occurs at 15.1.6 due to the same issue.
What is peculiar is that the cpu usage rate is also very high (issue version)

Logically, it is the smae phenomnon when trying only the return null in rootLayout

additional information
node: 20.15.1
setnry/core: 8.42.0
sentry/nextjs: 8.42.0
sentry/utils: 8.42.0

@Jax-p
Copy link

Jax-p commented Feb 4, 2025

Memory leakage occurs at 15.1.6 due to the same issue.
What is peculiar is that the cpu usage rate is also very high (issue version)

Logically, it is the smae phenomnon when trying only the return null in rootLayout

additional information
node: 20.15.1
setnry/core: 8.42.0
sentry/nextjs: 8.42.0
sentry/utils: 8.42.0

Have you tried to turn off Sentry? In our case 15.0.4 is runnable with Sentry/Next 8.5X.X but every other version leaks memory. But without Sentry, even 15.1.6 looks much better.

@masterkain
Copy link

a standalone app I deployed on my cluster usually uses around 100MB, it's balooning to 2GB, never used sentry

@mhanbl
Copy link

mhanbl commented Feb 5, 2025

We suffered a memory leak on AWS/ECS ever since 15.1.x - we just downgraded to 15.0.4 and that seems to have solved it

Image

@Maclay74
Copy link

Maclay74 commented Feb 5, 2025

Hey, we have the same problem on 15.1.6. I noticed that it happens once I add middleware to the application.
It's very easy to reproduce by creating a new application via create-next-app. Test it with and without middleware.

@peec
Copy link

peec commented Feb 5, 2025

We are experiencing the same issues, running in standalone nextjs using middleware on our kubernetes cluster.

When we downgraded to 15.0.4 the memory leak was gone.

@dennieriechelman
Copy link

Hey, we have the same problem on 15.1.6. I noticed that it happens once I add middleware to the application. It's very easy to reproduce by creating a new application via create-next-app. Test it with and without middleware.

Thanks @Maclay74 . I also came to the same conclusion. What exactly did you add to the middleware?

@Maclay74
Copy link

Maclay74 commented Feb 5, 2025

Hey, we have the same problem on 15.1.6. I noticed that it happens once I add middleware to the application. It's very easy to reproduce by creating a new application via create-next-app. Test it with and without middleware.

Thanks @Maclay74 . I also came to the same conclusion. What exactly did you add to the middleware?

Literally nothing, just an empty function. The fact that the middleware presents in the app causes leaks, apparently.

@u11d-aleksy-bohdziul
Copy link

In our case, upgrading the NodeJS version from 20.15.1 to 23.6.1 seems to have fixed the issue.

We're running our nextjs server on ECS Fargate and we noticed a memory issue during a stress tests, where we suddenly had a much higher memory usage, which didn't drop down after the tests finished.
In our case this was caused by NodeJS not cleaning up old Timeout objects.

I was able to reproduce this issue in both Next.js 15.1.6 and 15.0.4, but in 15.0.4 the issue only appears after setting the flagexperimental.after to true. That's because the flag experimental.after was removed in Next.js 15.1 and the feature tied to it is now always enabled.
The part of this feature causing our issue is this setTimeout call:

setTimeout(() => {
closeController.dispatchClose()
}, 0)

For some reason, the Timeout objects created by this setTimeout call were never cleaned up by NodeJS, and they also prevented quite a few other objects from being garbage collected.

Here’s a screenshot of the heap snapshot from our Next.js server, taken about 2 hours after the stress tests ended:

Image

Each of those nearly 20 000 Timeouts is created by the setTimeout invocation mentioned above. All of them are marked as destroyed, and are only retained by knownTimersById, which from my understanding shows that this is a purely NodeJS issue.

Image

@lubieowoce
Copy link
Member

lubieowoce commented Feb 6, 2025

@u11d-aleksy-bohdziul Thank you so much for tracking this down! I believe this might be caused by this nodejs bug:
Timeout leaks when converted into a primitive and not cleared #53335
In middleware (and other edge functions) we're wrapping node's setTimeout so that it returns a number instead of a NodeJS.Timeout, so we'd trigger this. (code)

AFAICT the fix for that was released in:

For anyone still experiencing leaks on Next.js 15.1.6, please try updating your node to one of the above versions and see if that solves it.

@wslp12
Copy link

wslp12 commented Feb 6, 2025

AFAICT the fix for that was released in:

For anyone still experiencing leaks on Next.js 15.1.6, please try updating your node to one of the above versions and see if that solves it.

Solve thanks

lubieowoce added a commit that referenced this issue Feb 6, 2025
Potential fix for a leak reported in #74855 on older node versions (see
[comment](#74855 (comment))).

### Background

When running middleware (or other edge functions) in `next start`, we
wrap them in an edge runtime sandbox. this includes polyfills of
`setTimeout` and `setInterval` which return `number` instead of
`NodeJS.Timeout`.

Unfortunately, on some older node versions, converting a
`NodeJS.Timeout` to a number will cause that timeout to leak:
nodejs/node#53335
The leaked timeout will also hold onto the callback, thus also leaking
anything that was closed over (which can be a lot of things!)

### Solution

Ideally, users just upgrade to a Node version that includes the fix:
- [node v20.16.0](nodejs/node#53945)
- [node v22.4.0](nodejs/node#53583)
- node v23.0.0

But we're currently still supporting node 18, so we can't necessarily
rely on that. Luckily, as noted in the description of the nodejs issue,
calling `clearTimeout` seems to unleak the timeout, so we can just do
that after the callback runs!

### Unrelated stuff I did

While i was at it, I also fixed a (very niche) discrepancy from how
`setTimeout` and `setInterval` behave on the web. when running the
callback, node sets `this` to the Timeout instance:
```js
> void setTimeout(function () {console.log('this in setTimeout:', this) } )
undefined
> this in setTimeout: Timeout { ... }
```
but on the web, `this` is always set to `globalThis`. Our wrapper now
correctly does this.

### Testing

<details>
<summary>Collapsed because it's long</summary>

Verifying this is kinda tricky, so bear with me...

Here's a script that can verify whether calling `clearTimeout` fixes the
leak by using a FinalizationRegistry and triggering GC to observe
whether memory leaked or not.
`setTimeoutWithFix` is a simplified version of `webSetTimeoutPolyfill`
from the PR.

```js
// setTimeout-test.js

if (typeof gc !== 'function') {
  console.log('this test must be run with --expose-gc')
  process.exit(1)
}

function setTimeoutWithFix(callback, ms, ...args) {
  const wrappedCallback = function () {
    try {
      return callback.apply(this, args)
    } finally {
      clearTimeout(timeout)
    }
  }
  const timeout = setTimeout(wrappedCallback, ms)
  return timeout
}

const didFinalize = {}
const registry = new FinalizationRegistry((id) => {
  didFinalize[id] = true
})

{
  const id = 'node setTimeout'.padEnd(26, ' ')

  const timeout = setTimeout(() => {}, 0)

  registry.register(timeout, id)
  didFinalize[id] = false
}

{
  const id = 'node setTimeout as number'.padEnd(26, ' ')

  const timeout = setTimeout(() => {}, 0)
  timeout[Symbol.toPrimitive]()

  registry.register(timeout, id)
  didFinalize[id] = false
}

{
  const id = 'fixed setTimeout'.padEnd(26, ' ')

  const timeout = setTimeoutWithFix(() => {}, 0)

  registry.register(timeout, id)
  didFinalize[id] = false
}

{
  const id = 'fixed setTimeout as number'.padEnd(26, ' ')

  const timeout = setTimeoutWithFix(() => {}, 0)
  timeout[Symbol.toPrimitive]()

  registry.register(timeout, id)
  didFinalize[id] = false
}

// wait for the timeouts to run
void setTimeout(() => {
  gc() // trigger garbage collection
  void registry // ...but make sure we keep the registry alive

  // wait a task so that finalization callbacks can run
  setTimeout(() =>
    console.log('did the Timeout get released after GC?', didFinalize)
  )
}, 10)
```

To run it, Install the required node versions:
```bash
for ver in v20.15.0 v20.16.0 v22.3.0 v22.4.0 v23.0.0; do ( nvm install "$ver" ); done
```

And run the test:
```bash
for ver in v20.15.0 v20.16.0 v22.3.0 v22.4.0 v23.0.0; do
  (
    echo '-------------------'
    nvm use "$ver" && node --expose-gc setTimeout-test.js
    echo
  );
done
```

The output on my machine is as follows. Note that the `node setTimeout
as number` case comes up as false on the older versions (because it
leaks and doesn't get finalized), but `fixed setTimeout as number`
(which calls `clearTimeout`) gets released fine, which is exactly what
we want.

```terminal
-------------------
Now using node v20.15.0 (npm v10.7.0)
did the Timeout get released after GC? {
  'node setTimeout           ': true,
  'node setTimeout as number ': false,
  'fixed setTimeout          ': true,
  'fixed setTimeout as number': true
}

-------------------
Now using node v20.16.0 (npm v10.8.1)
did the Timeout get released after GC? {
  'node setTimeout           ': true,
  'node setTimeout as number ': true,
  'fixed setTimeout          ': true,
  'fixed setTimeout as number': true
}

-------------------
Now using node v22.3.0 (npm v10.8.1)
did the Timeout get released after GC? {
  'node setTimeout           ': true,
  'node setTimeout as number ': false,
  'fixed setTimeout          ': true,
  'fixed setTimeout as number': true
}

-------------------
Now using node v22.4.0 (npm v10.8.1)
did the Timeout get released after GC? {
  'node setTimeout           ': true,
  'node setTimeout as number ': true,
  'fixed setTimeout          ': true,
  'fixed setTimeout as number': true
}

-------------------
Now using node v23.0.0 (npm v10.9.0)
did the Timeout get released after GC? {
  'node setTimeout           ': true,
  'node setTimeout as number ': true,
  'fixed setTimeout          ': true,
  'fixed setTimeout as number': true
}
```
</details>
@Maclay74
Copy link

Maclay74 commented Feb 6, 2025

Hey @lubieowoce (me too), I tested 15.1.6 on 20.16.0 and 22 versions of Node, and 20.16.0 still leaks some memory in our application, whereas 22 is totally stable.

@justinadkins
Copy link

The setTimeout Node issue ended up being the culprit for us! We were using a version of nixpacks which under the hood was using Node 22.3.0. After updating to the latest version which uses 22.10.0 our issue resolved. Thank you @u11d-aleksy-bohdziul for surfacing 🙏

@dymoo
Copy link

dymoo commented Feb 9, 2025

We suffered a memory leak on AWS/ECS ever since 15.1.x - we just downgraded to 15.0.4 and that seems to have solved it

Image

leak on 22.11-bookworm-slim and 23.7-bookworm on ECS ARM fargate next v15.1.6 -- I'm re-running with '--heapsnapshot-signal=SIGUSR2' to see whats up...

Image

@dennieriechelman
Copy link

@u11d-aleksy-bohdziul and @lubieowoce thank you for finding this out and providing a solution.

This morning we updated to node:23.7.0-alpine with Next 15.1.6 and we are not seeing memory leaks anymore! 🚀

@maxigs7
Copy link

maxigs7 commented Feb 10, 2025

@u11d-aleksy-bohdziul @lubieowoce awesome finding.

We updated it to node 22.12 and everything looks fine at the moment.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear: next Confirmed issue that is tracked by the Next.js team.
Projects
None yet
Development

No branches or pull requests