-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory spike issue with Next.js 15.1.4 on Azure #74855
Comments
We are facing a similar issue in our kubernetes pods. With Nextjs 15.1.3 the memory consumption was/is fine and the pods run as expected, but with the update to 15.1.4 the pods want to consume more and more memory and die at some point. Normally our pods use 110mb memory and are fine with that. With the update to Nextjs 15.1.4 they start with 115 and the memory get up to 300mb, then they die and it starts over again. From the graphs it looks like a memory leak or at least something that consumes more memory then necessary over time. @skerdi03 did you try version 15.1.3 as well. And do you face the same issues there? |
We are facing the same issue with the |
Yes, we had the same issue with 15.1.3 as well |
We've experienced memory issues since 15.1.x, 15.0.x is stable for us. Deployed on AWS via FlightControl. We're running Node 18 right now. |
We've identified one leak that became much more noticeable in |
The fix mentioned above is landed in v15.1.6, please upgrade and let us know if that fixes your issue 🙏 Thanks |
As i can see on the graphs of memory usage the problem has been fixed in 15.1.6 |
I have noticed that it's been climbing much slower than previously, but I can definitely still see more RAM usage compared to 15.0.x and a much higher jump compared to 14.x. On 14.x, we would average around 300-400mb per pod. On 15.0.x the average would be around 400-500mb. But I've been seeing 600mb+ each time I check my pods. Something's not quite right. |
We tried the same 15.1.6, but it is still happening, even though we created a container to measure it and the memory usage is increasing. We finally downgraded it to version 15.0.3, and everything stabilized. We are not using any server-side feature, only the pages router, and with no getServerSide props. |
@justinadkins ![]() |
@DonikaV I have taken a look at that documentation and done some troubleshooting. I don't have a memory leak in |
We're also seeing quite a bad memory leak on 15.1.6. It's not as severe as 15.1.4, but it's still quite problematic, such that we had to revert the upgrade back again. Let me know if it's helpful for us to measure the requests vs memory usage, to give you an idea of scale? |
For us the memory consumption is again at 110mb. We had one weird peak up to 300mb, but I think that had nothing to do with nextjs. Since the update to 15.1.6 the Pods run fine again. If you need further information how we are using nextjs let me know. |
We are experiencing exactly the same on two of our ecommerce sites with medium/low traffic (around 2M request per day). Please see screenshot for both apps. We were running 15.0.3 und deployed 15.1.6. Some more info:
I have tried to do some heap dump analysis but did not get far. If there is any more info that could help you please let me know. I would really like to be able to update our app. |
As stated above, we are still affected by memory leaks on 15.1.6. Our project is using:
|
Memory leakage occurs at 15.1.6 due to the same issue. Logically, it is the smae phenomnon when trying only the return null in rootLayout additional information |
Have you tried to turn off Sentry? In our case 15.0.4 is runnable with Sentry/Next 8.5X.X but every other version leaks memory. But without Sentry, even 15.1.6 looks much better. |
a standalone app I deployed on my cluster usually uses around 100MB, it's balooning to 2GB, never used sentry |
Hey, we have the same problem on 15.1.6. I noticed that it happens once I add middleware to the application. |
We are experiencing the same issues, running in standalone nextjs using middleware on our kubernetes cluster. When we downgraded to 15.0.4 the memory leak was gone. |
Thanks @Maclay74 . I also came to the same conclusion. What exactly did you add to the middleware? |
Literally nothing, just an empty function. The fact that the middleware presents in the app causes leaks, apparently. |
In our case, upgrading the NodeJS version from 20.15.1 to 23.6.1 seems to have fixed the issue. We're running our nextjs server on ECS Fargate and we noticed a memory issue during a stress tests, where we suddenly had a much higher memory usage, which didn't drop down after the tests finished. I was able to reproduce this issue in both Next.js 15.1.6 and 15.0.4, but in 15.0.4 the issue only appears after setting the flag next.js/packages/next/src/server/web/adapter.ts Lines 299 to 301 in 58e1bd2
For some reason, the Timeout objects created by this setTimeout call were never cleaned up by NodeJS, and they also prevented quite a few other objects from being garbage collected. Here’s a screenshot of the heap snapshot from our Next.js server, taken about 2 hours after the stress tests ended: ![]() Each of those nearly 20 000 Timeouts is created by the setTimeout invocation mentioned above. All of them are marked as destroyed, and are only retained by ![]() |
@u11d-aleksy-bohdziul Thank you so much for tracking this down! I believe this might be caused by this nodejs bug: AFAICT the fix for that was released in:
For anyone still experiencing leaks on Next.js 15.1.6, please try updating your node to one of the above versions and see if that solves it. |
Solve thanks |
Potential fix for a leak reported in #74855 on older node versions (see [comment](#74855 (comment))). ### Background When running middleware (or other edge functions) in `next start`, we wrap them in an edge runtime sandbox. this includes polyfills of `setTimeout` and `setInterval` which return `number` instead of `NodeJS.Timeout`. Unfortunately, on some older node versions, converting a `NodeJS.Timeout` to a number will cause that timeout to leak: nodejs/node#53335 The leaked timeout will also hold onto the callback, thus also leaking anything that was closed over (which can be a lot of things!) ### Solution Ideally, users just upgrade to a Node version that includes the fix: - [node v20.16.0](nodejs/node#53945) - [node v22.4.0](nodejs/node#53583) - node v23.0.0 But we're currently still supporting node 18, so we can't necessarily rely on that. Luckily, as noted in the description of the nodejs issue, calling `clearTimeout` seems to unleak the timeout, so we can just do that after the callback runs! ### Unrelated stuff I did While i was at it, I also fixed a (very niche) discrepancy from how `setTimeout` and `setInterval` behave on the web. when running the callback, node sets `this` to the Timeout instance: ```js > void setTimeout(function () {console.log('this in setTimeout:', this) } ) undefined > this in setTimeout: Timeout { ... } ``` but on the web, `this` is always set to `globalThis`. Our wrapper now correctly does this. ### Testing <details> <summary>Collapsed because it's long</summary> Verifying this is kinda tricky, so bear with me... Here's a script that can verify whether calling `clearTimeout` fixes the leak by using a FinalizationRegistry and triggering GC to observe whether memory leaked or not. `setTimeoutWithFix` is a simplified version of `webSetTimeoutPolyfill` from the PR. ```js // setTimeout-test.js if (typeof gc !== 'function') { console.log('this test must be run with --expose-gc') process.exit(1) } function setTimeoutWithFix(callback, ms, ...args) { const wrappedCallback = function () { try { return callback.apply(this, args) } finally { clearTimeout(timeout) } } const timeout = setTimeout(wrappedCallback, ms) return timeout } const didFinalize = {} const registry = new FinalizationRegistry((id) => { didFinalize[id] = true }) { const id = 'node setTimeout'.padEnd(26, ' ') const timeout = setTimeout(() => {}, 0) registry.register(timeout, id) didFinalize[id] = false } { const id = 'node setTimeout as number'.padEnd(26, ' ') const timeout = setTimeout(() => {}, 0) timeout[Symbol.toPrimitive]() registry.register(timeout, id) didFinalize[id] = false } { const id = 'fixed setTimeout'.padEnd(26, ' ') const timeout = setTimeoutWithFix(() => {}, 0) registry.register(timeout, id) didFinalize[id] = false } { const id = 'fixed setTimeout as number'.padEnd(26, ' ') const timeout = setTimeoutWithFix(() => {}, 0) timeout[Symbol.toPrimitive]() registry.register(timeout, id) didFinalize[id] = false } // wait for the timeouts to run void setTimeout(() => { gc() // trigger garbage collection void registry // ...but make sure we keep the registry alive // wait a task so that finalization callbacks can run setTimeout(() => console.log('did the Timeout get released after GC?', didFinalize) ) }, 10) ``` To run it, Install the required node versions: ```bash for ver in v20.15.0 v20.16.0 v22.3.0 v22.4.0 v23.0.0; do ( nvm install "$ver" ); done ``` And run the test: ```bash for ver in v20.15.0 v20.16.0 v22.3.0 v22.4.0 v23.0.0; do ( echo '-------------------' nvm use "$ver" && node --expose-gc setTimeout-test.js echo ); done ``` The output on my machine is as follows. Note that the `node setTimeout as number` case comes up as false on the older versions (because it leaks and doesn't get finalized), but `fixed setTimeout as number` (which calls `clearTimeout`) gets released fine, which is exactly what we want. ```terminal ------------------- Now using node v20.15.0 (npm v10.7.0) did the Timeout get released after GC? { 'node setTimeout ': true, 'node setTimeout as number ': false, 'fixed setTimeout ': true, 'fixed setTimeout as number': true } ------------------- Now using node v20.16.0 (npm v10.8.1) did the Timeout get released after GC? { 'node setTimeout ': true, 'node setTimeout as number ': true, 'fixed setTimeout ': true, 'fixed setTimeout as number': true } ------------------- Now using node v22.3.0 (npm v10.8.1) did the Timeout get released after GC? { 'node setTimeout ': true, 'node setTimeout as number ': false, 'fixed setTimeout ': true, 'fixed setTimeout as number': true } ------------------- Now using node v22.4.0 (npm v10.8.1) did the Timeout get released after GC? { 'node setTimeout ': true, 'node setTimeout as number ': true, 'fixed setTimeout ': true, 'fixed setTimeout as number': true } ------------------- Now using node v23.0.0 (npm v10.9.0) did the Timeout get released after GC? { 'node setTimeout ': true, 'node setTimeout as number ': true, 'fixed setTimeout ': true, 'fixed setTimeout as number': true } ``` </details>
Hey @lubieowoce (me too), I tested 15.1.6 on 20.16.0 and 22 versions of Node, and 20.16.0 still leaks some memory in our application, whereas 22 is totally stable. |
The |
@u11d-aleksy-bohdziul and @lubieowoce thank you for finding this out and providing a solution. This morning we updated to node:23.7.0-alpine with Next 15.1.6 and we are not seeing memory leaks anymore! 🚀 |
@u11d-aleksy-bohdziul @lubieowoce awesome finding. We updated it to node 22.12 and everything looks fine at the moment. Thank you! |
Verify canary release
Provide environment information
Operating System: Platform: darwin Arch: arm64 Version: Darwin Kernel Version 24.2.0: Fri Dec 6 18:51:28 PST 2024; root:xnu-11215.61.5~2/RELEASE_ARM64_T8112 Available memory (MB): 16384 Available CPU cores: 8 Binaries: Node: 20.13.1 npm: 10.8.1 Yarn: 1.22.22 pnpm: N/A Relevant Packages: next: 15.1.4 // Latest available version is detected (15.1.4). eslint-config-next: 14.2.3 react: 18.3.1 react-dom: 18.3.1 typescript: 5.4.5 Next.js Config: output: standalone
Which example does this report relate to?
This issue is not related to any specific example in the examples folder. The problem occurs in a general Next.js application deployed on Azure.
What browser are you using? (if relevant)
No response
How are you deploying your application? (if relevant)
No response
Describe the Bug
We are experiencing a significant memory spike and auto-scaling issues when using Next.js 15.1 in our Azure deployments. Memory usage increases unpredictably under typical traffic conditions, leading to higher resource utilization and triggering unnecessary auto-scaling.
When downgrading to Next.js 14.2, these issues are resolved, and memory usage returns to stable levels. This suggests a regression introduced in version 15.1.
Graphs comparing memory usage for versions 15.1 and 14.2 are attached below for reference.
Expected Behavior
Memory usage should remain stable and consistent under typical traffic conditions when using Next.js 15.1, similar to the behavior observed in Next.js 14.2.
To Reproduce
Deploy a Next.js 15.1 application on Azure with typical production traffic patterns.
Monitor the memory usage and auto-scaling behavior using Azure's monitoring tools.
Observe that memory usage increases significantly and unpredictably, causing auto-scaling to trigger even under normal load.
Downgrade the application to Next.js 14.2.
Re-monitor the application, noticing that memory usage stabilizes and auto-scaling behaves as expected.
The text was updated successfully, but these errors were encountered: