Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GraphQL (graphene) transactions are slow with newrelic from v6.6.0.162 and above #605

Open
trimailov opened this issue Aug 16, 2022 · 8 comments

Comments

@trimailov
Copy link

trimailov commented Aug 16, 2022

Description

Upgrading anything above v6.6.0.162 will make our GraphQL endpoints slower by some margin.

From stable ~30ms to hovering between 50-100ms.

Also CPU and RAM utilization is also noticable - RAM use increased by ~20% from previous total and CPU use is additional ~5%. So for example if we used 50% RAM - we now use 60% and if we used 20% CPU - we now use 25% CPU.

I have tried to run these newrelic agent versions: v7.16.0.178, v7.14.0.177, v6.10.0.165, v6.6.0.162 - all impacted response times in same way.

I was upgrading from: v6.2.0.156 and settled at v6.4.4.161 which is latest version without graphql support (to my knowledge), but it is not impacting my applications performance.

Also - important to note that <= v6.4.4.161 reports graphql transactions as web transaction named as GraphQL endpoint view class (i.e. Django view class that serves GraphQL API), which is expected, though versions with GraphQL support (>v6.6.0.162) are reporting graphql transactions as graphene.types.resolver:dict_or_attr_resolver.

Expected Behavior

There shouldn't be noticable changes in our web servers response times and CPU/RAM utilization.

Troubleshooting or NR Diag results

Output from NR Diag. I think it's all OK, as .NET errors or Java warnings are not relevant for the project.

> ./nrdiag_x64 -c newrelic.ini

Check Results
-------------------------------------------------

Info     Base/Env/CollectEnvVars [Gathered Environment variables of current shell.]
Success  Base/Config/Collect
Success  Base/Config/Validate
Success  Base/Config/LicenseKey
Success  Base/Config/AppName
Success  Python/Config/Agent
Success  Base/Log/Copy
Info     Python/Env/Dependencies [Collected pip freeze. See pipFreeze.txt for more i...]
Info     Base/Containers/DetectDocker [Docker Daemon is Running]
Info     Base/Env/HostInfo [Collected host information]
Error    DotNetCore/Env/Versions
Success  Base/Config/ValidateLicenseKey
Info     Base/Config/RegionDetect [1 unique New Relic region(s) detected from config.]
Info     Python/Env/Version [3.9.2]
Success  Base/Collector/ConnectEU
Warning  Java/AppServer/WebSphere
Success  Base/Config/ValidateHSM
Success  Python/Requirements/Webframework

Steps to Reproduce

I will try to create minimal example application to be able to reproduce this, but it's a Django application with graphene-django used for implementing GraphQL API, with minimal djangorestframework endpoint to serve it.

Your Environment

Basic app package requirements (python 3.9.2):

graphene==2.1.8
graphene-django==2.13.0
graphql-core==2.3.2
graphql-relay==2.0.0
gunicorn==19.9.0
newrelic==7.16.0.178   # as said above behavior can be reproduce with anything from v6.6.0.162
Django==3.1.7  # I know it's old, but I was in process of upgrading packages, that's how I caught this problem
djangorestframework==3.12.1

Web app is served with gunicorn and started with newrelic-admin:

NEW_RELIC_ENVIRONMENT=$ENV NEW_RELIC_CONFIG_FILE=newrelic.ini newrelic-admin run-program gunicorn -c ./gunicorn.conf.py app.wsgi:application

To my understanding newrelic.ini is not configured beyond default, setting up API key and enabling environments for staging and production.

Additional context

Looking at newrelic's transaction details I notice that the mentioned graphene.types.resolver:dict_or_attr_resolver is called hundreds of times.

Screenshot 2022-08-16 at 19 41 22

I am not sure if this is newrelic's bug or (more likely) misimplementation of our graphene application that breaks newrelic.

@stale
Copy link

stale bot commented Oct 15, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 15, 2022
@Ak-x
Copy link

Ak-x commented Nov 16, 2022

Hey @trimailov; is this still an issue? Have you been able to recreate a sharable environment to validate this?

@stale stale bot removed the stale label Nov 16, 2022
@trimailov
Copy link
Author

It's still an issue, though have no shareable environment at hand :(

@Ak-x
Copy link

Ak-x commented Dec 19, 2022

@trimailov without a sharable environment this will be difficult for our team to debug. I will close this ticket until we have more data, additional users are experiencing the issue, or we have an environment our team can work with.

@Ak-x Ak-x closed this as completed Dec 19, 2022
@tdg5
Copy link

tdg5 commented Apr 10, 2023

I believe I am having this issue as well. I also don't have a sharable environment I can offer at this point, but I added new relic to a graphene application today and found that all graphql requests are 3-6x slower. An example request goes from taking about 2s without newrelic to 8-10s with new relic.

It seems like it is a problem even with a lot of the new relic agent disabled. Currently I disable all these things in my newrelic.ini and I still suffer a significant performance regression:

application_logging.enabled = false
attributes.enabled = false
browser_monitoring.auto_instrument = false
code_level_metrics.enabled = false
cross_application_tracer.enabled = false
custom_insights_events.enabled = false
distributed_tracing.enabled = false
error_collector.enabled = false
event_loop_visibility.enabled = false
span_events.enabled = false
strip_exception_messages.enabled = false
thread_profiler.enabled = false
transaction_events.enabled = false
transaction_tracer.explain_enabled = false
transaction_tracer.enabled = false

Is there more I could try disabling? If I set monitor_mode = false the problem goes away, but I can't make it go away otherwise.

@tdg5
Copy link

tdg5 commented May 30, 2023

@Ak-x , may I request that this issue be reopened? Here's an example project where I've tried to reproduce the issue: https://github.com/tdg5/newrelic-python-agent-graphene-issue#results-with-new_relic_license_key-ouch

I was able to reliably achieve a slowdown >10x for the 99 percentile. I will try to simplify the example further as time allows.

Thanks to @Anton-Karpenko for putting together the basic application structure so I could try to reproduce the problem!

I spent a couple of hours looking for clues in one of the packages involved, but didn't find a smoking gun.

If I can offer any additional guidance or assistance, please let me know.

@rajanbharti
Copy link

I think, this is still an issue, we are using strawberry with fastapi, facing same problem. response time increased from ~50ms to ~300ms

@mehmetakyuz
Copy link

Ultimately the following settings in newrelic.ini fixed the performance regression for us. For a large response with many thousands of graphql objects and nesting we achieved a x3 improvement on latency.

[import-hook:graphql.execution.middleware]
enabled = false

[import-hook:graphql.execution.execute]
enabled = false

[import-hook:graphql.execution.executor]
enabled = false

[import-hook:graphql.graphql]
enabled = false

[import-hook:graphql]
enabled = false

[import-hook:graphene]
enabled = false

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants