Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replay "message size too large" error #23703

Closed
2 of 6 tasks
pauldambra opened this issue Jul 14, 2024 · 19 comments
Closed
2 of 6 tasks

Replay "message size too large" error #23703

pauldambra opened this issue Jul 14, 2024 · 19 comments
Labels
bug Something isn't working right feature/replay Features for Team Replay

Comments

@pauldambra
Copy link
Member

pauldambra commented Jul 14, 2024

Bug Description

We see these for multiple reasons
We can't ingest them
When we can't ingest a recording snapshot the chance of an unplayable recording is high
We need to minimise these

Kafka checks the size of a message before it compresses it so we need to be under 10MB un compressed to be ingestible

We're gathering samples of these messages so that we can identify improvements

file size lib version reason things to try tried in
67MB 1.146.1 426k attribute adds * split incrementals in the SDK v1.148.0
36MB 1.146.0 massive data image urls * redact large image urls
34MB 1.146.1 250k attribute adds * split incrementals in the SDK v1.148.0
15MB 1.145.0 lots of inlined css * compress the text * send inlined css a different way
11MB 1.86.1 many small incremental snapshots * split the snapshots in capture * get customer to upgrade
problems seen count
super large rrweb data structure 5
large data urls 8
lots of inlined CSS 3
lots of small snapshots 1

TODO

only works for updated clients

  • redact large data urls in the SDK
  • split large arrays of rrweb data in the SDK
  • split large incremental snapshots in the SDK

### works for all clients

  • increase acceptable message size in ingestion (20MB HTTP (gzip compressed), 64MB WarpStream (not gzip compressed)
  • compress in capture so warpstream/kafka client is accepting or not based on compressed size
  • compress (full) snapshots throughout the pipeline
@daibhin
Copy link
Contributor

daibhin commented Jul 15, 2024

The new asset stuff from rrweb might help a lot with sending the inlined CSS in a different way rrweb-io/rrweb#1475

@sudo-eugene
Copy link

Does this issue have anything to do with maskAllInputs: true or maskTextSelector: "*"? I've tried masking everything I possibly can but I'm still getting the error with unplayable recording

@pauldambra
Copy link
Member Author

Hey @sudo-eugene,

Normally that's not the issue here. The best bet is to report the problem using the in-app support flow since that lets us look into you recordings specifically to see what's best. Ideally if you can include example recordings that'd be awesome

@sudo-eugene
Copy link

@pauldambra this doesn't seem to be an issue in my PH cloud instance. It's only on my Self-Hosted instance that this happens. The issue report is, for reasons I understand, not available in the self-hosted version, hence my post here.

@pauldambra
Copy link
Member Author

Ah, it's really hard to debug self-hosted since everyone's setup can vary so much. You can see in this issue the reasons we see this trigger. With large data urls and large amounts of CSS being the next largest un-addressed items.

@sudo-eugene
Copy link

Yes I suspect the CSS could be an issue our side. Could this potentially be due to inline CSS, or css files, or both?

Is there a way to disable/exclude CSS that I know won't be of value in our recordings?

@daibhin
Copy link
Contributor

daibhin commented Sep 4, 2024

@sudo-eugene we mostly recommend to people with very large CSS bundles that they skip inlining it. This can be done when initializing the SDK:

posthog.init("API_KEY", {
	session_recording: {
        inlineStylesheet: false,
    }
})

Just so you know, this will mean that the files will be fetched during playback. If there are no longer available or have changed since capture your recordings might be unstyled

@lessless
Copy link

lessless commented Sep 4, 2024

Just FYI: Sentry doesn't have any issues capturing same sessions

@sudo-eugene
Copy link

I've added all of these and still getting the error. I'll keep debugging to see if I can find anything, or get a base version working

        inlineStylesheet: false,
        maskAllInputs: true,
        maskTextSelector: "*"

@daibhin
Copy link
Contributor

daibhin commented Sep 16, 2024

@sudo-eugene hmm sounds like it's not your CSS in that case. Would you mind opening a support ticket in-app or emailing me directly (david at posthog) so I can look into the specifics of your account

@flynet70
Copy link

flynet70 commented Oct 8, 2024

Same problem
I added
inlineStylesheet: false,
maskAllInputs: true,
maskTextSelector: "*"

to self hosted posthog

But still see "This session recording had recording data that was too large and could not be captured. This will mean playback is not 100% accurate."

@pauldambra
Copy link
Member Author

Hey @flynet70 are you running the latest posthog including posthog-js v 1.166.x?

We're consistently improving these ingestion routes (or trying to :))

@flynet70
Copy link

flynet70 commented Oct 9, 2024

@pauldambra thanks for answer!

Yes. I upgraded via
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/posthog/posthog/HEAD/bin/upgrade-hobby)"

posthog-js
https://posthog.mydomain.com/static/recorder.js?v=1.166.0 loads

Session replay works on main domain but 'too large' when I redirect to subdomain (billing system)
Checked in Chromium and Firefox.

But when i switch to cloud (api_host:'https://eu.i.posthog.com') - everything works well. Session replay works in billing subdomain too. What is difference?

@flynet70
Copy link

flynet70 commented Oct 9, 2024

I tried compare javascript loaded: cloud vs my host based

/static/array.js - identical
/static/recorder.js?v=1.166.0 - identical
/decide/?v=3&ip=1&_=1728447089625&ver=1.166.0&compression=base64 - has difference
left - cloud, right my host based

Desktop-screenshot-10-09-2024_11_16_AM

@sudo-eugene
Copy link

sudo-eugene commented Oct 9, 2024

I though perhaps is was the Kafka message size limit, so I set this in my docker-compose.yml, but it didn't work

            KAFKA_CFG_MESSAGE_MAX_BYTES: 67108864         # 64 MB
            KAFKA_CFG_REPLICA_FETCH_MAX_BYTES: 67108864   # 64 MB
            KAFKA_CFG_MAX_PARTITION_FETCH_BYTES: 67108864 # 64 MB
            KAFKA_CFG_MAX_REQUEST_SIZE: 67108864          # 64 MB

Screenshot 2024-10-09 at 07 54 04

@flynet70
Copy link

@pauldambra
Do you have any idea?
May be we (with @sudo-eugene) need adjust some environment variables ?
Now i use default

@pauldambra
Copy link
Member Author

I agree the next thing to try would be to increase message size over kafka. Assuming the kafka deployment is updated to accept larger messages you also need to set an environment variable to let the capture API know it's ok to ingest larger messages

SESSION_RECORDING_KAFKA_MAX_REQUEST_SIZE_BYTES in the web service

@sudo-eugene
Copy link

@flynet70 @pauldambra I added the SESSION_RECORDING_KAFKA_MAX_REQUEST_SIZE_BYTES in addition to the KAFKA_CFG* and it seems like it's working!

Thanks @pauldambra for the pointer.

Here's the docker-compose.yml:

services:
    ...
    kafka:
        extends:
            file: docker-compose.base.yml
            service: kafka
        depends_on:
            - zookeeper
        environment:
            KAFKA_LOG_RETENTION_MS: 3600000
            KAFKA_LOG_RETENTION_CHECK_INTERVAL_MS: 300000
            KAFKA_LOG_RETENTION_HOURS: 1
            KAFKA_CFG_MESSAGE_MAX_BYTES: 67108864         # Added 64MB to avoid "Message too large" error
            KAFKA_CFG_REPLICA_FETCH_MAX_BYTES: 67108864   # Added 64MB to avoid "Message too large" error
            KAFKA_CFG_MAX_PARTITION_FETCH_BYTES: 67108864 # Added 64MB to avoid "Message too large" error
            KAFKA_CFG_MAX_REQUEST_SIZE: 67108864          # Added 64MB to avoid "Message too large" error
        volumes:
            - kafka-data:/bitnami/kafka

    ...
    web:
        extends:
            file: docker-compose.base.yml
            service: web
        command: /compose/start
        volumes:
            - ./compose:/compose
        image: posthog/posthog:latest
        environment:
            ...
            SESSION_RECORDING_KAFKA_MAX_REQUEST_SIZE_BYTES: 67108864 # Added 64MB to avoid "Message too large" error
        depends_on:
            - db
            - redis
            - clickhouse
            - kafka
            - objectstorage
    ...

@pauldambra
Copy link
Member Author

awesome... i'm going to close this so folk coming by see the resolution

(although internally we're always working on improving this!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right feature/replay Features for Team Replay
Development

No branches or pull requests

6 participants