Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set network_userid to empty UUID in anonymous mode to prevent collector_payload_format_violation #127

Closed
wants to merge 1 commit into from

Conversation

paulboocock
Copy link
Contributor

When anonymous tracking is enabled in the JavaScript Tracker, as of version 2.1.2 we set the network_user to an empty string, which leads to collector_payload_format_violation errors downstream due to "" being an invalid UUID.

JavaScript Tracker (v2.17.3) example:

  <script type="text/javascript" async=1>
    ; (function (p, l, o, w, i, n, g) {
      if (!p[i]) {
        p.GlobalSnowplowNamespace = p.GlobalSnowplowNamespace || [];
        p.GlobalSnowplowNamespace.push(i); p[i] = function () {
          (p[i].q = p[i].q || []).push(arguments)
        }; p[i].q = p[i].q || []; n = l.createElement(o); g = l.getElementsByTagName(o)[0]; n.async = 1;
        n.src = w; g.parentNode.insertBefore(n, g)
      }
    }(window, document, "script", "https://cdn.jsdelivr.net/gh/snowplow/[email protected]/sp.js", "snowplow"));
  </script>

  <script>
    window.snowplow('newTracker', 'sp1', '{{COLLECTOR_URL}}', {
      appId: 'simple-test-1'
    });

    window.snowplow('enableAnonymousTracking', { withServerAnonymisation: true });
    window.snowplow('trackPageView');
  </script>

should have a network_userid of 00000000-0000-0000-0000-000000000000.

@paulboocock paulboocock requested a review from dilyand March 16, 2021 11:16
@paulboocock paulboocock self-assigned this Mar 16, 2021
@dilyand
Copy link
Contributor

dilyand commented Mar 18, 2021

Hey @paulboocock , this looks good. However I feel that it's unnecessarily late to swap the nuid with a dummy value when we're building the event. To me it seems more appropriate to do it when we're extracting the nuid from the request / cookie here: https://github.com/snowplow/stream-collector/blob/master/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala#L115.

This can be achieved with a change in the networkUserId function: https://github.com/snowplow/stream-collector/blob/master/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala#L460

def networkUserId(
    request: HttpRequest,
    requestCookie: Option[HttpCookie],
    spAnonymous: Option[String]
  ): Option[String] =
    spAnonymous match {
      case Some(_) => request.uri.query().get("nuid").orElse(requestCookie.map(_.value))
      case None    => Some("00000000-0000-0000-0000-000000000000")
    }

WDYT?

@paulboocock
Copy link
Contributor Author

That seems reasonable, keeps the nuid "calculatation" in the right function. The only reason I put it so far down the chain was so we didn't accidentally write a nuid when we shouldn't in the future but with some good unit tests this is likely not much of a risk.

@dilyand
Copy link
Contributor

dilyand commented Mar 18, 2021

Closing in favour of #124.

@dilyand dilyand closed this Mar 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants