-
Notifications
You must be signed in to change notification settings - Fork 23
Feature/sdk reporting #316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
...es/module-mongodb-storage/src/migrations/db/migrations/1752661449910-connection-reporting.ts
Outdated
Show resolved
Hide resolved
.../module-mongodb-storage/src/storage/implementation/MongoTestReportStorageFactoryGenerator.ts
Outdated
Show resolved
Hide resolved
export type SubscribeEvents = { | ||
[EventsEngineEventType.SDK_CONNECT_EVENT]: ClientConnectionEventData; | ||
[EventsEngineEventType.SDK_DISCONNECT_EVENT]: ClientDisconnectionEventData; | ||
[EventsEngineEventType.SDK_DELETE_OLD]: DeleteOldConnectionData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What emits this event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this instance its the report module.
packages/types/src/reports.ts
Outdated
client_id?: string; | ||
user_id: string; | ||
user_agent?: string; | ||
jwt_exp?: Date; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make this required - added comments elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a side effect of the Context object in most part
user_id = ${{ type: 'varchar', value: user_id }} | ||
AND client_id = ${{ type: 'varchar', value: client_id }} | ||
AND connected_at = ${{ type: 1184, value: connectIsoString }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the probability is low, there could be multiple connections with the same (user_id, client_id, connected_at)
combination. I'd recommend using a unique id per connection instead. Two options:
- Use the existing
rid
(request id) we generate for the logger - that could perhaps form the primary key here. This would be nice in that we could correlate these connections with the logs, but it would require some refactoring to make that id available in the request context. - Return the id from the
reportClientConnection
event, then use that id when disconnecting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally used that, but when we moved to updating an existing document. It was no longer viable option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the probability is low, there could be multiple connections with the same
(user_id, client_id, connected_at)
combination. I'd recommend using a unique id per connection instead. Two options:
- Use the existing
rid
(request id) we generate for the logger - that could perhaps form the primary key here. This would be nice in that we could correlate these connections with the logs, but it would require some refactoring to make that id available in the request context.- Return the id from the
reportClientConnection
event, then use that id when disconnecting.
Could you give me an example when this (user_id, client_id, connected_at) might occur?
My understanding is the client_id is basically linked to the client db and would only change if there was a reinstall of the application... So would that mean that there is an instance where an application could be installed by the same user on a different device connected to the same instance and have the same client_id generated but with a different sdk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most cases the client_id should be unique, but we don't have any guarantees on that. For example, the user can copy the database to another device (maybe not on Android/iOS, but on desktop it's easy), in which case the same client_id would be present on both. Older SDKs do not send client_id at all. And in some cases, the user_id is not unique either.
When combining with the connect_at timestamp it should be very rare that you get duplicates, but it is possible.
That said, since we're only "pre-aggregating" these connections per (user_id, client_id, day) anyway, I guess that doesn't really make a difference, and we can keep this as-is.
client_id, | ||
user_id, | ||
connected_at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as for postgres - we should use connection ids here, rather than assume this combination is unique.
}); | ||
} | ||
|
||
describe('SDK reporting storage', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests should also specifically include a case where client_id
is undefined.
}); | ||
} | ||
|
||
describe('Connection report storage', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as for MongoDB, tests should also specifically include a case where client_id
is undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, is it feasible for these tests to primarily use the public storage APIs, rather than raw SQL? That way the tests could be shared between the Postgres and MongoDB implementations, potentially only having small number of db-specific tests if you do still need the lower-level access in some places.
$gte: new Date(year, month, today), | ||
$lt: new Date(year, month, nextDay) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This depends on the timezone of the server - not sure if that is what we want?
But I'm also wondering - do we specifically want this behavior ("if the connection event is within the same day the stored initial connection is updated")? Why not store these as separate connection events, and then handle any other logic in the aggregations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dylan was concerned about storing the connections as logs in the ps service db, due to volume. Considering we are using per day, per week and per month, we only need day granularity. When a user refreshes the web sdk it actually causes a disconnect and a reconnect which would make multiple copies withing very a short period if they refresh a few times. So to reduce the redundant connections by the same user I did it this way. Which still gives us the minimal per day granularity.
The date objects get converted to UTC time by Mongo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, we can keep it as daily connections for now - it looks like we can change that later if needed without too much effort.
The date objects get converted to UTC time by Mongo.
If we do keep daily values, we need to be very explicit about time zones. As is, if I run the server in UTC+2, the filter here would use 2025-08-21T22:00:00.000Z
. If the reporting job happens to use a different timezone, it would not match up with the way it is persisted here.
Now in our hosted environment at least, it is likely that the service runs in the UTC timezone already, and this won't make a difference. But if we're relying on that, it is better to be explicit - use UTC timezones here (new Date(Date.UTC(year, month, today))
), and document that UTC-based timestamps should be used when querying the data.
Connection SDK reporting.
Stores sdk data for user connections on the instance. Open edition storage is disabled by default, so no impact on attached databases. Event emitter engine is extendable for other use cases as it is attached to the service context.
Additions:
Basic flow: