Feature/sdk reporting #316

EspressoTrip-v2 · 2025-08-04T09:42:01Z

Connection SDK reporting.
Stores sdk data for user connections on the instance. Open edition storage is disabled by default, so no impact on attached databases. Event emitter engine is extendable for other use cases as it is attached to the service context.

Additions:

Event emitter engine
Posgres report storage factory
Mongo report storage factory

Basic flow:

User connects, user information, jwt and meta are stored
User connection refreshes or reconnects, if the connection event is within the same day the stored initial connection is updated. Else new connection is stored
User disconnects connection is updated as disconnected, using the previously connected time for long running connections

packages/types/src/events.ts

modules/module-postgres-storage/src/types/models/SdkReporting.ts

packages/service-core/src/storage/StorageProvider.ts

packages/types/src/events.ts

packages/service-core/src/system/ServiceContext.ts

packages/service-core/src/routes/endpoints/sync-stream.ts

packages/service-core/src/routes/endpoints/socket-route.ts

packages/service-core/src/emitters/EmitterEngine.ts

...es/module-mongodb-storage/src/migrations/db/migrations/1752661449910-connection-reporting.ts

.../module-mongodb-storage/src/storage/implementation/MongoTestReportStorageFactoryGenerator.ts

packages/types/src/reports.ts

rkistner · 2025-08-21T09:10:33Z

packages/types/src/reports.ts

+export type SubscribeEvents = {
+  [EventsEngineEventType.SDK_CONNECT_EVENT]: ClientConnectionEventData;
+  [EventsEngineEventType.SDK_DISCONNECT_EVENT]: ClientDisconnectionEventData;
+  [EventsEngineEventType.SDK_DELETE_OLD]: DeleteOldConnectionData;


What emits this event?

In this instance its the report module.

rkistner · 2025-08-21T09:15:48Z

packages/types/src/reports.ts

+  client_id?: string;
+  user_id: string;
+  user_agent?: string;
+  jwt_exp?: Date;


We can make this required - added comments elsewhere.

This is a side effect of the Context object in most part

packages/service-core/src/routes/endpoints/socket-route.ts

rkistner · 2025-08-21T09:44:04Z

modules/module-postgres-storage/src/storage/PostgresReportStorageFactory.ts

+        user_id = ${{ type: 'varchar', value: user_id }}
+        AND client_id = ${{ type: 'varchar', value: client_id }}
+        AND connected_at = ${{ type: 1184, value: connectIsoString }}


While the probability is low, there could be multiple connections with the same (user_id, client_id, connected_at) combination. I'd recommend using a unique id per connection instead. Two options:

Use the existing rid (request id) we generate for the logger - that could perhaps form the primary key here. This would be nice in that we could correlate these connections with the logs, but it would require some refactoring to make that id available in the request context.

Return the id from the reportClientConnection event, then use that id when disconnecting.

I originally used that, but when we moved to updating an existing document. It was no longer viable option.

While the probability is low, there could be multiple connections with the same (user_id, client_id, connected_at) combination. I'd recommend using a unique id per connection instead. Two options:

Use the existing rid (request id) we generate for the logger - that could perhaps form the primary key here. This would be nice in that we could correlate these connections with the logs, but it would require some refactoring to make that id available in the request context.

Return the id from the reportClientConnection event, then use that id when disconnecting.

Could you give me an example when this (user_id, client_id, connected_at) might occur?
My understanding is the client_id is basically linked to the client db and would only change if there was a reinstall of the application... So would that mean that there is an instance where an application could be installed by the same user on a different device connected to the same instance and have the same client_id generated but with a different sdk?

In most cases the client_id should be unique, but we don't have any guarantees on that. For example, the user can copy the database to another device (maybe not on Android/iOS, but on desktop it's easy), in which case the same client_id would be present on both. Older SDKs do not send client_id at all. And in some cases, the user_id is not unique either.

When combining with the connect_at timestamp it should be very rare that you get duplicates, but it is possible.

That said, since we're only "pre-aggregating" these connections per (user_id, client_id, day) anyway, I guess that doesn't really make a difference, and we can keep this as-is.

rkistner · 2025-08-21T09:44:36Z

modules/module-mongodb-storage/src/storage/MongoReportStorage.ts

+        client_id,
+        user_id,
+        connected_at


Same as for postgres - we should use connection ids here, rather than assume this combination is unique.

rkistner · 2025-08-21T09:45:58Z

modules/module-mongodb-storage/test/src/connection-report-storage.test.ts

+  });
+}
+
+describe('SDK reporting storage', async () => {


Tests should also specifically include a case where client_id is undefined.

rkistner · 2025-08-21T09:46:19Z

modules/module-postgres-storage/test/src/connection-report-storage.test.ts

+  });
+}
+
+describe('Connection report storage', async () => {


Same as for MongoDB, tests should also specifically include a case where client_id is undefined.

Additionally, is it feasible for these tests to primarily use the public storage APIs, rather than raw SQL? That way the tests could be shared between the Postgres and MongoDB implementations, potentially only having small number of db-specific tests if you do still need the lower-level access in some places.

rkistner · 2025-08-21T09:52:21Z

modules/module-mongodb-storage/src/storage/MongoReportStorage.ts

+        $gte: new Date(year, month, today),
+        $lt: new Date(year, month, nextDay)
+      }


This depends on the timezone of the server - not sure if that is what we want?

But I'm also wondering - do we specifically want this behavior ("if the connection event is within the same day the stored initial connection is updated")? Why not store these as separate connection events, and then handle any other logic in the aggregations?

Dylan was concerned about storing the connections as logs in the ps service db, due to volume. Considering we are using per day, per week and per month, we only need day granularity. When a user refreshes the web sdk it actually causes a disconnect and a reconnect which would make multiple copies withing very a short period if they refresh a few times. So to reduce the redundant connections by the same user I did it this way. Which still gives us the minimal per day granularity.

The date objects get converted to UTC time by Mongo.

Ok, we can keep it as daily connections for now - it looks like we can change that later if needed without too much effort.

The date objects get converted to UTC time by Mongo.

If we do keep daily values, we need to be very explicit about time zones. As is, if I run the server in UTC+2, the filter here would use 2025-08-21T22:00:00.000Z. If the reporting job happens to use a different timezone, it would not match up with the way it is persisted here.

Now in our hosted environment at least, it is likely that the service runs in the UTC timezone already, and this won't make a difference. But if we're relying on that, it is better to be explicit - use UTC timezones here (new Date(Date.UTC(year, month, today))), and document that UTC-based timestamps should be used when querying the data.

EspressoTrip-v2 added 30 commits July 14, 2025 12:07

emiiter engine

7546215

Merge branch 'main' into feature/sdk-reporting

7eb90c4

emitter changes

edee67a

chnages based on feedback

fea7be8

refactoring and some renaming

0f603cb

controller addition handler

ef6cfd1

type fix

65c2f5b

dev packages

777d426

forgot lockfile

f6370a6

reverted dev packages

79c9a5d

lockfile dammit

4685dd3

testing

dd29d5f

test emitter

cbdcaaf

test

fd6b080

added emiiter to socket route

2846a61

fixed binding

93bf85a

removed log

1e12209

report storage initial

499d98b

sdk scape

44209ff

reworked the emitter engine

063e446

alterations to types eventemitter

e3e7aab

list events chnage

5359d09

changed document type mongo

ce5092b

changed document type mongo

5ff2da5

changed document type mongo

0657ffe

changed document type mongo

388ab2f

chnaged date imps events

af06f64

chnaged jwt exp

50eb198

Merge branch 'main' into feature/sdk-reporting

119f5c4

ffs

ec9ce6f