Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track how many times someone has taken previous editions of a survey #466

Open
SachaG opened this issue Nov 28, 2024 · 3 comments
Open

Track how many times someone has taken previous editions of a survey #466

SachaG opened this issue Nov 28, 2024 · 3 comments

Comments

@SachaG
Copy link
Member

SachaG commented Nov 28, 2024

It would be nice to know which proportion of respondents have taken the survey a single time, twice, three times, etc. over previous years.

I could see different approaches:

1. track this in surveyform and submit it as part of every save

Pros: distributes the workload over every save
Cons: only works for future respondents

2. do it as part of the normalization process

Pros: will work for every response, even for past surveys if we renormalize them
Cons: will slow down normalization drastically (+1 db request per response processed)

3. do it as part of a surveyadmin script

Pros: since this is something that doesn't change, we can calculate it once per survey even if it takes a long time
Cons: easy to forget to run it

@SachaG
Copy link
Member Author

SachaG commented Nov 28, 2024

I think a mix of 3 for past responses, and 1 going forward, is probably the best solution.

@eric-burel
Copy link
Contributor

For 2. If you index surveys per (editionId, userId), your agregation should actually be fast during the normalization process
For 3., the advantage in the long run is that it would set a basis to run further computations, like we could call LLM APIs over each response. So you may want to focus on automating running these scripts just after normalization? The only limitation is that you'd want to avoid any computation that implies loading all responses in memory, because at some point it might become too big, and stick to computations that can be done in a stream

@SachaG
Copy link
Member Author

SachaG commented Nov 28, 2024

I already added 3. It stores the data on the raw response, not the normalized response. So we only need to run it once after the survey ends, and even if we renormalize everything we don't need to run it again (since it's data that can't change anyway).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants