-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving our understanding of our users with kedro-telemetry
#510
Comments
Status of personal data collection and consent in adjacent products presented:
|
Added Streamlit OSS (also does not collect personal data) thanks @Joseph-Perkins! |
Added dbt |
Pending: Add another column that shows whether the systems track individual users or no |
Done 👍🏽 |
kedro-telemetry
kedro-telemetry
There's a couple of things in this issue. On one hand, we compiled a list of similar libraries to have references of how other projects do telemetry, and we also asked for legal advice. That is already done #510 (comment) On the other hand, there's the list of use cases @yetudada created in #510 (comment). Before getting to those we want to simplify our data collection process #375, for which we want to address #333 (done) and #507 (in progress). For now this issue is blocked, for clarity I'm removing it from the current sprint and focusing on #507. Regardless, it's a good moment to make a release of kedro-telemetry cc @merelcht |
Updated the first table of #510 (comment) with the current status, only 2 items remaining. |
Anything left to do here? @astrojuanlu |
The "What are the current challenges with its implementation?" still contains a couple of minor items, and also "What else could we learn from our users?" contains some valid points. I will have a look at this before EOY and give a summary of what should be the next steps, if any. |
Today I learned that Daft collects telemetry on every function call: https://github.com/Eventual-Inc/Daft/blob/fd662c1/docs/source/faq/telemetry.rst
It's achieved by decorating every public method and function: And then it buffers the events, by default in groups of 100: |
Introduction
Analytics play a critical role in product management. As Marty Cagan highlights, analytics are essential for understanding user behaviour, measuring product progress, validating product ideas, informing decisions, and inspiring product work. In the context of Kedro, we have telemetry tools that help us qualitatively understand our users, namely:
kedro-telemetry
, which gives insight into the feature usage and user adoption of the CLI in Kedro Framework and the CLI and UI of Kedro-Vizkedro-telemetry
is the focus of this GitHub issue.What principles should we adopt to govern the improvements of
kedro-telemetry
?With all of these potential changes to
kedro-telemetry
, I thought it would be helpful to ground our work in certain principles that affect our users and our team. Therefore, I propose we adopt the following principles when improvingkedro-telemetry
:kedro-telemetry
are reliable and accurate. Team members should have full confidence in the data they're using to make decisions.kedro-telemetry
, including its activation process, ensuring informed consent and understanding.kedro-telemetry
to provide insights that are directly applicable to product improvement strategies.How was
kedro-telemetry
designed?We have detailed some of the ways that
kedro-telemetry
was designed in a separate GitHub issue (#506).What are the current challenges with its implementation?
There is room for improvement for the current implementation of
kedro-telemetry
. I've tried to capture all known issues here but let me know if I'm missing some and I'll update the details here.kedro-telemetry
does not interrupt the CI/CD workflow, right now users have to check the documentation whenkedro-telemetry
will interrupt their workflowkedro-telemetry
as a mandatory dependency meaning that users will havekedro-telemetry
packaged in Kedro and it will no longer be part of the requirements of the starterskedro-telemetry
works with Databrickskedro-telemetry
to our userspackage_name
andproject_name
and investigate whyproject_name
is a blank field in our datakedro viz
CLI command runs differs to the number of users of Kedro-Viz according to Heap AnalyticsWhat else could we learn from our users?
I'll always be forward-looking on how we could continue to learn more about our users and even improve our existing metrics. I'd like to use a key to detail the status of the metric.
Status of metric:
kedro-telemetry
collects and hashes the computer's username upon user consent for user ID generation and counting.kedro-telemetry
user data i.e. ifkedro-telemetry
user data declines by PyPI downloads increase then that might be a sign.kedro-telemetry
hashes package_name and project_name for project ID generation and counting.kedro-datasets
kedro-telemetry
is active, a hook counts this figure.kedro-telemetry
is active, a hook counts this figure.kedro-telemetry
is active, a hook counts this figure.kedro-telemetry
versionskedro-telemetry
is active, a hook reads this figure from their project.kedro-telemetry
is active, a hook reads this figure from their project.kedro-telemetry
is active, a hook reads this figure from their project.kedro-telemetry
is active, a hook counts this figure.What are other projects that we can be inspired by?
I'm just going to list them and not detail what they're about and what we could learn:
telemetry-python
by DVC)The text was updated successfully, but these errors were encountered: