feat(telemetry): Combine Telemetry hook to send heap event once #766

noklam · 2024-07-12T14:02:03Z

Description

Development notes

The test isn't quite working yet, as some test is passing when run individually but fail when I run in a batch. The change is quite big already so I want to get some review:

KedroTelemetryCLIHooks and KedroTelemetryProjectHooks are combined to avoid cross hook communication and make it possible to send event ONCE only by storing some intermediate states (handle entrypoint for both kedro command or just session.run()
Added Integration test as I think it's important to make sure the event is only sent once, mocking wouldn't work so I need a real project here. For the effort I just create a minimal kedro project. I don't think we should spend too much effort as we moved kedro-telemetry back to kedro then we can share the same starter project. It's a temporary structure as I don't think adding it to the current test_plugin.py fits as the test is more high level.
I created a fake telemetry ID in case something isn't working we would still capture the fake telemetry from ourselves.

[tool.kedro_telemetry]
project_id = "KEDRO_TELEMETRY_TEST"

As discussed, the almost duplicate CLI command event is removed. kedro-telemetry: Spike to reduce redundant telemetry events #730 (comment)
There are more detailed conversation internally for this approach.

Checklist

Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added a description of this change in the relevant RELEASE.md file
Added tests to cover my changes

Signed-off-by: Nok Lam Chan <[email protected]>

kedro-telemetry/kedro_telemetry/plugin.py

kedro-telemetry/pyproject.toml

DimedS

Thank you for the PR, @noklam. Great job! I appreciate the approach to merge the two classes into one to facilitate data exchange, and the integration tests are really useful. I mostly agree with the new logic; it's excellent that we can now send only one event! I left a small comment and would like to propose removing the before_command_run() hook entirely. I believe it's not necessary, and all the logic can be placed inside the other two hooks with the same result. Is that correct?

DimedS · 2024-07-16T11:34:29Z

kedro-telemetry/kedro_telemetry/plugin.py

        )
+        self.event_properties = enriched_properties
+        if not self._sent:


I believe we don't need that check here. The logic should be that enriched_properties should be sent in any case. It is our responsibility to ensure the event is not sent again in the after_command_run() hook later.

Yeah I think you are right, since we only have two hooks after_catalog_created and after_command_run that will send event, and this is the earliest one so in reality there shouldn't be case this is set True.

ankatiyar · 2024-07-16T17:04:12Z

Generally, the approach makes sense but I think it would be good to wait to switch to after_command_run till @lrcouto's analysis on the Kedro side is complete (re #709) - i.e. if the hook is executed even if there's an exception during the command execution. Otherwise we miss out on data about commands run with failures.

DimedS · 2024-07-17T15:58:58Z

Generally, the approach makes sense but I think it would be good to wait to switch to before_command_run till @lrcouto's analysis on the Kedro side is complete (re #709) - i.e. if the hook is executed even if there's an exception during the command execution. Otherwise we miss out on data about commands run with failures.

My opinion is that it would be better to proceed with current approach, but first completely remove the before_command_run() hook for clarity. The logic will be clear: if a step includes catalog creation, enhanced data will be sent in the after_catalog_creation() hook; otherwise, a simple set of telemetry data will be sent at the last step.

I think it's acceptable to lose information about failed runs for now because we are not currently analysing it properly. I believe it makes sense to implement a special flow for failed runs later, such as collecting error types and analysing them for some insights.

noklam · 2024-07-17T16:31:03Z

@ankatiyar
The after_command_run change is slightly tricky.

There may be conflicts between the two requirements:

Send event even if command fails
Make sure event is not sent at the beginning and only sent it after the metadata is enriched

Signed-off-by: Dmitry Sorokin <[email protected]>

noklam · 2024-07-19T13:05:00Z

I can't approve my own PR, but it looks good to me now :)

DimedS · 2024-07-19T13:15:35Z

As agreed, we have merged KedroTelemetryCLIHooks and KedroTelemetryProjectHooks into KedroTelemetryHook. I refactored the code to improve clarity and reduce repetition by consolidating relevant parts into the new class. Additionally, I updated the tests to reflect this merge.

The current logic is as follows:

We collect information about the CLI command and basic project details in the before_command_run hook.
If the after_catalog_created hook is triggered, we add information about nodes and pipelines and then send the event.
If step 2 does not occur, we send the event in the after_command_run hook.

I attempted to fix the integration tests but was unsuccessful. After discussing with @noklam, I have moved the integration tests to a separate issue and PR (#771).

Please re-review the PR.

Signed-off-by: Dmitry Sorokin <[email protected]>

ankatiyar

LGTM! I think this partially resolves #709 too then!

kedro-telemetry/kedro_telemetry/plugin.py

Signed-off-by: Dmitry Sorokin <[email protected]>

…o-org#766) --------- Signed-off-by: Nok Lam Chan <[email protected]> Signed-off-by: Dmitry Sorokin <[email protected]> Signed-off-by: Dmitry Sorokin <[email protected]> Signed-off-by: Merel Theisen <[email protected]>

noklam added 7 commits July 11, 2024 17:02

merge hooks

56a52ed

Signed-off-by: Nok Lam Chan <[email protected]>

combine hook and fix test

fafca5a

Signed-off-by: Nok Lam Chan <[email protected]>

add integration test

71af907

Signed-off-by: Nok Lam Chan <[email protected]>

remove redundant evenT

6686f92

Signed-off-by: Nok Lam Chan <[email protected]>

fix test and telemetry to send once only

7bab861

Signed-off-by: Nok Lam Chan <[email protected]>

fix bug

f46c796

Signed-off-by: Nok Lam Chan <[email protected]>

fix test, partially

1be74a6

Signed-off-by: Nok Lam Chan <[email protected]>

noklam requested review from DimedS, merelcht and ankatiyar July 12, 2024 14:04

merelcht reviewed Jul 15, 2024

View reviewed changes

kedro-telemetry/kedro_telemetry/plugin.py Show resolved Hide resolved

kedro-telemetry/kedro_telemetry/plugin.py Show resolved Hide resolved

kedro-telemetry/pyproject.toml Show resolved Hide resolved

DimedS reviewed Jul 16, 2024

View reviewed changes

DimedS and others added 3 commits July 18, 2024 10:57

Merge branch 'main' into noklam/kedro-telemetry-spike-to-730

18df86f

Signed-off-by: Dmitry Sorokin <[email protected]>

Modify telemetry and tests

e476d6a

Signed-off-by: Dmitry Sorokin <[email protected]>

Fix heap name and linting

1df6554

Signed-off-by: Dmitry Sorokin <[email protected]>

DimedS changed the title ~~Combine Telemetry hook to send heap event once~~ feat(telemetry): Combine Telemetry hook to send heap event once Jul 18, 2024

DimedS added 2 commits July 19, 2024 10:55

Fix integration tests

86b2eee

Signed-off-by: Dmitry Sorokin <[email protected]>

Fix integration tests

98401ed

Signed-off-by: Dmitry Sorokin <[email protected]>

This was referenced Jul 19, 2024

kedro-telemetry: Add Integration tests to ensure telemetry event sends only once per command #770

Closed

feat(telemetry): add integration tests #771

Merged

Remove integration tests

4624f3c

Signed-off-by: Dmitry Sorokin <[email protected]>

DimedS requested a review from merelcht July 19, 2024 13:15

DimedS approved these changes Jul 19, 2024

View reviewed changes

DimedS marked this pull request as ready for review July 19, 2024 13:36

Fix command_args to use masking

ecc5167

Signed-off-by: Dmitry Sorokin <[email protected]>

ankatiyar approved these changes Jul 19, 2024

View reviewed changes

kedro-telemetry/kedro_telemetry/plugin.py Show resolved Hide resolved

ankatiyar mentioned this pull request Jul 19, 2024

kedro-telemetry: Improve performance by switching to after_command_run #709

Closed

Switch to private names

1cb1e0f

Signed-off-by: Dmitry Sorokin <[email protected]>

DimedS merged commit 142342d into main Jul 19, 2024
10 checks passed

DimedS deleted the noklam/kedro-telemetry-spike-to-730 branch July 19, 2024 16:05

noklam mentioned this pull request Jul 30, 2024

kedro-telemetry: Improve performance by switching to after_command_run kedro-org/kedro#4014

Merged

7 tasks

ankatiyar mentioned this pull request Aug 16, 2024

feat(telemetry): Switch to after_command_run hook #707

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(telemetry): Combine Telemetry hook to send heap event once #766

feat(telemetry): Combine Telemetry hook to send heap event once #766

noklam commented Jul 12, 2024 •

edited

Loading

DimedS left a comment

DimedS Jul 16, 2024

noklam Jul 16, 2024

ankatiyar commented Jul 16, 2024

DimedS commented Jul 17, 2024 •

edited

Loading

noklam commented Jul 17, 2024

noklam commented Jul 19, 2024

DimedS commented Jul 19, 2024

ankatiyar left a comment

feat(telemetry): Combine Telemetry hook to send heap event once #766

feat(telemetry): Combine Telemetry hook to send heap event once #766

Conversation

noklam commented Jul 12, 2024 • edited Loading

Description

Development notes

Checklist

DimedS left a comment

Choose a reason for hiding this comment

DimedS Jul 16, 2024

Choose a reason for hiding this comment

noklam Jul 16, 2024

Choose a reason for hiding this comment

ankatiyar commented Jul 16, 2024

DimedS commented Jul 17, 2024 • edited Loading

noklam commented Jul 17, 2024

noklam commented Jul 19, 2024

DimedS commented Jul 19, 2024

ankatiyar left a comment

Choose a reason for hiding this comment

noklam commented Jul 12, 2024 •

edited

Loading

DimedS commented Jul 17, 2024 •

edited

Loading