Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added overview of OTEL #2964

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

gechetspr
Copy link
Collaborator

PR Description

Add a meaningful description here that will let us know what you want to fix with this PR or what functionality you want to add.

Steps before you submit a PR

  • Please add tests for the code you add if it's possible.
  • Please check out our contribution guide: https://docs.spryker.com/docs/dg/dev/code-contribution-guide.html
  • Add a contribution-license-agreement.txt file with the following content:
    I hereby agree to Spryker\'s Contribution License Agreement in https://github.com/spryker/spryker-docs/blob/HASH_OF_COMMIT_YOU_ARE_BASING_YOUR_BRANCH_FROM_MASTER_BRANCH/CONTRIBUTING.md.

This is a mandatory step to make sure you are aware of the license agreement and agree to it. HASH_OF_COMMIT_YOU_ARE_BASING_YOUR_BRANCH_FROM_MASTER_BRANCH is a hash of the commit you are basing your branch from the master branch. You can take it from commits list of master branch before you submit a PR.

Checklist

  • I agree with the Code Contribution License Agreement in CONTRIBUTING.md


```

Wire new console command into your install script. It allows it to be executed on every deploy and generated filed during it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check this part and generated filed during it.

Comment on lines 177 to 182
`\Spryker\Zed\Opentelemetry\OpentelemetryConfig::getExcludedDirs` - controls what directories MUST NOT be instrumented. You may not want to see in your traces spans from some infra code that is not relevant for you. So you can just exclude the whole directory with a provided name. OOTB we excluded bunch of directories, so you can check it as example.
`\Spryker\Zed\Opentelemetry\OpentelemetryConfig::getExcludedSpans` - allows you to exclude specific spans that you noticed in your traces. You need to use specific span name for that.
`\Spryker\Zed\Opentelemetry\OpentelemetryConfig::getPathPatterns` - allows to add a path pattern that shows where console command should look for you classes to cover them with hooks. OOTB all Spryker directories are covered together with `Pyz` namespace on the project. Be advised that you should not cover code that is autogenerated.
`\Spryker\Zed\Opentelemetry\OpentelemetryConfig::getOutputDir` - specifies in what directory you want to put generated hooks. Default value is `src/Generated/OpenTelemetry/Hooks/`.
`\Spryker\Zed\Opentelemetry\OpentelemetryConfig::areOnlyPublicMethodsInstrumented` - OOTB we cover with hooks only public methods (except Controller classes), but you can change that with this configuration method.
`\Spryker\Zed\Opentelemetry\OpentelemetryConfig::getCriticalClassNamePatterns` - some of the spans can be marked as critical in order to give them more priority on sampling. You can define what classes you are interested in for that. OOTB we have Controllers and Facades here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is core level classes, i would provide example how it can be changed.

Copy link

@eduard-melnytskyi eduard-melnytskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It collects traces and send it to your monitoring system after.

```

### Collector
Collector does what you think it is. It collects traces and send it to your backend after. Sending traces to collector is done after that request is sent, so it will make a client wait just for that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to change this
It collects traces and send it to your backend after.
to this
It collects traces and send it to your monitoring system after.

@@ -0,0 +1,216 @@
---

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add information about following points:

  1. Ho you can deactivate instrumentation via env variable.
  2. information about thresholds
  3. Information about possible configuration options
  4. Information about how to extend scope of instrumentation


Use sampling to not get a full trace every time. You can change a sampling probability with `OTEL_TRACES_SAMPLER_ARG` env variable.

Skip some traces. You may not want to get a full trace for all of your transactions. You can define a probability of detailed trace overview by setting a probability via `TEL_TRACE_PROBABILITY` env variable. Be advised that Trace still will be processed and root span will be there for you. Also requests that are changing something in your application (POST, DELETE, PUT, PATCH) considered as critical and will be processed anyway.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TEL_TRACE_PROBABILITY to OTEL_TRACE_PROBABILITY
Plus critical also filtered by probability so this must be reflected in documentation to avoid wrong expectations

---

Spryker allows to integrate different monitoring tools like NewRelic into the SCCOS. But usually those tools are vendor locked and allows to work only with limited amount of backends or have not so much customizability options.
In order to give more customisable options, Spryker introduces a OpenTelemetry integration that allows to make your monitoring system be custom tailored for you and in the same time uses open standards that are widely accepted in community.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In order to give more customisable options, Spryker introduces a OpenTelemetry integration that allows to make your monitoring system be custom tailored for you and in the same time uses open standards that are widely accepted in community.
In order to give you more customisable options, Spryker introduces an OpenTelemetry integration that allows to make your monitoring system more custom tailored based on your requirements and in the same time uses open standards that are widely accepted in community.

### Things that you should know

#### Trace
Trace represents a single transaction. It has an unique ID and all spans are related to it. Trace have a name that is defined automatically or can be changed with `\Spryker\Service\Monitoring\MonitoringServiceInterface::setTransactionName` method if you wire a plugin for it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Trace represents a single transaction. It has an unique ID and all spans are related to it. Trace have a name that is defined automatically or can be changed with `\Spryker\Service\Monitoring\MonitoringServiceInterface::setTransactionName` method if you wire a plugin for it.
Trace represents a single transaction. It has a unique ID and all spans are related to it. Trace has a name that is defined automatically or can be changed with `\Spryker\Service\Monitoring\MonitoringServiceInterface::setTransactionName` method if you wire a plugin for it.

Please also add explanation of the plugin mentioned. Today, I may put the call to setTransactionName anywhere in the code, and it will work with NewRelic.

You can say that Span in our case represents an execution of single method of the class.

#### Hook
OpenTelemetry provide a way to instrument your code without modifying it directly. Hook is a function that execute a function before and after method execution.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OpenTelemetry provide a way to instrument your code without modifying it directly. Hook is a function that execute a function before and after method execution.
OpenTelemetry provides a way to instrument your code without modifying it directly. Hook is a function that is executed before and after method execution.

#### Hook
OpenTelemetry provide a way to instrument your code without modifying it directly. Hook is a function that execute a function before and after method execution.

Example of hook you can see below. Hooks should be registered before the class is executed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Example of hook you can see below. Hooks should be registered before the class is executed.
Example of a hook you can see below. Hooks should be registered before the class is executed.

How to register a hook?
I suggest to remove the second sentence, unless customers have to do something to register hooks on the project.

```

### Collector
Collector does what you think it is. It collects traces and send it to your backend after. Sending traces to collector is done after that request is sent, so it will not make a client wait just for that.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Collector does what you think it is. It collects traces and send it to your backend after. Sending traces to collector is done after that request is sent, so it will not make a client wait just for that.
Collector collects traces and sends them to your monitoring platform afterwards. Sending traces to collector is done after that request is sent, so it will not impact response time.

### (Optional) Install Monitoring module
Opentelemetry integration doesn't require to use Monitoring service, but this is highly recommended as it allows you to add custom attributes, change your traces (transaction) name during the request execution and have exceptions added to the root span for visibility.
You can get a module [here](https://packagist.org/packages/spryker/monitoring).
After install you can wire a Monitoring plugin from a `spryker/opentelemtry` module to get all listed features.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
After install you can wire a Monitoring plugin from a `spryker/opentelemtry` module to get all listed features.
After installation you can wire a Monitoring plugin from a `spryker/opentelemtry` module to enable all listed above features.

}

```
After that you can call method from Monitoring service and they will be translated to OpenTelemetry action. Be advised that some of the methods are not covered due to the fact that those things have no direct implementation, like `\Spryker\Service\Opentelemetry\Plugin\OpentelemetryMonitoringExtensionPlugin::markStartTransaction()` as transaction will start anyway.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
After that you can call method from Monitoring service and they will be translated to OpenTelemetry action. Be advised that some of the methods are not covered due to the fact that those things have no direct implementation, like `\Spryker\Service\Opentelemetry\Plugin\OpentelemetryMonitoringExtensionPlugin::markStartTransaction()` as transaction will start anyway.
After that you can call methods from Monitoring service and they will be translated to OpenTelemetry action. Be advised that some of the methods are empty due to the fact that those things have no direct implementation in Opentelemetry, like `\Spryker\Service\Opentelemetry\Plugin\OpentelemetryMonitoringExtensionPlugin::markStartTransaction()` as transaction will start anyway.


### Wire a console command

Provided console command used to generate hooks automatically for classes that you want to be covered with spans.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Provided console command used to generate hooks automatically for classes that you want to be covered with spans.
Console command *OpentelemetryGeneratorConsole* is used to generate hooks automatically for classes that you want to be covered with spans.


```

Wire new console command into your install script. It allows it to be executed on every deploy and generated filed during it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Wire new console command into your install script. It allows it to be executed on every deploy and generated filed during it.
You have to wire this console command into your install script, so it's executed on every deployment.
Please make sure that this command is located after all code modifications and generation.


Use sampling to not get a full trace every time. Please check configuration section for the reference.

Skip some traces. You may not want to get a full trace for all of your transactions. You can define a probability of detailed trace overview by setting a probability via `OTEL_TRACE_PROBABILITY` env variable. Be advised that Trace still will be processed and root span will be there for you. Also requests that are changing something in your application (POST, DELETE, PUT, PATCH) considered as critical and will be processed anyway.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss explanation of the NON-INSTRUMENTED classes, that are restricted in the main Spryker module.
For example, Container calls, - \Spryker\Zed\Opentelemetry\OpentelemetryConfig::getExcludedDirs
I already faced issues that some spans are missing.


Use sampling to not get a full trace every time. Please check configuration section for the reference.

Skip some traces. You may not want to get a full trace for all of your transactions. You can define a probability of detailed trace overview by setting a probability via `OTEL_TRACE_PROBABILITY` env variable. Be advised that Trace still will be processed and root span will be there for you. Also requests that are changing something in your application (POST, DELETE, PUT, PATCH) considered as critical and will be processed anyway.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include the possible values for each setting, i.e.
OTEL_TRACE_PROBABILITY = integer 1...100

Like already mentioned, valued that used for sampling can be changed. In order to do so, you need to change a few env variables.

`OTEL_BSP_MIN_SPAN_DURATION_THRESHOLD` - this is value that is used in `Closing span sampling` for critical and regular spans. OOTB the value is `20`. This value will be used as nanosecond.
`OTEL_BSP_MIN_CRITICAL_SPAN_DURATION_THRESHOLD` - this is similar value to `OTEL_BSP_MIN_SPAN_DURATION_THRESHOLD` one, but used for non critical spans. Default value is `10`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see NON in the name of the variable. Maybe it's a mistake?

Suggested change
`OTEL_BSP_MIN_CRITICAL_SPAN_DURATION_THRESHOLD` - this is similar value to `OTEL_BSP_MIN_SPAN_DURATION_THRESHOLD` one, but used for non critical spans. Default value is `10`.
`OTEL_BSP_MIN_NON_CRITICAL_SPAN_DURATION_THRESHOLD` - this is similar value to `OTEL_BSP_MIN_SPAN_DURATION_THRESHOLD` one, but used for non critical spans. Default value is `10`.

For doing so we generate a random float number from 0 to 1. After that we compare it to a configuration value and of the generated number is less than configured one - span will be sampled and opened. But if the number is bigger - span will be omitted and empty span will be opened instead. Empty span will not be shown in your traces.

### Closing span sampling
We also may want to filter super fast spans that had no errors inside of them. After the sampled span is closing we are checking it execution time and status. If span is successful and faster than a configured value it will also be omitted and will not appear in a trace.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We also may want to filter super fast spans that had no errors inside of them. After the sampled span is closing we are checking it execution time and status. If span is successful and faster than a configured value it will also be omitted and will not appear in a trace.
We also may want to filter super fast spans that had no errors inside of them. After the sampled span is closing we are checking its execution time and status. If span is successful and faster than a configured value (OTEL_BSP_MIN_SPAN_DURATION_THRESHOLD or OTEL_BSP_MIN_CRITICAL_SPAN_DURATION_THRESHOLD) it will also be omitted and will not appear in the trace.

```php
class OpentelemetryConfig extends AbstractBundleConfig
{
public function getCriticalClassNamePatterns(): array
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name suggests a Pattern, but the values below are simple words.

Can we use Regex? Or is there a special syntax?

}
```

`\Spryker\Zed\Opentelemetry\OpentelemetryConfig::areOnlyPublicMethodsInstrumented()` - OOTB we cover with hooks only public methods (except Controller classes), but you can change that with this configuration method.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`\Spryker\Zed\Opentelemetry\OpentelemetryConfig::areOnlyPublicMethodsInstrumented()` - OOTB we cover with hooks only public methods (except Controller classes), but you can change that with this configuration method.
`\Spryker\Zed\Opentelemetry\OpentelemetryConfig::areOnlyPublicMethodsInstrumented()` - OOTB we cover with hooks public methods of all classes and all methods of Controller classes, but you can change that with this configuration method.

btw, I'm not sure that private methods could be hooked in.


### Enable PHP extension

Hooks processing require you to have a few PHP extensions in place. Spryker prepares our PHP images, so you need to install nothing, but just enable it in your deploy file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Hooks processing require you to have a few PHP extensions in place. Spryker prepares our PHP images, so you need to install nothing, but just enable it in your deploy file
Hooks processing requires you to have a few PHP extensions in place. Spryker has prepared a new PHP image, so you need to install nothing, but just enable it in your deploy file


environment: docker.dev
image:
tag: spryker/php:8.3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tag: spryker/php:8.3
tag: spryker/php:8.3-alpine3.20-otel

- grpc
- protobuf
```
Be advised that `blackfire` extension can have a conflict with a `opentelemetry` one, so make sure that you are not using both in the same time.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about NewRelic?

### Sampling configuration
Like already mentioned, valued that used for sampling can be changed. In order to do so, you need to change a few env variables.

`OTEL_BSP_MIN_SPAN_DURATION_THRESHOLD` - this is value that is used in `Closing span sampling` for critical and regular spans. OOTB the value is `20`. This value will be used as nanosecond.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a natural language description of this parameter.

Suggested change
`OTEL_BSP_MIN_SPAN_DURATION_THRESHOLD` - this is value that is used in `Closing span sampling` for critical and regular spans. OOTB the value is `20`. This value will be used as nanosecond.
`OTEL_BSP_MIN_SPAN_DURATION_THRESHOLD` - this is value that is used in `Closing span sampling` for critical and regular spans. OOTB the value is `20`. This value will be used as nanosecond.


`OTEL_BSP_MIN_SPAN_DURATION_THRESHOLD` - this is value that is used in `Closing span sampling` for critical and regular spans. OOTB the value is `20`. This value will be used as nanosecond.
`OTEL_BSP_MIN_CRITICAL_SPAN_DURATION_THRESHOLD` - this is similar value to `OTEL_BSP_MIN_SPAN_DURATION_THRESHOLD` one, but used for non critical spans. Default value is `10`.
`OTEL_TRACES_SAMPLER_ARG` - this is a probability for `Opening span sampling` and is used for the regular spans. Default value is `0.3`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work together with the duration check?

Is higher probability leading to more spans tracked?

`OTEL_BSP_MIN_SPAN_DURATION_THRESHOLD` - this is value that is used in `Closing span sampling` for critical and regular spans. OOTB the value is `20`. This value will be used as nanosecond.
`OTEL_BSP_MIN_CRITICAL_SPAN_DURATION_THRESHOLD` - this is similar value to `OTEL_BSP_MIN_SPAN_DURATION_THRESHOLD` one, but used for non critical spans. Default value is `10`.
`OTEL_TRACES_SAMPLER_ARG` - this is a probability for `Opening span sampling` and is used for the regular spans. Default value is `0.3`.
`OTEL_TRACES_CRITICAL_SAMPLER_ARG` - same as a previous one, but for critical spans. Default value is `0.5`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, Is higher probability leading to more spans tracked?

`OTEL_TRACES_SAMPLER_ARG` - this is a probability for `Opening span sampling` and is used for the regular spans. Default value is `0.3`.
`OTEL_TRACES_CRITICAL_SAMPLER_ARG` - same as a previous one, but for critical spans. Default value is `0.5`.
`OTEL_TRACES_NON_CRITICAL_SAMPLER_ARG` - same as a previous one, but for non critical spans. Default value is `0.1`.
`OTEL_TRACE_PROBABILITY` - is a probability for the `Tracing sampling`. Default value is `0.1`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was told that all traces are tracked. What does this sampling here mean?
How does it combine with the sampler probabilities?

`OTEL_TRACES_NON_CRITICAL_SAMPLER_ARG` - same as a previous one, but for non critical spans. Default value is `0.1`.
`OTEL_TRACE_PROBABILITY` - is a probability for the `Tracing sampling`. Default value is `0.1`.

Increasing those values will make your traces bigger, but also can slow down your application as more spans will be generated and send to the collector.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can it slow down?

Can we monitor the degradation?


### Additional configuration
`OTEL_SERVICE_NAMESPACE` - defines a namespace that is shown in the resource attributes of trace. Default value is `spryker`.
`OTEL_EXPORTER_OTLP_ENDPOINT` - defines a collector endpoint. Should be configured for your project as it varies from setup to setup. Default value is `http://collector:4317`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a not clear path of this value being set by the project. Please check with Cloud team.

`OTEL_DEFAULT_SERVICE_NAME` - if no mapping is provided and no name is provided via `MonitoringService::setApplicationName()` this value would be used as a fallback value. Default value is `Default Service`.
`OTEL_BSP_SCHEDULE_DELAY` - this value is used in SpanProcessor in order to flush all processed spans into exported after some delay. Default value is `1000` milliseconds.
`OTEL_BSP_MAX_QUEUE_SIZE` - defines max amount of spans that should be processed. All spans on top will be dropped. Default value is `2048`
`OTEL_BSP_MAX_EXPORT_BATCH_SIZE` - define a spans batch size that should be sent to an exporter. Default value `512`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`OTEL_BSP_MAX_EXPORT_BATCH_SIZE` - define a spans batch size that should be sent to an exporter. Default value `512`.
`OTEL_BSP_MAX_EXPORT_BATCH_SIZE` - define a span batch size that should be sent to an exporter. Default value `512`.

In other places it's called collector, here it's exporter - so what's the proper name?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set 1? 1000? why not?

`OTEL_SERVICE_NAME_MAPPING` - a json value that can be provided to define a service name is `MonitoringService::setApplicationName()` is not useful for you. By default OOTB Spryker application are mapped. `\Spryker\Service\Opentelemetry\OpentelemetryInstrumentationConfig::getServiceNameMapping()` can be checked for example.
`OTEL_DEFAULT_SERVICE_NAME` - if no mapping is provided and no name is provided via `MonitoringService::setApplicationName()` this value would be used as a fallback value. Default value is `Default Service`.
`OTEL_BSP_SCHEDULE_DELAY` - this value is used in SpanProcessor in order to flush all processed spans into exported after some delay. Default value is `1000` milliseconds.
`OTEL_BSP_MAX_QUEUE_SIZE` - defines max amount of spans that should be processed. All spans on top will be dropped. Default value is `2048`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it per request? per hour?

`OTEL_EXPORTER_OTLP_ENDPOINT` - defines a collector endpoint. Should be configured for your project as it varies from setup to setup. Default value is `http://collector:4317`.
`OTEL_SERVICE_NAME_MAPPING` - a json value that can be provided to define a service name is `MonitoringService::setApplicationName()` is not useful for you. By default OOTB Spryker application are mapped. `\Spryker\Service\Opentelemetry\OpentelemetryInstrumentationConfig::getServiceNameMapping()` can be checked for example.
`OTEL_DEFAULT_SERVICE_NAME` - if no mapping is provided and no name is provided via `MonitoringService::setApplicationName()` this value would be used as a fallback value. Default value is `Default Service`.
`OTEL_BSP_SCHEDULE_DELAY` - this value is used in SpanProcessor in order to flush all processed spans into exported after some delay. Default value is `1000` milliseconds.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why delay is needed, there's a batch size defined. How does it work?

@@ -0,0 +1,345 @@
---
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add data about New Relic + OTEL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants