Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric support #333

Open
t11omas opened this issue Oct 31, 2024 · 11 comments
Open

Metric support #333

t11omas opened this issue Oct 31, 2024 · 11 comments
Assignees

Comments

@t11omas
Copy link

t11omas commented Oct 31, 2024

Are their any plans to introduce metrics, for example the duration of a consumer? I can write an interceptor to do this currently, but I don't thinks its an accurate value as the message has already had consumption logic performed on it, i,e desterilized at this point.

So I think either a hook in point to when the message is first received is required, or build the metrics into SMB

@zarusz
Copy link
Owner

zarusz commented Oct 31, 2024

hello,

So I was thinking about adding these at some point via the OTEL. See #149. Happy to accept a PR if you want to help.

For now one could add a generic consumer interceptor that would measure the consumer processing time and log using your favorite telemetry platform.

Also some of the underlying client libraries each transport use have their own (but yes, better if SMB abstracts it all under one unified model).

Let me know your thoughts.

@t11omas
Copy link
Author

t11omas commented Oct 31, 2024

I have done that via the generic interceptor, but the issue is that by the time it hits the interceptor, some processing has already occurred. I.e, its already been desterilized which means that time wont be included.

I will take a look at #149 :)

@t11omas
Copy link
Author

t11omas commented Oct 31, 2024

Just copying this question over to this ticket:

Is there any way to see/monitor the count of messages on the in memory bus? (might be another example of a good metric to log)

We have a process that reads a lot of data and places it onto an in memory bus to be processed. It would be nice to be able to see/monitor that in memory bus

@zarusz
Copy link
Owner

zarusz commented Oct 31, 2024

Yes, that's another good metric that should be logged.

For now you could either turn on more verbose logging and then count the lifecycle log events that are indicative of counts (not great, but possible).

Add an generic interceptor as explained above and increment the counter for your metric platform, but you have that already.
For in memory by default messages are not serialized/deserialized so it wont include that overhead (which is what you're looking for).

Either way, we need to capture metrics (new feature). I belive we could use OTEL as the abstraction.
Would OTEL work for you? What logging/monitoring platform do you use?

@t11omas
Copy link
Author

t11omas commented Oct 31, 2024

We are using a Hybrid bus, so both in memory and external. I understand for the in memory bus the messages don't go via serialization, but they do for the external buses.

OTEL would work for us, but I am wondering, just from a design point of via, could it somehow be abstracted like what you have done for the transports and serialization.

For example, and don't get caught up on the names or signatures I use here, they are just used to illustrate the point.

public interface IConsumerObserver 
{
   void ConsumeStarted();

   void ConsumeError();
 
   void  ConsumeFinished();

}
public interface IInMemoryBusObserver
{
   void MessageAdded();

   void MessagedRemoved();

}

That way you can added in OTELMetric or extend it to support whatever platform is required

@zarusz
Copy link
Owner

zarusz commented Oct 31, 2024

Yes, that's what I was thinking too. OTEL isn't a clean abstraction (we have to drag in libraries into the SlimMessageBus.Host) in itself a clean interface would address my concern. However, OTEL has become a standard that many platforms use nowadays, and I believe users of SMB would expect to just connect it to Prometheus/Grafana/Azure App Insights and it simply working.

That way you can added in OTELMetric or extend it to support whatever platform is required

Yes, maybe the path would be SMB > SMBs abstraction (the interface above as suggested) > OTEL plugin (optional).

I will think about it. Again, please voice your needs and opinions. Feedback is much appreciated.

@zarusz
Copy link
Owner

zarusz commented Nov 3, 2024

Reading some more about metrics instrumentation it seems that nowadays the best and modern way to add metrics to .net apps is via System.Diagnostics.DiagnosticSource. It is a good and performant abstraction with OTEL in mind and tools like Prometheus and Grafana can collect the metrics.

I will give it a try.

@zarusz zarusz self-assigned this Nov 3, 2024
@t11omas
Copy link
Author

t11omas commented Nov 3, 2024

If it helps, this is what i was playing around with using the interceptor:

public class MetricsConsumerInterceptor<TMessage> : IConsumerInterceptor<TMessage>
{
    readonly Histogram<double> consumeDuration;
    readonly Counter<long> consumeTotal;
    readonly Counter<long> consumeFaultTotal;
    readonly Counter<long> consumerInProgress;
    readonly Counter<long> consumeRetryTotal;

    public MetricsConsumerInterceptor(IMeterFactory meterFactory)
    {
        var meter = meterFactory.Create("Cdms");
        consumeTotal = meter.CreateCounter<long>("messaging.cdms.consume", "ea", "Number of messages consumed");
        consumeFaultTotal = meter.CreateCounter<long>("messaging.cdms.consume.errors", "ea",
            "Number of message consume faults");
        consumerInProgress = meter.CreateCounter<long>("messaging.cdms.consume.active", "ea",
            "Number of consumers in progress");
        consumeDuration = meter.CreateHistogram<double>("messaging.cdms.consume.duration", "ms",
            "Elapsed time spent consuming a message, in millis");
        consumeRetryTotal =
            meter.CreateCounter<long>("messaging.cdms.consume.retries", "ea", "Number of message consume retries");
    }

    public async Task<object> OnHandle(TMessage message, Func<Task<object>> next, IConsumerContext context)
    {
        var timer = Stopwatch.StartNew();
        var tagList = new TagList
        {
            { "messaging.cdms.service", Process.GetCurrentProcess().ProcessName },
            { "messaging.cdms.destination", context.Path },
            {
                "messaging.cdms.message_type",
                ObservabilityUtils.FormatTypeName(new StringBuilder(), typeof(TMessage))
            },
            { "messaging.cdms.consumer_type", context.Consumer.GetType().Name }
        };

        try
        {
            consumeTotal.Add(1, tagList);
            consumerInProgress.Add(1, tagList);
            if (context.Properties.TryGetValue(MessageBusHeaders.RetryCount, out var value))
            {
                tagList.Add("messaging.cdms.retry_attempt", (int)value);
                consumeRetryTotal.Add(1, tagList);
            }

            return await next();
        }
        catch (Exception exception)
        {
            tagList.Add("messaging.cdms.exception_type", exception.GetType().Name);
            consumeFaultTotal.Add(1, tagList);
            throw;
        }
        finally
        {
            consumerInProgress.Add(-1, tagList);
            consumeDuration.Record(timer.ElapsedMilliseconds, tagList);
        }
    }
}

.Net abstraction does come with an OTEL Exporter. We might need to use EMF instead though. I am just investigation on whether we can export the .net abstraction to EMF

@t11omas
Copy link
Author

t11omas commented Nov 3, 2024

Looking though the code, I wonder if an IReceviedInterceptor<in TMessage> : IInterceptor would be the simplest way to go, and that is invoked by the by the message processor (the TMessage would be the raw transport message, for example ServiceBusReceivedMessage ).

Then Metrics likes could be captured early on on the process. It would also mean that observability support (regardless of the protocol) would just be plugged in via interceptors.

@zarusz
Copy link
Owner

zarusz commented Nov 3, 2024

Thanks for sharing. The interceptor example you've provided was one of the ways I was looking to implement this, however, exploring if adding this internally (outside of the interceptors pipeline) to make this more efficient.

Couple of questions, so let me know your thoughts here:

  1. I see you've used one metric consume and the path is just a tag on it. I was wondering if doing the metric per path/topic would be more desirable (each path/topic would get its own total count metric). Applying the path should give a way to see all the consumers metic as one or segregate them by topic. So perhaps that's the most flexible, but please comment here.

  2. What is the reason you'd like to see metrics captured at the native message (e.g. ServiceBusMessage)? The challenge here is that little is known about the message type at that stage and the types per transports vary making it troublesome to write one unified metric. On the other hand the overhead for serialization (or other SMB ceremonies) should be negligible. Let me know.

  3. I am thinking about giving a away to configure the metric names and prefix, as well as to be able to augment the tags collected (to collect less or more).

  4. Do we want producer metrics too?

@t11omas
Copy link
Author

t11omas commented Nov 3, 2024

Hey,

  1. I am not sure which would be better, but if we wanted to show the metrics at a service level, and we had 10 topics, then I am not sure how easy it would be to augment them together later on (generating dashboards etc isn't something I have a lot of experience in). If we started with metrics at a service level, with tags (filters) then I know tools like Aspire dashboard make it really easy to filter down. (see image below)
  2. I would be more comfortable if "duration metric" timer actually started as soon as the "Consume" pipeline started, and not after it at the initial point it hit the code. As its not just serialization its going on here, its also generating inceptors pipelines, looking up consumers etc, and sure that might be negligible, but lets say all that takes 10ms, and then the actual Consumer I implemented also takes 10ms, then I would be reporting a processing time of 10ms when actually it was 100% longer.
  3. Yes please. I think if you create a Metrics name Options class then a PostConfigure could be used to edit those values. Not sure about the tags though.
  4. Yes, I think what we have done for the consumer we will want for the producer, but I think that one might be a little simpler and the current inceptors will be suffice for that

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants