-
Notifications
You must be signed in to change notification settings - Fork 50
[AQUA] Track md logs for error logging #1219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -728,6 +729,44 @@ def update( | |||
|
|||
return self._update_from_oci_model(response) | |||
|
|||
def tail_logs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we use -
def logs(self, log_type: str = None) -> ConsolidatedLog:
"""Gets the access or predict logs.
Parameters
----------
log_type: (str, optional). Defaults to None.
The log type. Can be "access", "predict" or None.
Returns
-------
ConsolidatedLog
The ConsolidatedLog object containing the logs.
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I see that show_logs() within model_deployment.py also covers the logic described below.
@@ -1300,24 +1304,52 @@ def get_deployment_status( | |||
max_wait_time=DEFAULT_WAIT_TIME, | |||
poll_interval=DEFAULT_POLL_INTERVAL, | |||
) | |||
except Exception: | |||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add more test cases to cover this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggesting minor changes
@@ -728,6 +729,44 @@ def update( | |||
|
|||
return self._update_from_oci_model(response) | |||
|
|||
def tail_logs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I see that show_logs() within model_deployment.py also covers the logic described below.
|
||
if predict_logs and len(predict_logs) > 0: | ||
status += predict_logs[0]["message"] | ||
status = re.sub(r"[^a-zA-Z0-9]", " ", status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: any reason why we're removing the non-alphanumeric characters here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without removing the non-alphanumeric characters, I was getting bad request when calling the head_object endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you try encoding the characters using urllib.parse instead and see if that helps?
status = urllib.parse.quote(status, safe='')
the missing characters is not a deal breaker, but can lead to not-so-readable output/message. If adding this also results in bad request, then we can go ahead with the change you've added.
status = access_logs[0]["message"] | ||
|
||
if predict_logs and len(predict_logs) > 0: | ||
status += predict_logs[0]["message"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking whether we need to append both predict and access logs here. For aqua, I think UI passes the same ocids for predict and access logs, so we'll be duplicating the content. Instead, would it make sense to check for access logs first, and only look to add predict logs if these are empty? Better, if we know that both logs are the same, then we can just look at access.
telemetry_kwargs = { | ||
"ocid": ocid, | ||
"model_name": model_name, | ||
"status": error_str + " " + status, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's split this in two fields: "status" can be error_str, and then add "log_message" can be the status content from the logs. This way we can identify the error messages coming from work requests and also the deployment logs separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes look good.
I thought the oci sdk would handle the encoding of the object name automatically but might be good to just check if we can keep them by manually encoding it. If it still doesn't result in the proper output, then we can go ahead with the changes you have.
|
||
if predict_logs and len(predict_logs) > 0: | ||
status += predict_logs[0]["message"] | ||
status = re.sub(r"[^a-zA-Z0-9]", " ", status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you try encoding the characters using urllib.parse instead and see if that helps?
status = urllib.parse.quote(status, safe='')
the missing characters is not a deal breaker, but can lead to not-so-readable output/message. If adding this also results in bad request, then we can go ahead with the change you've added.
Description
In this PR , we have added support to watch predict and access logs (if available) of Model Deployment and pass on error from the logs to telemetry.
Following changes were made as part of this requirement -
Jira
https://jira.oci.oraclecorp.com/browse/ODSC-73585
Sample error JSON from logs