-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sqlID column to failed_jobs.csv #1567
base: dev
Are you sure you want to change the base?
Conversation
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]> Fixes NVIDIA#1563 - adds column `sqlID` to failed_jobs.csv - the column might be empty if the job has no sqlID attached to it
// Extracts the file format from a class object string, such as | ||
// "com.nvidia.spark.rapids.GpuParquetFileFormat@9f5022c". | ||
// | ||
// This function is designed to handle cases where the RAPIDS plugin logs raw object names | ||
// instead of a user-friendly file format name. For example, it extracts "Parquet" from | ||
// "com.nvidia.spark.rapids.GpuParquetFileFormat@9f5022c". | ||
// Refer: https://github.com/NVIDIA/spark-rapids-tools/issues/1561 | ||
// | ||
// If the input string does not match the expected pattern, the function returns the original | ||
// string as a fallback. | ||
// | ||
// @param formatStr The raw format string, typically containing the class name of the file | ||
// format. | ||
// @return A user-friendly file format name (e.g., "Parquet") or the original string if no | ||
// match is found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the documentation format because this scala doc style is not allowed inside nested methods. It is only allowed in top level resources.
case class FailedJobsProfileResults( | ||
appIndex: Int, | ||
jobId: Int, | ||
sqlID: Option[Long], // sqlID is optional because Jobs might not have a SQL (i.e., RDDs) | ||
jobResult: String, | ||
endReason: String) extends ProfileResult { | ||
override val outputHeaders = Seq("appIndex", "jobID", "sqlID", "jobResult", "failureReason") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the sqlID
column. The rest of changes are code formatting to have each field in its own line.
Seq(appIndex.toString, | ||
jobId.toString, | ||
sqlID.map(_.toString).getOrElse(null), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the sqlID
column. if sqlId is not defined then it puts null.
The rest of changes are code formatting to have each field in its own line.
Seq(appIndex.toString, | ||
jobId.toString, | ||
sqlID.map(_.toString).getOrElse(null), | ||
StringUtils.reformatCSVString(jobResult), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the sqlID column. if sqlId is not defined then it puts null.
The rest of changes are code formatting to have each field in its own line.
@@ -56,7 +56,7 @@ class HealthCheckSuite extends FunSuite { | |||
assert(apps.size == 1) | |||
|
|||
val healthCheck = new HealthCheck(apps) | |||
for (app <- apps) { | |||
for (_ <- apps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(unrelated) get rid of unused definition
@@ -142,7 +142,7 @@ class HealthCheckSuite extends FunSuite { | |||
assert(apps.size == 1) | |||
|
|||
val healthCheck = new HealthCheck(apps) | |||
for (app <- apps) { | |||
for (_ <- apps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(unrelated) get rid of unused definition
CC: @leewyang |
Signed-off-by: Ahmed Hussein (amahussein) [email protected]
Fixes #1563
sqlID
to failed_jobs.csvThis pull request includes several changes to improve the handling of job profiling and file format extraction in the RAPIDS plugin for Apache Spark. The most important changes include modifying the
FailedJobsProfileResults
case class to include an optional SQL ID, updating related views and tests, and simplifying the code in theHealthCheckSuite
class.Sample output file
Improvements to job profiling:
core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/ProfileClassWarehouse.scala
: Modified theFailedJobsProfileResults
case class to include an optionalsqlID
field and updated theoutputHeaders
andconvertToSeq
methods accordingly.core/src/main/scala/com/nvidia/spark/rapids/tool/views/JobView.scala
: Updated theAppFailedJobsViewTrait
to handle the newsqlID
field inFailedJobsProfileResults
and modified thesortView
method to includesqlID
in the sorting criteria.core/src/test/resources/ProfilingExpectations/jobs_failure_eventlog_expectation.csv
: Updated the test expectations to include the newsqlID
field in the CSV header and data.