-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 3/ local / Uncompressed #716
base: master
Are you sure you want to change the base?
Conversation
Hi AbdelrahmanMosly, When I run the drelephant with hdfs it is working fine and getting the event logs data into the ui. |
|
Directory is exists and spark-event logs are present in the directory. FetcherConf.xml Permissions are provided And i have configured the spark and hadoop configurations files correctly. Please help me on this. As I have already mentioned that dr-elephant is working fine with hdfs. I want it to work with the local FS. The below error is in the dr_elephant.log file 06-24-2024 20:28:10 INFO [Thread-8] com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : Event log directory file:///Users/shaikbasha/spark-events |
First i recommend you to chrck this commit : eb6092b Based on the error message
|
Thank you @AbdelrahmanMosly But then I am getting this error in the Can you please help in resolving this error. |
@Javid-Shaik If you check my PR, you'll see that to make Dr. Elephant compatible with Spark 3.x, I had to modify the listeners. Spark 3.x introduced new listeners and removed some of the existing ones, which required adjustments in the event log parsing logic. Additionally you can check those commits |
@AbdelrahmanMosly
@AbdelrahmanMosly And the ui is like in the above pictures giving the wrong data. Please help me in this. |
make sure the metrics you need are present in Spark Eventlog I am getting confused about which spark version you use as I see from this error if your whole problem was just to read from local you need to change configs. There is no need to change the code |
@AbdelrahmanMosly |
There are discrepancies with event logs due to differences in event types between Spark versions. Current Status:
Next Steps:
Congratulations on getting the basic UI and some events parsed! The next step involves customizing the event parsing to ensure all necessary data is captured from Spark 3.5.1 logs. |
@AbdelrahmanMosly These are the newly added events in spark-3.5.1 :
Can you please give me the head start to update Dr. Elephant’s event parsing logic to handle the new or renamed events in Spark 3.5.1. |
@Javid-Shaik In the worst-case scenario, you can parse the JSON of the event logs directly. |
@AbdelrahmanMosly
I have observed that the events As you have told me I have observed the code in the SparkDataCollection.scala. but not sure what to do where to modify the code. Could you please assist me in identifying the relevant sections of the code and provide recommendations on how to adjust the parsing logic to handle these discrepancies between the Spark versions? For the reference purpose you can see the below event.
|
|
Hi @AbdelrahmanMosly 07-10-2024 11:37:43 ERROR [dr-el-executor-thread-0] com.linkedin.drelephant.ElephantRunner : Failed to analyze SPARK spark-05981aeb46fb4816b20a62ae2fdf6041 |
@Javid-Shaik i dont remember encountering this error but simply you can check for any duplicate as this error indicates |
@AbdelrahmanMosly And also can you please tell me how to get JobExecution Url, Flow Execution Url, Job Definition Url etc. And also is it possible to analyze the streaming jobs. Currently DrElephant analyzes the batch jobs i.e the event logs of already completed application if it is please give me a lead on how to analyze the streaming jobs. |
@Javid-Shaik Regarding the URLs, you need to ensure that your configuration includes the scheduler URLs to integrate properly. Here's an example of the configuration you should add:
These configurations are necessary because Spark event logs alone are not sufficient for this task. |
@AbdelrahmanMosly Does this mean that the spark jobs need be submitted via a scheduler? And also please tell me whether it is possible to analyze the streaming jobs. |
I haven't personally gone down this path as it wasn’t required in my case, so I don't have direct experience with these methods. However, these suggestions should help you get started. |
@AbdelrahmanMosly |
@Javid-Shaik |
PR #357: Uncompressed File Support for Dr Elephant
Using Local Event Logs
Initially, Dr Elephant utilized the YARN Resource Manager to check submitted jobs. However, we made modifications to read from local Spark event logs instead.
If the environment variable
USE_YARN
is set totrue
, Dr Elephant will still be able to use the YARN Resource Manager. In this case, it will read and check the logs from the history server of Hadoop (YARN Resource Manager).Using Uncompressed Files
Dr Elephant originally processed compressed files using codec. We enhanced it to support the reading of uncompressed files.
Spark and Hadoop Versions
Dr Elephant is designed to run on Spark 1.4.0 and Hadoop 2.3.0. However, issues arose when attempting to read event logs generated from Spark 3, as a new listener was introduced that couldn't be identified using the
ReplayListenerBus
of Spark version 1.4.0. To address this, we implemented a workaround, neglecting the listener namedSparkListenerResourceProfileAdded
.Fetchers Configuration
We identified the Spark event logs directory and disabled the Tez Fetcher in the
FetcherConf.xml
configuration.