Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WEATGenerator: log level set programmatically not passed to java.util.logging #7

Open
sebastian-nagel opened this issue Oct 8, 2023 · 0 comments

Comments

@sebastian-nagel
Copy link

WEATGenerator sets the log level programmatically (WEATGenerator, line 87) for classes in ia-web-commons using java.util.logging. When running the job on a Hadoop cluster of a recent Hadoop version (here: Hadoop 3.3.5), the log level is not passed to properly to the loggers of java.util.logging.

Notes:

  • WEATGenerator uses org.apache.commons.logging while other classes in ia-hadoop-tools use java.util.logging
  • log messages from java.util.logging are written to stderr while the log messages from WEATGenerator and Hadoop classes are written to "syslog" which is the expected place. Also the formatting of the java.util.logging messages does not follow the format defined by the Hadoop logging configuration.
  • recently, the logging of Haddop was switched from log4j to reload4j (HADOOP-18088) as the backend for slf4j logging classes.

As a consequence of not passing the log levels, log files are very large by a factor of almost 1000. A related PR (commoncrawl/ia-web-commons#33) changes the log level for all log messages not related to potential errors from INFO to FINE. After the fix was deployed, the log file volume is back to a acceptable size:

$> hadoop fs -du -h /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/
41.7 G  125.1 G  /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/0181
40.6 G  121.9 G  /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/0182
37.6 G  112.9 G  /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/0183
38.6 G  115.7 G  /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/0184
39.2 G  117.6 G  /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/0185
39.1 G  117.3 G  /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/0186
# <<<< fix applied, see commoncrawl/ia-web-commons#33
46.9 M  140.7 M  /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/0187
46.9 M  140.8 M  /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/0188
46.9 M  140.7 M  /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/0189

Some investigation is needed, whether there is a reliable solution to define log level and format for loggers based on java.util.logging when running the code as part of a Hadoop job?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant