You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
WEATGenerator sets the log level programmatically (WEATGenerator, line 87) for classes in ia-web-commons using java.util.logging. When running the job on a Hadoop cluster of a recent Hadoop version (here: Hadoop 3.3.5), the log level is not passed to properly to the loggers of java.util.logging.
Notes:
WEATGenerator uses org.apache.commons.logging while other classes in ia-hadoop-tools use java.util.logging
log messages from java.util.logging are written to stderr while the log messages from WEATGenerator and Hadoop classes are written to "syslog" which is the expected place. Also the formatting of the java.util.logging messages does not follow the format defined by the Hadoop logging configuration.
recently, the logging of Haddop was switched from log4j to reload4j (HADOOP-18088) as the backend for slf4j logging classes.
As a consequence of not passing the log levels, log files are very large by a factor of almost 1000. A related PR (commoncrawl/ia-web-commons#33) changes the log level for all log messages not related to potential errors from INFO to FINE. After the fix was deployed, the log file volume is back to a acceptable size:
$> hadoop fs -du -h /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/
41.7 G 125.1 G /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/0181
40.6 G 121.9 G /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/0182
37.6 G 112.9 G /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/0183
38.6 G 115.7 G /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/0184
39.2 G 117.6 G /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/0185
39.1 G 117.3 G /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/0186
# <<<< fix applied, see commoncrawl/ia-web-commons#33
46.9 M 140.7 M /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/0187
46.9 M 140.8 M /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/0188
46.9 M 140.7 M /var/log/hadoop-yarn/apps/ubuntu/bucket-logs-tfile/0189
Some investigation is needed, whether there is a reliable solution to define log level and format for loggers based on java.util.logging when running the code as part of a Hadoop job?
The text was updated successfully, but these errors were encountered:
WEATGenerator sets the log level programmatically (WEATGenerator, line 87) for classes in ia-web-commons using java.util.logging. When running the job on a Hadoop cluster of a recent Hadoop version (here: Hadoop 3.3.5), the log level is not passed to properly to the loggers of java.util.logging.
Notes:
As a consequence of not passing the log levels, log files are very large by a factor of almost 1000. A related PR (commoncrawl/ia-web-commons#33) changes the log level for all log messages not related to potential errors from INFO to FINE. After the fix was deployed, the log file volume is back to a acceptable size:
Some investigation is needed, whether there is a reliable solution to define log level and format for loggers based on java.util.logging when running the code as part of a Hadoop job?
The text was updated successfully, but these errors were encountered: