Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Error: Container is running beyond physical memory limits #19

Closed
clifff opened this issue Sep 30, 2019 · 11 comments
Closed

Error: Container is running beyond physical memory limits #19

clifff opened this issue Sep 30, 2019 · 11 comments

Comments

@clifff
Copy link

clifff commented Sep 30, 2019

I'm trying to run some analysis on a collection of S3 Access Logs, and set up a Glue job using the steps in the README to do so. The set of logs is about 14 GB over 12.8 million files. Whenever I kick off the job, it runs for about 13 minutes and then fails with a Command failed with exit code 1 message. Looking at the logs, I see this line that seems important:

Diagnostics: Container [pid=11027,containerID=container_1569865532923_0001_01_000001] is running beyond physical memory limits. Current usage: 5.5 GB of 5.5 GB physical memory used; 7.7 GB of 27.5 GB virtual memory used. Killing container.

This is corroborated by CloudWatch metrics, which show the driver memory usage steadily climbing and the executor staying low.

Based on the athena_glue_service_logs blog post here, it seems like my volume of data is well within the expected limits. I retried the job after adding the --conf parameter set to spark.yarn.executor.memoryOverhead=1G, but it failed in the same way.

Any advice for getting this to work are appreciated - otherwise I'll follow the Glue documentation suggestion of writing a script to do the conversion using DynamicFrames.

@dacort
Copy link
Contributor

dacort commented Sep 30, 2019

Hi @clifff - there's a couple things you can try here.

  1. Change the worker type in the Glue job to one with larger memory (G.1X - see screenshot below).
  2. Try increasing the spark.yarn.executor.memoryOverhead even more, but there is only so far you can go with that.
  3. You can also try increasing the driver memory, since you mention that's increasing. Set a parameter with the key --conf and value spark.driver.memory=10g.

I would recommend trying the first option as that will inherently give you more memory to work with.

image

@clifff
Copy link
Author

clifff commented Sep 30, 2019

Thanks for the tip @dacort! Didn't realize worker type was configurable like that. I upped to G.1X and let the job run again - churned for about 100 minutes before crashing again. Found this in the logs:

Log Contents:
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): 169.254.169.254
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): 169.254.169.254
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): glue.us-east-1.amazonaws.com
INFO:athena_glue_service_logs.job:Recurring run, only looking for recent partitions on raw catalog.
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
# Executing /bin/sh -c "kill -9 10754"...
os::fork_and_exec failed: Cannot allocate memory (12)
End of LogType:stdout

Which matches what Cloudwatch is showing:

Screen Shot 2019-09-30 at 4 32 31 PM

It seems promising it didn't hit a memory usage of 1 and immediately crash, but that does make me think it's not necessary to configure the driver to a specific amount. Went ahead and raised spark.yarn.executor.memoryOverhead to 2G and will try that out.

@clifff
Copy link
Author

clifff commented Oct 1, 2019

Confirm, timed out in about the same amount of time with the 2G setting.

@dacort
Copy link
Contributor

dacort commented Oct 1, 2019

OK, thanks for trying that @clifff - looks like building up the list of those 13M files is taking up quite the resources. Give me a few days to see if I can reproduce this in my own environment to see what options there might be. There's definitely still some more testing for these scripts at that scale.

@clifff
Copy link
Author

clifff commented Oct 1, 2019

Sounds good - thanks for looking into this @dacort! Happy to tweak settings/code and retry whenever.

@clifff
Copy link
Author

clifff commented Oct 17, 2019

@dacort - sorry to bump, but any update on this? Totally understand if not - I may take a go at loading these up on an EC2 instance with lots of RAM and attempting to dig at what I want w/ unix tools.

@dacort
Copy link
Contributor

dacort commented Oct 21, 2019

Hey @clifff - Unfortunately haven't been able to take a look much deeper. How high did you bump spark.driver.memory?

A couple other options:

  • Try increasing executor memory as well (don't think this is the issue tho): --conf spark.yarn.executor.memory=1g and keep increasing
  • Take a look at this doc about grouping input files. You'll have to modify converter.py and I'm not sure what the options are to read from the catalog, but you can test reading the options with an explicit S3 path to at least see if it works.

There is some more detail on debugging OOM issues here as well: https://docs.aws.amazon.com/glue/latest/dg/monitor-profile-debug-oom-abnormalities.html#monitor-profile-debug-oom-driver

edit

I think you can specify the file grouping as an additional_options parameter to the from_catalog function. For example:

additional_options={"groupFiles": "inPartition"}

@clifff
Copy link
Author

clifff commented Oct 21, 2019

No worries! I actually was successful loading the logs onto an EC2 instance. Turns out the bucket inventory size was way off and it was more like 60 GB of logs... but the good news is I was able to filter it down to ~100 mb of relevant lines using ripgrep, and got the info I needed from there.

Will go ahed and close this for now since, but feel free to re-open if you want to track the issue further.

@clifff clifff closed this as completed Oct 21, 2019
@dacort
Copy link
Contributor

dacort commented Oct 21, 2019

👍 Sounds good, thanks!

@dacort
Copy link
Contributor

dacort commented Oct 21, 2019

I didn't realize you were just trying to do a one-time query. For future reference, this library creates two tables - one for the "raw" unconverted data and another for the "optimized" parquet data. This appears to have been failing during the conversion process, but you still could have queried the raw data. But ripgrep for the win! One of my favorite tools.

@RickardCardell
Copy link

RickardCardell commented May 25, 2020

Hi
So I've got a similar issue, that I couldn't make the job run on not that large dataset: 25GB s3 access logs/day for 30days.

I've tried with:

  • 150 standard DPUs
  • 100 G.1X
  • 50 G.2X

all with many combinations of memory settings to no avail.

I instead went to the code and skipped the repartition stage: https://github.com/awslabs/athena-glue-service-logs/blob/master/athena_glue_service_logs/converter.py#L66
I also had to add spark.hadoop.fs.s3.maxRetries=20 since it now makes quite a lot S3 calls which caused throttling.

The job succeeded with 100 'standard' workers after only 4hours.
The drawback is of course that more objects were created: between 50-140 per day-partition.

But for me at least it is better to have the jobs succeeding, than having no log data at all. Also, for our use case, the athena query performance will be good enough.

Q: Would it make sense to make the repartitioning configurable?
Another option is to have a (separate?) step that reduces the number of objects but more efficiently.

EDIT: added this as a separate issue instead: #21

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants