memory utilization increment after every request, worker died, memory issue #974

n0thing233 · 2021-10-31T18:37:55Z

Hi, MMS by default will print memory utilization into log which is great. The problem I have is after each request to MMS, the memory utilization increment a little bit. after several requests, the memory utilization went up to 100% and worker died.
I don't think this is the right behavior right?
I tried gc.collect() in _handle function but it doesn't work.(no gpu available in this machine)
I wonder if anyone can help me out here.
here is an example:
when just started the server, the log shows:
2021-10-31 18:22:25,881 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:5.1|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635704545
After one request:
mms_1 | 2021-10-31 18:24:25,742 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:26.2|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635704665
After second request:
mms_1 | 2021-10-31 18:26:25,601 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:39.7|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635704785
After third request:
mms_1 | 2021-10-31 18:30:25,323 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:58.5|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635705025
After 4th request:
mms_1 | 2021-10-31 18:32:25,187 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:81.6|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635705145
After 5th request,OOM appears:
mms_1 | 2021-10-31 18:35:41,402 [INFO ] epollEventLoopGroup-4-7 com.amazonaws.ml.mms.wlm.WorkerThread - 9000-96795301 Worker disconnected. WORKER_MODEL_LOADED mms_1 | 2021-10-31 18:35:41,528 [DEBUG] W-9000-video_segmentation_v1 com.amazonaws.ml.mms.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died. mms_1 | java.lang.InterruptedException mms_1 | at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) mms_1 | at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088) mms_1 | at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418) mms_1 | at com.amazonaws.ml.mms.wlm.WorkerThread.runWorker(WorkerThread.java:148) mms_1 | at com.amazonaws.ml.mms.wlm.WorkerThread.run(WorkerThread.java:211) mms_1 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) mms_1 | at java.util.concurrent.FutureTask.run(FutureTask.java:266) mms_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) mms_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) mms_1 | at java.lang.Thread.run(Thread.java:748)

The text was updated successfully, but these errors were encountered:

kastman · 2021-12-10T15:29:35Z

Commenting to follow - at first I suspected this was involved with #942 , but I tested with that PR and saw no change in behavior compared to the current released version (1.1.4). @n0thing233 - Are you doing any large memory load from inside the predict function, or is it all in the model load?

n0thing233 changed the title ~~worker died and restart, memory issue~~ memory utilization increment after every request, worker died, memory issue Oct 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory utilization increment after every request, worker died, memory issue #974

memory utilization increment after every request, worker died, memory issue #974

n0thing233 commented Oct 31, 2021

kastman commented Dec 10, 2021

memory utilization increment after every request, worker died, memory issue #974

memory utilization increment after every request, worker died, memory issue #974

Comments

n0thing233 commented Oct 31, 2021

kastman commented Dec 10, 2021