Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid signal error seen when running large data set #725

Open
NACHC-CAD opened this issue Dec 31, 2021 · 0 comments
Open

Invalid signal error seen when running large data set #725

NACHC-CAD opened this issue Dec 31, 2021 · 0 comments

Comments

@NACHC-CAD
Copy link

NACHC-CAD commented Dec 31, 2021

I'm running anon-link-entity-service with 6 hospitals each contributing 1,000,000 patients. During the run, there is a long section that uses about 60% of the available cpu, followed by a long (hours) period of time when only about 10% of the available CPU is being used, followed by the error shown in the attached logs.

Below is the area of the log where the error occurs. The attached files have more of the logs. I have the full logs but they are very large (about 1g).

full-log-error-section.txt
run-log.txt
run-log-error-focus.txt

pprl-error

backend_1 | [debug ] Connecting to redis [entityservice.cache.connection] pid=8efc27085d66234af616468b4251613028f05fa792c02df9 port=26379 request=64fe99e6 rid=eee1624a7a5a42e277f9e16dee29f30ca1f90fe440d494b1 server=redis
backend_1 | [info ] LOG_FILE: Connecting to redis [entityservice.cache.
connection] pid=8efc27085d66234af616468b4251613028f05fa792c02df9 request=64fe99e6 rid=eee1624a7a5a42e277f9e16dee29f30ca1f90fe440d494b1
backend_1 | [debug ] total comparisons: 10000000000000 [entityservice.views.run.status] pid=8efc27085d66234af616468b4251613028f05fa792c02df9 request=64fe99e6 rid=eee1624a7a5a42e277f9e16dee29f30ca1f90fe440d494b1
nginx_1 | [200] - 172.18.0.1 - "GET /api/v1/projects/8efc27085d66234af616468b4251613028f05fa792c02df9/runs/eee1624a7a5a42e277f9e16dee29f30ca1f90fe440d494b1/status HTTP/1.1" 335 920 374 0.014 "-" "python-requests/2.26.0" "-"
worker_a13_1 | [2021-12-30 20:44:28,832:
DEBUG/ForkPoolWorker-2] [debug ] setting up tracing on task [entityservice.tasks] task_name=aggregate_comparisons
worker_a13_1 | [2021-12-30 20:44:28,853: DEBUG/ForkPoolWorker-2] [debug ] Aggregating result chunks from 33060 files, total size: 958067760 [entityservice.tasks] pid=8efc27085d66234af616468b4251613028f05fa792c02df9 run_id=eee1624a7a5a42e277f9e16dee29f30ca1f90fe440d494b1 task_name=aggregate_comparisons
worker_a13_1 | [2021-12-30 20:44:28,949: WARNING/ForkPoolWorker-2] [warning ] Task 33ed3bfa-0953-429b-a530-c2818051fc31 is retrying after a 'S3Error' exception [entityservice.tasks] pid=8efc27085d66234af616468b4251613028f05fa792c02df9 run_id=eee1624a7a5a42e277f9e16dee29f30ca1f90fe440d494b1 task_name=aggregate_comparisons
worker_a13_1 | [2021-12-30 20:44:28,952: WARNING/ForkPoolWorker-2] /usr/lib/python3.9/signal.py:60: RuntimeWarning: invalid signal number 32, please use valid_signals()
worker_a13_1 | sigs_set = _signal.pthread_sigmask(how, mask)
worker_a13_1 |
worker_a13_1 | [2021-12-30 20:44:28,952: WARNING/ForkPoolWorker-2] /usr/lib/python3.9/signal.py:60: RuntimeWarning: invalid signal number 33, please use valid_signals()
worker_a13_1 | sigs_set = _signal.pthread_sigmask(how, mask)
worker_a13_1 |
worker_a13_1 | [2021-12-30 20:44:28,952: WARNING/ForkPoolWorker-2] /usr/lib/python3.9/signal.py:60: RuntimeWarning: invalid signal number 34, please use valid_signals()
worker_a13_1 | sigs_set = _signal.pthread_sigmask(how, mask)
worker_a13_1 |
worker_a13_1 | [2021-12-30 20:44:28,995: INFO/MainProcess] [info ] An error occurred while processing task [entityservice.tasks] run_id=eee1624a7a5a42e277f9e16dee29f30ca1f90fe440d494b1 task_id=<Context: {'lang': 'py', 'task': 'entityservice.tasks.comparing.aggregate_comparisons', 'id': '33ed3bfa-0953-429b-a530-c2818051fc31', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': 'a95304d4-eedc-4712-b479-315c1a3b3714', 'parent_id': '2434d2de-235b-45a8-8bc0-488c92b5438a', 'argsrepr': "([[18129, 435100, 'similarity-scores/771dc665dafe5bf0cbd98d2e.bin'], [871, 20908, 'similarity-scores/31cae8ef35597b136be6e185.bin'], [811, 19468, 'similarity-scores/fa76d3d05a26b3817834267f.bin'], [860, 20644, 'similarity-scores/f1f37c32c300f7bd365055c9.bin'], [879, 21100, 'similarity-scores/81719e644f12b1a400797d82.bin'], [860, 20644, 'similarity-scores/69fb24c851caebe8ed81ae03.bin'], [896, 21508, 'similarity-scores/26e9cbdcf0abf33b8a20c0fb.bin'], [929, 22300, 'similarity-scores/f74670768d2f987b4fb1b01b.bin'], [908, 21796, 'similarity-scores/3fca1b757b3f154bf1424045.bin'], [883, 21196, 'similarity-scores/06e7f5024859f3403eba7796.bin'], [919, 22060, 'similarity-scores/2ffa682d3afe6b54fff78835.bin'], [920, 22084, 'similarity-scores/498ef5687d1dcab93127d8eb.bin'], [837, 20092, 'similarity-scores/7befc3448aaf0822a0224496.bin'], [887, 21292, 'similarity-scores/615b4d08d37990b461ac70e9.bin'], [836, 20068, 'similarity-scores/e91720668cf63cc0769491b9.bin'], [924, 22180, 'similarity-scores/3ff5c3d9704a5a97163e2cd5.bin...', ...],)", 'kwargsrepr': "{'project_id': '8efc27085d66234af616468b4251613028f05fa792c02df9', 'run_id': 'eee1624a7a5a42e277f9e16dee29f30ca1f90fe440d494b1', 'parent_span': {'uber-trace-id': 'e38e6235a8b07c59:5a1d8351c03ad888:1e07d0b84a4e637d:1'}}", 'origin': 'gen8@6dc4aae3ab70', 'ignore_result': True, 'redelivered': True, 'reply_to': 'fa078a18-bf93-3164-8de4-0665067672f7', 'correlation_id': '33ed3bfa-0953-429b-a530-c2818051fc31', 'hostname': 'celery@bec4c46dba22', 'delivery_info': {'exchange': '', 'routing_key': 'highmemory', 'priority': 0, 'redelivered': None}, 'args': [[[18129, 435100, 'similarity-scores/771dc665dafe5bf0cbd98d2e.bin'], [871, 20908, 'similarity-scores/31cae8ef35597b136be6e185.bin'], [811, 19468, 'similarity-scores/fa76d3d05a26b3817834267f.bin'], [860, 20644, 'similarity-scores/f1f37c32c300f7bd365055c9.bin'], [879, 21100, 'similarity-scores/81719e644f12b1a400797d82.bin'], [860, 20644, 'similarity-scores/69fb24c851caebe8ed81ae03.bin'], [896, 21508, 'similarity-scores/26e9cbdcf0abf33b8a20c0fb.bin'], [929, 22300, 'similarity-scores/f74670768d2f987b4fb1b01b.bin'], [908, 21796, 'similarity-scores/3fca1b757b3f154bf1424045.bin'], [883, 21196, 'similarity-scores/06e7f5024859f3403eba7796.bin'], [919, 22060, 'similarity-scores/2ffa682d3afe6b54fff78835.bin'], [920, 22084, 'similarity-scores/498ef5687d1dcab93127d8eb.bin'], [837, 20092, 'similarity-scores/7befc3448aaf0822a0224496.bin'], [887, 21292, 'similarity-scores/615b4d08d37990b461ac70e9.bin'], [836, 20068, 'similarity-scores/e91720668cf63cc0769491b9.bin'], [924, 22180, 'similarity-scores/3ff5c3d9704a5a97163e2cd5.bin'], [878, 21076, 'similarity-scores/4c7ead830a841a3366663e76.bin'], [858, 20596, 'similarity-scores/2175593b9f3fbac45767a875.bin'], [853, 20476, 'similarity-scores/2546fe61601f8ec3fa087fdb.bin'], [889, 21340, 'similarity-scores/5a86e1059f0ee9ca9033ad56.bin'], [934, 22420, 'similarity-scores/eb2d93507c788eaa15a39f8e.bin'], [920, 22084, 'similarity-scores/6ebe9e0973cbad97c975d78f.bin'], [893, 21436, 'similarity-scores/e0ce6c8af04fead1a414bfc2.bin'], [882, 21172, 'similarity-scores/b21a1505cfcb6952a209163c.bin'], [914, 21940, 'similarity-scores/75c013d872e16288db3ff2d3.bin'], [910, 21844, 'similarity-scores/6e15c5aa835fb218d6bf588c.bin'], [928, 22276, 'similarity-scores/4855fd93634871ad004dad20.bin'], [867, 20812, 'similarity-scores/5fbc3a270cfe4b173d886320.bin'], [841, 20188, 'similarity-scores/d6bd858b6b5519a9392e0e17.bin'], [887, 21292, 'similarity-scores/d76e34f22dd656b373041df6.bin'], [948, 22756, 'similarity-scores/c3ac231c58efeaaf70057310.bin'], [908, 21796, 'similarity-scores/4227445c6b7c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant