Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault (core dumped) #5

Open
littlejerry411 opened this issue Nov 3, 2020 · 6 comments
Open

Segmentation fault (core dumped) #5

littlejerry411 opened this issue Nov 3, 2020 · 6 comments

Comments

@littlejerry411
Copy link

After I configured the environment according to readme, I started this command "python3 test.py model_best_ndcg.pth --config configs/tern.yaml --size 1k ", I found this error "Segmentation fault (core dumped)". Then I tried debugging test.py and found that there was something wrong after the code"loaded_config = checkpoint['config']" , I couldn't print checkpoint['config'] and found this error "Segmentation fault (core dumped)". I would appreciate it if you told me how to solve it.

@Erf1369
Copy link

Erf1369 commented Nov 8, 2020

@littlejerry411 Could you please share the solution if you found it?

@mesnico
Copy link
Owner

mesnico commented Nov 9, 2020

Unfortunately, I am not able to reproduce the problem. On my system, the code runs without errors.
If you haven't yet, you can try using gdb:
gdb --args python3 test.py model_best_ndcg.pth --config configs/tern.yaml --size 1k
Once in the gdb prompt, you issue the command run, and then backtrace once the program crashes into segmentation fault. You can then share the trace here.

@Erf1369
Copy link

Erf1369 commented Nov 9, 2020

@mesnico Thanks for your nice work Nicola. I have a similar problem. When I run "python3 test.py model_best_ndcg.pth --config configs/tern.yaml --size 1k ", I get error in "checkpoint = torch.load(model_checkpoint)". The error is "Segmentation fault". I appreciate it if you could run your model on a system other than your own system and help us to reproduce your nice experiments. Thanks!

@Erf1369
Copy link

Erf1369 commented Nov 9, 2020

@mesnico I tried running with gdb and here is the result:

(gdb) run
Starting program: /export/home1/NoCsBack/hci/Erf/anaconda3/envs/tern/bin/python3 test.py model_best_ndcg.pth --config configs/tern.yaml --size 1k
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 22038]
[New Thread 0x7fb8e5d4c700 (LWP 22040)]
[New Thread 0x7fb8e554b700 (LWP 22041)]
[New Thread 0x7fb8e0d4a700 (LWP 22042)]
[New Thread 0x7fb8de549700 (LWP 22043)]
[New Thread 0x7fb8dbd48700 (LWP 22044)]
[New Thread 0x7fb8d9547700 (LWP 22045)]
[New Thread 0x7fb8d6d46700 (LWP 22046)]
[New Thread 0x7fb8d4545700 (LWP 22047)]
[New Thread 0x7fb8d1d44700 (LWP 22048)]
[New Thread 0x7fb8cf543700 (LWP 22049)]
[New Thread 0x7fb8ccd42700 (LWP 22050)]
[New Thread 0x7fb8ca541700 (LWP 22051)]
[New Thread 0x7fb8c7d40700 (LWP 22052)]
[New Thread 0x7fb8c553f700 (LWP 22053)]
[New Thread 0x7fb8c4d3e700 (LWP 22054)]
[New Thread 0x7fb8c053d700 (LWP 22055)]
[New Thread 0x7fb8bdd3c700 (LWP 22056)]
[New Thread 0x7fb8bb53b700 (LWP 22057)]
[New Thread 0x7fb8b8d3a700 (LWP 22058)]
[New Thread 0x7fb8b6539700 (LWP 22059)]
[New Thread 0x7fb8b3d38700 (LWP 22060)]
[New Thread 0x7fb8b1537700 (LWP 22061)]
[New Thread 0x7fb8aed36700 (LWP 22062)]
[New Thread 0x7fb8ac535700 (LWP 22063)]
[New Thread 0x7fb8a9d34700 (LWP 22064)]
[New Thread 0x7fb8a7533700 (LWP 22065)]
[New Thread 0x7fb8a4d32700 (LWP 22066)]
[New Thread 0x7fb8a2531700 (LWP 22067)]
[New Thread 0x7fb89fd30700 (LWP 22068)]
[New Thread 0x7fb89d52f700 (LWP 22069)]
[New Thread 0x7fb89ad2e700 (LWP 22070)]
[New Thread 0x7fb89852d700 (LWP 22071)]
[New Thread 0x7fb895d2c700 (LWP 22072)]
[New Thread 0x7fb89352b700 (LWP 22073)]
[New Thread 0x7fb890d2a700 (LWP 22074)]
[New Thread 0x7fb88e529700 (LWP 22075)]
[New Thread 0x7fb88bd28700 (LWP 22076)]
[New Thread 0x7fb889527700 (LWP 22077)]
[New Thread 0x7fb886d26700 (LWP 22078)]
[New Thread 0x7fb884525700 (LWP 22079)]
[New Thread 0x7fb881d24700 (LWP 22080)]
[New Thread 0x7fb87f523700 (LWP 22081)]
[New Thread 0x7fb87cd22700 (LWP 22082)]
[New Thread 0x7fb87a521700 (LWP 22083)]
[New Thread 0x7fb877d20700 (LWP 22084)]
[New Thread 0x7fb87551f700 (LWP 22085)]
[New Thread 0x7fb874d1e700 (LWP 22086)]
[Thread 0x7fb8a7533700 (LWP 22065) exited]
[Thread 0x7fb89352b700 (LWP 22073) exited]
[Thread 0x7fb89ad2e700 (LWP 22070) exited]
[Thread 0x7fb89d52f700 (LWP 22069) exited]
[Thread 0x7fb89fd30700 (LWP 22068) exited]
[Thread 0x7fb87a521700 (LWP 22083) exited]
[Thread 0x7fb874d1e700 (LWP 22086) exited]
[Thread 0x7fb87551f700 (LWP 22085) exited]
[Thread 0x7fb877d20700 (LWP 22084) exited]
[Thread 0x7fb87cd22700 (LWP 22082) exited]
[Thread 0x7fb87f523700 (LWP 22081) exited]
[Thread 0x7fb881d24700 (LWP 22080) exited]
[Thread 0x7fb884525700 (LWP 22079) exited]
[Thread 0x7fb886d26700 (LWP 22078) exited]
[Thread 0x7fb889527700 (LWP 22077) exited]
[Thread 0x7fb88bd28700 (LWP 22076) exited]
[Thread 0x7fb88e529700 (LWP 22075) exited]
[Thread 0x7fb890d2a700 (LWP 22074) exited]
[Thread 0x7fb895d2c700 (LWP 22072) exited]
[Thread 0x7fb89852d700 (LWP 22071) exited]
[Thread 0x7fb8a2531700 (LWP 22067) exited]
[Thread 0x7fb8a4d32700 (LWP 22066) exited]
[Thread 0x7fb8a9d34700 (LWP 22064) exited]
[Thread 0x7fb8ac535700 (LWP 22063) exited]
[Thread 0x7fb8aed36700 (LWP 22062) exited]
[Thread 0x7fb8b1537700 (LWP 22061) exited]
[Thread 0x7fb8b3d38700 (LWP 22060) exited]
[Thread 0x7fb8b6539700 (LWP 22059) exited]
[Thread 0x7fb8b8d3a700 (LWP 22058) exited]
[Thread 0x7fb8bb53b700 (LWP 22057) exited]
[Thread 0x7fb8bdd3c700 (LWP 22056) exited]
[Thread 0x7fb8c053d700 (LWP 22055) exited]
[Thread 0x7fb8c4d3e700 (LWP 22054) exited]
[Thread 0x7fb8c553f700 (LWP 22053) exited]
[Thread 0x7fb8c7d40700 (LWP 22052) exited]
[Thread 0x7fb8ca541700 (LWP 22051) exited]
[Thread 0x7fb8ccd42700 (LWP 22050) exited]
[Thread 0x7fb8cf543700 (LWP 22049) exited]
[Thread 0x7fb8d1d44700 (LWP 22048) exited]
[Thread 0x7fb8d4545700 (LWP 22047) exited]
[Thread 0x7fb8d6d46700 (LWP 22046) exited]
[Thread 0x7fb8d9547700 (LWP 22045) exited]
[Thread 0x7fb8dbd48700 (LWP 22044) exited]
[Thread 0x7fb8de549700 (LWP 22043) exited]
[Thread 0x7fb8e0d4a700 (LWP 22042) exited]
[Thread 0x7fb8e554b700 (LWP 22041) exited]
[Thread 0x7fb8e5d4c700 (LWP 22040) exited]
[Detaching after fork from child process 22087]
[New Thread 0x7fb874d1e700 (LWP 22098)]

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x00007fb8000055ae in ?? ()
(gdb) backtrace
#0  0x00007fb8000055ae in ?? ()
#1  0x00007fb8fe29d8dd in c10::detail::LogAPIUsageFakeReturn(std::string const&) ()
   from /export/home1/NoCsBack/hci/Erf/anaconda3/envs/tern/lib/python3.6/site-packages/torch/lib/libc10.so
#2  0x00007fb903ccefc9 in at::cuda::detail::CUDAHooks::initCUDA() const ()
   from /export/home1/NoCsBack/hci/Erf/anaconda3/envs/tern/lib/python3.6/site-packages/torch/lib/libtorch.so
#3  0x00007fb92e42edac in std::call_once<at::Context::lazyInitCUDA()::{lambda()#1}>(std::once_flag&, at::Context::lazyInitCUDA()::{lambda()#1}&&)::{lambda()#2}::_FUN()
    () from /export/home1/NoCsBack/hci/Erf/anaconda3/envs/tern/lib/python3.6/site-packages/torch/lib/libtorch_python.so
#4  0x00007fb93d7cf827 in __pthread_once_slow () from /lib/x86_64-linux-gnu/libpthread.so.0
#5  0x00007fb92e5109db in THCPModule_initExtension(_object*, _object*) ()
   from /export/home1/NoCsBack/hci/Erf/anaconda3/envs/tern/lib/python3.6/site-packages/torch/lib/libtorch_python.so
#6  0x000055ae02c44c9a in _PyCFunction_FastCallDict (func_obj=<built-in method _cuda_init of module object at remote 0x7fb92ed3ecc8>, args=0x55ae063b4e88, nargs=0, 
    kwargs=0x0) at /tmp/build/80754af9/python_1564510748219/work/Objects/methodobject.c:192
#7  0x000055ae02cccabc in call_function (pp_stack=0x7ffe671c4ae8, oparg=<optimized out>, kwnames=0x0)
    at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4851
#8  0x000055ae02cef75a in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:3335
#9  0x000055ae02cc72db in _PyFunction_FastCall (globals=<optimized out>, nargs=0, args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4933
#10 _PyFunction_FastCallDict (func=<optimized out>, args=0x0, nargs=0, kwargs=<optimized out>) at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:5035
#11 0x000055ae02c4501f in _PyObject_FastCallDict (func=<function at remote 0x7fb8eb471c80>, args=0x0, nargs=<optimized out>, kwargs=0x0)
    at /tmp/build/80754af9/python_1564510748219/work/Objects/abstract.c:2310
#12 0x000055ae02d25039 in callmethod (is_size_t=0, va=0x7ffe671c4c70, format=0x7fb92e695064 "", func=<function at remote 0x7fb8eb471c80>)
    at /tmp/build/80754af9/python_1564510748219/work/Objects/abstract.c:2583
#13 PyObject_CallMethod (o=<optimized out>, name=<optimized out>, format=0x7fb92e695064 "") at /tmp/build/80754af9/python_1564510748219/work/Objects/abstract.c:2617
#14 0x00007fb92e400aa3 in torch::utils::cuda_lazy_init() ()
   from /export/home1/NoCsBack/hci/Erf/anaconda3/envs/tern/lib/python3.6/site-packages/torch/lib/libtorch_python.so
#15 0x00007fb92e50b42f in THCPModule_getDevice_wrap(_object*, _object*) ()
   from /export/home1/NoCsBack/hci/Erf/anaconda3/envs/tern/lib/python3.6/site-packages/torch/lib/libtorch_python.so
#16 0x000055ae02c44c9a in _PyCFunction_FastCallDict (func_obj=<built-in method _cuda_getDevice of module object at remote 0x7fb92ed3ecc8>, args=0x7fb8c79d41d0, 
    nargs=0, kwargs=0x0) at /tmp/build/80754af9/python_1564510748219/work/Objects/methodobject.c:192
#17 0x000055ae02cccabc in call_function (pp_stack=0x7ffe671c4f18, oparg=<optimized out>, kwnames=0x0)
    at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4851
#18 0x000055ae02cef75a in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:3335
#19 0x000055ae02cc72db in _PyFunction_FastCall (globals=<optimized out>, nargs=1, args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4933
#20 _PyFunction_FastCallDict (func=<optimized out>, args=0x7ffe671c50a0, nargs=1, kwargs=<optimized out>)
    at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:5035
#21 0x000055ae02c4501f in _PyObject_FastCallDict (func=<function at remote 0x7fb8eb4820d0>, args=0x7ffe671c50a0, nargs=<optimized out>, kwargs=0x0)
    at /tmp/build/80754af9/python_1564510748219/work/Objects/abstract.c:2310
#22 0x000055ae02c49aa3 in _PyObject_Call_Prepend (func=<function at remote 0x7fb8eb4820d0>, obj=<optimized out>, args=(), kwargs=0x0)
    at /tmp/build/80754af9/python_1564510748219/work/Objects/abstract.c:2373
#23 0x000055ae02c44e3b in _PyObject_FastCallDict (func=<method at remote 0x7fb93db8c9c8>, args=0x7ffe671c51c0, nargs=<optimized out>, kwargs=0x0)
    at /tmp/build/80754af9/python_1564510748219/work/Objects/abstract.c:2331
#24 0x000055ae02c635fd in PyObject_CallFunctionObjArgs (callable=<method at remote 0x7fb93db8c9c8>)
    at /tmp/build/80754af9/python_1564510748219/work/Objects/abstract.c:2827
--Type <RET> for more, q to quit, c to continue without paging--<RET>
#25 0x000055ae02cf0fd9 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:3208
#26 0x000055ae02cc6c5b in _PyFunction_FastCall (globals=<optimized out>, nargs=2, args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4933
#27 fast_function (func=<optimized out>, stack=0x7fb8c79bf798, nargs=2, kwnames=<optimized out>) at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4968
#28 0x000055ae02cccb95 in call_function (pp_stack=0x7ffe671c54f8, oparg=<optimized out>, kwnames=0x0)
    at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4872
#29 0x000055ae02cef75a in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:3335
#30 0x000055ae02cc6c5b in _PyFunction_FastCall (globals=<optimized out>, nargs=2, args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4933
#31 fast_function (func=<optimized out>, stack=0x55ae063a55f0, nargs=2, kwnames=<optimized out>) at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4968
#32 0x000055ae02cccb95 in call_function (pp_stack=0x7ffe671c56a8, oparg=<optimized out>, kwnames=0x0)
    at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4872
#33 0x000055ae02cef75a in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:3335
#34 0x000055ae02cc629e in _PyEval_EvalCodeWithName (_co=<code at remote 0x7fb8eb5394b0>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, 
    argcount=<optimized out>, kwnames=0x0, kwargs=0x0, kwcount=<optimized out>, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, 
    closure=(<cell at remote 0x7fb8c7a0baf8>, <cell at remote 0x7fb8c7a0b918>, <cell at remote 0x7fb8c7a0ba38>, <cell at remote 0x7fb8c7a0bac8>), 
    name=<optimized out>, qualname='_load.<locals>.persistent_load') at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4166
#35 0x000055ae02cc737e in _PyFunction_FastCallDict (func=<optimized out>, args=0x7ffe671c5990, nargs=1, kwargs=<optimized out>)
    at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:5084
#36 0x000055ae02c4501f in _PyObject_FastCallDict (func=<function at remote 0x7fb8c79b4ea0>, args=0x7ffe671c5990, nargs=<optimized out>, kwargs=0x0)
    at /tmp/build/80754af9/python_1564510748219/work/Objects/abstract.c:2310
#37 0x000055ae02c635fd in PyObject_CallFunctionObjArgs (callable=<function at remote 0x7fb8c79b4ea0>)
    at /tmp/build/80754af9/python_1564510748219/work/Objects/abstract.c:2827
#38 0x00007fb93b3887bd in load_binpersid (self=0x7fb8c7a055c0) at /usr/local/src/conda/python-3.6.9/Modules/_pickle.c:5566
#39 0x00007fb93b381c14 in load (self=0x7fb8c7a055c0) at /usr/local/src/conda/python-3.6.9/Modules/_pickle.c:6376
#40 0x000055ae02c44c9a in _PyCFunction_FastCallDict (func_obj=<built-in method load of _pickle.Unpickler object at remote 0x7fb8c7a055c0>, args=0x55ae063a3a58, 
    nargs=0, kwargs=0x0) at /tmp/build/80754af9/python_1564510748219/work/Objects/methodobject.c:192
#41 0x000055ae02cccabc in call_function (pp_stack=0x7ffe671c5c88, oparg=<optimized out>, kwnames=0x0)
    at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4851
#42 0x000055ae02cef75a in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:3335
#43 0x000055ae02cc7ff6 in _PyEval_EvalCodeWithName (qualname=0x0, name=<optimized out>, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwstep=2, 
    kwcount=<optimized out>, kwargs=0x7fb8c79ba168, kwnames=0x7fb8c79ba160, argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, 
    globals=<optimized out>, _co=<code at remote 0x7fb8eb539540>) at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4166
#44 PyEval_EvalCodeEx (_co=<code at remote 0x7fb8eb539540>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, 
    kws=0x7fb8c79ba160, kwcount=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4187
#45 0x000055ae02cc88e6 in function_call (func=func@entry=<function at remote 0x7fb8eb4eb1e0>, 
    arg=(<_io.BufferedReader at remote 0x7fb8c7a45d58>, None, <module at remote 0x7fb93b4290e8>), kw={'encoding': 'utf-8'})
    at /tmp/build/80754af9/python_1564510748219/work/Objects/funcobject.c:604
#46 0x000055ae02c44a5e in PyObject_Call (func=<function at remote 0x7fb8eb4eb1e0>, args=<optimized out>, kwargs=<optimized out>)
    at /tmp/build/80754af9/python_1564510748219/work/Objects/abstract.c:2261
#47 0x000055ae02cf0e37 in do_call_core (kwdict={'encoding': 'utf-8'}, 
    callargs=(<_io.BufferedReader at remote 0x7fb8c7a45d58>, None, <module at remote 0x7fb93b4290e8>), func=<function at remote 0x7fb8eb4eb1e0>)
    at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:5120
#48 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:3404
--Type <RET> for more, q to quit, c to continue without paging--<RET>
#49 0x000055ae02cc5e66 in _PyEval_EvalCodeWithName (_co=<code at remote 0x7fb92ecf8f60>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, 
    argcount=<optimized out>, kwnames=0x0, kwargs=0x7fb8c7a47bd8, kwcount=<optimized out>, kwstep=1, defs=0x7fb8eb4e12e0, defcount=2, kwdefs=0x0, closure=0x0, 
    name=<optimized out>, qualname='load') at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4166
#50 0x000055ae02cc6ed6 in fast_function (func=<optimized out>, stack=0x7fb8c7a47bd0, nargs=1, kwnames=<optimized out>)
    at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4992
#51 0x000055ae02cccb95 in call_function (pp_stack=0x7ffe671c6218, oparg=<optimized out>, kwnames=0x0)
    at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4872
#52 0x000055ae02cef75a in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:3335
#53 0x000055ae02cc6c5b in _PyFunction_FastCall (globals=<optimized out>, nargs=2, args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4933
#54 fast_function (func=<optimized out>, stack=0x55ae0362c528, nargs=2, kwnames=<optimized out>) at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4968
#55 0x000055ae02cccb95 in call_function (pp_stack=0x7ffe671c63c8, oparg=<optimized out>, kwnames=0x0)
    at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4872
#56 0x000055ae02cef75a in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:3335
#57 0x000055ae02cc79b9 in _PyEval_EvalCodeWithName (qualname=0x0, name=<optimized out>, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwstep=2, 
    kwcount=<optimized out>, kwargs=0x0, kwnames=0x0, argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>, 
    _co=<code at remote 0x7fb93c693f60>) at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4166
#58 PyEval_EvalCodeEx (_co=<code at remote 0x7fb93c693f60>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=0x0, 
    kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:4187
#59 0x000055ae02cc875c in PyEval_EvalCode (co=co@entry=<code at remote 0x7fb93c693f60>, 
    globals=globals@entry={'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <SourceFileLoader(name='__main__', path='test.py') at remote 0x7fb93c63ee10>, '__spec__': None, '__annotations__': {}, '__builtins__': <module at remote 0x7fb93dbd3638>, '__file__': 'test.py', '__cached__': None, 'argparse': <module at remote 0x7fb93c6452c8>, 'evaluation': <module at remote 0x7fb93c645f98>, 'yaml': <module at remote 0x7fb8cc717728>, 'torch': <module at remote 0x7fb93a553a48>, 'main': <function at remote 0x7fb93db3ae18>, 'parser': <ArgumentParser(description=None, argument_default=None, prefix_chars='-', conflict_handler='error', _registries={'action': {None: <type at remote 0x55ae036b8e68>, 'store': <type at remote 0x55ae036b8e68>, 'store_const': <type at remote 0x55ae036b9218>, 'store_true': <type at remote 0x55ae036b95c8>, 'store_false': <type at remote 0x55ae036b9978>, 'append': <type at remote 0x55ae036b9d28>, 'append_const': <type at remote 0x55ae036ba0d8>, 'count': <type at remote 0x55ae036ba488>, 'help': <type...(truncated), 
    locals=locals@entry={'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <SourceFileLoader(name='__main__', path='test.py') at remote 0x7fb93c63ee10>, '__spec__': None, '__annotations__': {}, '__builtins__': <module at remote 0x7fb93dbd3638>, '__file__': 'test.py', '__cached__': None, 'argparse': <module at remote 0x7fb93c6452c8>, 'evaluation': <module at remote 0x7fb93c645f98>, 'yaml': <module at remote 0x7fb8cc717728>, 'torch': <module at remote 0x7fb93a553a48>, 'main': <function at remote 0x7fb93db3ae18>, 'parser': <ArgumentParser(description=None, argument_default=None, prefix_chars='-', conflict_handler='error', _registries={'action': {None: <type at remote 0x55ae036b8e68>, 'store': <type at remote 0x55ae036b8e68>, 'store_const': <type at remote 0x55ae036b9218>, 'store_true': <type at remote 0x55ae036b95c8>, 'store_false': <type at remote 0x55ae036b9978>, 'append': <type at remote 0x55ae036b9d28>, 'append_const': <type at remote 0x55ae036ba0d8>, 'count': <type at remote 0x55ae036ba488>, 'help': <type...(truncated)) at /tmp/build/80754af9/python_1564510748219/work/Python/ceval.c:731
#60 0x000055ae02d48744 in run_mod (mod=<optimized out>, filename=<optimized out>, 
    globals={'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <SourceFileLoader(name='__main__', path='test.py') at remote 0x7fb93c63ee10>, '__spec__': None, '__annotations__': {}, '__builtins__': <module at remote 0x7fb93dbd3638>, '__file__': 'test.py', '__cached__': None, 'argparse': <module at remote 0x7fb93c6452c8>, 'evaluation': <module at remote 0x7fb93c645f98>, 'yaml': <module at remote 0x7fb8cc717728>, 'torch': <module at remote 0x7fb93a553a48>, 'main': <function at remote 0x7fb93db3ae18>, 'parser': <ArgumentParser(description=None, argument_default=None, prefix_chars='-', conflict_handler='error', _registries={'action': {None: <type at remote 0x55ae036b8e68>, 'store': <type at remote 0x55ae036b8e68>, 'store_const': <type at remote 0x55ae036b9218>, 'store_true': <type at remote 0x55ae036b95c8>, 'store_false': <type at remote 0x55ae036b9978>, 'append': <type at remote 0x55ae036b9d28>, 'append_const': <type at remote 0x55ae036ba0d8>, 'count': <type at remote 0x55ae036ba488>, 'help': <type...(truncated), 
    locals={'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <SourceFileLoader(name='__main__', path='test.py') at remote 0x7fb93c63ee10>, '__spec__': None, '__annotations__': {}, '__builtins__': <module at remote 0x7fb93dbd3638>, '__file__': 'test.py', '__cached__': None, 'argparse': <module at remote 0x7fb--Type <RET> for more, q to quit, c to continue without paging--c
93c6452c8>, 'evaluation': <module at remote 0x7fb93c645f98>, 'yaml': <module at remote 0x7fb8cc717728>, 'torch': <module at remote 0x7fb93a553a48>, 'main': <function at remote 0x7fb93db3ae18>, 'parser': <ArgumentParser(description=None, argument_default=None, prefix_chars='-', conflict_handler='error', _registries={'action': {None: <type at remote 0x55ae036b8e68>, 'store': <type at remote 0x55ae036b8e68>, 'store_const': <type at remote 0x55ae036b9218>, 'store_true': <type at remote 0x55ae036b95c8>, 'store_false': <type at remote 0x55ae036b9978>, 'append': <type at remote 0x55ae036b9d28>, 'append_const': <type at remote 0x55ae036ba0d8>, 'count': <type at remote 0x55ae036ba488>, 'help': <type...(truncated), flags=<optimized out>, arena=<optimized out>) at /tmp/build/80754af9/python_1564510748219/work/Python/pythonrun.c:1025
#61 0x000055ae02d48b41 in PyRun_FileExFlags (fp=0x55ae03661810, filename_str=<optimized out>, start=<optimized out>, globals={'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <SourceFileLoader(name='__main__', path='test.py') at remote 0x7fb93c63ee10>, '__spec__': None, '__annotations__': {}, '__builtins__': <module at remote 0x7fb93dbd3638>, '__file__': 'test.py', '__cached__': None, 'argparse': <module at remote 0x7fb93c6452c8>, 'evaluation': <module at remote 0x7fb93c645f98>, 'yaml': <module at remote 0x7fb8cc717728>, 'torch': <module at remote 0x7fb93a553a48>, 'main': <function at remote 0x7fb93db3ae18>, 'parser': <ArgumentParser(description=None, argument_default=None, prefix_chars='-', conflict_handler='error', _registries={'action': {None: <type at remote 0x55ae036b8e68>, 'store': <type at remote 0x55ae036b8e68>, 'store_const': <type at remote 0x55ae036b9218>, 'store_true': <type at remote 0x55ae036b95c8>, 'store_false': <type at remote 0x55ae036b9978>, 'append': <type at remote 0x55ae036b9d28>, 'append_const': <type at remote 0x55ae036ba0d8>, 'count': <type at remote 0x55ae036ba488>, 'help': <type...(truncated), locals={'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <SourceFileLoader(name='__main__', path='test.py') at remote 0x7fb93c63ee10>, '__spec__': None, '__annotations__': {}, '__builtins__': <module at remote 0x7fb93dbd3638>, '__file__': 'test.py', '__cached__': None, 'argparse': <module at remote 0x7fb93c6452c8>, 'evaluation': <module at remote 0x7fb93c645f98>, 'yaml': <module at remote 0x7fb8cc717728>, 'torch': <module at remote 0x7fb93a553a48>, 'main': <function at remote 0x7fb93db3ae18>, 'parser': <ArgumentParser(description=None, argument_default=None, prefix_chars='-', conflict_handler='error', _registries={'action': {None: <type at remote 0x55ae036b8e68>, 'store': <type at remote 0x55ae036b8e68>, 'store_const': <type at remote 0x55ae036b9218>, 'store_true': <type at remote 0x55ae036b95c8>, 'store_false': <type at remote 0x55ae036b9978>, 'append': <type at remote 0x55ae036b9d28>, 'append_const': <type at remote 0x55ae036ba0d8>, 'count': <type at remote 0x55ae036ba488>, 'help': <type...(truncated), closeit=1, flags=0x7ffe671c666c) at /tmp/build/80754af9/python_1564510748219/work/Python/pythonrun.c:978
#62 0x000055ae02d48d43 in PyRun_SimpleFileExFlags (fp=0x55ae03661810, filename=<optimized out>, closeit=1, flags=0x7ffe671c666c) at /tmp/build/80754af9/python_1564510748219/work/Python/pythonrun.c:419
#63 0x000055ae02d4c833 in run_file (p_cf=0x7ffe671c666c, filename=0x55ae035d1770 L"test.py", fp=0x55ae03661810) at /tmp/build/80754af9/python_1564510748219/work/Modules/main.c:340
#64 Py_Main (argc=<optimized out>, argv=<optimized out>) at /tmp/build/80754af9/python_1564510748219/work/Modules/main.c:811
#65 0x000055ae02c1688e in main (argc=7, argv=0x7ffe671c6878) at /tmp/build/80754af9/python_1564510748219/work/Programs/python.c:69
`

@mesnico
Copy link
Owner

mesnico commented Nov 9, 2020

The beginning of the stack trace seems related to this one. It may be something related to cuda initialization for some variables. I will try to work on this in the next few days. In the meantime, you can try to substitute the following line in test.py
checkpoint = torch.load(model_checkpoint)
with
checkpoint = torch.load(model_checkpoint, map_location=torch.device('cpu'))
to see if something changes.
Thank you for your interest in this work!

@Erf1369
Copy link

Erf1369 commented Nov 9, 2020

@mesnico Thanks for your response. I tried

checkpoint = torch.load(model_checkpoint, map_location=torch.device('cpu'))

, but the same error happened again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants