Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudnn error on windows #188

Open
CorcovadoMing opened this issue Dec 21, 2016 · 5 comments
Open

cudnn error on windows #188

CorcovadoMing opened this issue Dec 21, 2016 · 5 comments

Comments

@CorcovadoMing
Copy link

I've installed torch on windows with cuda and cudnn,
I can run cunn without error, but when I convert my model into cudnn, the error appears:

In 1 module of nn.Sequential:
...\.\install\luarocks\systree/share/lua/5.1/cudnn\find.lua:379: bad argument #7 to 'call' (cannot convert 'int *' to 'uint64_t *')
stack traceback:
        [C]: in function 'call'
        ...\.\install\luarocks\systree/share/lua/5.1/cudnn\find.lua:379: in function 'callCudnn'
        ...\.\install\luarocks\systree/share/lua/5.1/cudnn\find.lua:472: in function 'forwardAlgorithm'
        ...rocks\systree/share/lua/5.1/cudnn\SpatialConvolution.lua:190: in function <...rocks\systree/share/lua/5.1/cudnn\SpatialConvolution.lua:186>
        [C]: in function 'xpcall'
        ...\install\luarocks\systree/share/lua/5.1/nn\Container.lua:63: in function 'rethrowErrors'
        ...install\luarocks\systree/share/lua/5.1/nn\Sequential.lua:44: in function 'forward'
        Main.lua:221: in function 'Train'
        Main.lua:343: in main chunk
        [C]: in function 'dofile'
        ...l\luarocks\systree\lib\luarocks\rocks\trepl\scm-1\bin\th:145: in main chunk
        [C]: at 0x7ff726b71eb0

Any suggestion?

@BTNC
Copy link
Contributor

BTNC commented Dec 22, 2016

Package cudnn is not fully patched to work on windows. The main problem is cudnn is using LongTensor as a storage for 64 bit integer in a few places, while the underlining long type is only 32 bit as int on windows. For now, you have to use cunn instead of cudnn if you face error with cudnn, or you can try to replace those LongTensors to real 64 bit integers.

@CorcovadoMing
Copy link
Author

@BTNC I can use cunn without errors. However, it is slow, I still need 8 min for an epoch running vgg16 instead of 30 min an epoch on CPU, I thought the poor performance is because of not using cudnn, or do you think there are another issue related to the performance? (it only need 30 sec an epoch on Linux)

May you provide some clues about how could I help to port the cudnn running on windows?

@elikosan
Copy link

I am facing the same problem. When do you think there will be a patch available ?
Thanks!

@elikosan
Copy link

Actually, i tried @BTNC's suggestion to replace some LongTensor with 64bit pointers:
--local bufSize = torch.LongTensor(1)
local bufSize = ffi.new("size_t[1]")
And it seems to work fine.

@wakanawakana
Copy link

I Try this code
cudnn work

                    local bufSize = torch.LongTensor(1)
                    local uint64buf = ffi.new("size_t[1]")
                    ret = cudnn.call(getWSAlgos[findAPI_idx],
                                     cudnn.getHandle(),
                                     params[1], params[3], layer.convDesc[0], params[6],
                                     retAlgo, ffi.cast('uintptr_t*', uint64buf))
                                     --bufSize:data()
                    bufSize[1] = tonumber(uint64buf[0])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants