Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install on Ubuntu 14.04 uses the default OpenBlas library -> Low Performance #121

Open
fredowski opened this issue Jun 5, 2016 · 8 comments

Comments

@fredowski
Copy link
Contributor

Hi,

i installed torch via torch/distro on a Amazon EC2 g2.2xlarge machine with Ubuntu 14.04. Instead of the default luajit, I used lua51 because my test example crashed on MacOS due to memory problems for luajit. Then I checked the performance for this cifar10 net: https://github.com/szagoruyko/cifar.torch

With the default Ubuntu installation I get for the test on the CPU (not GPU)

th train.lua --type=float

a steptime of 31s. The steptime on my Macbook Pro i5 is approx. 8s. So this is really bad.

With

OMP_NUM_THREADS=4 th train.lua --type=float

the steptime is reduced to approx. 10s. So there is much communication which reduces overall performance.

Then I installed a local OpenBLAS library via git as described in install.sh for other systems. I linked torch agains this version. The resulting performance is 3.7s steptime for using 8 cores. (4 Cores is the same...)

With

th train.lua --type=cuda

which uses the GPU, the resulting steptime is 1679ms.

The reported steptime for a GTX 980 was 700ms: http://torch.ch/blog/2015/07/30/cifar.html just for comparison.

So I think it is really important to compile and install the OpenBLAS library with

make  NO_AFFINITY=1 USE_OPENMP=1

but the current install.sh and install-deps uses the default system openblas version for Ubuntu 14.04.

My proposal is to compile OpenBLAS also for Ubuntu 14.04. Doing this change will improve the CPU performance from 31s per step to 3.5s per step. Was there any reason not to do this as this is done for other distros?

Friedrich

@cdluminate
Copy link

@lukeyeager
Copy link
Contributor

See relevant discussion here:
torch/ezinstall#83 (comment)

@fredowski
Copy link
Contributor Author

Hi CDLuminate,

your proposal might be an idea to build from source via the package mechanism. But I think that openblas is currently build without openmp: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=684344
on debian/ubuntu. I guess the difference in performance is not from updates in the source code but rather

a) Compiling it with openmp which fits to torch (Needs different compile switch)
b) Maybe AVX(2) instructions enabled for the current hardware. I have not looked into the build process in debian/ubuntu if this allowed/enabled/prohibited to have the build architecture independent.

Friedrich

@nagadomi
Copy link
Contributor

nagadomi commented Aug 6, 2016

It seems that the debian package of OpenBLAS is compiled with USE_OPENMP=0 and USE_THREAD=1.
So libopenblas is running with pthread, and Torch is running with OpenMP. It causes the following two issues.

  1. torch.setnumthreads does not work. (OpenBLAS's pthread is out of support)
  2. performance issue (4~10x slower). Torch's OpenMP threads and OpenBLAS's pthreads are nested. It causes the performance issue.

@cdluminate
Copy link

@nagadomi
Copy link
Contributor

nagadomi commented Aug 6, 2016

In Ubuntu 16.04, libopenblas-base is still using pthread. So I suggest that Torch's default installer(distro) should not use libopenblas-base.
Or we need libopenblas-openmp package.

@soumith
Copy link
Member

soumith commented Aug 24, 2016

@nagadomi if you send a pull request to install-deps.sh with whatever your preference is, i will merge.

@cdluminate
Copy link

cdluminate commented Dec 5, 2016

Now I totally agree with recompiling OpenBLAS locally (for Debian and Ubuntu) for sake of performance. I gave a similar hint in the Debian caffe package guide draft. BVLC/caffe#2601

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants