[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Theano for Debian Science



On 25.02.2016 20:46, Daniel Stender wrote:
> I've packaged Theano [1] now for Debian Science [2, 3].
> 
> Description: CPU/GPU math expression compiler for Python
>  Theano is a Python library that allows one to define and evaluate mathematical
>  expressions involving multi-dimensional arrays efficiently. It provides a
>  high-level Numpy like expression language for functional description of
>  calculation, rearranges expressions for speed and stability, and generates
>  native machine instructions for fast calculation. Optionally, highly
>  accelerated computations could be carried out on graphics cards processors.
> 
> It's going into experimental first of all, there were some hangs here on the GPU, which
> might aren't reproducible, and it needs some more work.
> 
> I'll send more information on how to test drive it etc. when the package is in.
> 
> I would suggest to add it to the numerical computation task, where Numpy is.
> 
> Thanks,
> Daniel Stender
> 
> [1] http://www.deeplearning.net/software/theano/
> 
> [2] https://bugs.debian.org/576540 (ITP: theano -- Python library to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays)
> 
> [3] https://anonscm.debian.org/cgit/debian-science/packages/theano.git

[thx Anton, I'm already a group member but warm welcomes like this are always appreciated, though]

Theano is in experimental [1] now and I want to give some information towards the state of this package
and some hints how to test drive the library.

I've put it into experimental first (~exp1) because there were some issues I had which prevented it
from being ready for Unstable (see below).

One of the main advantages of Theano is to run the computations on a NVIDIA graphics card via the
CUDA API. For that, nvidia-cuda-dev (non-free) must be installed. Although, that's purely optional and
could be missed if you don't want to run non-free software. Alternatively, Theano has a backend for the
libgpuarray library developed by the same people (not packaged yet), which employs free OpenCL. My plan
is to package that, too.

For CPU processing, Theano employs OpenBLAS. Both, the and CPU and GPU extensions are build run-time.
While using it, the cache directory `~/.theano` fills up. There is a helper script `theano-cache` to flush
it in /usr/share/python{,3}-theano, but not yet running well (better sweep it manually, if needed).

The documentation is in the theano-doc package and could be found at:
file:///usr/share/doc/theano-doc/html/index.html

Being installed, you can first check if Theano runs on OpenBLAS like supposed:
$ THEANO_FLAGS=device=cpu python `python -c "import os, theano; print os.path.dirname(theano.__file__)"`/misc/check_blas.py
<cut>
Some Theano flags:
    blas.ldflags= -L/usr/lib -lblas
    compiledir= /home/aham/.theano/compiledir_Linux-4.3--amd64-x86_64-with-debian-stretch-sid--2.7.11+-64
    floatX= float64
    device= cpu
Some OS information:
    sys.platform= linux2
    sys.version= 2.7.11+ (default, Feb 22 2016, 16:38:42) 
[GCC 5.3.1 20160220]
    sys.prefix= /usr
{...}
Numpy dot module: numpy.core.multiarray
Numpy location: /usr/lib/python2.7/dist-packages/numpy/__init__.pyc
Numpy version: 1.10.4

We executed 10 calls to gemm with a and b matrices of shapes (2000, 2000) and (2000, 2000).
Total execution time: 3.61s on CPU (with direct Theano binding to blas).
Try to run this script a few times. Experience shows that the first time is not as fast as followings calls. The difference is not big, but consistent.
</cut>
$ THEANO_FLAGS=device=cpu python3 `python3 -c "import os, theano; print(os.path.dirname(theano.__file__))"`/misc/check_blas.py

Info on Theano configuration and environment variables could be found here:
file:///usr/share/doc/theano-doc/html/library/config.html

Theano ships an exhaustive test suite which could be run like this:
$ python -c "import theano; theano_test()"
This takes a while (approx 30mins - 1h, depending on the hardware). It appears the test auto detect
if the GPU device is accessible, or not.

CUDA is going to be updated to 7.5, like Lumin has written, currently there is an issue with 7.0 concerning
the deep learning frameworks which run on that API [2]. I've experienced some hangs running Theano on the
GPU and it most likely has something to do with that.

Anyway, for running in the GPU you need the NVIDIA driver installed and your graphics adapter unoccupied
by any graphic display tasks, that means running your X-Server on the CPU, or another GPU device.
Please regard that if you're using Bumblebee for an Optimus graphics card on a notebook, Theano has to
be run using the `optirun` command (Bumblebee switches off the GPU if it isn't used).

I've had the error message "modprobe: FATAL: Module nvidia-uvm not found.a/dnn", which has been solved
with "alias nvidia-uvm nvidia-current-uvm" in /etc/modprobe.d/nvidia.conf. But that was last year, maybe
this isn't a problem anymore (the NVIDIA driver got updated in the meanwhile).

With a running NVIDIA driver, the attached test script could be run like this:
<cut>
$ THEANO_FLAGS=floatX=32,device=cpu python ./test.py 
<cut>
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 4.469798 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
  1.62323284]
Used the cpu
$ THEANO_FLAGS=floatX=32,device=gpu python ./test.py 
Using gpu device 0: GeForce 940M
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 1.049460 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu
</cut>

I'll have the package in Unstable soon. Please enjoy and happy number crunching! Any feedback very much
welcome.

Thanks,
DS

[1] https://tracker.debian.org/pkg/theano

[2] https://bugs.debian.org/576540 (ITP: theano)

-- 
4096R/DF5182C8
http://www.danielstender.com/blog/
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in xrange(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')
exit(0)

Reply to: