Bug#723587: release.debian.org: Non-free file in PyOpenCL - new version upload to stable and oldstable

To: submit@bugs.debian.org
Subject: Bug#723587: release.debian.org: Non-free file in PyOpenCL - new version upload to stable and oldstable
From: Tomasz Rybak <tomasz.rybak@post.pl>
Date: Tue, 17 Sep 2013 18:26:26 +0200
Message-id: <1379435186.9450.2.camel@rogue.dyndns.info>
Reply-to: Tomasz Rybak <tomasz.rybak@post.pl>, 723587@bugs.debian.org

Package: release.debian.org
Severity: important

Hello.
I received bug report that my package python-pyopencl-doc contains
non-free file in examples (#722014). Licence reads:
"Copyright NVIDIA. ALl rights reserved.
NVIDIA Corporation and its licensors retain all intellectual property
and
proprietary rights in and to this software and related documentation.
Any use, reproduction, disclosure, or distribution of this software
and related documentation without an express license agreement from
NVIDIA Corporation is strictly prohibited."

I have checked and non-free file examples/matrix-multiply.py
exists in all versions of PyOpenCL - in oldstable, stable, testing,
and unstable. Upstream has been notified and removed this file
from archive, and  I'll soon upload fixed version to unstable.

After asking on debian-mentors I was advised that I should also
upload DFSG-free versions of packages to stable and oldstable:
http://lists.debian.org/debian-mentors/2013/09/msg00104.html
I attach debdiffs of proposed changes - please review those.
I am DM, not DD, so I will need sponsor for those uploads.

Best regards.

-- System Information:
Debian Release: jessie/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.10-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=pl_PL.UTF-8, LC_CTYPE=pl_PL.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

-- 
Tomasz Rybak  GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak

diff -Nru pyopencl-0.92/debian/changelog pyopencl-0.92.dfsg/debian/changelog
--- pyopencl-0.92/debian/changelog	2010-11-11 23:10:57.000000000 +0100
+++ pyopencl-0.92.dfsg/debian/changelog	2013-09-17 17:12:20.000000000 +0200
@@ -1,3 +1,9 @@
+pyopencl (0.92.dfsg-1) oldstable; urgency=low
+
+  * Remove non-free file from examples (#722014).
+
+ -- Tomasz Rybak <tomasz.rybak@post.pl>  Tue, 17 Sep 2013 17:03:46 +0200
+
 pyopencl (0.92-1) unstable; urgency=high
 
   * New upstream release
diff -Nru pyopencl-0.92/debian/patches/python-versions.patch pyopencl-0.92.dfsg/debian/patches/python-versions.patch
--- pyopencl-0.92/debian/patches/python-versions.patch	2010-11-11 13:30:27.000000000 +0100
+++ pyopencl-0.92.dfsg/debian/patches/python-versions.patch	2013-09-17 17:12:20.000000000 +0200
@@ -4,11 +4,11 @@
  Python version and include it into name of linked library.
 Forwarded: not-needed
 Author: Tomasz Rybak <bogomips@post.pl>
-Last-Update: 2010-11-11
-Index: pyopencl-0.92/setup.py
+Last-Update: 2010-06-19
+Index: pyopencl-0.92~beta+git20100618/setup.py
 ===================================================================
---- pyopencl-0.92.orig/setup.py	2010-11-11 13:22:42.000000000 +0100
-+++ pyopencl-0.92/setup.py	2010-11-11 13:22:55.000000000 +0100
+--- pyopencl-0.92~beta+git20100618.orig/setup.py	2010-06-19 20:30:40.000000000 +0200
++++ pyopencl-0.92~beta+git20100618/setup.py	2010-06-19 20:31:07.000000000 +0200
 @@ -42,6 +42,7 @@
  
  def main():
@@ -17,7 +17,7 @@
      from aksetup_helper import (hack_distutils, get_config, setup,
              NumpyExtension, set_up_shipped_boost_if_requested)
  
-@@ -51,6 +52,7 @@
+@@ -53,6 +54,7 @@
      EXTRA_OBJECTS, EXTRA_DEFINES = set_up_shipped_boost_if_requested(conf)
  
      LIBRARY_DIRS = conf["BOOST_LIB_DIR"]
diff -Nru pyopencl-0.92/debian/patches/replace-setuptools.patch pyopencl-0.92.dfsg/debian/patches/replace-setuptools.patch
--- pyopencl-0.92/debian/patches/replace-setuptools.patch	2010-11-11 13:30:27.000000000 +0100
+++ pyopencl-0.92.dfsg/debian/patches/replace-setuptools.patch	2013-09-17 17:12:20.000000000 +0200
@@ -3,11 +3,11 @@
  contains all necessary files and we avoid network traffic from build machine.
 Forwarded: not-needed
 Author: Tomasz Rybak <bogomips@post.pl>
-Last-Update: 2010-11-11
-Index: pyopencl-0.92/MANIFEST.in
+Last-Update: 2010-06-02
+Index: pyopencl-0.92~beta+git20100806/MANIFEST.in
 ===================================================================
---- pyopencl-0.92.orig/MANIFEST.in	2010-11-11 13:24:53.000000000 +0100
-+++ pyopencl-0.92/MANIFEST.in	2010-11-11 13:25:37.000000000 +0100
+--- pyopencl-0.92~beta+git20100806.orig/MANIFEST.in	2010-08-07 13:49:58.000000000 +0200
++++ pyopencl-0.92~beta+git20100806/MANIFEST.in	2010-08-07 15:13:27.000000000 +0200
 @@ -7,7 +7,6 @@
  include doc/*.py
  include doc/source/conf.py
@@ -16,10 +16,10 @@
  include configure.py
  include Makefile.in
  include aksetup_helper.py
-Index: pyopencl-0.92/aksetup_helper.py
+Index: pyopencl-0.92~beta+git20100806/aksetup_helper.py
 ===================================================================
---- pyopencl-0.92.orig/aksetup_helper.py	2010-11-11 13:25:02.000000000 +0100
-+++ pyopencl-0.92/aksetup_helper.py	2010-11-11 13:25:37.000000000 +0100
+--- pyopencl-0.92~beta+git20100806.orig/aksetup_helper.py	2010-08-07 13:49:58.000000000 +0200
++++ pyopencl-0.92~beta+git20100806/aksetup_helper.py	2010-08-07 15:13:52.000000000 +0200
 @@ -1,35 +1,6 @@
 -# dealings with ez_setup ------------------------------------------------------
 -import distribute_setup
@@ -90,8 +90,7 @@
      return schema.read_config()
  
  
-@@ -516,34 +460,7 @@
-     import sys
+@@ -517,34 +461,7 @@
  
      if conf["USE_SHIPPED_BOOST"]:
 -        if not exists("bpl-subset/bpl_subset/boost/version.hpp"):
@@ -112,7 +111,8 @@
 -            print("'USE_SHIPPED_BOOST' off.")
 -            print("------------------------------------------------------------------------")
 -            conf["USE_SHIPPED_BOOST"] = False
--
++       conf["USE_SHIPPED_BOOST"] = False
+ 
 -            delay = 10
 -
 -            from time import sleep
@@ -122,7 +122,7 @@
 -                sys.stdout.flush()
 -                delay -= 1
 -                sleep(1)
-+       conf["USE_SHIPPED_BOOST"] = False
- 
+-
      if conf["USE_SHIPPED_BOOST"]:
          conf["BOOST_INC_DIR"] = ["bpl-subset/bpl_subset"]
+         conf["BOOST_LIB_DIR"] = []
diff -Nru pyopencl-0.92/debian/rules pyopencl-0.92.dfsg/debian/rules
--- pyopencl-0.92/debian/rules	2010-11-11 13:30:27.000000000 +0100
+++ pyopencl-0.92.dfsg/debian/rules	2013-09-17 17:12:20.000000000 +0200
@@ -36,6 +36,7 @@
 	git clone $(GIT_URL) $(MODULE_NAME)-$(DEB_UPSTREAM_VERSION)
 	cd $(MODULE_NAME)-$(DEB_UPSTREAM_VERSION) && git checkout $(GIT_REVISION)
 	rm -rf $(MODULE_NAME)-$(DEB_UPSTREAM_VERSION)/.git $(MODULE_NAME)-$(DEB_UPSTREAM_VERSION)/.gitignore $(MODULE_NAME)-$(DEB_UPSTREAM_VERSION)/.gitmodules
+	rm -rf $(MODULE_NAME)-$(DEB_UPSTREAM_VERSION)/examples/matrix-multiply.py
 	tar czf $(MODULE_NAME)_$(DEB_UPSTREAM_VERSION).orig.tar.gz $(MODULE_NAME)-$(DEB_UPSTREAM_VERSION)
 	rm -rf $(MODULE_NAME)-$(DEB_UPSTREAM_VERSION)
 
diff -Nru pyopencl-0.92/examples/matrix-multiply.py pyopencl-0.92.dfsg/examples/matrix-multiply.py
--- pyopencl-0.92/examples/matrix-multiply.py	2010-10-21 19:10:19.000000000 +0200
+++ pyopencl-0.92.dfsg/examples/matrix-multiply.py	1970-01-01 01:00:00.000000000 +0100
@@ -1,241 +0,0 @@
-# example provided by Eilif Muller
-
-from __future__ import division
-
-KERNEL_CODE = """
-
-// Thread block size
-#define BLOCK_SIZE %(block_size)d
-
-// Matrix dimensions
-// (chosen as multiples of the thread block size for simplicity)
-#define WA %(w_a)d // Matrix A width
-#define HA %(h_a)d // Matrix A height
-#define WB %(w_b)d // Matrix B width
-#define HB WA  // Matrix B height
-#define WC WB  // Matrix C width
-#define HC HA  // Matrix C height
-
-
-/*
- * Copyright 1993-2009 NVIDIA Corporation.  All rights reserved.
- *
- * NVIDIA Corporation and its licensors retain all intellectual property and
- * proprietary rights in and to this software and related documentation.
- * Any use, reproduction, disclosure, or distribution of this software
- * and related documentation without an express license agreement from
- * NVIDIA Corporation is strictly prohibited.
- *
- * Please refer to the applicable NVIDIA end user license agreement (EULA)
- * associated with this source code for terms and conditions that govern
- * your use of this NVIDIA software.
- *
- */
-
-/* Matrix multiplication: C = A * B.
- * Device code.
- */
-
-#define AS(j, i) As[i + j * BLOCK_SIZE]
-#define BS(j, i) Bs[i + j * BLOCK_SIZE]
-
-////////////////////////////////////////////////////////////////////////////////
-//! Matrix multiplication on the device: C = A * B
-//! WA is A's width and WB is B's width
-////////////////////////////////////////////////////////////////////////////////
-__kernel __attribute__((reqd_work_group_size(BLOCK_SIZE,BLOCK_SIZE,1))) 
-void
-matrixMul( __global float* C, __global float* A, __global float* B)
-{
-    __local float As[BLOCK_SIZE*BLOCK_SIZE];
-    __local float Bs[BLOCK_SIZE*BLOCK_SIZE];
-
-    // Block index
-    int bx = get_group_id(0);
-    int by = get_group_id(1);
-
-    // Thread index
-    int tx = get_local_id(0);
-    int ty = get_local_id(1);
-
-    // Index of the first sub-matrix of A processed by the block
-    int aBegin = WA * BLOCK_SIZE * by;
-
-    // Index of the last sub-matrix of A processed by the block
-    int aEnd   = aBegin + WA - 1;
-
-    // Step size used to iterate through the sub-matrices of A
-    int aStep  = BLOCK_SIZE;
-
-    // Index of the first sub-matrix of B processed by the block
-    int bBegin = BLOCK_SIZE * bx;
-
-    // Step size used to iterate through the sub-matrices of B
-    int bStep  = BLOCK_SIZE * WB;
-
-    // Csub is used to store the element of the block sub-matrix
-    // that is computed by the thread
-    float Csub = 0.0f;
-
-    // Loop over all the sub-matrices of A and B
-    // required to compute the block sub-matrix
-    for (int a = aBegin, b = bBegin;
-             a <= aEnd;
-             a += aStep, b += bStep) {
-
-        // Load the matrices from device memory
-        // to shared memory; each thread loads
-        // one element of each matrix
-        AS(ty, tx) = A[a + WA * ty + tx];
-        BS(ty, tx) = B[b + WB * ty + tx];
-
-        // Synchronize to make sure the matrices are loaded
-        barrier(CLK_LOCAL_MEM_FENCE);
-
-        // Multiply the two matrices together;
-        // each thread computes one element
-        // of the block sub-matrix
-        for (int k = 0; k < BLOCK_SIZE; ++k)
-            Csub += AS(ty, k) * BS(k, tx);
-
-        // Synchronize to make sure that the preceding
-        // computation is done before loading two new
-        // sub-matrices of A and B in the next iteration
-        barrier(CLK_LOCAL_MEM_FENCE);
-    }
-
-    // Write the block sub-matrix to device memory;
-    // each thread writes one element
-    C[get_global_id(1) * get_global_size(0) + get_global_id(0)] = Csub;
-
-}
-
-"""
-
-import pyopencl as cl
-from time import time
-import numpy
-
-block_size = 16
-
-ctx = cl.create_some_context()
-
-for dev in ctx.devices:
-    assert dev.local_mem_size > 0
-
-queue = cl.CommandQueue(ctx,
-        properties=cl.command_queue_properties.PROFILING_ENABLE)
-
-#queue = cl.CommandQueue(ctx)
-
-if False:
-    a_height = 4096
-    #a_height = 1024
-    a_width = 2048
-    #a_width = 256
-    #b_height == a_width
-    b_width = a_height
-
-elif False:
-    # like PyCUDA
-    a_height = 2516
-    a_width = 1472
-    b_height = a_width
-    b_width = 2144
-
-else:
-    # CL SDK
-    a_width = 50*block_size
-    a_height = 100*block_size
-    b_width = 50*block_size
-    b_height = a_width
-
-c_width = b_width
-c_height = a_height
-
-h_a = numpy.random.rand(a_height, a_width).astype(numpy.float32)
-h_b = numpy.random.rand(b_height, b_width).astype(numpy.float32)
-h_c = numpy.empty((c_height, c_width)).astype(numpy.float32)
-
-
-kernel_params = {"block_size": block_size,
-        "w_a":a_width, "h_a":a_height, "w_b":b_width}
-
-if "NVIDIA" in queue.device.vendor:
-    options = "-cl-mad-enable -cl-fast-relaxed-math"
-else:
-    options = None
-prg = cl.Program(ctx, KERNEL_CODE % kernel_params,
-        ).build(options=options)
-kernel = prg.matrixMul
-#print prg.binaries[0]
-
-assert a_width % block_size == 0
-assert a_height % block_size == 0
-assert b_width % block_size == 0
-
-# transfer host -> device -----------------------------------------------------
-mf = cl.mem_flags
-
-t1 = time()
-
-d_a_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=h_a)
-d_b_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=h_b)
-d_c_buf = cl.Buffer(ctx, mf.WRITE_ONLY, size=h_c.nbytes)
-
-push_time = time()-t1
-
-# warmup ----------------------------------------------------------------------
-for i in range(5):
-    event = kernel(queue, h_c.shape, (block_size, block_size), 
-            d_c_buf, d_a_buf, d_b_buf)
-    event.wait()
-
-queue.finish()
-
-# actual benchmark ------------------------------------------------------------
-t1 = time()
-
-count = 20
-for i in range(count):
-    event = kernel(queue, h_c.shape, (block_size, block_size),
-            d_c_buf, d_a_buf, d_b_buf)
-
-event.wait()
-
-gpu_time = (time()-t1)/count
-
-# transfer device -> host -----------------------------------------------------
-t1 = time()
-cl.enqueue_read_buffer(queue, d_c_buf, h_c).wait()
-pull_time = time()-t1
-
-# timing output ---------------------------------------------------------------
-gpu_total_time = gpu_time+push_time+pull_time
-
-print "GPU push+compute+pull total [s]:", gpu_total_time
-print "GPU push [s]:", push_time
-print "GPU pull [s]:", pull_time
-print "GPU compute (host-timed) [s]:", gpu_time
-print "GPU compute (event-timed) [s]: ", (event.profile.end-event.profile.start)*1e-9
-
-gflop = h_c.size * (a_width * 2.) / (1000**3.)
-gflops = gflop / gpu_time
-
-print
-print "GFlops/s:", gflops
-
-# cpu comparison --------------------------------------------------------------
-t1 = time()
-h_c_cpu = numpy.dot(h_a,h_b)
-cpu_time = time()-t1
-
-print
-print "GPU==CPU:",numpy.allclose(h_c, h_c_cpu)
-print
-print "CPU time (s)", cpu_time
-print
-
-print "GPU speedup (with transfer): ", cpu_time/gpu_total_time
-print "GPU speedup (without transfer): ", cpu_time/gpu_time
-

diff -Nru pyopencl-2012.1/debian/changelog pyopencl-2012.1.dfsg/debian/changelog
--- pyopencl-2012.1/debian/changelog	2012-06-21 22:22:52.000000000 +0200
+++ pyopencl-2012.1.dfsg/debian/changelog	2013-09-17 17:29:52.000000000 +0200
@@ -1,3 +1,9 @@
+pyopencl (2012.1.dfsg-1) stable; urgency=low
+
+  * Remove non-free file from examples (#722014).
+
+ -- Tomasz Rybak <tomasz.rybak@post.pl>  Tue, 17 Sep 2013 17:25:16 +0200
+
 pyopencl (2012.1-1) unstable; urgency=low
 
   * New upstream release.
diff -Nru pyopencl-2012.1/debian/rules pyopencl-2012.1.dfsg/debian/rules
--- pyopencl-2012.1/debian/rules	2012-06-21 22:22:52.000000000 +0200
+++ pyopencl-2012.1.dfsg/debian/rules	2013-09-17 17:29:52.000000000 +0200
@@ -48,6 +48,7 @@
 	rm -rf $(MODULE_NAME)-$(DEB_UPSTREAM_VERSION)/.git $(MODULE_NAME)-$(DEB_UPSTREAM_VERSION)/.gitignore $(MODULE_NAME)-$(DEB_UPSTREAM_VERSION)/.gitmodules
 	rm -rf $(MODULE_NAME)-$(DEB_UPSTREAM_VERSION)/doc/source/.gitignore
 	rm -rf $(MODULE_NAME)-$(DEB_UPSTREAM_VERSION)/$(GIT_SUBMODULES)/.git $(MODULE_NAME)-$(DEB_UPSTREAM_VERSION)/$(GIT_SUBMODULES)/.gitignore
+	rm -rf $(MODULE_NAME)-$(DEB_UPSTREAM_VERSION)/examples/matrix-multiply.py
 	tar czf $(MODULE_NAME)_$(DEB_UPSTREAM_VERSION).orig.tar.gz $(MODULE_NAME)-$(DEB_UPSTREAM_VERSION)
 	rm -rf $(MODULE_NAME)-$(DEB_UPSTREAM_VERSION)
 
diff -Nru pyopencl-2012.1/examples/matrix-multiply.py pyopencl-2012.1.dfsg/examples/matrix-multiply.py
--- pyopencl-2012.1/examples/matrix-multiply.py	2012-06-21 22:22:19.000000000 +0200
+++ pyopencl-2012.1.dfsg/examples/matrix-multiply.py	1970-01-01 01:00:00.000000000 +0100
@@ -1,241 +0,0 @@
-# example provided by Eilif Muller
-
-from __future__ import division
-
-KERNEL_CODE = """
-
-// Thread block size
-#define BLOCK_SIZE %(block_size)d
-
-// Matrix dimensions
-// (chosen as multiples of the thread block size for simplicity)
-#define WA %(w_a)d // Matrix A width
-#define HA %(h_a)d // Matrix A height
-#define WB %(w_b)d // Matrix B width
-#define HB WA  // Matrix B height
-#define WC WB  // Matrix C width
-#define HC HA  // Matrix C height
-
-
-/*
- * Copyright 1993-2009 NVIDIA Corporation.  All rights reserved.
- *
- * NVIDIA Corporation and its licensors retain all intellectual property and
- * proprietary rights in and to this software and related documentation.
- * Any use, reproduction, disclosure, or distribution of this software
- * and related documentation without an express license agreement from
- * NVIDIA Corporation is strictly prohibited.
- *
- * Please refer to the applicable NVIDIA end user license agreement (EULA)
- * associated with this source code for terms and conditions that govern
- * your use of this NVIDIA software.
- *
- */
-
-/* Matrix multiplication: C = A * B.
- * Device code.
- */
-
-#define AS(j, i) As[i + j * BLOCK_SIZE]
-#define BS(j, i) Bs[i + j * BLOCK_SIZE]
-
-////////////////////////////////////////////////////////////////////////////////
-//! Matrix multiplication on the device: C = A * B
-//! WA is A's width and WB is B's width
-////////////////////////////////////////////////////////////////////////////////
-__kernel __attribute__((reqd_work_group_size(BLOCK_SIZE,BLOCK_SIZE,1))) 
-void
-matrixMul( __global float* C, __global float* A, __global float* B)
-{
-    __local float As[BLOCK_SIZE*BLOCK_SIZE];
-    __local float Bs[BLOCK_SIZE*BLOCK_SIZE];
-
-    // Block index
-    int bx = get_group_id(0);
-    int by = get_group_id(1);
-
-    // Thread index
-    int tx = get_local_id(0);
-    int ty = get_local_id(1);
-
-    // Index of the first sub-matrix of A processed by the block
-    int aBegin = WA * BLOCK_SIZE * by;
-
-    // Index of the last sub-matrix of A processed by the block
-    int aEnd   = aBegin + WA - 1;
-
-    // Step size used to iterate through the sub-matrices of A
-    int aStep  = BLOCK_SIZE;
-
-    // Index of the first sub-matrix of B processed by the block
-    int bBegin = BLOCK_SIZE * bx;
-
-    // Step size used to iterate through the sub-matrices of B
-    int bStep  = BLOCK_SIZE * WB;
-
-    // Csub is used to store the element of the block sub-matrix
-    // that is computed by the thread
-    float Csub = 0.0f;
-
-    // Loop over all the sub-matrices of A and B
-    // required to compute the block sub-matrix
-    for (int a = aBegin, b = bBegin;
-             a <= aEnd;
-             a += aStep, b += bStep) {
-
-        // Load the matrices from device memory
-        // to shared memory; each thread loads
-        // one element of each matrix
-        AS(ty, tx) = A[a + WA * ty + tx];
-        BS(ty, tx) = B[b + WB * ty + tx];
-
-        // Synchronize to make sure the matrices are loaded
-        barrier(CLK_LOCAL_MEM_FENCE);
-
-        // Multiply the two matrices together;
-        // each thread computes one element
-        // of the block sub-matrix
-        for (int k = 0; k < BLOCK_SIZE; ++k)
-            Csub += AS(ty, k) * BS(k, tx);
-
-        // Synchronize to make sure that the preceding
-        // computation is done before loading two new
-        // sub-matrices of A and B in the next iteration
-        barrier(CLK_LOCAL_MEM_FENCE);
-    }
-
-    // Write the block sub-matrix to device memory;
-    // each thread writes one element
-    C[get_global_id(1) * get_global_size(0) + get_global_id(0)] = Csub;
-
-}
-
-"""
-
-import pyopencl as cl
-from time import time
-import numpy
-
-block_size = 16
-
-ctx = cl.create_some_context()
-
-for dev in ctx.devices:
-    assert dev.local_mem_size > 0
-
-queue = cl.CommandQueue(ctx,
-        properties=cl.command_queue_properties.PROFILING_ENABLE)
-
-#queue = cl.CommandQueue(ctx)
-
-if False:
-    a_height = 4096
-    #a_height = 1024
-    a_width = 2048
-    #a_width = 256
-    #b_height == a_width
-    b_width = a_height
-
-elif False:
-    # like PyCUDA
-    a_height = 2516
-    a_width = 1472
-    b_height = a_width
-    b_width = 2144
-
-else:
-    # CL SDK
-    a_width = 50*block_size
-    a_height = 100*block_size
-    b_width = 50*block_size
-    b_height = a_width
-
-c_width = b_width
-c_height = a_height
-
-h_a = numpy.random.rand(a_height, a_width).astype(numpy.float32)
-h_b = numpy.random.rand(b_height, b_width).astype(numpy.float32)
-h_c = numpy.empty((c_height, c_width)).astype(numpy.float32)
-
-
-kernel_params = {"block_size": block_size,
-        "w_a":a_width, "h_a":a_height, "w_b":b_width}
-
-if "NVIDIA" in queue.device.vendor:
-    options = "-cl-mad-enable -cl-fast-relaxed-math"
-else:
-    options = ""
-prg = cl.Program(ctx, KERNEL_CODE % kernel_params,
-        ).build(options=options)
-kernel = prg.matrixMul
-#print prg.binaries[0]
-
-assert a_width % block_size == 0
-assert a_height % block_size == 0
-assert b_width % block_size == 0
-
-# transfer host -> device -----------------------------------------------------
-mf = cl.mem_flags
-
-t1 = time()
-
-d_a_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=h_a)
-d_b_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=h_b)
-d_c_buf = cl.Buffer(ctx, mf.WRITE_ONLY, size=h_c.nbytes)
-
-push_time = time()-t1
-
-# warmup ----------------------------------------------------------------------
-for i in range(5):
-    event = kernel(queue, h_c.shape[::-1], (block_size, block_size), 
-            d_c_buf, d_a_buf, d_b_buf)
-    event.wait()
-
-queue.finish()
-
-# actual benchmark ------------------------------------------------------------
-t1 = time()
-
-count = 20
-for i in range(count):
-    event = kernel(queue, h_c.shape[::-1], (block_size, block_size),
-            d_c_buf, d_a_buf, d_b_buf)
-
-event.wait()
-
-gpu_time = (time()-t1)/count
-
-# transfer device -> host -----------------------------------------------------
-t1 = time()
-cl.enqueue_copy(queue, h_c, d_c_buf)
-pull_time = time()-t1
-
-# timing output ---------------------------------------------------------------
-gpu_total_time = gpu_time+push_time+pull_time
-
-print "GPU push+compute+pull total [s]:", gpu_total_time
-print "GPU push [s]:", push_time
-print "GPU pull [s]:", pull_time
-print "GPU compute (host-timed) [s]:", gpu_time
-print "GPU compute (event-timed) [s]: ", (event.profile.end-event.profile.start)*1e-9
-
-gflop = h_c.size * (a_width * 2.) / (1000**3.)
-gflops = gflop / gpu_time
-
-print
-print "GFlops/s:", gflops
-
-# cpu comparison --------------------------------------------------------------
-t1 = time()
-h_c_cpu = numpy.dot(h_a,h_b)
-cpu_time = time()-t1
-
-print
-print "GPU==CPU:",numpy.allclose(h_c, h_c_cpu)
-print
-print "CPU time (s)", cpu_time
-print
-
-print "GPU speedup (with transfer): ", cpu_time/gpu_total_time
-print "GPU speedup (without transfer): ", cpu_time/gpu_time
-

Attachment: signature.asc
Description: This is a digitally signed message part

Reply to:

Follow-Ups:
- Bug#723587: release.debian.org: Non-free file in PyOpenCL - new version upload to stable and oldstable
  - From: Cyril Brulebois <kibi@debian.org>
- Processed: Re: Bug#723587: release.debian.org: Non-free file in PyOpenCL - new version upload to stable and oldstable
  - From: owner@bugs.debian.org (Debian Bug Tracking System)
- Processed: Re: Bug#723587: release.debian.org: Non-free file in PyOpenCL - new version upload to stable and oldstable
  - From: owner@bugs.debian.org (Debian Bug Tracking System)
- Processed: Re: Bug#723587: release.debian.org: Non-free file in PyOpenCL - new version upload to stable and oldstable
  - From: owner@bugs.debian.org (Debian Bug Tracking System)
- Bug#723587: marked as done (pu: Non-free file in PyOpenCL)
  - From: owner@bugs.debian.org (Debian Bug Tracking System)

Prev by Date: Bug#723569: transition: evolution-data-server 3.8 removal candidates
Next by Date: Proposed addition to RC policy (release upgrades)
Previous by thread: Bug#723569: transition: evolution-data-server 3.8 removal candidates
Next by thread: Bug#723587: release.debian.org: Non-free file in PyOpenCL - new version upload to stable and oldstable
Index(es):
- Date
- Thread