Work units failing in 64-bit Linux

Author	Message
Greg Send message Joined: 20 Nov 08 Posts: 3 Credit: 2,670,125 RAC: 0 Level Scientific publications	Message 12638 - Posted: 22 Sep 2009, 23:38:44 UTC Last modified: 22 Sep 2009, 23:41:30 UTC Work units are failing on 64-bit linux. This is a research unit that has proven itself to be very stable, but is idle at the moment. Thought I'd try to take advantage of the idle time. See, for example, http://www.gpugrid.net/result.php?resultid=1291252 The driver version is cudadriver_2.3_linux_64_190.16 Info about the setup: CUDA Device Query (Runtime API) version (CUDART static linking) There are 4 devices supporting CUDA Device 0: "Tesla C1060" CUDA Driver Version: 2.30 CUDA Runtime Version: 2.30 CUDA Capability Major revision number: 1 CUDA Capability Minor revision number: 3 Total amount of global memory: 4294705152 bytes Number of multiprocessors: 30 Number of cores: 240 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 16384 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 1.30 GHz Concurrent copy and execution: Yes Run time limit on kernels: No Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously) Device 1: "Tesla C1060" CUDA Driver Version: 2.30 CUDA Runtime Version: 2.30 CUDA Capability Major revision number: 1 CUDA Capability Minor revision number: 3 Total amount of global memory: 4294705152 bytes Number of multiprocessors: 30 Number of cores: 240 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 16384 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 1.30 GHz Concurrent copy and execution: Yes Run time limit on kernels: No Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously) Device 2: "Tesla C1060" CUDA Driver Version: 2.30 CUDA Runtime Version: 2.30 CUDA Capability Major revision number: 1 CUDA Capability Minor revision number: 3 Total amount of global memory: 4294705152 bytes Number of multiprocessors: 30 Number of cores: 240 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 16384 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 1.30 GHz Concurrent copy and execution: Yes Run time limit on kernels: No Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously) Device 3: "Tesla C1060" CUDA Driver Version: 2.30 CUDA Runtime Version: 2.30 CUDA Capability Major revision number: 1 CUDA Capability Minor revision number: 3 Total amount of global memory: 4294705152 bytes Number of multiprocessors: 30 Number of cores: 240 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 16384 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 1.30 GHz Concurrent copy and execution: Yes Run time limit on kernels: No Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously) Test PASSED ID: 12638 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 12668 - Posted: 23 Sep 2009, 7:33:57 UTC - in response to Message 12638. Do you always fail or just sometime? The error type that you get is usually given by too hot GPUs. Being your GPUs a tesla I am quite surprised. gdf ID: 12668 · Rating: 0 · rate: / Reply Quote

Greg Send message Joined: 20 Nov 08 Posts: 3 Credit: 2,670,125 RAC: 0 Level Scientific publications	Message 12696 - Posted: 24 Sep 2009, 0:55:04 UTC - in response to Message 12668. Last modified: 24 Sep 2009, 1:01:40 UTC The first eight work units failed, so I suspended it at that point. It's definitely not overheating. It's housed in 1u rack unit in a frigid, air conditioned room. I wonder whether it has something to do with # Total amount of global memory: -262144 bytes ID: 12696 · Rating: 0 · rate: / Reply Quote

Greg Send message Joined: 20 Nov 08 Posts: 3 Credit: 2,670,125 RAC: 0 Level Scientific publications	Message 12697 - Posted: 24 Sep 2009, 1:24:13 UTC - in response to Message 12696. Found the problem. The Einstein@Home beta app was the culprit. I had left something around that GPUGrid didn't like. Unloading and reloading the kernel module fixed the problem. ID: 12697 · Rating: 0 · rate: / Reply Quote