Message boards :
Graphics cards (GPUs) :
Work units failing in 64-bit Linux
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 20 Nov 08 Posts: 3 Credit: 2,670,125 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Work units are failing on 64-bit linux. This is a research unit that has proven itself to be very stable, but is idle at the moment. Thought I'd try to take advantage of the idle time. See, for example, http://www.gpugrid.net/result.php?resultid=1291252 The driver version is cudadriver_2.3_linux_64_190.16 Info about the setup: CUDA Device Query (Runtime API) version (CUDART static linking) There are 4 devices supporting CUDA Device 0: "Tesla C1060" CUDA Driver Version: 2.30 CUDA Runtime Version: 2.30 CUDA Capability Major revision number: 1 CUDA Capability Minor revision number: 3 Total amount of global memory: 4294705152 bytes Number of multiprocessors: 30 Number of cores: 240 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 16384 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 1.30 GHz Concurrent copy and execution: Yes Run time limit on kernels: No Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously) Device 1: "Tesla C1060" CUDA Driver Version: 2.30 CUDA Runtime Version: 2.30 CUDA Capability Major revision number: 1 CUDA Capability Minor revision number: 3 Total amount of global memory: 4294705152 bytes Number of multiprocessors: 30 Number of cores: 240 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 16384 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 1.30 GHz Concurrent copy and execution: Yes Run time limit on kernels: No Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously) Device 2: "Tesla C1060" CUDA Driver Version: 2.30 CUDA Runtime Version: 2.30 CUDA Capability Major revision number: 1 CUDA Capability Minor revision number: 3 Total amount of global memory: 4294705152 bytes Number of multiprocessors: 30 Number of cores: 240 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 16384 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 1.30 GHz Concurrent copy and execution: Yes Run time limit on kernels: No Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously) Device 3: "Tesla C1060" CUDA Driver Version: 2.30 CUDA Runtime Version: 2.30 CUDA Capability Major revision number: 1 CUDA Capability Minor revision number: 3 Total amount of global memory: 4294705152 bytes Number of multiprocessors: 30 Number of cores: 240 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 16384 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 1.30 GHz Concurrent copy and execution: Yes Run time limit on kernels: No Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously) Test PASSED |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Do you always fail or just sometime? The error type that you get is usually given by too hot GPUs. Being your GPUs a tesla I am quite surprised. gdf |
|
Send message Joined: 20 Nov 08 Posts: 3 Credit: 2,670,125 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
The first eight work units failed, so I suspended it at that point. It's definitely not overheating. It's housed in 1u rack unit in a frigid, air conditioned room. I wonder whether it has something to do with # Total amount of global memory: -262144 bytes |
|
Send message Joined: 20 Nov 08 Posts: 3 Credit: 2,670,125 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Found the problem. The Einstein@Home beta app was the culprit. I had left something around that GPUGrid didn't like. Unloading and reloading the kernel module fixed the problem. |
©2025 Universitat Pompeu Fabra