Message boards :
Number crunching :
Failed again
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 27 Dec 08 Posts: 4 Credit: 6,055,773 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
<core_client_version>6.6.38</core_client_version> <![CDATA[ <message> Unzul�ssige Funktion. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # Using CUDA device 0 # There is 1 device supporting CUDA # Device 0: "GeForce 9800 GT" # Clock rate: 1.65 GHz # Total amount of global memory: 1073741824 bytes # Number of multiprocessors: 14 # Number of cores: 112 MDIO ERROR: cannot open file "restart.coor" # Using CUDA device 0 # There is 1 device supporting CUDA # Device 0: "GeForce 9800 GT" # Clock rate: 1.65 GHz # Total amount of global memory: 1073741824 bytes # Number of multiprocessors: 14 # Number of cores: 112 Cuda error: Kernel [pme_fill_charges_accumulate] failed in file 'fillcharges.cu' in line 73 : unknown error. </stderr_txt> ? |
|
Send message Joined: 27 Dec 08 Posts: 4 Credit: 6,055,773 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Another failure: <core_client_version>6.6.38</core_client_version> <![CDATA[ <message> Unzul�ssige Funktion. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # Using CUDA device 0 # There is 1 device supporting CUDA # Device 0: "GeForce 9800 GT" # Clock rate: 1.65 GHz # Total amount of global memory: 1073741824 bytes # Number of multiprocessors: 14 # Number of cores: 112 MDIO ERROR: cannot open file "restart.coor" # Using CUDA device 0 # There is 1 device supporting CUDA # Device 0: "GeForce 9800 GT" # Clock rate: 1.65 GHz # Total amount of global memory: 1073741824 bytes # Number of multiprocessors: 14 # Number of cores: 112 Cuda error: Kernel [pme_fill_charges_accumulate] failed in file 'fillcharges.cu' in line 73 : unknown error. </stderr_txt> ]]> |
DamaralandSend message Joined: 7 Nov 09 Posts: 152 Credit: 16,181,924 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I keep having this error on every unit. What should I do? <core_client_version>6.10.18</core_client_version> <![CDATA[ <message> Funci�n incorrecta. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # Using CUDA device 0 # There are 2 devices supporting CUDA # Device 0: "GeForce GTX 260" # Clock rate: 1.41 GHz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 27 # Number of cores: 216 # Device 1: "GeForce 9800 GT" # Clock rate: 1.37 GHz # Total amount of global memory: 1073545216 bytes # Number of multiprocessors: 14 # Number of cores: 112 MDIO ERROR: cannot open file "restart.coor" </stderr_txt> ]]> <core_client_version>6.10.18</core_client_version> <![CDATA[ <message> Funci�n incorrecta. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # Using CUDA device 0 # There are 2 devices supporting CUDA # Device 0: "GeForce GTX 260" # Clock rate: 1.41 GHz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 27 # Number of cores: 216 # Device 1: "GeForce 9800 GT" # Clock rate: 1.37 GHz # Total amount of global memory: 1073545216 bytes # Number of multiprocessors: 14 # Number of cores: 112 MDIO ERROR: cannot open file "restart.coor" Cuda error: Kernel [shake_step_1] failed in file 'shake.cu' in line 79 : unspecified launch failure. </stderr_txt> ]]> |
DamaralandSend message Joined: 7 Nov 09 Posts: 152 Credit: 16,181,924 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
More and more... task 1629422 task 1629408 task 1629383 task 1629337 task 1629322 Tonight I suspend proyect until further news. If admins what me to report any further information, I'll gladly help. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
You should do this, Install and run GPU-Z. It is freeware and will allow you to see the temperatures of the GPUS. If they are over 70degrees when running GPUGrid tasks, you may have a heating / ventilation problem. If you do then you could test this by just leaving the door off for a while and try running more tasks and check the temperatures. If this is definitely a problem, either get a couple of extra system fans or manually turn the fan speed up on the card(s). I would highly recommend that you uninstall the 9800GT. These cards use a G92 core and do not handle most of today’s tasks too well, especially hERG tasks! This card is most likely causing ALL your failures. If you do not then the card will eventually give so many failures that you will get no new work. Your GTX260 will be doing at least 3 times the work of the 9800GT. What are the GPU temperatures like with and without the 9800GT installed, Running tasks and not running tasks? What is your PSU? |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
More and more... When your card fails task after task in 10 seconds or less, it's likely that its internal state has become corrupted. Do a complete power cycle - power down the host and restart: that should allow most of those tasks (the GIANNI-BIND and the KASHIF-HIVPR, at least) to run properly - even on the 9800GT that SKGiven despises so much :-) |
Michael GoetzSend message Joined: 2 Mar 09 Posts: 124 Credit: 124,873,744 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Install and run GPU-Z. It is freeware and will allow you to see the temperatures of the GPUS. If they are over 70degrees when running GPUGrid tasks, you may have a heating / ventilation problem. That's good advice for the CPU. The GPU is a different beast altogether. Newer GPU's appear to be designed to run hotter. A *LOT* hotter. The latest generation of Nvidia GPUs (260, 280, 285, 295) actually are designed to safely run to -- get this -- just over 100 degrees Celsius. In fact, when the fan is left on automatic, it won't even start to ramp up the fan speed until the temperature exceeds 70. Even then, it's only slightly increasing the fan speed and is clearly not trying to keep temps in the 70s Normal operating temperature under a heavy load seems to be around 80 to 85 degrees -- and that's with the fan still below 60% on automatic control. So don't panic if you see GPU temps in the 70s or 80s. That's normal. On the other hand, if your *CPU* is running that hot, you're possibly just on the brink of having it fail, depending on the model. Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.
|
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Install and run GPU-Z. It is freeware and will allow you to see the temperatures of the GPUS. If they are over 70degrees when running GPUGrid tasks, you may have a heating / ventilation problem. OK, GPU-Z measures GPU temperatures. It is not CPU-Z! Although the cards can run hot, it is NOT Good for them to stay at that temperature for extended periods of time and certainly Not Normal! My GTX260-216sp sits at 66 Degrees C when crunching GPUGrid and the fan is at 40%. That is my Normal and the card works 100% for ALL tasks. I would not like to hear the fan at 60% or run the GPU at 80 Degrees C. Pull the 9800GT You are testing to find the problem, so trying things is important! |
DamaralandSend message Joined: 7 Nov 09 Posts: 152 Credit: 16,181,924 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Sure it wasn't the heat, because since I thought that it was in a very cold room with the mainboard without case. I had the GT in the same board with the GTX card. I moved the gtx to a brand new board and everything is fine since then... the wonderfull world of computers... Now having problems with Linux drivers with my GTX, but that's another story I'll handle with time and patience... |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
A pain I’m sure. It ran for some time before it failed! Not sure a 9500 GT is up to the task any more. Well, certainly not that task. Was there a system restart, an update, or did you clear the cache, wipe free space...? Just on the off chance it is something other than Boinc/GPUGrid! I tried my 8800GTS again, but all tasks failed. 4 failed within 3sec (no real problem) but the fifth failed after 13h or so. It is running another task. If it fails I will take it back off GPUGrid, but I might get another GT240 instead. The one I have uses less electric and has successfully finished all 11 tasks so far. |
©2025 Universitat Pompeu Fabra