Message boards :
Graphics cards (GPUs) :
Gigabyte GTS 450 OC2 crashes
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 22 Jul 11 Posts: 166 Credit: 138,629,987 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Cant help me , every day it happens, my cuda app crashes by using the GTS 450 1gb i have installed. Search for an upgraded correct BIOS for this card. Can u help. Thx. Edit: So i lowered the chip frequency , i hope that will help. Guessing and waiting... |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Several of the tasks that failed on your system also failed on other systems. This suggests the problem is not specific to your system. Too many errors (may have bug) - Example |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Some clients corrupt files on upload, and subsequent workunits fail as a consequence. Luckily they do so immediately. We are investigating. It might be related to new boinc client versions. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Yes, looks like it; the 6.12.x Boinc Clients are not doing so well. Somewhat obfuscated by the odd CC1.1 card (to be ignored). Then there are the mixed GPU-series systems (GTX200 + GTX 400 or GTX500); the GTX 200 cards are failing some tasks. |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
If you see "ERROR: Unable to read bincoordfile" it's not a problem in your card, but it was caused by the host which computed the previous step. |
|
Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It seems his host running XP is also running BOINC 6.13.1 which is unstable to say the least. I would suggest he goes back to 6.12.34 (the last official release) until the 6.13 series become stable. Even I haven't touched 6.13 yet and normally I run the latest and greatest. BOINC blog |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
On this example task, all but one of the 6 failures was on Boinc 6.12.x. The exception was a Linux system with a GeForce 9800 GT (using the 6.14app) and Boinc 6.10.58 installed. I have only had 2 GPUGrid failures on 6.13.1 (but definitely not recommending anyone use it): One was after 5sec, MDIO: read error for file "input.coor", byte number 0: expected to read number of atoms ERROR: Unable to read bincoordfile This task failed on 6.12.x and 6.13.1 Boinc versions, with the exception of a Linux system that fails tasks regularly. The other was just a "No heartbeat from core client for 30 sec - exiting" error, caused by another application. |
|
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
If you see Here's an example of that from my errored tasks list. The first host to receive it runs BOINC 6.10.18 and it failed with the bincoordfile error. The second iteration went to a BOINC 6.10.58 host and failed with bincoordfile error. Third host runs 6.12.34, failed with bincoordffile error. Fourth host is 6.12.33, bincoordfile error. Fifth host is 6.10.58, bincoorfile error. Sixth host is 6.12.33, bincoordfile error. So it's the unreadable bincoordfile, not the BOINC version. I've been running 6.12.33 for a few months and have had very few errors, most of my errored tasks failed for all 6 iterations. This task from my errored tasks list went first to a Linux host on 6.10.58, then to my Linux host with 6.12.33 then to a Windows host with 6.12.33 which finished the task error free. No bincoordfile error on that WU, can't blame it on 6.12 versions, seems Linux or app version 6.14 was to blame. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
If you see As far as I understand how this project works, the workunint you make reference to is the same step issued to different hosts. All failed, because the previous step of this thread was processed by a new version BOINC client (this host corrupted the result on upload), but this host is not on this list. This task from my errored tasks list went first to a Linux host on 6.10.58, then to my Linux host with 6.12.33 then to a Windows host with 6.12.33 which finished the task error free. No bincoordfile error on that WU, can't blame it on 6.12 versions, seems Linux or app version 6.14 was to blame. This task failed on the first and the second host for different reasons, none of them are the "Unable to read bincoordfile" error. |
|
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
As far as I understand how this project works, the workunint you make reference to is the same step issued to different hosts. All failed, because the previous step of this thread was processed by a new version BOINC client (this host corrupted the result on upload), but this host is not on this list. Ah yes, I see now. Even though my 6.12 host isn't crashing tasks the results it uploads may be corrupt which will cause the next step to crash on 6 hosts in a row. Well, it's easy enough to roll back to 6.10.58. Can the validator be tuned to reject the corrupted upload thus causing a resend of the task? |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Did this problem arise subsequent to moving to a 5day return, without the early (3day) resend; did the early rescheduling mechanism avoid this issue? |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
I did some statistics and possibly traced the problem to (unstable) BOINC 6.13.3 and 6.13.4. See this thread. |
©2025 Universitat Pompeu Fabra