Advanced search

Message boards : Graphics cards (GPUs) : Gigabyte GTS 450 OC2 crashes

Author Message
Rantanplan
Send message
Joined: 22 Jul 11
Posts: 166
Credit: 138,629,987
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22178 - Posted: 28 Sep 2011 | 10:20:09 UTC
Last modified: 28 Sep 2011 | 10:35:55 UTC

Cant help me , every day it happens, my cuda app

crashes by using the GTS 450 1gb i have installed.

Search for an upgraded correct BIOS for this card.

Can u help.

Thx.

Edit:

So i lowered the chip frequency , i hope that will help.

Guessing and waiting...

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22182 - Posted: 28 Sep 2011 | 17:54:04 UTC - in response to Message 22178.

Several of the tasks that failed on your system also failed on other systems.
This suggests the problem is not specific to your system.
Too many errors (may have bug) - Example

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 22183 - Posted: 28 Sep 2011 | 20:47:05 UTC - in response to Message 22182.

Some clients corrupt files on upload, and subsequent workunits fail as a consequence. Luckily they do so immediately. We are investigating. It might be related to new boinc client versions.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22184 - Posted: 28 Sep 2011 | 23:05:26 UTC - in response to Message 22183.

Yes, looks like it; the 6.12.x Boinc Clients are not doing so well. Somewhat obfuscated by the odd CC1.1 card (to be ignored). Then there are the mixed GPU-series systems (GTX200 + GTX 400 or GTX500); the GTX 200 cards are failing some tasks.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 22185 - Posted: 29 Sep 2011 | 8:56:36 UTC - in response to Message 22184.
Last modified: 29 Sep 2011 | 9:12:26 UTC

If you see
"ERROR: Unable to read bincoordfile"
it's not a problem in your card, but it was caused by the host which computed the previous step.

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22186 - Posted: 29 Sep 2011 | 12:00:17 UTC

It seems his host running XP is also running BOINC 6.13.1 which is unstable to say the least. I would suggest he goes back to 6.12.34 (the last official release) until the 6.13 series become stable. Even I haven't touched 6.13 yet and normally I run the latest and greatest.
____________
BOINC blog

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22187 - Posted: 29 Sep 2011 | 15:34:44 UTC - in response to Message 22186.

On this example task, all but one of the 6 failures was on Boinc 6.12.x. The exception was a Linux system with a GeForce 9800 GT (using the 6.14app) and Boinc 6.10.58 installed.

I have only had 2 GPUGrid failures on 6.13.1 (but definitely not recommending anyone use it):
One was after 5sec,
MDIO: read error for file "input.coor", byte number 0: expected to read number of atoms
ERROR: Unable to read bincoordfile
This task failed on 6.12.x and 6.13.1 Boinc versions, with the exception of a Linux system that fails tasks regularly.

The other was just a "No heartbeat from core client for 30 sec - exiting" error, caused by another application.

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22188 - Posted: 30 Sep 2011 | 0:16:48 UTC - in response to Message 22185.

If you see
"ERROR: Unable to read bincoordfile"
it's not a problem in your card, but it was caused by the host which computed the previous step.


Here's an example of that from my errored tasks list.

The first host to receive it runs BOINC 6.10.18 and it failed with the bincoordfile error.
The second iteration went to a BOINC 6.10.58 host and failed with bincoordfile error.
Third host runs 6.12.34, failed with bincoordffile error.
Fourth host is 6.12.33, bincoordfile error.
Fifth host is 6.10.58, bincoorfile error.
Sixth host is 6.12.33, bincoordfile error.

So it's the unreadable bincoordfile, not the BOINC version. I've been running 6.12.33 for a few months and have had very few errors, most of my errored tasks failed for all 6 iterations.

This task from my errored tasks list went first to a Linux host on 6.10.58, then to my Linux host with 6.12.33 then to a Windows host with 6.12.33 which finished the task error free. No bincoordfile error on that WU, can't blame it on 6.12 versions, seems Linux or app version 6.14 was to blame.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,377,515,923
RAC: 3,462,917
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22189 - Posted: 30 Sep 2011 | 7:08:41 UTC - in response to Message 22188.

If you see
"ERROR: Unable to read bincoordfile"
it's not a problem in your card, but it was caused by the host which computed the previous step.

Some clients corrupt files on upload, and subsequent workunits fail as a consequence. Luckily they do so immediately. We are investigating. It might be related to new boinc client versions.

Here's an example of that from my errored tasks list.

The first host to receive it runs BOINC 6.10.18 and it failed with the bincoordfile error.
The second iteration went to a BOINC 6.10.58 host and failed with bincoordfile error.
Third host runs 6.12.34, failed with bincoordffile error.
Fourth host is 6.12.33, bincoordfile error.
Fifth host is 6.10.58, bincoorfile error.
Sixth host is 6.12.33, bincoordfile error.

So it's the unreadable bincoordfile, not the BOINC version. I've been running 6.12.33 for a few months and have had very few errors, most of my errored tasks
failed for all 6 iterations.

As far as I understand how this project works, the workunint you make reference to is the same step issued to different hosts. All failed, because the previous step of this thread was processed by a new version BOINC client (this host corrupted the result on upload), but this host is not on this list.

This task from my errored tasks list went first to a Linux host on 6.10.58, then to my Linux host with 6.12.33 then to a Windows host with 6.12.33 which finished the task error free. No bincoordfile error on that WU, can't blame it on 6.12 versions, seems Linux or app version 6.14 was to blame.

This task failed on the first and the second host for different reasons, none of them are the "Unable to read bincoordfile" error.

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22190 - Posted: 30 Sep 2011 | 7:45:59 UTC - in response to Message 22189.

As far as I understand how this project works, the workunint you make reference to is the same step issued to different hosts. All failed, because the previous step of this thread was processed by a new version BOINC client (this host corrupted the result on upload), but this host is not on this list.


Ah yes, I see now. Even though my 6.12 host isn't crashing tasks the results it uploads may be corrupt which will cause the next step to crash on 6 hosts in a row. Well, it's easy enough to roll back to 6.10.58.

Can the validator be tuned to reject the corrupted upload thus causing a resend of the task?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22191 - Posted: 30 Sep 2011 | 14:41:49 UTC - in response to Message 22190.

Did this problem arise subsequent to moving to a 5day return, without the early (3day) resend; did the early rescheduling mechanism avoid this issue?

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 22195 - Posted: 30 Sep 2011 | 22:30:51 UTC - in response to Message 22191.
Last modified: 30 Sep 2011 | 22:31:34 UTC

I did some statistics and possibly traced the problem to (unstable) BOINC 6.13.3 and 6.13.4. See this thread.

Post to thread

Message boards : Graphics cards (GPUs) : Gigabyte GTS 450 OC2 crashes

//