Recent problems for WUs on older GPUs

Message boards : Graphics cards (GPUs) : Recent problems for WUs on older GPUs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile Paul D. Buck

Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9836 - Posted: 16 May 2009, 10:42:06 UTC - in response to Message 9830.  

We have managed to replicate the problem on one of our machines.
This should lead to a solution soon.

Be patient.

Oh, now we have to be patient too???? :)

Its good news GDF ... thanks for the note.
ID: 9836 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toby Broom

Send message
Joined: 11 Dec 08
Posts: 26
Credit: 648,944,294
RAC: 479
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9843 - Posted: 16 May 2009, 11:54:10 UTC

I worked out the numbers on my computers, they all run 182.50 drivers.

ID: 30829 (8800GT 256Mb) - 11% failure rate
ID: 33373 (9800GX2 512Mb) - 46% failure rate
ID: 26481 (9800GX2 & GTX260) - 29% failure rate
ID: 34636 (9800GX2 & 8800GT) - 18% failure rate

It seems strange that the 8800GT is the most reliable card give the issues. 26481, did have an issue that I know was my fault, so that's a little higher than expected.

ID: 9843 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
uBronan
Avatar

Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 9844 - Posted: 16 May 2009, 11:55:45 UTC
Last modified: 16 May 2009, 11:57:23 UTC

Hmm i am not convinced its just the drivers i started under win xp with 182.50 driver and boinc 6.6.28 but again i see the ibuch unit hang on 64.688% for more then an hour after 13 hours of calculation.
So i start to believe this one is going to crash as well
ID: 9844 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matteo

Send message
Joined: 30 Mar 09
Posts: 1
Credit: 176,953
RAC: 0
Level

Scientific publications
watwatwat
Message 9849 - Posted: 16 May 2009, 12:25:17 UTC

My card is an 9800GTX whith 185.82 driver and Boinc 6.6.20.
I don't want to downgrade drivers, so, in the mean time, i suspended any WU's for GPUGRID.

I hope to see good news asap.

Sorry for my bad english...

Greetings, Matteo
ID: 9849 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9855 - Posted: 16 May 2009, 13:18:55 UTC - in response to Message 9844.  

Hmm i am not convinced its just the drivers


GDF said the problems appear with 185.xx and don't show up with some 180.xx, which apparently noone else is still using. This does not mean that 182.xx is fine and I think the usual "KASHIF_HIVPR" and "IBUCH_KID" problems definitely affect 182.50.
It seems to be a problem with the driver, triggered by some new WUs.

MrS
Scanning for our furry friends since Jan 2002
ID: 9855 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9858 - Posted: 16 May 2009, 14:11:17 UTC - in response to Message 9855.  

Hmm i am not convinced its just the drivers


GDF said the problems appear with 185.xx and don't show up with some 180.xx, which apparently noone else is still using. This does not mean that 182.xx is fine and I think the usual "KASHIF_HIVPR" and "IBUCH_KID" problems definitely affect 182.50.
It seems to be a problem with the driver, triggered by some new WUs.

Well, I have some of these named tasks running on my 9800GT and the GTX295s ... but they don't seem to want to run on the new GTX260 or my GTX280 ... As far as I know, at the moment I am running 182.50 everywhere ...

I suppose I could roll back to the 180.xx to see if I can get a task and if it dies ... heck, nothing else seems to be bothering this problem.
ID: 9858 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9860 - Posted: 16 May 2009, 14:20:20 UTC - in response to Message 9858.  

Sorry, not very specific post. Not all WUs with those names are affected, e.g. see here. "KASHIF_HIVPR_mon" and "KASHIF_HIVPR_dim" have been fine for me.

MrS
Scanning for our furry friends since Jan 2002
ID: 9860 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9861 - Posted: 16 May 2009, 14:32:34 UTC - in response to Message 9860.  

Sorry, not very specific post. Not all WUs with those names are affected, e.g. see here. "KASHIF_HIVPR_mon" and "KASHIF_HIVPR_dim" have been fine for me.

MrS

Well, I just rolled the driver back to 180.4 and still got an invalid function. THe tasks die immediately. gettingevery depressed ... can't tell if it is my new systems or bad tasks ...
ID: 9861 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9863 - Posted: 16 May 2009, 14:58:51 UTC
Last modified: 16 May 2009, 15:01:12 UTC

This tasks: p1480000-RAUL_pYEpYI1605-0-10-RND5295_0 started up and I have 5:10 or so on the clock ... so, unlike all the rest, finally got one running. It is running on the new MB, but the old GPU.

SO, this batch of tasks is so bad that most of them won't run on anything ... though my GTX 295s seem to be rolling on ...

{edit}

I was wrong ... it is on one of the new GTX 260 cards ...
ID: 9863 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Zydor

Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 9867 - Posted: 16 May 2009, 17:29:21 UTC - in response to Message 9863.  
Last modified: 16 May 2009, 17:30:11 UTC

GIANNI_FB's have come in for some flak lately , so thought I would post a successful one as comparator. The stop/starts in there were me, due to non-BOINC related stuff.

http://www.gpugrid.net/result.php?resultid=677172

Regards
Zy
ID: 9867 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9878 - Posted: 16 May 2009, 21:07:46 UTC

Glad your new rig made it through one WU successfully! The oters don't look too well, though. They error on most other hosts as well, but 3 have been finished by other GT200 cards. One of them uses 185.85, but I can't see the others.

MrS
Scanning for our furry friends since Jan 2002
ID: 9878 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9883 - Posted: 16 May 2009, 21:37:49 UTC - in response to Message 9867.  
Last modified: 16 May 2009, 21:39:14 UTC

GIANNI_FB's have come in for some flak lately , so thought I would post a successful one as comparator. The stop/starts in there were me, due to non-BOINC related stuff.

http://www.gpugrid.net/result.php?resultid=677172

Regards
Zy

And here's a 205-GIANNI_FB that failed on the same machine after running a LONG time:

http://www.gpugrid.net/result.php?resultid=677771
ID: 9883 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9891 - Posted: 17 May 2009, 0:52:05 UTC - in response to Message 9878.  

Glad your new rig made it through one WU successfully! The oters don't look too well, though. They error on most other hosts as well, but 3 have been finished by other GT200 cards. One of them uses 185.85, but I can't see the others.

MrS

I think I had TWO problems, one was OC got turned on by mistake and the automatic mode OC probably tried to do too much. What it broke is not entirely clear to me. It may also have been the BIOS ... I flashed that with the latest and turned off the OC mode at the same time so it is hard to know which it was.

The second problem was of course the bad tasks which would have failed with the other error messages if I had not had problem one on both rigs.

Now I am running into power limits (again) ...

I really gotta call that electrician to change my old 3 phase 230 V UPS socket into a 30A 115 supply...
ID: 9891 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Amis des Lapins]Gilloox

Send message
Joined: 21 Mar 08
Posts: 7
Credit: 24,394,688
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 9896 - Posted: 17 May 2009, 2:34:55 UTC

The similar for me,

http://www.gpugrid.net/result.php?resultid=678214
http://www.gpugrid.net/result.php?resultid=664263
ID: 9896 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9897 - Posted: 17 May 2009, 2:44:10 UTC - in response to Message 9896.  

The similar for me,

http://www.gpugrid.net/result.php?resultid=678214
http://www.gpugrid.net/result.php?resultid=664263

I don't understand ... you don't like valid tasks?
ID: 9897 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Amis des Lapins]Gilloox

Send message
Joined: 21 Mar 08
Posts: 7
Credit: 24,394,688
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 9898 - Posted: 17 May 2009, 3:06:09 UTC - in response to Message 9897.  

Hello,

oops
http://www.gpugrid.net/workunit.php?wuid=466073
http://www.gpugrid.net/workunit.php?wuid=458046

give me 5500 points for 17/24 hours of crunch (260GTX 216 SPU O/C stable
)
ID: 9898 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mark Henderson

Send message
Joined: 21 Dec 08
Posts: 51
Credit: 26,320,167
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 9899 - Posted: 17 May 2009, 4:24:17 UTC
Last modified: 17 May 2009, 5:03:20 UTC

I maybe lucky but I am having very few problems. 185.85 drivers, XP64, 2 EVGA 260s, Boinc 6.6.28 I had 1 compute error yesterday but that was my fault for suspending right as it started and unsuspending a couple of seconds later, and a couple of others that everyone else in the quorum errored out on.
I have heard of hanging WUs but have never had one of those either.
but 99 percent of the time it runs great.
I always take great care to run driver sweeper in safe mode after uninstalling Nvidia drivers before updating. I do not know if this matters that much though.
I also never let the gpu temps get over 65c with moderate OC.
Also 4 cpu units of either seti astropulse, einstein or abc running along side at same time always.
I had a 9800gt in this computer for about a month that ran good as well. Replaced it with a 260 this week.
ID: 9899 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
uBronan
Avatar

Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 9900 - Posted: 17 May 2009, 9:59:36 UTC
Last modified: 17 May 2009, 10:03:20 UTC

Now i have been able to save a few hanging units
It seems to work for me first make sure to disable keep units in memory under options.

I pause all other units available then i pause the unit which does not move in progress then push it to continue, i know it costs alot of time because it jumps back to some point in time.

Untill now i had 4 units which kept at a certain % and did not move in more then half an hour so i started messing with them.

When i woke up this morning i saw a 92-kashif_hivpr_dim unit reporting to have done 0.700 % in 7 hours so i paused it, ofcourse it jumped back to 0.426 % when it started over but now did in half an hour 1.5 %.

So the reason seems to be the units get stuck in the calculations and finally error out if this takes too long.

But i can tell you its a pain in the ass problem when they hang you hardly notice, we don't have time to watch them all day if the units progress or not.
ID: 9900 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[boinc.at] Nowi

Send message
Joined: 4 Sep 08
Posts: 44
Credit: 3,685,033
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwat
Message 9901 - Posted: 17 May 2009, 10:27:29 UTC

Now I have the fourth error WU in a row. :-(((

http://www.gpugrid.net/result.php?resultid=678849
http://www.gpugrid.net/result.php?resultid=679319
http://www.gpugrid.net/result.php?resultid=680211
http://www.gpugrid.net/result.php?resultid=680860

It wastes a lot of GPU-time for scientific knowledge!
It costs a lot of credits...
It costs a lot of fun...

Is GPUGRID going to be used only with newer cards?

Attention! Sarcasm!
Is there a hidden deal with NVIDIA to push cards with G200...?


My System
Q9550 @ 3.4
8800 GT @ stock
4 GB
Windows 7 RC 64 Bit
185.85
ID: 9901 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9903 - Posted: 17 May 2009, 12:22:29 UTC - in response to Message 9901.  

Nowi,

take a look further up in this thread.

[AF>EDLS>BIOMED],

take a look here. I edited the title to make it more clear that this problem also affects previous versions.

Mark,

I also get few errors, but if I look at my tasks I see that these are "friendly" WUs, almost none of the trouble makers. This makes it harder to blame it on config differences..

uBronan,

I think what you're doing is in the end similar to a BOINC restart. It's good to know that this helps, but still it's *irritating* that it seems to happen so often. Which BOINC version do you run? The thing is, i'm running 6.5.0, 185.66 and Vista 64 and from looking at my results I think I did not have a single hanging WU. Every day 2 succesful returns, except when errors occured or with the one "kashif_hivpr_dim" that I had. It registered a runtime of 89839s = 24:57h and gave 10096 credits. The interval between the previous result and this one is 24:55h, so I don't think it was hanging at all.

Of course, just because I ran one of them alright does not mean the problem doesn't exist. I just can't see the pattern.. is it the 6.6.x clients? It's not all of the WUs, it's not all of the 185 drivers, it's not all of the G9x GPUs. What's left?

Paul,

I really gotta call that electrician to change my old 3 phase 230 V UPS socket into a 30A 115 supply...


Do you think that's a good idea? I don't know your 230V, but at 115V the power supplies loose efficiency compared to 230V. 30A @ 115V is 3.5kW, quite massive :D
I know we can draw at least 2kW over the regular 230V, whereas I heard the US net may deliver something around 1.5kW at 110V. Our 3 phase plugs are 380V and I think you can get 5 - 6 kW from them.. but you're not talking about these, right?

MrS
Scanning for our furry friends since Jan 2002
ID: 9903 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Graphics cards (GPUs) : Recent problems for WUs on older GPUs

©2025 Universitat Pompeu Fabra