Message boards :
Graphics cards (GPUs) :
Recent problems for WUs on older GPUs
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
| Author | Message |
|---|---|
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
We have managed to replicate the problem on one of our machines. Oh, now we have to be patient too???? :) Its good news GDF ... thanks for the note. |
|
Send message Joined: 11 Dec 08 Posts: 26 Credit: 648,944,294 RAC: 479 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I worked out the numbers on my computers, they all run 182.50 drivers. ID: 30829 (8800GT 256Mb) - 11% failure rate ID: 33373 (9800GX2 512Mb) - 46% failure rate ID: 26481 (9800GX2 & GTX260) - 29% failure rate ID: 34636 (9800GX2 & 8800GT) - 18% failure rate It seems strange that the 8800GT is the most reliable card give the issues. 26481, did have an issue that I know was my fault, so that's a little higher than expected. |
|
Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Hmm i am not convinced its just the drivers i started under win xp with 182.50 driver and boinc 6.6.28 but again i see the ibuch unit hang on 64.688% for more then an hour after 13 hours of calculation. So i start to believe this one is going to crash as well |
|
Send message Joined: 30 Mar 09 Posts: 1 Credit: 176,953 RAC: 0 Level ![]() Scientific publications ![]() ![]()
|
My card is an 9800GTX whith 185.82 driver and Boinc 6.6.20. I don't want to downgrade drivers, so, in the mean time, i suspended any WU's for GPUGRID. I hope to see good news asap. Sorry for my bad english... Greetings, Matteo |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hmm i am not convinced its just the drivers GDF said the problems appear with 185.xx and don't show up with some 180.xx, which apparently noone else is still using. This does not mean that 182.xx is fine and I think the usual "KASHIF_HIVPR" and "IBUCH_KID" problems definitely affect 182.50. It seems to be a problem with the driver, triggered by some new WUs. MrS Scanning for our furry friends since Jan 2002 |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hmm i am not convinced its just the drivers Well, I have some of these named tasks running on my 9800GT and the GTX295s ... but they don't seem to want to run on the new GTX260 or my GTX280 ... As far as I know, at the moment I am running 182.50 everywhere ... I suppose I could roll back to the 180.xx to see if I can get a task and if it dies ... heck, nothing else seems to be bothering this problem. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Sorry, not very specific post. Not all WUs with those names are affected, e.g. see here. "KASHIF_HIVPR_mon" and "KASHIF_HIVPR_dim" have been fine for me. MrS Scanning for our furry friends since Jan 2002 |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Sorry, not very specific post. Not all WUs with those names are affected, e.g. see here. "KASHIF_HIVPR_mon" and "KASHIF_HIVPR_dim" have been fine for me. Well, I just rolled the driver back to 180.4 and still got an invalid function. THe tasks die immediately. gettingevery depressed ... can't tell if it is my new systems or bad tasks ... |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This tasks: p1480000-RAUL_pYEpYI1605-0-10-RND5295_0 started up and I have 5:10 or so on the clock ... so, unlike all the rest, finally got one running. It is running on the new MB, but the old GPU. SO, this batch of tasks is so bad that most of them won't run on anything ... though my GTX 295s seem to be rolling on ... {edit} I was wrong ... it is on one of the new GTX 260 cards ... |
ZydorSend message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
GIANNI_FB's have come in for some flak lately , so thought I would post a successful one as comparator. The stop/starts in there were me, due to non-BOINC related stuff. http://www.gpugrid.net/result.php?resultid=677172 Regards Zy |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Glad your new rig made it through one WU successfully! The oters don't look too well, though. They error on most other hosts as well, but 3 have been finished by other GT200 cards. One of them uses 185.85, but I can't see the others. MrS Scanning for our furry friends since Jan 2002 |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
GIANNI_FB's have come in for some flak lately , so thought I would post a successful one as comparator. The stop/starts in there were me, due to non-BOINC related stuff. And here's a 205-GIANNI_FB that failed on the same machine after running a LONG time: http://www.gpugrid.net/result.php?resultid=677771 |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Glad your new rig made it through one WU successfully! The oters don't look too well, though. They error on most other hosts as well, but 3 have been finished by other GT200 cards. One of them uses 185.85, but I can't see the others. I think I had TWO problems, one was OC got turned on by mistake and the automatic mode OC probably tried to do too much. What it broke is not entirely clear to me. It may also have been the BIOS ... I flashed that with the latest and turned off the OC mode at the same time so it is hard to know which it was. The second problem was of course the bad tasks which would have failed with the other error messages if I had not had problem one on both rigs. Now I am running into power limits (again) ... I really gotta call that electrician to change my old 3 phase 230 V UPS socket into a 30A 115 supply... |
[AF>Amis des Lapins]GillooxSend message Joined: 21 Mar 08 Posts: 7 Credit: 24,394,688 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The similar for me, http://www.gpugrid.net/result.php?resultid=678214 http://www.gpugrid.net/result.php?resultid=664263 |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The similar for me, I don't understand ... you don't like valid tasks? |
[AF>Amis des Lapins]GillooxSend message Joined: 21 Mar 08 Posts: 7 Credit: 24,394,688 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hello, oops http://www.gpugrid.net/workunit.php?wuid=466073 http://www.gpugrid.net/workunit.php?wuid=458046 give me 5500 points for 17/24 hours of crunch (260GTX 216 SPU O/C stable ) |
|
Send message Joined: 21 Dec 08 Posts: 51 Credit: 26,320,167 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I maybe lucky but I am having very few problems. 185.85 drivers, XP64, 2 EVGA 260s, Boinc 6.6.28 I had 1 compute error yesterday but that was my fault for suspending right as it started and unsuspending a couple of seconds later, and a couple of others that everyone else in the quorum errored out on. I have heard of hanging WUs but have never had one of those either. but 99 percent of the time it runs great. I always take great care to run driver sweeper in safe mode after uninstalling Nvidia drivers before updating. I do not know if this matters that much though. I also never let the gpu temps get over 65c with moderate OC. Also 4 cpu units of either seti astropulse, einstein or abc running along side at same time always. I had a 9800gt in this computer for about a month that ran good as well. Replaced it with a 260 this week. |
|
Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Now i have been able to save a few hanging units It seems to work for me first make sure to disable keep units in memory under options. I pause all other units available then i pause the unit which does not move in progress then push it to continue, i know it costs alot of time because it jumps back to some point in time. Untill now i had 4 units which kept at a certain % and did not move in more then half an hour so i started messing with them. When i woke up this morning i saw a 92-kashif_hivpr_dim unit reporting to have done 0.700 % in 7 hours so i paused it, ofcourse it jumped back to 0.426 % when it started over but now did in half an hour 1.5 %. So the reason seems to be the units get stuck in the calculations and finally error out if this takes too long. But i can tell you its a pain in the ass problem when they hang you hardly notice, we don't have time to watch them all day if the units progress or not. |
|
Send message Joined: 4 Sep 08 Posts: 44 Credit: 3,685,033 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
Now I have the fourth error WU in a row. :-((( http://www.gpugrid.net/result.php?resultid=678849 http://www.gpugrid.net/result.php?resultid=679319 http://www.gpugrid.net/result.php?resultid=680211 http://www.gpugrid.net/result.php?resultid=680860 It wastes a lot of GPU-time for scientific knowledge! It costs a lot of credits... It costs a lot of fun... Is GPUGRID going to be used only with newer cards? Attention! Sarcasm! Is there a hidden deal with NVIDIA to push cards with G200...? My System Q9550 @ 3.4 8800 GT @ stock 4 GB Windows 7 RC 64 Bit 185.85 |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Nowi, take a look further up in this thread. [AF>EDLS>BIOMED], take a look here. I edited the title to make it more clear that this problem also affects previous versions. Mark, I also get few errors, but if I look at my tasks I see that these are "friendly" WUs, almost none of the trouble makers. This makes it harder to blame it on config differences.. uBronan, I think what you're doing is in the end similar to a BOINC restart. It's good to know that this helps, but still it's *irritating* that it seems to happen so often. Which BOINC version do you run? The thing is, i'm running 6.5.0, 185.66 and Vista 64 and from looking at my results I think I did not have a single hanging WU. Every day 2 succesful returns, except when errors occured or with the one "kashif_hivpr_dim" that I had. It registered a runtime of 89839s = 24:57h and gave 10096 credits. The interval between the previous result and this one is 24:55h, so I don't think it was hanging at all. Of course, just because I ran one of them alright does not mean the problem doesn't exist. I just can't see the pattern.. is it the 6.6.x clients? It's not all of the WUs, it's not all of the 185 drivers, it's not all of the G9x GPUs. What's left? Paul, I really gotta call that electrician to change my old 3 phase 230 V UPS socket into a 30A 115 supply... Do you think that's a good idea? I don't know your 230V, but at 115V the power supplies loose efficiency compared to 230V. 30A @ 115V is 3.5kW, quite massive :D I know we can draw at least 2kW over the regular 230V, whereas I heard the US net may deliver something around 1.5kW at 110V. Our 3 phase plugs are 380V and I think you can get 5 - 6 kW from them.. but you're not talking about these, right? MrS Scanning for our furry friends since Jan 2002 |
©2025 Universitat Pompeu Fabra