WU failures discussion

Message boards : Number crunching : WU failures discussion
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31974 - Posted: 12 Aug 2013, 20:55:49 UTC - in response to Message 31965.  

Thanks Stefan, sounds good so far!

About that new project: new, as in "GPU-Grid beta" or something? Or a new subproject / WUs type like the short and long queues, maybe "risk production"? Credit-wise it might be nicer to have them all combined under the same banner. But we and you might not be able to set things up as specifically as we want it, if they're not separate projects.

MrS
Scanning for our furry friends since Jan 2002
ID: 31974 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 31984 - Posted: 13 Aug 2013, 10:00:01 UTC

Ok so now the Noelia WU's got cancelled totally.
Have a happy crash-free crunching month :D
ID: 31984 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 42
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31985 - Posted: 13 Aug 2013, 11:20:17 UTC - in response to Message 31984.  

They were fun while they lasted. Though, I managed to finish the last few successfully, including two overnight, no errors since last Friday. It's nice to end on a high note. I hope the results are useful, and I hope these simulations resume once the bugs are fixed.


ID: 31985 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 31986 - Posted: 13 Aug 2013, 12:09:42 UTC - in response to Message 31985.  

Yes, they will probably come back in two three weeks with some different parameters which should decrease the error rate.
ID: 31986 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pvh

Send message
Joined: 17 Mar 10
Posts: 23
Credit: 1,173,824,416
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31987 - Posted: 13 Aug 2013, 12:41:23 UTC - in response to Message 31984.  

Ok so now the Noelia WU's got cancelled totally.


Excellent news! I will start up GPUGRID again when I get home...
ID: 31987 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
petebe

Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 31988 - Posted: 13 Aug 2013, 12:52:42 UTC

Thanks very much for the update and follow-thru, Stefan!
ID: 31988 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John C MacAlister

Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31990 - Posted: 13 Aug 2013, 16:33:05 UTC - in response to Message 31965.  

Thanks for the update, Stefan.

John
ID: 31990 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John C MacAlister

Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31991 - Posted: 13 Aug 2013, 16:36:08 UTC - in response to Message 31936.  

Hi, Jim:

I will not be processing GPUGrid WUs for a while as I am concentrating on other areas of interest. I will keep an eye on the Forum and decide at a future date if I should contribute more.

Happy crunching!

John
ID: 31991 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31993 - Posted: 13 Aug 2013, 22:00:46 UTC - in response to Message 31984.  

Ok so now the Noelia WU's got cancelled totally.
Have a happy crash-free crunching month :D

I am not happy with this as on my rigs the Noelia's did better than Santi's and that still is, I had again Santi errors, LR and SR even with the latest beta drivers.
Greetings from TJ
ID: 31993 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 31999 - Posted: 14 Aug 2013, 8:26:26 UTC - in response to Message 31993.  
Last modified: 14 Aug 2013, 8:27:39 UTC

Well, we cannot please everyone unfortunately :D It's great to hear that they worked fine on your machine. But on the last days the Noelia WU's had an incredibly high failure rate, so even if they worked for you they were crashing for nearly 30-40% of the users. So the general good had to prevail here :)
ID: 31999 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32003 - Posted: 14 Aug 2013, 9:49:01 UTC - in response to Message 31999.  

Well, we cannot please everyone unfortunately :D It's grat to hear that they worked fine on your machine. But on the last days the Noelia WU's had an incredibly high failure rate, so even if they worked for you they were crashing for nearly 30-40% of the users. So the general good had to prevail here :)

Aha, about a weak ago we could read that the failure rate was acceptable according to the project. This however is another conclusion ;-)
Well never mind, the Santi's keep failing on my rigs and with a lot of wingman too who got them afterwards. So I guess the complains about them will now increase. Not longer from me, I set to LR and will not longer complain about them.
And I have not to hurry to build new rigs, or update old ones with 690, 780 and titans.
Perhaps you could crunch a few of those Santi's, Stefan than you can see it yourself.

Greetings from TJ
ID: 32003 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 32004 - Posted: 14 Aug 2013, 9:57:15 UTC - in response to Message 32003.  
Last modified: 14 Aug 2013, 10:00:52 UTC

Well a week ago it was acceptable. For some reason it started increasing and became quite unacceptable (hence I said "on the last days").

Santi's are at 2-7% error rate which might be an all-time historical low or something like that :P

As for crunching them, I think my (single) GTX 280 might cry.
ID: 32004 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32005 - Posted: 14 Aug 2013, 10:54:02 UTC - in response to Message 32004.  

Thanks for the clarification Stefan.
Yes indeed the 280 will get it very warm :)
Greetings from TJ
ID: 32005 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32016 - Posted: 15 Aug 2013, 13:44:42 UTC - in response to Message 31965.  

Ok so now the Noelia WU's got cancelled totally.
Have a happy crash-free crunching month :D

Thanks for this. Since the Noelia WUs disappeared I've had no crashes or failures at all on my 8 GPUGrid machines.

2. Now for every batch we send out we decided we will make a thread in the News section with the exact batch name. If someone forgets to do that send a message quick and I will remind them :D These threads will also contain information about the specific batch.
3. There are plans to test features (maybe on a new project?) such as adding hardware requirements to WU's so that they only run on specified hardware and thus prevent unnessecary crashes.

Great news on both counts although I don't see why a new project would be necessary.
ID: 32016 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 32017 - Posted: 15 Aug 2013, 16:40:13 UTC - in response to Message 32016.  

Supposedly from what I understood some options (like WU deadlines? or hardware requirements?) are defined project wise. So we could not test them publicly on our main project as it could ruin everything. That's at least what I understood.
ID: 32017 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John C MacAlister

Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 32021 - Posted: 16 Aug 2013, 10:28:00 UTC - in response to Message 31987.  
Last modified: 16 Aug 2013, 10:54:32 UTC

Ok so now the Noelia WU's got cancelled totally.


Excellent news! I will start up GPUGRID again when I get home...



Hi, Stefan:

Thank you for this most welcome news! I will now run a couple of short run WUs and see what happens. I cannot turn my back on this important research: that's why I invested in two GTX 650Ti GPUs a few months ago. Until that time I had processed other WUs with ATI GPUs only.

Thanks, again.

John
ID: 32021 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32033 - Posted: 18 Aug 2013, 12:57:01 UTC - in response to Message 32004.  

Santi's are at 2-7% error rate which might be an all-time historical low or something like that :P

Haven't had a single WU error since the Noelia WUs left over a week ago. Looking back, the very few errors I had with the Nathan and Santi WUs seem to have occurred after defective Noelia WUs put the GPUs into a bad state. This is with 8 machines running a range of GPUs from the lowly GTX 460/768, the 560 1GB, the 650 Ti 1GB and the 670 2GB. If my hypothesis is correct, without the Noelia WUs around to mess up the GPUs, the error rate for other WU types should be falling.
ID: 32033 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 42
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32037 - Posted: 18 Aug 2013, 14:19:20 UTC - in response to Message 32033.  

Santi's are at 2-7% error rate which might be an all-time historical low or something like that :P

Haven't had a single WU error since the Noelia WUs left over a week ago. Looking back, the very few errors I had with the Nathan and Santi WUs seem to have occurred after defective Noelia WUs put the GPUs into a bad state. This is with 8 machines running a range of GPUs from the lowly GTX 460/768, the 560 1GB, the 650 Ti 1GB and the 670 2GB. If my hypothesis is correct, without the Noelia WUs around to mess up the GPUs, the error rate for other WU types should be falling.


This reminds me, on my windows xp computer, I had observed on 2 occasions, when the Noelia unit crashed, the subsequent non Noelia would not load into the GPU (the clock would run, but progress would stay at 0.00%). To fix this, I had to suspend the unit, reboot the computer, and then resume the unit. This non Noelia unit would then run normally.

I almost forgot about this. Thanks for jogging my memory with your post.




ID: 32037 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32039 - Posted: 18 Aug 2013, 15:58:38 UTC

On my 660 the Santi's keep erroring so Noelia's WU have nothing to do with this!
So I have withdrawn the 660 and give it to Einstein and Albert.
Greetings from TJ
ID: 32039 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32064 - Posted: 19 Aug 2013, 12:25:23 UTC - in response to Message 32039.  

On my 660 the Santi's keep erroring so Noelia's WU have nothing to do with this!
So I have withdrawn the 660 and give it to Einstein and Albert.

Did you ever try to RMA it as suggested?
ID: 32064 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : WU failures discussion

©2025 Universitat Pompeu Fabra