*_pYEEI_* information and issues

Message boards : Graphics cards (GPUs) : *_pYEEI_* information and issues
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Michael Goetz
Avatar

Send message
Joined: 2 Mar 09
Posts: 124
Credit: 124,873,744
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 14079 - Posted: 30 Dec 2009, 1:38:05 UTC - in response to Message 14078.  

I *just* managed to squeak by. I had six of these error out, dropping my daily quota to 9. The next WU was the 9th of the day; fortunately, it's a different series and is crunching normally. If it had been another error I think this GPU would have been done for the day. (Unless it's still counting this as WUs per CPU core, in which case I had a lot of headway.)


Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

ID: 14079 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Stoneageman
Avatar

Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,374,498
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14080 - Posted: 30 Dec 2009, 2:21:20 UTC
Last modified: 30 Dec 2009, 2:57:17 UTC

UPDATE: Four more have trashed another gpu
Aborted a boat load of these critters, yet still they come. It's like they are breeding!
ID: 14080 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14081 - Posted: 30 Dec 2009, 9:15:13 UTC

Can you PLEASE PLEASE PLEASE make sure WU batches are OK before sending them out.
ID: 14081 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Siegfried Niklas
Avatar

Send message
Joined: 23 Feb 09
Posts: 39
Credit: 144,654,294
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 14082 - Posted: 30 Dec 2009, 9:54:46 UTC

GTX295 - Nine *_pYEEI_* WUs crashed in a row.

http://www.gpugrid.net/results.php?hostid=53295

"MDIO ERROR: syntax error in file "structure.psf", line number 1: failed to find PSF keyword
ERROR: mdioload.cu, line 172: Unable to read topology file"

No new work sent for 7,5 hours. (recently got new)

Should I abort *_pYEEI_* on other GPUs (cache)?
ID: 14082 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hzels

Send message
Joined: 4 Sep 08
Posts: 7
Credit: 52,864,406
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14083 - Posted: 30 Dec 2009, 11:11:14 UTC - in response to Message 14082.  

last WUs all going down the drain:

<stderr_txt>
# Using CUDA device 0
# There are 2 devices supporting CUDA
# Device 0: "GeForce GTX 280"
# Clock rate: 1.55 GHz
# Total amount of global memory: 1073741824 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 1: "GeForce GTX 260"
# Clock rate: 1.51 GHz
# Total amount of global memory: 939524096 bytes
# Number of multiprocessors: 27
# Number of cores: 216
MDIO ERROR: syntax error in file "structure.psf", line number 1: failed to find PSF keyword
ERROR: mdioload.cu, line 172: Unable to read topology file

called boinc_finish

</stderr_txt>

I'm over to Collatz for some days.
ID: 14083 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael Goetz
Avatar

Send message
Joined: 2 Mar 09
Posts: 124
Credit: 124,873,744
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 14085 - Posted: 30 Dec 2009, 16:17:39 UTC

I just had another one of these fail:

1057058


Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

ID: 14085 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14086 - Posted: 30 Dec 2009, 16:29:08 UTC - in response to Message 14085.  
Last modified: 30 Dec 2009, 16:32:34 UTC

Had 2 fail in a few seconds on one system, 3 on another.
184-IBUCH_reverse_pYEEI_2912-0-40-RND6748 http://www.gpugrid.net/workunit.php?wuid=1056751
128-IBUCH_reverse_pYEEI_2912-0-40-RND3643 http://www.gpugrid.net/workunit.php?wuid=1056695
Also, could not get any tasks this morning between about 1am and noon, on the same system, but running a task now.

http://www.gpugrid.net/workunit.php?wuid=1056826
http://www.gpugrid.net/workunit.php?wuid=1056758
http://www.gpugrid.net/workunit.php?wuid=1056826
ID: 14086 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14092 - Posted: 31 Dec 2009, 0:36:13 UTC - in response to Message 14028.  

Please use this thread to post any problem regarding all workunits tagged as *_pYEEI_*.

Thanks,
ignasi

As you can see (I hope) massive problems have been reported and many systems have been locked out (and are sitting idle) of receiving new WUs due to these faulty units. Don't you think it's about time to pull the rest? It looks like they're just being allowed to run until they fail so many times that the server cancels them. That's not showing any concern at all for the people who are doing your work.

I know they're not being canceled because I've received 22 of them so far today. Every one of those 22 has failed on several machines before being sent to me. That's just wrong.


ID: 14092 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14094 - Posted: 31 Dec 2009, 14:09:24 UTC - in response to Message 14092.  

In a way the _pYEEI_ tasks are SPAM!

I had to take extreme action yesterday - shut down my system for a couple of hours ;)
ID: 14094 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ignasi

Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 14107 - Posted: 3 Jan 2010, 17:50:40 UTC - in response to Message 14094.  

My most sincere apologies to everybody for all this.
I wanted to fill up the queue before going offline for some days but obviously it didn't work as expected.

The balance between keeping crunchers support, not having an empty queue and having private life is always very sensitive to human errors.

Sincerely,
ignasi
ID: 14107 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael Goetz
Avatar

Send message
Joined: 2 Mar 09
Posts: 124
Credit: 124,873,744
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 14109 - Posted: 3 Jan 2010, 18:28:46 UTC - in response to Message 14107.  

My most sincere apologies to everybody for all this.


No worries here; stuff happens. It's the nature of the "free" distributed computing that there are going to be minor problems along the way.

Happy new year!

Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

ID: 14109 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14111 - Posted: 3 Jan 2010, 19:26:08 UTC - in response to Message 14109.  

My most sincere apologies to everybody for all this.

Happy new year!

Thanks for letting us know what happened. Communication is appreciated.

Happy new year everyone!
ID: 14111 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Stoneageman
Avatar

Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,374,498
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14112 - Posted: 3 Jan 2010, 19:58:41 UTC - in response to Message 14107.  


The balance between keeping crunchers support, not having an empty queue and having private life is always very sensitive to human errors.

Sincerely,
ignasi


"A PRIVATE life".......... well ok. However, we expect you to sleep with the server :)
ID: 14112 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ignasi

Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 14113 - Posted: 4 Jan 2010, 9:33:01 UTC - in response to Message 14112.  

"A PRIVATE life".......... well ok. However, we expect you to sleep with the server :)
[/quote]

I am afraid girlfriends are too jealous...
ID: 14113 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Snow Crash

Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14122 - Posted: 4 Jan 2010, 23:52:38 UTC - in response to Message 14113.  

"A PRIVATE life".......... well ok. However, we expect you to sleep with the server :)


I am afraid girlfriends are too jealous...


You have more than ONE !!! No wonder he can't get the WUs straight , he is sleep deprived :-)

Keep up the good work, we'll crunch the best we can!
Thanks - Steve
ID: 14122 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Libristes>Jip] Elgrande71
Avatar

Send message
Joined: 16 Jul 08
Posts: 45
Credit: 78,618,001
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14348 - Posted: 26 Jan 2010, 14:27:46 UTC - in response to Message 14122.  

Three compute errors 1,2,3 on this host .
ID: 14348 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AndyMM

Send message
Joined: 27 Jan 09
Posts: 4
Credit: 582,988,184
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14396 - Posted: 27 Jan 2010, 10:10:20 UTC

Sorry but gong to say good bye. Last 3 days non stop computation errors made even worse by the fact the cards just sat there doing nothing.

Switching all my GPUs to F@H. I do not accept having my money wasted with units processing for 17 hours then showing a computing error.
ID: 14396 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GPUGRID Role account

Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 14410 - Posted: 27 Jan 2010, 13:56:28 UTC - in response to Message 14396.  

Hi,

It's because you have been accepting beta work from us. If reliability of work is of paramount importance to you, don't track the beta application.

Matt
ID: 14410 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14524 - Posted: 28 Jan 2010, 0:30:51 UTC - in response to Message 14396.  

Switching all my GPUs to F@H.

Your cards do a lot more work here than they can at F@H.

If the problem is Beta related you just need to turn the Betas off, as MJH said. It might also be that you need to restart the system. Sometimes one failure can cause contunuous failures (a runaway) and you need to restart the system. I say this because the problem was only limited to your GTX 295, and not your GTX 275.
Many of your tasks seem to have been aborted by user. Some immediately and one after running for a long time,
286-IBUCH_esrever_pYEEI_0301-10-40-RND7408 - Aborted by user after 43,189.28 seconds.

Turn off Betas, restart and see how you get on.
ID: 14524 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AndyMM

Send message
Joined: 27 Jan 09
Posts: 4
Credit: 582,988,184
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14789 - Posted: 29 Jan 2010, 14:08:33 UTC - in response to Message 14524.  

Thanks for the comments. I looked in my GPUGrid preferences and did not notice anything saying Beta
I did see
"Run test applications?
This helps us develop applications, but may cause jobs to fail on your computer"

Which was already set to no.

Please advise, how do a turn off receiving Beta work units

Thanks

Andy
ID: 14789 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Graphics cards (GPUs) : *_pYEEI_* information and issues

©2026 Universitat Pompeu Fabra