failing tasks lately

Message boards : Number crunching : failing tasks lately
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52534 - Posted: 27 Aug 2019, 5:04:15 UTC

the faulty tasks seem to be back (erroring out after a few seconds):

http://www.gpugrid.net/result.php?resultid=21331546

:-(
ID: 52534 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52807 - Posted: 8 Oct 2019, 8:49:00 UTC

I had a task fail after few seconds.

Stderr says: ERROR: file pme.cpp line 91: PME NX too small

here the URL: http://www.gpugrid.net/result.php?resultid=21429528

anyone any idea what was going wrong?
ID: 52807 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52808 - Posted: 8 Oct 2019, 10:30:21 UTC - in response to Message 52807.  

At least it went wrong for everyone, not just for you. A bad workunit.

WU 16799014
ID: 52808 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52819 - Posted: 9 Oct 2019, 11:08:06 UTC

here another one, from this morning, with error message:

ERROR: file mdioload.cpp line 81: Unable to read bincoordfile

http://www.gpugrid.net/result.php?resultid=21431713
ID: 52819 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Killersocke

Send message
Joined: 18 Oct 13
Posts: 53
Credit: 406,647,419
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52820 - Posted: 9 Oct 2019, 11:47:45 UTC - in response to Message 52819.  
Last modified: 9 Oct 2019, 11:48:55 UTC

ID: 52820 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52822 - Posted: 9 Oct 2019, 12:05:46 UTC - in response to Message 52820.  
Last modified: 9 Oct 2019, 12:06:57 UTC

Same here
http://www.gpugrid.net/result.php?resultid=21432948
http://www.gpugrid.net/result.php?resultid=21432946
http://www.gpugrid.net/result.php?resultid=21431340
http://www.gpugrid.net/result.php?resultid=21431266
http://www.gpugrid.net/result.php?resultid=21430771
...and more others, all CUDA 80
Until the new app (ACEMD3) is released, you should assign this host to a venue which receives work only from the ACEMD3 queue, as the other two queues have the old client, which is incompatible with the Turing cards.
ID: 52822 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52829 - Posted: 9 Oct 2019, 18:45:15 UTC

obviously, the faulty tasks are back, here the next one from a minute ago:
http://www.gpugrid.net/result.php?resultid=21433016

This is even worse in times where new tasks are very rare, anyway :-(
ID: 52829 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52892 - Posted: 24 Oct 2019, 18:57:45 UTC

ID: 52892 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52893 - Posted: 24 Oct 2019, 19:21:45 UTC

ID: 52893 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52894 - Posted: 24 Oct 2019, 22:24:57 UTC - in response to Message 52893.  

I think the license of the v9.22 app has expired this time.
ID: 52894 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52895 - Posted: 25 Oct 2019, 2:58:20 UTC - in response to Message 52894.  

I think the license of the v9.22 app has expired this time.

that's what I now am suspecting, too :-(
ID: 52895 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BelgianEnthousiast

Send message
Joined: 7 Apr 15
Posts: 33
Credit: 1,201,157,375
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwat
Message 52896 - Posted: 25 Oct 2019, 14:25:47 UTC

Any prediction when continous supply of new WU's will become available again ?
Nearly full month of very intermittent and small numbers of WU's.

Einstein is a happy project in the meantime :-)

Are all efforts being put into support of the new 20XX cards at the detriment
of the current 10XX cards ? (limited staff available maybe/lack of funding ?)

ID: 52896 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52923 - Posted: 31 Oct 2019, 15:48:16 UTC

this is an increasingly annoying situation:

while there are no tasks available most of the time, some of the few ones that are being downloaded fail after 5 seconds:

http://www.gpugrid.net/result.php?resultid=21481323

ERROR: file mdioload.cpp line 81: Unable to read bincoordfile

:-( :-( :-(
ID: 52923 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Clive

Send message
Joined: 2 Jul 19
Posts: 21
Credit: 90,744,164
RAC: 0
Level
Thr
Scientific publications
wat
Message 52928 - Posted: 4 Nov 2019, 4:41:35 UTC

Hi:

I see this is a well used section of the forum.

I would like to contribute some useful results here with my Alienware laptop but I have a high failure rate which I would like to resolve here. The GPU in my laptop is a Geoforce 660M. The OS I am using is uptodate Windows 10.

I would appreciate it if a tech person could narrow down the reason or reasons why I am experiencing such a high failure rate.

Clive Hunt
Canada
ID: 52928 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52929 - Posted: 4 Nov 2019, 5:24:31 UTC - in response to Message 52928.  

I would like to contribute some useful results here with my Alienware laptop

I am afraid that laptop GPUs are not made for this kind of load :-(
ID: 52929 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52930 - Posted: 4 Nov 2019, 5:24:49 UTC - in response to Message 52929.  

I would like to contribute some useful results here with my Alienware laptop

I am afraid that laptop GPUs are not made for this kind of heavy load :-(

ID: 52930 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52931 - Posted: 4 Nov 2019, 5:25:32 UTC - in response to Message 52930.  

I would like to contribute some useful results here with my Alienware laptop

I am afraid that laptop GPUs are not made for this kind of heavy load :-(
ID: 52931 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 52932 - Posted: 4 Nov 2019, 7:06:53 UTC

My Dell G7 15 laptop is happily crunching. That is another matter that I have to send a blast of air every day to get the dust-out.
ID: 52932 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rod4x4

Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 52933 - Posted: 4 Nov 2019, 7:59:05 UTC - in response to Message 52928.  
Last modified: 4 Nov 2019, 8:02:22 UTC

Hi:

I see this is a well used section of the forum.

I would like to contribute some useful results here with my Alienware laptop but I have a high failure rate which I would like to resolve here. The GPU in my laptop is a Geoforce 660M. The OS I am using is uptodate Windows 10.

I would appreciate it if a tech person could narrow down the reason or reasons why I am experiencing such a high failure rate.

Clive Hunt
Canada


The issue is with the Scheduler on the GPUgrid servers. The Scheduler is sending CUDA65 tasks to your Laptop, all of which will fail due to an expired license. (Server end)
Your laptop can process CUDA80 tasks, but you are at the mercy of the Scheduler. For most Hosts it sends the correct tasks, and for a handful of Hosts, it is sending the wrong tasks.
This issue tends to affect Kepler GPUs (600 series GPU), even though they are still supported.
Some relevant posts discussing this issue are here:
http://www.gpugrid.net/forum_thread.php?id=5000&nowrap=true#52924
http://www.gpugrid.net/forum_thread.php?id=5000&nowrap=true#52920

The Project is in the middle of changing the Application to a newer version, hopefully when the new Application is released (ACEMD3), these issues will be smoothed out.
ID: 52933 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52934 - Posted: 4 Nov 2019, 8:34:40 UTC - in response to Message 52933.  

... when the new Application is released (ACEMD3)...

I am curious WHEN this will be the case
ID: 52934 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : failing tasks lately

©2025 Universitat Pompeu Fabra