failing tasks lately

Message boards : Number crunching : failing tasks lately
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52390 - Posted: 7 Aug 2019, 14:29:10 UTC - in response to Message 52386.  

any idea why all tasks downloaded within the last few hours fail immediately?

No idea, but it's the same for others.

yes, I had checked that before I wrote my posting above.

I wonder whether the GPUGRID team has realized this problem yet.
ID: 52390 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Killersocke

Send message
Joined: 18 Oct 13
Posts: 53
Credit: 406,647,419
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52392 - Posted: 7 Aug 2019, 16:22:35 UTC - in response to Message 52174.  

same here all WU's with the same Error Code

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -44 (0xffffffd4)</message>
]]>
ID: 52392 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52400 - Posted: 7 Aug 2019, 19:24:31 UTC

it seems that the licence for Windows 10 (and maybe for Windows 7/8, too) has expired.

Why do I think so? My Windows XP host downloaded a new tasks a few minutes ago, and it works well.
ID: 52400 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JStateson
Avatar

Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,578,903,157
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52405 - Posted: 7 Aug 2019, 19:52:18 UTC - in response to Message 52390.  

any idea why all tasks downloaded within the last few hours fail immediately?

No idea, but it's the same for others.

yes, I had checked that before I wrote my posting above.

I wonder whether the GPUGRID team has realized this problem yet.


Things left to themselves tend to go from bad to worse.
ID: 52405 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52407 - Posted: 7 Aug 2019, 22:13:07 UTC
Last modified: 7 Aug 2019, 22:14:49 UTC

Several more tasks with computation errors, but nothing definite about just what kind of error.

At least they didn't use much CPU or GPU time.

http://www.gpugrid.net/result.php?resultid=21242466

http://www.gpugrid.net/result.php?resultid=21242065

http://www.gpugrid.net/result.php?resultid=21241863

http://www.gpugrid.net/result.php?resultid=21233480

And so on.

Could more diagnostics be added to v9.22 (cuda80) to show what caused this error, if you can't fix it instead? This appears for both short and long runs.
ID: 52407 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Moises Cardona

Send message
Joined: 7 Jun 10
Posts: 3
Credit: 208,405,467
RAC: 0
Level
Leu
Scientific publications
watwatwatwat
Message 52410 - Posted: 7 Aug 2019, 23:48:39 UTC

Same here...
ID: 52410 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 57
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52411 - Posted: 8 Aug 2019, 0:24:44 UTC

I actually got one to finish successfully:

http://www.gpugrid.net/workunit.php?wuid=16709219


I changed the date to before the license expired, right after the WU started crunching and before it crashes, and then change it back. It's actually tricky to do, because boinc acts strangely when the date is moved back. My two other attempts failed, so I had enough of this.

BTW, the video card that I used was a gtx 980 ti, not the rtx 2080 ti.






ID: 52411 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52415 - Posted: 8 Aug 2019, 5:39:06 UTC - in response to Message 52411.  

I actually got one to finish successfully:

http://www.gpugrid.net/workunit.php?wuid=16709219

I changed the date to before the license expired, right after the WU started crunching and before it crashes, and then change it back. It's actually tricky to do, because boinc acts strangely when the date is moved back.

so it's clear that the license has expired.

Changing the date of the host can indeed be tricky, even more if also other BOINC projects are running which could be totally confused by doing this. Happened to me last time when the license expired, it all ended up in a total mess.

Let's hope that it won't take too long until there is a new acemd with a valid license.
ID: 52415 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jul 16
Posts: 338
Credit: 7,987,341,558
RAC: 213
Level
Tyr
Scientific publications
watwatwatwatwat
Message 52416 - Posted: 8 Aug 2019, 11:58:10 UTC - in response to Message 52415.  

I actually got one to finish successfully:

http://www.gpugrid.net/workunit.php?wuid=16709219

I changed the date to before the license expired, right after the WU started crunching and before it crashes, and then change it back. It's actually tricky to do, because boinc acts strangely when the date is moved back.

so it's clear that the license has expired.

Changing the date of the host can indeed be tricky, even more if also other BOINC projects are running which could be totally confused by doing this. Happened to me last time when the license expired, it all ended up in a total mess.

Let's hope that it won't take too long until there is a new acemd with a valid license.


I thought one of the reasons for the new app was to not need the license that keeps expiring. Plus Turing support in a BOINC wrapper to separate the science part from the BOINC part.
ID: 52416 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PappaLitto

Send message
Joined: 21 Mar 16
Posts: 513
Credit: 4,673,458,277
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 52418 - Posted: 8 Aug 2019, 12:25:59 UTC

They are not using the new app yet, the reason the app expired is because it's still the old app.
ID: 52418 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jul 16
Posts: 338
Credit: 7,987,341,558
RAC: 213
Level
Tyr
Scientific publications
watwatwatwatwat
Message 52419 - Posted: 8 Aug 2019, 12:43:39 UTC - in response to Message 52418.  

They are not using the new app yet, the reason the app expired is because it's still the old app.


And?

I was replying to this part
"new acemd with a valid license."

The new app won't need a license from what I recall.
ID: 52419 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52432 - Posted: 9 Aug 2019, 12:23:31 UTC
Last modified: 9 Aug 2019, 12:25:13 UTC

I've seen some mentions of tasks still completing properly on some rather old versions of Windows, such as Windows XP. Could some people with at least one computer with such a version give more details?

Perhaps the older versions don't include an expiration check, and therefore have to assume that it is not expired.
ID: 52432 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52433 - Posted: 9 Aug 2019, 12:59:21 UTC

the "older versions" also include an expiration check.

However, for XP, a differnt acemd.exe is used (running with CUDA 65), the license for which seems to expire at a later date. No idea at what date exactly, it could be tomorrow, or in a week, or next month ...
ID: 52433 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GPUGRID

Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 52436 - Posted: 9 Aug 2019, 18:40:01 UTC - in response to Message 52433.  

I´m using Win XP 64 and havind just errors aswell.
ID: 52436 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52437 - Posted: 9 Aug 2019, 19:32:25 UTC - in response to Message 52436.  

No, you are using Windows 7 x64.
ID: 52437 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 52472 - Posted: 12 Aug 2019, 11:20:32 UTC

Stderr output
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -44 (0xffffffd4)</message>
]]>

name e18s22_e7s95p0f111-PABLO_V4_UCB_p27_sj403_no_salt_IDP-0-2-RND0646
application Long runs (8-12 hours on fastest card)
created 8 Aug 2019 | 21:02:41 UTC
minimum quorum 1
initial replication 1
max # of error/total/success tasks 7, 10, 6
errors Too many errors (may have bug)

100% failure rate for the last three days.
ID: 52472 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
marsinph

Send message
Joined: 11 Feb 18
Posts: 41
Credit: 579,891,424
RAC: 0
Level
Lys
Scientific publications
wat
Message 52474 - Posted: 12 Aug 2019, 11:34:31 UTC

Hello everyone,
Please read the post in "news" about "expired licence".
It is not at our side, but at server side.

Admin know it already two days.
ID: 52474 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GPUGRID

Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 52500 - Posted: 13 Aug 2019, 16:09:35 UTC - in response to Message 52437.  
Last modified: 13 Aug 2019, 16:11:10 UTC

No, you are using Windows 7 x64.

You are right, my bad. But I was having errors with the new drivers. Then I rolled back to 378.94 driver and it´s running fine now.

http://www.gpugrid.net/show_host_detail.php?hostid=413063

http://www.gpugrid.net/workunit.php?wuid=16717273
ID: 52500 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey

Send message
Joined: 2 Jan 09
Posts: 303
Credit: 7,321,800,090
RAC: 270
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52503 - Posted: 13 Aug 2019, 19:49:08 UTC - in response to Message 52474.  
Last modified: 13 Aug 2019, 19:51:14 UTC

Hello everyone,
Please read the post in "news" about "expired licence".
It is not at our side, but at server side.

Admin know it already two days.


That's fixed now. But the errors continue, 2 seconds into a Pablo unit and poof they error out. I turned off the long run units and it seems there aren't any short run units to do for the gpu's.
ID: 52503 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rod4x4

Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 52504 - Posted: 13 Aug 2019, 23:42:12 UTC - in response to Message 52503.  
Last modified: 14 Aug 2019, 0:27:35 UTC

But the errors continue, 2 seconds into a Pablo unit and poof they error out

mikey, the tasks with errors were run on a Turing based card (GTX1660ti). These GPUs are not currently supported by the ACEMD2 app.
Admins are working on ACEMD3 app which will support Turing based GPUs. Hopefully this will be released soon.
There is currently no short tasks in the queue.
ID: 52504 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : failing tasks lately

©2025 Universitat Pompeu Fabra