new ADRIA_KIXcMyb_HIP_bandit workunits

Message boards : Number crunching : new ADRIA_KIXcMyb_HIP_bandit workunits
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,447
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56991 - Posted: 18 Jun 2021, 17:48:41 UTC

I guess that something is going on for a definitive solution to lately problems.
I had 5 ADRIA tasks left in process.
They all were canceled by server between 15:53 and 16:36 UTC on 18/06/2021.
149,56 wasted processing hours altogether.
I had applied <max_nbytes> parameter workaround to every of them, but this is something that the server couldn't know...
ID: 56991 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 56992 - Posted: 18 Jun 2021, 18:16:11 UTC - in response to Message 56991.  

I applied the fix to all my personal systems also, and you're right, the project couldnt know who did or didn't apply the fix manually. I'm sure there were many more people that didnt fix and would have errored out. While it's unfortunate that some processing time was wasted, I think they made the right call to just cancel them all.
ID: 56992 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56993 - Posted: 18 Jun 2021, 20:16:21 UTC

Cancelled tasks here also. I had edited them to survive. Moot now but at least some of them hadn't started yet so no crunching time lost.
ID: 56993 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,447
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56997 - Posted: 21 Jun 2021, 22:29:49 UTC
Last modified: 21 Jun 2021, 22:32:42 UTC

New tasks have just started to flow.
I've replenished my buffer: 7 tasks , one for each of my working GPUs.
New New:
...
<name>e3s358_e1s419p0f996-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-0-2-RND5689_0_9</name>
<nbytes>0.000000</nbytes>
<max_nbytes>1024000000.000000</max_nbytes>
...
ID: 56997 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 56998 - Posted: 22 Jun 2021, 0:16:00 UTC - in response to Message 56997.  

New tasks have just started to flow.
I've replenished my buffer: 7 tasks , one for each of my working GPUs.
New New:
...
<name>e3s358_e1s419p0f996-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-0-2-RND5689_0_9</name>
<nbytes>0.000000</nbytes>
<max_nbytes>1024000000.000000</max_nbytes>
...


can confirm. hopefully no more issues with these ones :)

thanks admins!
ID: 56998 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 57015 - Posted: 24 Jun 2021, 14:06:23 UTC

seeing nearly all of these tasks instant failing the past 2 days. not sure what's up. there is no verbose error message associated with them. just...fail.

http://www.gpugrid.net/result.php?resultid=32628049
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
05:41:09 (894336): wrapper (7.7.26016): starting
05:41:09 (894336): wrapper (7.7.26016): starting
05:41:09 (894336): wrapper: running acemd3 (--boinc input --device 3)
05:41:10 (894336): acemd3 exited; CPU time 0.002381
05:41:10 (894336): app exit status: 0x1
05:41:10 (894336): called boinc_finish(195)


not sure what's happening with these, but I've checked several other computers and it looks like everyone is having the same problem.

ID: 57015 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,447
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57016 - Posted: 24 Jun 2021, 16:28:15 UTC - in response to Message 57015.  

seeing nearly all of these tasks instant failing the past 2 days...

Same behavior for mine ones. And their re-sends to other hosts.
ID: 57016 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jul 16
Posts: 338
Credit: 7,987,341,558
RAC: 259
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57017 - Posted: 25 Jun 2021, 0:02:47 UTC

I thought it was me as I got one and the resend hadn't error'd out yet. Good to know.
ID: 57017 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 428
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57021 - Posted: 28 Jun 2021, 9:50:51 UTC

They're trying again. I got e3s197_e1s419p0f951-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND9332_0 - newly created this morning - but sadly it failed after 3 seconds.

stderr says 'EXIT_CHILD_FAILED', but no hint of what the actual failure was. We couldn't have a licensing problem again, could we? I've preserved the job specification, but it's unlikely to reveal anything - I'll check it.
ID: 57021 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 57024 - Posted: 28 Jun 2021, 13:33:52 UTC - in response to Message 57021.  

i usually see a few _0's roll through every day. all instant fail for the past several days. I sent a message to Toni, but did see any resolution yet.
ID: 57024 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 428
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57026 - Posted: 28 Jun 2021, 14:00:26 UTC - in response to Message 57024.  

I couldn't get anything out of the captured work spec - the wrapper rather gets in the way of decoding it. Next time, I'll try capturing the files as well, and running them in terminal - that worked for WCG.
ID: 57026 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,447
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57027 - Posted: 28 Jun 2021, 16:43:36 UTC

Every of 7 tasks received at my Linux hosts today, all of them starting with "e4s...", have failed after a few seconds past.
Every resends have continued to fail at other varied Linux hosts... until they have been catched by some Windows non-Ampere-GPU hosts.
May be there is some kind of issue regarding Linux application / license (?)
ID: 57027 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 57028 - Posted: 28 Jun 2021, 16:57:07 UTC

I don’t think it’s a licensing issue. The apps were recreated with an updated license last October, about 9 months ago. Both Windows and Linux were created at the same time, presumably with the same licensing period.
ID: 57028 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 57029 - Posted: 28 Jun 2021, 18:52:35 UTC - in response to Message 57028.  

but you're right that it does appear to be a problem with Linux computers. maybe some parameter isnt set right for the Linux app? i see some successful runs on Windows machines.
ID: 57029 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,447
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57030 - Posted: 28 Jun 2021, 22:20:38 UTC

May be there is some kind of issue regarding Linux application / license (?)

To verify my assumption, I've entered the Windows 10 partition at this Linux / Windows dual boot system.
Every my last ADRIA_New_KIXcMyb_HIP_AdaptiveBandit tasks received at the Linux host since June 22nd had failed after three to four seconds of execution.
I've received this e3s203_e1s419p0f906-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND8149_1 task at the Windows host, and it has been running for more than one hour by now.
The computer is the same, the only difference is in Operating System entered.
I think I'll let this task to run to its completion, to check whether it succeeds.
About 44 more hours left, estimated for this GTX 1650 SUPER GPU...
ID: 57030 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,447
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57039 - Posted: 30 Jun 2021, 19:56:53 UTC - in response to Message 57030.  

I've received this e3s203_e1s419p0f906-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND8149_1 task at the Windows host, and it has been running for more than one hour by now.
The computer is the same, the only difference is in Operating System entered.
I think I'll let this task to run to its completion, to check whether it succeeds.
About 44 more hours left, estimated for this GTX 1650 SUPER GPU...

Finally, the mentioned task finished today successfully at my Windows 10 host, after 157.184,28 seconds of total processing time.
This task eventually survived an unwanted system reboot, due to delayed Windows updates after a long time of working on its Linux side.
It even got mid bonus for result returned in less than 48 hours.
I continue to think that some action should be taken on Server side to correct a problem affecting tasks generated for Linux environment.
ID: 57039 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 57040 - Posted: 1 Jul 2021, 17:36:26 UTC - in response to Message 57039.  

I sent another message to Toni about the issue of Linux tasks. Hopefully a quick resolution.
ID: 57040 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57044 - Posted: 1 Jul 2021, 19:26:41 UTC - in response to Message 57040.  

I sent another message to Toni about the issue of Linux tasks. Hopefully a quick resolution.

I did too.
ID: 57044 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : new ADRIA_KIXcMyb_HIP_bandit workunits

©2025 Universitat Pompeu Fabra