Message boards :
Number crunching :
new ADRIA_KIXcMyb_HIP_bandit workunits
Message board moderation
Previous · 1 · 2
| Author | Message |
|---|---|
ServicEnginICSend message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 1,447 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I guess that something is going on for a definitive solution to lately problems. I had 5 ADRIA tasks left in process. They all were canceled by server between 15:53 and 16:36 UTC on 18/06/2021. 149,56 wasted processing hours altogether. I had applied <max_nbytes> parameter workaround to every of them, but this is something that the server couldn't know... |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
I applied the fix to all my personal systems also, and you're right, the project couldnt know who did or didn't apply the fix manually. I'm sure there were many more people that didnt fix and would have errored out. While it's unfortunate that some processing time was wasted, I think they made the right call to just cancel them all.
|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Cancelled tasks here also. I had edited them to survive. Moot now but at least some of them hadn't started yet so no crunching time lost. |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 1,447 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
New tasks have just started to flow. I've replenished my buffer: 7 tasks , one for each of my working GPUs. New New: ... |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
New tasks have just started to flow. can confirm. hopefully no more issues with these ones :) thanks admins!
|
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
seeing nearly all of these tasks instant failing the past 2 days. not sure what's up. there is no verbose error message associated with them. just...fail. http://www.gpugrid.net/result.php?resultid=32628049 <message> not sure what's happening with these, but I've checked several other computers and it looks like everyone is having the same problem.
|
ServicEnginICSend message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 1,447 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
seeing nearly all of these tasks instant failing the past 2 days... Same behavior for mine ones. And their re-sends to other hosts. |
|
Send message Joined: 2 Jul 16 Posts: 338 Credit: 7,987,341,558 RAC: 259 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I thought it was me as I got one and the resend hadn't error'd out yet. Good to know. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
They're trying again. I got e3s197_e1s419p0f951-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND9332_0 - newly created this morning - but sadly it failed after 3 seconds. stderr says 'EXIT_CHILD_FAILED', but no hint of what the actual failure was. We couldn't have a licensing problem again, could we? I've preserved the job specification, but it's unlikely to reveal anything - I'll check it. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
i usually see a few _0's roll through every day. all instant fail for the past several days. I sent a message to Toni, but did see any resolution yet.
|
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I couldn't get anything out of the captured work spec - the wrapper rather gets in the way of decoding it. Next time, I'll try capturing the files as well, and running them in terminal - that worked for WCG. |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 1,447 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Every of 7 tasks received at my Linux hosts today, all of them starting with "e4s...", have failed after a few seconds past. Every resends have continued to fail at other varied Linux hosts... until they have been catched by some Windows non-Ampere-GPU hosts. May be there is some kind of issue regarding Linux application / license (?) |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
I don’t think it’s a licensing issue. The apps were recreated with an updated license last October, about 9 months ago. Both Windows and Linux were created at the same time, presumably with the same licensing period.
|
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
but you're right that it does appear to be a problem with Linux computers. maybe some parameter isnt set right for the Linux app? i see some successful runs on Windows machines.
|
ServicEnginICSend message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 1,447 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
May be there is some kind of issue regarding Linux application / license (?) To verify my assumption, I've entered the Windows 10 partition at this Linux / Windows dual boot system. Every my last ADRIA_New_KIXcMyb_HIP_AdaptiveBandit tasks received at the Linux host since June 22nd had failed after three to four seconds of execution. I've received this e3s203_e1s419p0f906-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND8149_1 task at the Windows host, and it has been running for more than one hour by now. The computer is the same, the only difference is in Operating System entered. I think I'll let this task to run to its completion, to check whether it succeeds. About 44 more hours left, estimated for this GTX 1650 SUPER GPU... |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 1,447 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've received this e3s203_e1s419p0f906-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND8149_1 task at the Windows host, and it has been running for more than one hour by now. Finally, the mentioned task finished today successfully at my Windows 10 host, after 157.184,28 seconds of total processing time. This task eventually survived an unwanted system reboot, due to delayed Windows updates after a long time of working on its Linux side. It even got mid bonus for result returned in less than 48 hours. I continue to think that some action should be taken on Server side to correct a problem affecting tasks generated for Linux environment. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
I sent another message to Toni about the issue of Linux tasks. Hopefully a quick resolution.
|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I sent another message to Toni about the issue of Linux tasks. Hopefully a quick resolution. I did too. |
©2025 Universitat Pompeu Fabra