Unsent tasks decreasing much more slowly

Message boards : Number crunching : Unsent tasks decreasing much more slowly
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54652 - Posted: 11 May 2020, 9:41:47 UTC - in response to Message 54650.  

What is the ghost recovery procedure on this project?
I've tried the way it works for SETI, but it didn't work here.
Luckily GPUGrid has a much shorter deadline than SETI, so it's not a big problem.
ID: 54652 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54653 - Posted: 11 May 2020, 9:46:56 UTC - in response to Message 54642.  

the present supply (171,016) will last for about 4 days from now.
The rate of decline seems to stabilize around 30/minute, so the supply will last for about 3 days from now.
(exactly 2 days 22 hours 39 minutes and 42.9 seconds)
ID: 54653 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 593
Credit: 12,146,936,510
RAC: 4,406,248
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54654 - Posted: 11 May 2020, 9:52:57 UTC - in response to Message 54651.  
Last modified: 11 May 2020, 9:54:26 UTC

Ghost tasks are on GPUGRID's server side.
After 5 days deadline is past, server will automatically clear ghost tasks on original host, and resend to another one.

[Clarification]

We call "Ghost task" to that the server counts as sent to a Host, but for any reason, it was not really received.
It doesn't interfere at the host side, as BOINC Manager will not see these ghost tasks, and it will continue asking for new tasks until tasks buffer is full, or maximum "2 tasks per GPU" is achieved.
On the server's side, ghost tasks are wrongly being counted as "In process" tasks, while really they are not.
ID: 54654 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 593
Credit: 12,146,936,510
RAC: 4,406,248
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54656 - Posted: 11 May 2020, 9:59:54 UTC - in response to Message 54653.  

The rate of decline seems to stabilize around 30/minute, so the supply will last for about 3 days from now.
(exactly 2 days 22 hours 39 minutes and 42.9 seconds)

What is coming next, is a mystery...
ID: 54656 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1423
Credit: 9,188,446,190
RAC: 1,336,521
Level
Tyr
Scientific publications
watwatwatwatwat
Message 54660 - Posted: 11 May 2020, 16:13:33 UTC - in response to Message 54652.  

What is the ghost recovery procedure on this project?
I've tried the way it works for SETI, but it didn't work here.
Luckily GPUGrid has a much shorter deadline than SETI, so it's not a big problem.

Thanks Zoltan, I tried my Seti ghost recovery protocol and it didn't work either.
I managed to pick up 10 ghosts and wanted to clear them.
Good thing the deadline here is so short compared to Seti.
ID: 54660 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 54662 - Posted: 11 May 2020, 17:20:09 UTC

These ghost tasks seem to occur after the server runs out of disk space. Are they somehow related to that? 🤔
_____________________________

An unrelated item: Anybody else getting this error?

(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
01:29:38 (6776): wrapper (7.9.26016): starting
01:29:38 (6776): wrapper: running acemd3.exe (--boinc input --device 0)
EXCEPTIONAL CONDITION: src\mdio\bincoord.c, line 193: "nelems != 1"
01:29:40 (6776): acemd3.exe exited; CPU time 0.015625
01:29:40 (6776): app exit status:


It apparently signals that the WU is bad- when you track them. After getting 6 of them I'm curious what the bug might be. Bad code?
ID: 54662 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,876,970,595
RAC: 347,555
Level
Trp
Scientific publications
wat
Message 54663 - Posted: 11 May 2020, 17:21:14 UTC - in response to Message 54662.  

yes, I saw a bunch of bad WUs. checking the resends, they are all erroring out also on different hosts.
ID: 54663 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1423
Credit: 9,188,446,190
RAC: 1,336,521
Level
Tyr
Scientific publications
watwatwatwatwat
Message 54664 - Posted: 11 May 2020, 18:50:58 UTC

Looks like a lot of tasks lost their file references on the storage. Can't pull the correct data for the tasks.

<core_client_version>7.17.0</core_client_version>
<![CDATA[
<message>
ERROR: /home/user/conda/conda-bld/acemd3_1570536635323/work/src/mdsim/trajectory.cpp line 135: Simulation box has to be rectangular!
07:01:16 (1119448): acemd3 exited; CPU time 0.557061
07:01:16 (1119448): app exit status: 0x9e
07:01:16 (1119448): called boinc_finish(195)

ID: 54664 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54665 - Posted: 11 May 2020, 19:09:47 UTC - in response to Message 54664.  

I'm interpreting that message as "file is present, but contains bad contents".

On another aspect of the 'error task' problem. I'm using a very ancient predecessor of BoincTasks. It (and I think BoincTasks itself), retains the concept of "CPU efficiency", which was withdrawn from BOINC Manager several years ago.

What I'm seeing for Windows tasks is that the ACEMD worker app crashes seconds after launch, but the Wrapper app doesn't notice for some time - the task as a whole is seen by BOINC as continuing to run. This shows up as a CPU efficiency of 0.0000 (helpfully colour coded) - no CPU time is being measured for the task as a whole, instead of the usual 96% - 97%.

That low efficiency warning prompts me to look at the workunit on the website, and see if there are any previous failures (the replication number is a good hint, as well). If it's a bad workunit, I can abort and move on with less wasted time overall.

It's a technique which some users might find helpful.
ID: 54665 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 0
Level
Trp
Scientific publications
watwatwat
Message 54666 - Posted: 11 May 2020, 19:19:39 UTC - in response to Message 54656.  

The rate of decline seems to stabilize around 30/minute, so the supply will last for about 3 days from now.
(exactly 2 days 22 hours 39 minutes and 42.9 seconds)

What is coming next, is a mystery...

And what we're finishing now is a complete and utter mystery as well.
ID: 54666 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 54674 - Posted: 12 May 2020, 18:32:49 UTC - in response to Message 54666.  
Last modified: 12 May 2020, 18:34:20 UTC

And what we're finishing now is a complete and utter mystery as well


I've only been able to glean that it is a vigorous attempt at mapping the simulation environment which is meant to improve (or simplify?) future modeling methods.

If one of the admins would want to comment, we're all ears...
👂👂👂👂👂đŸĻģ👂😉
ID: 54674 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 593
Credit: 12,146,936,510
RAC: 4,406,248
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54675 - Posted: 12 May 2020, 19:05:46 UTC

New version of ACEMD: 73,631 Unsent tasks left

âŗī¸
ID: 54675 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54686 - Posted: 14 May 2020, 9:49:10 UTC - in response to Message 54653.  

the present supply (171,016) will last for about 4 days from now.
The rate of decline seems to stabilize around 30/minute, so the supply will last for about 3 days from now.
(exactly 2 days 22 hours 39 minutes and 42.9 seconds)
3 days passed, there are 11.806 workunits left, this supply will last for another 6~7 hours.
ID: 54686 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 0
Level
Trp
Scientific publications
watwatwat
Message 54690 - Posted: 14 May 2020, 18:28:49 UTC

They're all gone, so what now?
ID: 54690 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ben

Send message
Joined: 28 Dec 14
Posts: 9
Credit: 149,574,556
RAC: 0
Level
Cys
Scientific publications
watwatwat
Message 54691 - Posted: 14 May 2020, 18:47:42 UTC - in response to Message 54690.  
Last modified: 14 May 2020, 18:50:50 UTC

Our poor GPUs start getting hangry!! :)

And I was pushing so hard for the magic 100m milestone. :(
ID: 54691 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 593
Credit: 12,146,936,510
RAC: 4,406,248
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54693 - Posted: 14 May 2020, 20:07:15 UTC - in response to Message 54691.  

They're all gone, so what now?

I liked this expresion:

...is a complete and utter mystery...

Familiar?
(I took note for such a moment like this)

Now that unsent tasks have reached and stuck on zero, the topic of this thread recovers full sense: Unsent tasks decreasing much more slowly
(Unless negative values are permitted, who knows?)
ID: 54693 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54696 - Posted: 14 May 2020, 21:19:32 UTC - in response to Message 54690.  
Last modified: 14 May 2020, 21:23:00 UTC

They're all gone, so what now?
It will take at least 5-10 days (or more) until all the workunits out in the field are finished (or timed out, and finished on another host).
I don't expect that another batch will be queued until then.
Exam period is coming, then the summer break is coming, so perhaps there won't be much work queued soon.
Unless Toni prepared some COVID-19 related work. Or perhaps we could help out the Acellera drug design people doing their job.
ID: 54696 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1168
Credit: 12,311,898,501
RAC: 271,810
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 54698 - Posted: 15 May 2020, 5:38:07 UTC

the difference between the tasks of the current series in contrast to all the others before is:
whereas, before, tasks still could be downloaded once in a while, as long as there were enough tasks "in process", here this seems not to be the case.
Once the "unsent" queue is dry, no more tasks can be downloaded.
ID: 54698 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1423
Credit: 9,188,446,190
RAC: 1,336,521
Level
Tyr
Scientific publications
watwatwatwatwat
Message 54699 - Posted: 15 May 2020, 6:56:25 UTC

I picked up 4 resends after the RTS buffer had hit zero today.
ID: 54699 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54700 - Posted: 15 May 2020, 7:16:57 UTC - in response to Message 54699.  

I picked up 4 resends after the RTS buffer had hit zero today.

Were they from the 'instant crashing' batch? I've had a few of those recently, though I haven't checked to see if I got any while I was asleep.
ID: 54700 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Unsent tasks decreasing much more slowly

©2026 Universitat Pompeu Fabra