Message boards :
Number crunching :
Unsent tasks decreasing much more slowly
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
| Author | Message |
|---|---|
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
What is the ghost recovery procedure on this project?I've tried the way it works for SETI, but it didn't work here. Luckily GPUGrid has a much shorter deadline than SETI, so it's not a big problem. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
the present supply (171,016) will last for about 4 days from now.The rate of decline seems to stabilize around 30/minute, so the supply will last for about 3 days from now. (exactly 2 days 22 hours 39 minutes and 42.9 seconds) |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 593 Credit: 12,146,936,510 RAC: 4,406,248 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Ghost tasks are on GPUGRID's server side. [Clarification] We call "Ghost task" to that the server counts as sent to a Host, but for any reason, it was not really received. It doesn't interfere at the host side, as BOINC Manager will not see these ghost tasks, and it will continue asking for new tasks until tasks buffer is full, or maximum "2 tasks per GPU" is achieved. On the server's side, ghost tasks are wrongly being counted as "In process" tasks, while really they are not. |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 593 Credit: 12,146,936,510 RAC: 4,406,248 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The rate of decline seems to stabilize around 30/minute, so the supply will last for about 3 days from now. What is coming next, is a mystery... |
|
Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,188,446,190 RAC: 1,336,521 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
What is the ghost recovery procedure on this project?I've tried the way it works for SETI, but it didn't work here. Thanks Zoltan, I tried my Seti ghost recovery protocol and it didn't work either. I managed to pick up 10 ghosts and wanted to clear them. Good thing the deadline here is so short compared to Seti. |
|
Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level ![]() Scientific publications ![]()
|
These ghost tasks seem to occur after the server runs out of disk space. Are they somehow related to that? đ¤ _____________________________ An unrelated item: Anybody else getting this error? (unknown error) - exit code 195 (0xc3)</message> It apparently signals that the WU is bad- when you track them. After getting 6 of them I'm curious what the bug might be. Bad code? |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 347,555 Level ![]() Scientific publications
|
yes, I saw a bunch of bad WUs. checking the resends, they are all erroring out also on different hosts.
|
|
Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,188,446,190 RAC: 1,336,521 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Looks like a lot of tasks lost their file references on the storage. Can't pull the correct data for the tasks. <core_client_version>7.17.0</core_client_version> <![CDATA[ <message> ERROR: /home/user/conda/conda-bld/acemd3_1570536635323/work/src/mdsim/trajectory.cpp line 135: Simulation box has to be rectangular! 07:01:16 (1119448): acemd3 exited; CPU time 0.557061 07:01:16 (1119448): app exit status: 0x9e 07:01:16 (1119448): called boinc_finish(195) |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'm interpreting that message as "file is present, but contains bad contents". On another aspect of the 'error task' problem. I'm using a very ancient predecessor of BoincTasks. It (and I think BoincTasks itself), retains the concept of "CPU efficiency", which was withdrawn from BOINC Manager several years ago. What I'm seeing for Windows tasks is that the ACEMD worker app crashes seconds after launch, but the Wrapper app doesn't notice for some time - the task as a whole is seen by BOINC as continuing to run. This shows up as a CPU efficiency of 0.0000 (helpfully colour coded) - no CPU time is being measured for the task as a whole, instead of the usual 96% - 97%. That low efficiency warning prompts me to look at the workunit on the website, and see if there are any previous failures (the replication number is a good hint, as well). If it's a bad workunit, I can abort and move on with less wasted time overall. It's a technique which some users might find helpful. |
|
Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level ![]() Scientific publications ![]() ![]()
|
The rate of decline seems to stabilize around 30/minute, so the supply will last for about 3 days from now. And what we're finishing now is a complete and utter mystery as well. |
|
Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level ![]() Scientific publications ![]()
|
And what we're finishing now is a complete and utter mystery as well I've only been able to glean that it is a vigorous attempt at mapping the simulation environment which is meant to improve (or simplify?) future modeling methods. If one of the admins would want to comment, we're all ears... đđđđđđĻģđđ |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 593 Credit: 12,146,936,510 RAC: 4,406,248 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
New version of ACEMD: 73,631 Unsent tasks left âŗī¸ |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
3 days passed, there are 11.806 workunits left, this supply will last for another 6~7 hours.the present supply (171,016) will last for about 4 days from now.The rate of decline seems to stabilize around 30/minute, so the supply will last for about 3 days from now. |
|
Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level ![]() Scientific publications ![]() ![]()
|
They're all gone, so what now? |
|
Send message Joined: 28 Dec 14 Posts: 9 Credit: 149,574,556 RAC: 0 Level ![]() Scientific publications ![]() ![]()
|
Our poor GPUs start getting hangry!! :) And I was pushing so hard for the magic 100m milestone. :( |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 593 Credit: 12,146,936,510 RAC: 4,406,248 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
They're all gone, so what now? I liked this expresion: ...is a complete and utter mystery... Familiar? (I took note for such a moment like this) Now that unsent tasks have reached and stuck on zero, the topic of this thread recovers full sense: Unsent tasks decreasing much more slowly (Unless negative values are permitted, who knows?) |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
They're all gone, so what now?It will take at least 5-10 days (or more) until all the workunits out in the field are finished (or timed out, and finished on another host). I don't expect that another batch will be queued until then. Exam period is coming, then the summer break is coming, so perhaps there won't be much work queued soon. Unless Toni prepared some COVID-19 related work. Or perhaps we could help out the Acellera drug design people doing their job. |
|
Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,311,898,501 RAC: 271,810 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
the difference between the tasks of the current series in contrast to all the others before is: whereas, before, tasks still could be downloaded once in a while, as long as there were enough tasks "in process", here this seems not to be the case. Once the "unsent" queue is dry, no more tasks can be downloaded. |
|
Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,188,446,190 RAC: 1,336,521 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I picked up 4 resends after the RTS buffer had hit zero today. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I picked up 4 resends after the RTS buffer had hit zero today. Were they from the 'instant crashing' batch? I've had a few of those recently, though I haven't checked to see if I got any while I was asleep. |
©2026 Universitat Pompeu Fabra