Message boards :
News :
ATM
Message board moderation
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 35 · Next
Author | Message |
---|---|
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 678,713 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
This task PTP1B_23471_23468_2_2A-QUICO_TEST_ATM-0-1-RND8957_1 is currently doing the same on this host. Been at 100% complete now for at least an hour now. I know to just leave them alone and they will eventually finish and report as validated. |
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 52,725 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This task reached "100% complete" in about 7 hours, and then ran for an additional 7 hours +, before actually finishing. https://www.gpugrid.net/workunit.php?wuid=27442023 Anybody got that beat?????? |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Anybody got that beat?????? The task I reported in Message 60213 (14:55 yesterday) is still running. It was approaching 100% when I went to bed last night, and it's still there this morning. I'll go and check it out after coffee (I can't see the sample numbers remotely). As soon as I wrote that, it uploaded and reported! Ah well, my other Linux machine has got one in the same state. |
Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
None of my WUs from yesterday completed. Please issue a server abort and eliminate all these defective WUs before releasing a new set. Otherwise defects will keep wasting 8 computers time for days to come. _________________ Just woke up. The task was finished. Sent it home. task 27441741 |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
OK, it's the same story as yesterday. This task: PTP1B_23486_23479_4_2A-QUICO_TEST_ATM-0-1-RND5081_2 downloaded at 15:26:54 UTC yesterday, and started running at about 16:30 UTC. As before, the run.log shows a MAX_SAMPLES: 114, with timings that don't match my machine. The 16:30 run has MAX_SAMPLES: 341, and starts running with sample 115. The machine downloaded a new task at 3:50:47 UTC: that normally happens around 85 - 90% progress, with an hour to run - but the existing one is still only at sample 308, so maybe three hours to go. And it's another PTP1B_new_ resend, so we may have to go round the cycle again. |
Send message Joined: 28 Feb 23 Posts: 35 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
OK, it's the same story as yesterday. This task: I believe it's what I imagined. From the manual division I was doing before I was splitting some runs in 2/3 steps: 114 - 228 - 341 samples. If the job ID has a 2A/3A it's most probably that it's starting from a previous checkpoint and the progress report is going crazy with it. I'll pass this on to Raimondas to see if he can get a look at it. Our priority first is to be able to that these job divisions are done automatically like ACEMD does, that way we can avoid these really long jobs for everyone. Doing this manually makes it really hard to track all the jobs and the resends. So I hope that in the next days everything goes smoother. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks. Now I know what I'm looking for (and when), I was able to watch the next transition. Task PTP1B_new_20669_2qbr_23472_T1_2A-QUICO_TEST_ATM-0-1-RND5753_3 started with a couple of 0.1% initial steps (as usual), but then jumped to 50.983%. It then moved on by 0.441% every five minutes or so. The run.log shows the same figures as before: a pre-existing run of 114 samples, then the real work starts with sample 115, and should proceed to a max_sample of 341. The progress jumps match the completion of samples 115 - 120. The %age intervals match the formula in Emilio Gallicchio's post 60160 (115/(341-114)), but I can't see where the initial big value of 50.983 comes from. Also, I don't follow the logic of the resend explanation. Mine is replication _3, so there have been 3 previous attempts - but none of them got beyond the program setup stages: all failed in less than 100 seconds. So who did the first 114 samples? |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
The %age intervals match the formula in Emilio Gallicchio's post 60160 (115/(341-114)), but I can't see where the initial big value of 50.983 comes from. 115/(341-114) = 0.5066 = 50.66% strikingly close. maybe "BOINC logic" in some form of rounding. but it's pretty clear that the 50% value is coming from this calculation. ![]() |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I thought I'd checked that, and got a different answer, but my mouse must have slipped on the calculator buttons. The difference is probably the 0.2% program setup stages - it'll do. Thanks. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
After that, it failed after 3 hours 20 minutes with a 'ValueError: Energy is NaN' error. Never mind - I tried. |
Send message Joined: 18 Jul 13 Posts: 79 Credit: 210,528,292 RAC: 163 Level ![]() Scientific publications ![]() |
C:/Windows/system32/cmd.exe command creates c:\users\frolo\.exe\ folder. On subsequent runs it gives "A subdirectory or file .exe already exists." error. C:/Windows/system32/cmd.exe /c call test.bat outputs The syntax of the command is incorrect. C:\Windows\system32\cmd.exe /c call test.bat outputs 'test.bat' is not recognized as an internal or external command, operable program or batch file. |
Send message Joined: 28 Feb 23 Posts: 35 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
Thanks. Now I know what I'm looking for (and when), I was able to watch the next transition. The first 114 samples should be calculated by: T_PTP1B_new_20669_2qbr_23472_1A_3-QUICO_TEST_ATM-0-1-RND2542_0.tar.bz2 I've been doing all the division and resends manually and we've been simplifying the naming convention for my sake. Now we are testing a multiple_steps protocol just like in AceMD which should help ease things and I hope mess less with the progress reporter. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks. Be aware that out here in client-land we can only locate jobs by WU or task ID numbers - it's extremely difficult to find a task by name unless we can follow an ID chain. Newer versions of the BOINC website tools do provide a rudimentary 'search by name' facility, but it requires a full task name - no wildcards or partial matches. And I know your colleagues on this project are very wary about updating the server code. We'll just have to live with it. |
Send message Joined: 28 Feb 23 Posts: 35 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
Yeah I'm sorry about that. I'm trying to learn as I go. I'll be sending (and already sent) some runs through the ATMbeta app. We tested the multiple_steps code and it seems to work fine. That way if everything runs smoothly everything should get 70 sample runs(~13ns), which should be much shorter for everyone and avoid the drag of the +24h runs. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Two downloaded, the first has reached 6% with no problems. |
Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Yeah I'm sorry about that. I'm trying to learn as I go. ____________________ It is un-stable tasks, re-start problems, suspend problems. Quite a few of us have done year-plus runs on Climate. 24-hour runs are no problem. |
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 960 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
deleted |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 678,713 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
I believe I just finished one of these ATMbeta tasks. https://www.gpugrid.net/result.php?resultid=33393179 It never checkpointed but it did show correct estimations of time to finish plus the progress was correct and incremented correctly. |
![]() Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() |
I believe I just finished one of these ATMbeta tasks. Same for me with Linux. Since there's no checkpointing I didn't bother to test suspending. I think all windows WUs failed. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
My current two ATM betas both have MAX_SAMPLES: +70 - but one started at 71, and the other at 141. Both are displaying 100% progress. I watched one jump to 100% after about enough time to load the program and complete 1 sample: the other I would expect to finish within half an hour (it's on sample 205). Edit - yes, it did. I see you've put step information in the task names: these were PTP1B_20669_2qbr_23466_2-QUICO_ATM_OFF_STEPS-1-5-RND8189_0 PTP1B_23467_23475_4-QUICO_ATM_OFF_STEPS-2-5-RND5806_0 |
©2025 Universitat Pompeu Fabra