Message boards :
News :
ATM
Message board moderation
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 35 · Next
Author | Message |
---|---|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 869 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I, too, had such an error after the task had run for 7.885 seconds: this time, the task errored out after 16.400 seconds :-( https://www.gpugrid.net/result.php?resultid=33442242 |
![]() Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() |
It feels like there's at least four categories of ATMbeta WUs running simultaneously. None have checkpointing. Top Priority should be to make checkpointing work. Shotgun approach squanters a lot of compute time. |
Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
My Nation like many others has gone into a default situation. The most expensive item is the supply of electricity and they are frequently switching off the grid without informing us. David H, says the WUs are checkpointing. If they are checkpointing, then why are they not recovering? Well recovering or not, I cannot do a thing about the electric grid. So, best of luck to the WUs and as it is Ramadan, I have nothing left in the upper chamber to argue. Over and out. |
![]() Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() |
Still no checkpointing. Suspent then unsuspend = crash. Many WUs failing due to subprocess. |
Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
If there is a storm and electricity go, WU crashes. I know that Boincer's do not do a re-start for months on end but I have to do a re-start. WU crashes. If the GPU updates or the System updates, the WU crashes. If the cat plays with the keyboard, the WU crashes. I do not want catty remarks but will keep crashing them from now on. Who cares. |
Send message Joined: 26 Dec 13 Posts: 86 Credit: 1,292,358,731 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Who cares. No, it's not about who cares. This is about which of the project employees has the knowledge and resources to implement the necessary functionality, and which of them has the time for this. And as you should understand, they don't make decisions there on their own, it's not a hobby. The necessary specialists can now be involved in other, more priority projects for the institute, and neither we nor the employees themselves can influence this. Deal with it. Nothing will change from the number of tearful posts about the problem, no matter how much someone would like it. Unless, of course, the goal is once again just to let off steam somewhere because of indignation. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Task TYK2_m44_m55_5_FIX-QUICO_ATM_Sage_xTB-0-5-RND2847_0 (today): FileNotFoundError: [Errno 2] No such file or directory: 'TYK2_m44_m55_0.xml' Later - CDK2_miu_m26_4-QUICO_ATM_Sage_xTB-0-5-RND8419_0 running OK. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
And a similar batch configuration error with today's BACE run, like BACE_m24_m7e_5-QUICO_ATM_Sage_xTB-0-5-RND7993_0 08:05:32 (386384): wrapper: running bin/bash (run.sh) (five so far) Edit - now wasted 20 of the things, and switched to Python to avoid quota errors. I should have dropped in to give you a hand when passing through Barcelona at the weekend! |
Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
I cannot resource share ATMBeta with other projects because it is stopped to run other projects. Ends up with an error. |
Send message Joined: 26 Dec 13 Posts: 86 Credit: 1,292,358,731 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
And a similar batch configuration error with today's BACE run, like Same for Win apps: https://www.gpugrid.net/result.php?resultid=33475629 https://www.gpugrid.net/results.php?userid=101590 Sad : / |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
I cannot resource share ATMBeta with other projects because it is stopped to run other projects. Ends up with an error. set all other GPU projects to resource share of 0, then they wont run at all when you have ATM work. ![]() |
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 869 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
many of the recent ATMs errored out after not even a minute, stderr says: wrapper: running C:/Windows/system32/cmd.exe (/c call run.bat) Der Befehl "run.bat" ist entweder falsch geschrieben oder konnte nicht gefunden werden. in English: the command "run.but" is either misspelled our could not be found. What's up? |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Same equivalent type of error in Linux for a great many tasks. bin/bash: run.sh: No such file or directory BACE_m7g_m7c_3-QUICO_ATM_Sage_xTB-0-5-RND8127_3 |
Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Got a collection of twenty-one errored tasks. Suspended work fetch on that computer. The other is busy with Abous WU. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Now these are doing it as well: MCL1_m28_m47_1_FIX-QUICO_ATM_Sage_xTB-0-5-RND0954_0 18:09:56 (394275): wrapper: running bin/bash (run.sh) The experimenters and/or staff have got to get a grip on this - you are wasting everybody's time and electricity. BOINC is very unforgiving: you have to get it 100% exact, all at the same time, every time. It's worth you taking a pause after each new batch is prepared, and then going back and proof-reading the configuration. Five minutes spent checking would probably have meant getting some real research results over the weekend: now, nothing will probably work until Monday (and I'm not holding my breath then, either). |
Send message Joined: 13 Apr 15 Posts: 11 Credit: 3,003,712,606 RAC: 2,164,606 Level ![]() Scientific publications ![]() |
Now these are doing it as well: MCL1_m28_m47_1_FIX-QUICO_ATM_Sage_xTB-0-5-RND0954_0 Exactly! When you have more Tasks that Error (277) than Valid (240) ... that is pretty damn sad! |
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 869 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The experimenters and/or staff have got to get a grip on this - you are wasting everybody's time and electricity. + 1 |
Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Got a collection of twenty-one errored tasks. Suspended work fetch on that computer. The other is busy with Abous WU. ___________ Abous, WU finished and I got one ATMBeta. It lasted all of one minute and three seconds. Suspended work fetch on this computer also. Validated two ATMBeta, error twenty-two. |
Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Maybe someone can answer a question I have. After running ATMBeta, Einstein starts but it reports, GPU is missing. How does this happen? |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
atmbeta likely has nothing to do with it. but ATMbeta uses CUDA, Einstein uses OpenCL. does BOINC still report OpenCL support in the startup log? you might need to reinstall your drivers. ![]() |
©2025 Universitat Pompeu Fabra