Message boards :
News :
ATM
Message board moderation
Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 35 · Next
| Author | Message |
|---|---|
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 4,772 Level ![]() Scientific publications
|
Sorry for missing out for a while. We were testing ATM in a setup not available for GPUGRID. But we're back to crunching :) several long standing and well discussed issues are still unresolved with these tasks. in reducing priority: 1. task checkpointing still does not work properly. it may be writing to the checkpoint file, but it does not ever resume from the checkpoint. any pausing or suspending work units for any reason will cause it to error out when it attempts to resume. this is an issue for anyone who runs multiple projects (BOINC will occasionally pause in-progress units to crunch other projects) or needs to shutdown their computer for updates or whatever. 2. runtime progress reporting ONLY works for the first batch "0-5" labelled tasks. anything "1-5" though "4-5" do not work properly, they jump immediately to 100% and stay there until it is complete. this makes it hard to know how long they will run 3. estimated flops setting on these tasks is probably way too high leading to crazy high runtime estimates. this could likely cause indirect issues with the BOINC client either not fetching work properly or not managing other projects properly. 4. many batches are being sent out malformed occasionally. leading to errors. seems most are due to incorrect formatting or naming. stuff like this: "+ tar cjvf restart.tar.bz2 'r*/*.xml' tar: r*/*.xml: Cannot stat: No such file or directory" these are things I've seen constant complaints about every time these tasks come back. I would highly recommend that you guys attach a computer to the project like a normal user so that you can experience them first hand and properly troubleshoot them.
|
|
Send message Joined: 13 Apr 15 Posts: 11 Credit: 3,003,712,606 RAC: 2,331 Level ![]() Scientific publications
|
Yes, the constant WUs throwing errors! Luckily most at the beginning, but some run for quite some time before erroring out. More Errors than Valid is a huge waste of resources and time for everyone. Ian&Steve says it well! |
|
Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level ![]() Scientific publications ![]()
|
Sorry for missing out for a while. We were testing ATM in a setup not available for GPUGRID. But we're back to crunching :) ________ Could you please make these tasks be able to suspend? Monsoons in my part of the World and every time it rains there is a power outage. Even though in Preferences I have set it to keep WU in Memory while on batteries, every time the power goes WU ends up with an error. Now 100% of the WUs at my end in error are due to this reason. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Yes, the constant WUs throwing errors! Luckily most at the beginning, but some run for quite some time before erroring out. More Errors than Valid is a huge waste of resources and time for everyone. mentioning the "waste of resources": "ValueError: Energy is NaN." has happened again quite a lot in the recent past. Mostly after between 1-1/2 and 2 hours runtime. Given that electricity cost has trippled here since last year, such waste has become quite expensive :-( |
|
Send message Joined: 13 Apr 15 Posts: 11 Credit: 3,003,712,606 RAC: 2,331 Level ![]() Scientific publications
|
Another new batch...same old Errors. Does Krembill have anything to do with this project lololol. |
|
Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level ![]() Scientific publications ![]()
|
Valid 17, error 26. I know it makes no difference; plenty of computers are standing by and it will get done. |
|
Send message Joined: 11 May 10 Posts: 68 Credit: 12,293,501,875 RAC: 3,114 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
AFAIK it should run anywhere, maybe the issue is more driver related? We recently tested on 40 series GPUs locally and it run fine, since I saw some comments in the thread. My driver for the RTX 4080 under Win11 is 536.23 All units error out after about 40 seconds. I do not see this on a 2070S nor on a 3070 Laptop. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 662 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
AFAIK it should run anywhere, maybe the issue is more driver related? We recently tested on 40 series GPUs locally and it run fine, since I saw some comments in the thread. It would be helpful if you unhid your computers so we could examine the output files to get a clue on why the tasks are failing on your 40 series card, |
|
Send message Joined: 11 May 10 Posts: 68 Credit: 12,293,501,875 RAC: 3,114 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It would be helpful if you unhid your computers so we could examine the output files to get a clue on why the tasks are failing on your 40 series card, Done. I just ran 2 fresh WUs that errored out as usual. Thank you very much. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 662 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Wasn't helpful. You don't have any result output at all. The tasks never even get to start the setup process. They just exit immediately. Quico needs to reexamine his statement that the 40 series cards are working OK on the ATMbeta tasks. [Edit] I would reset the project to start with in the hope that the task and app packages gets downloaded again. Maybe the necessary Python environment never got set up correctly initially. |
|
Send message Joined: 11 May 10 Posts: 68 Credit: 12,293,501,875 RAC: 3,114 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Project reset and tried two new WU. Same result - error after a few seconds. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 4,772 Level ![]() Scientific publications
|
Quico needs to reexamine his statement that the 40 series cards are working OK on the ATMbeta tasks. they do. look at the leaderboard. many 40-series hosts returning valid work from both linux and Windows.
|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 662 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
OK, so 40 series works fine for both Windows and Linux. So what would you recommend for this volunteer to do for troubleshooting when tasks don't report any useful information? The most logical step of project reset was not fruitful. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 662 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Does the 4080 run other projects gpu tasks without errors? |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 4,772 Level ![]() Scientific publications
|
Maybe a problem with BOINC itself. Might try a different BOINC version.
|
|
Send message Joined: 11 May 10 Posts: 68 Credit: 12,293,501,875 RAC: 3,114 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Does the 4080 run other projects gpu tasks without errors? Yes, it does without errors. PrimeGrid, SRBase, Einstein and WCG OPNG. BOINC was updated to 7.22.2 within this Beat phase - same result. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Another new batch...same old Errors. forget Krembil - it's down most of the time. Too bad what happened to WCG :-( |
|
Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level ![]() Scientific publications ![]()
|
It does not make any difference to Quico. The task will be completed on one or another computer and his science is done. It is our very expensive energy that is wasted but as Quico himself said, the science gets done who cares about wasted energy? |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 4,772 Level ![]() Scientific publications
|
You also have the option to crunch something else if your time is wasted here.
|
|
Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level ![]() Scientific publications ![]()
|
You also have the option to crunch something else if your time is wasted here. I wish you would put a dirty sock where required. In Asia, the transmission of power is through overhead lines. They run red hot and expanded in our heat. Many people used to die due to electrocution. They switch off the grid. If the WU's cannot handle a suspension then there is no need for cati useless useless remarks. You also have the option of not running off with your writing skills. |
©2025 Universitat Pompeu Fabra