Message boards :
News :
acemdshort application 8.15 - discussion
Message board moderation
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next
Author | Message |
---|---|
![]() Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Side effect of having more code blocked out in critical sections. As the article you found indicates, prompt terminating on suspend requires the monitoring thread to wake up while the app thread is outside a critical region. The only way this is going to get fixed to change the dumb way the boinc client lib blugeons the app process to death, and give the app opportunity to close down gracefully. This will take a bit of work, but it's high on the Todo list. MJH |
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks. Regarding http://boinc.berkeley.edu/trac/changeset/b98bc309cceccf95b9fac578c47cbea06a8b8150/boinc-v2 ... It looked like a simple-to-moderate code change that just changes the way suspension works with the critical sections. It looks very applicable toward making our suspense requests run smoother, and I hope it isn't hard to implement. (I don't know much about where the API code comes in to play, but if it's just "a piece that's included when building apps", then maybe it'll be pretty easy for you to "hook it in") |
![]() Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Jacob, That fix would already have been included in 8.12 when I updated to the latest boinc library revision. Matt |
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Well, for my situation there, it was an 8.11 that caused the problem. :) I'll keep testing, and hopefully it works even better in the already-released 8.12 and 8.13 Thanks for making progress - I really do appreciate it!! Edit: 8.13 is suspending/resuming VERY nicely. I can't wait to have 8.13 running on a NOELIA_KLEBE task (to test it!) |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have tested it on a NOELIA_INS task in order to get a beta. The suspending and starting again worked. (Not getting beta WU as it knew that a task was suspended :( ) Greetings from TJ |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The only way this is going to get fixed to change the dumb way the boinc client lib blugeons the app process to death, and give the app opportunity to close down gracefully. You have allies in the BOINC community. Eric Korpela of SETI@home wrote (on 13 Nov 2008 - unfortunately in a private forum I can't link): Yes, the terminate with no mercy policy sucks and we should find if there is a way around it, or at least a way to allow I/O to finish. About time we got round to fixing that... |
Send message Joined: 26 Feb 12 Posts: 184 Credit: 222,376,233 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
First MJHarvey_Crash beta just finished. http://www.gpugrid.net/result.php?resultid=7251789 Only betas I can't get to finish are the Noelia_Klebe http://www.gpugrid.net/result.php?resultid=7248385 Could the Noelia_Klebes be troublesome because my card is a PE and ramps up to 1200MHz. on the core when crunching? Just wondered because it runs @ 1200 for all the other tasks too. I've also noticed that there is a huge time discrepancy between GPU/CPU on these failed tasks when all the others show GPU/CPU times to be very close. |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Your card mostly ran at 58°C, so it wasn't overly taxed by the Noelia_Klebe WU. My GTX660Ti also clocks up to ~1200MHz. That said I also get the odd error from it and other similar cards. The Noelia_Klebe WU's don't use a full CPU core/thread in the same way most other WU's do. This has been the case since they were first released. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
![]() Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
acemdshort is now updated to 8.14. This version has improved stability during suspend/resume. MJH |
Send message Joined: 26 Feb 12 Posts: 184 Credit: 222,376,233 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
My GTX660Ti also clocks up to ~1200MHz. That said I also get the odd error from it and other similar cards. Wish it was just an odd now or than error. I haven't had 1 NOELIA_KLEBE beta complete and validate yet. The all end with the time exceeded error after running for an hour or so. |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I had one CRASH test overnight that took 22,493.99 seconds to complete. Checking the system shows that the one that was running on half the core clock of the GPU. So that is the explanation. However no reason in the stderr report, core clock was there reported as it should be, 1058MHz. I reboot the system and all is normal again. I have seen reduced clock speeds, but that was after an error or ACEMD crash , this is new that it happened without any errors. Greetings from TJ |
![]() ![]() Send message Joined: 13 Nov 10 Posts: 328 Credit: 72,619,453 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hello: 8.14 Tasks are running low load on the GPU <60% and also very unstable, varies more than 10% + -. in my GTX 770. The CPU runs smoothly, but the result is that it takes twice as necessary, a short assignment are about four hours ...??. |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
TJ, I guess you are referring to this WU, 194-MJHARVEY_CRASH1-1-25-RND6694_0 4759387 7 Sep 2013 | 14:08:27 UTC 8 Sep 2013 | 0:59:20 UTC Completed and validated 22,493.99 22,461.82 18,750.00 ACEMD beta version v8.14 (cuda55) When a WU doesn't use the GPU enough, it can cause the GPU to downclock. The temps were only 52°C, while your other runs on similar WU's had temps rising to 67°C. A 15°C drop sounds about right for a downclock. Perhaps a mechanism to report changes in core clock, as well as temp, would be useful (if it's not too late)! FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
![]() ![]() Send message Joined: 13 Nov 10 Posts: 328 Credit: 72,619,453 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hello: If this task is completed, but is now running; SANTI_MARwt2-4-25-uan RND4912_0 with load <65%. GPU The GTX770 is running at 1254 Mhz GPU Clock without problem. Temperature 55 °C, 20% use FB, BUS use 7% (two variants unstable + - 2%) Memory Usage: 519 MB in GPU. |
![]() ![]() Send message Joined: 13 Nov 10 Posts: 328 Credit: 72,619,453 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hello: If this task is completed, but is now running; SANTI_MARwt2-4-25-uan RND4912_0 with load <65%. GPU Hello: Regarding the issue of little use GPU if it has to be the way of working of these tasks, the solution will perform two tasks on the GPU to achieve maximum load. That those responsible will be interesting to confirm this issue in order to decide how to handle these tasks. NOTE: I happened to run two tasks at the same GPU 8.14 GTX770 and the total charge passed 55% to 70% + - 5% Memory 777 MB FB and BUS 22% and 8% 1254 Mhz GPU. |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
TJ, I guess you are referring to this WU, Yes, skgiven that is the one. Later this morning I had one error, but that did not down clock the core clock. But as these CRASH tests are Santi's SR and I had a lot of errors of them, my error rate has lowered significantly. Greetings from TJ |
![]() ![]() Send message Joined: 13 Nov 10 Posts: 328 Credit: 72,619,453 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hello: If this task is completed, but is now running; SANTI_MARwt2-4-25-uan RND4912_0 with load <65%. GPU Hello: Sorry ... 8.14 my problems with no load on the GPU result from a corruption of the driver, reinstalled the question has been solved and GPU load is normal 85% + - |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
My Asus 770 runs at 91-92% GPU load steady, with core clock of 1097MHz, however I have it set to 1060MHz. This is for a Nathan WU and obvious no 8.14 app yet. Greetings from TJ |
![]() Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
The server should now once again be dishing out Short tasks to Linux clients. MJH |
![]() Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
I've promoted the 8.15 beta application to the short queue. This version has a workaround to catch tasks that repeatedly fail, necessitating a machine reset. Matt |
©2025 Universitat Pompeu Fabra