acemdshort application 8.15

Author	Message
dskagcommunity Send message Joined: 28 Apr 11 Posts: 463 Credit: 1,077,516,958 RAC: 128,789 Level Scientific publications	Message 32294 - Posted: 26 Aug 2013, 15:48:07 UTC Wow even on Fermi the new app is ~120secs faster ^^ So, seems good from this front. ---------- 24/7 Crunching since 2011 ----------- DSKAG Austria: http://www.dskag.at ID: 32294 · Rating: 0 · rate: / Reply Quote

5pot Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level Scientific publications	Message 32295 - Posted: 26 Aug 2013, 16:50:21 UTC Shouldnt there be a tick box for CUDA 5.5 under short runs, or am I missing something? ID: 32295 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 32301 - Posted: 26 Aug 2013, 18:21:13 UTC - in response to Message 32295. It's selected automatically, based on client driver version. MJH ID: 32301 · Rating: 0 · rate: / Reply Quote

5pot Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level Scientific publications	Message 32303 - Posted: 26 Aug 2013, 18:55:57 UTC Looking like 90 min for the short runs. ID: 32303 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 32305 - Posted: 26 Aug 2013, 19:31:17 UTC I think that renaming threads is not nice. Anyway, it's good news that you put the new app to the short queue. When do you plan to put it in the long queue too? ID: 32305 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 32306 - Posted: 26 Aug 2013, 19:55:46 UTC - in response to Message 32305. When do you plan to put it in the long queue too? In a week or so. MJH ID: 32306 · Rating: 0 · rate: / Reply Quote

dskagcommunity Send message Joined: 28 Apr 11 Posts: 463 Credit: 1,077,516,958 RAC: 128,789 Level Scientific publications	Message 32310 - Posted: 26 Aug 2013, 23:30:43 UTC OK its good for cc1.3 cards too. Tried the 285GTX witch normally is retired from my side from GPUGrid, but now as Test good enough ^^ 26,241.59 secs = 7,3h ---------- 24/7 Crunching since 2011 ----------- DSKAG Austria: http://www.dskag.at ID: 32310 · Rating: 0 · rate: / Reply Quote

Zarck Send message Joined: 16 Aug 08 Posts: 145 Credit: 328,473,995 RAC: 0 Level Scientific publications	Message 32466 - Posted: 29 Aug 2013, 15:07:42 UTC - in response to Message 32310. Last modified: 29 Aug 2013, 15:08:05 UTC Load on the graphics card to O% increase at 0%, I left the unit after 45 minutes. http://www.gpugrid.net/result.php?resultid=7221739 @+ _ ID: 32466 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 32469 - Posted: 29 Aug 2013, 15:17:36 UTC - in response to Message 32466. You need app 8.01 and then the Noelia's run smooth as ever. Greetings from TJ ID: 32469 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 32483 - Posted: 29 Aug 2013, 17:40:03 UTC - in response to Message 32469. 8.02 makes NOELIA tasks run even smootherer ID: 32483 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 32559 - Posted: 31 Aug 2013, 0:00:54 UTC There is a 8.04 app in the Beta queue. I've received this alongside with some TEST14 workunits. ID: 32559 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 32681 - Posted: 4 Sep 2013, 10:39:58 UTC There is a new acemdshort application, version 8.11 (Windows only). Since this is (hopefully!) the last app revision, now's a good time to summarise the changes in the 800 series over the older app: * SM3.5 support for Titan, Geforce 780, etc. * CUDA 4.2 and CUDA 5.5 builds, automatically assigned based on client driver version. This represents the first step in deprecating CUDA 4.2 and moving exclusively to CUDA 5.5. * Improved stability Fixed several bugs that caused significant rates of compute errors. * Reduced driver crashes Reduced incidence of driver hangs on suspend. The problem is not yet eliminated totally. * Improved reporting GPU stats and temperatures now reported in the stderr. Error codes cleaned up to give better data on failure modes. * application bug-fixes many fixes and enhancements for the science. MJH ID: 32681 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 32700 - Posted: 4 Sep 2013, 15:29:41 UTC - in response to Message 32681. Last modified: 5 Sep 2013, 16:18:31 UTC Here is a list of compute errors codes for the 8xx series applications and their meanings. If you encounter a new one, or have a question or observation about the circumstances of an error, please PM me. * 255 See -1 * 247 See -9 * 212 See -44 * 197 The WU took longer to much complete than the client was expecting and so it was terminated. Indicates a WU misconfiguration. If recurrent, try re-attaching to the project. * 194 Unknown. ("finish file present too long") * 193 Unknown. (Segfault on Linux) * 159 See -97 * 98 See -9 * -1 Unknown * -9 The GPU compute capability is not supported by the application; for example a pre cc 1.3 G80. * -44 The computer's date is wrong. * -80 Failed to recover after an access violation (Win32) * -97 "Simulation has become unstable". This indicates that the scientific simulation that the application performs has gone wrong. If this happens as soon as the WU starts, there may be a problem with the WU. If it happens frequently or after the WU has made some progress, a hardware problem is strongly indicated. Check GPU temperatures (now reported in stderr) and for the presence of other GPU-using programs (eg games). * -185 Application doesn't start, "Access denied" error in the stderr. Check that the client has downloaded the application correctly - if unsure re-attach to the project. Could also be caused by antivirus preventing BOINC starting new processes. * -226 "Too many exits" The app repeatedly exited without indicating BOINC that the WU was complete and was restarted, until a limit was reached. Cause unclear. * -1073741515 (Windows only) The application failed to intialize properly. Indicates missing DLLs. Re-attach to the project, to force the application and its support DLLs to be re-downloaded. Ensure that VS2008 redistributables are installed http://www.microsoft.com/en-us/download/details.aspx?id=29 * -1073741819 (Windows only) Access violation (Segmentation fault). The application made an illegal memory access. These seem mostly to come from inside the Nvidia driver but root cause(s) unknown. If occurring repeatedly, reboot machine. ID: 32700 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 32719 - Posted: 5 Sep 2013, 9:49:00 UTC The beta's are gone and the Santi's start to error again on my GTX660. Greetings from TJ ID: 32719 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 32720 - Posted: 5 Sep 2013, 10:17:26 UTC - in response to Message 32719. Last modified: 5 Sep 2013, 10:27:08 UTC Short runs (2-3 hours on fastest card) v8.11 (cuda55) Outcome Computation error Client state Compute error Exit status -97 (0xffffffffffffff9f) Unknown error number * -97 "Simulation has become unstable". This indicates that the scientific simulation that the application performs has gone wrong. If this happens as soon as the WU starts, there may be a problem with the WU. If it happens frequently or after the WU has made some progress, a hardware problem is strongly indicated. Check GPU temperatures (now reported in stderr) and for the presence of other GPU-using programs (eg games) - MJH Your temps look reasonable (mostly around 66°C). I suggest you restart the system, and if errors continue to occur look into what else might be causing this problem (games, video programs, antivirus scans, updates...). You might want to note the failure time and check your logs to see what was happening at that time or just before. Both times you had the error, the stderr log ends in, # GPU 0 Current Temp: 64 C # The simulation has become unstable. Terminating to avoid lock-up (1) The slight GPU temperature drop from 66°C to 64°C might indicate resource consumption by something else on your system just before the WU was ended, or the GPU temperature might just have dropped as the WU was ended? FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 32720 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 32721 - Posted: 5 Sep 2013, 10:56:05 UTC - in response to Message 32720. * -97 "Simulation has become unstable". The new app does a much better job at determining when a WU has gone bad and aborting. Previously this might have manifested itself as non-specific crash/driver reset. In some circumstances it may be possible to attempt recovery from this failure. Expect a new beta trying an idea out later today. MJH ID: 32721 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 32723 - Posted: 5 Sep 2013, 11:21:32 UTC - in response to Message 32720. Thanks for the information skgiven (and moving my post). No games or whatever on this machine, only 6 Rosetta WU's and 2 cores free for GPUGRID. Two Xeon's processors so it are real cores, no HT on these oldies. In the right event that I am behind the system and I see a WU fail, I did notice a temperature drop of the GPU. This is normal off course as it is no longer working hard. It will not drop match as a new WU start again. AV is F-Secure is this is no longer a problem for BOINC. I have been in very close contact with someone from there main office for a few months to get everything working after their update. I did a lot of testing for them and they even run Rosetta for themselves for a month to get it working. So that is no issue. I have it free with my ISP subscription and am using it for little longer than 2 years now, and for the last 8 months it has never been an issue on any PC for any project. System was restarted just before night as a WU had down clocked my GPU. I have set this system to beta to test the new Harvey's Matt will bring in shortly. Greetings from TJ ID: 32723 · Rating: 0 · rate: / Reply Quote

Carlesa25 Send message Joined: 13 Nov 10 Posts: 328 Credit: 72,619,453 RAC: 0 Level Scientific publications	Message 32735 - Posted: 5 Sep 2013, 15:39:28 UTC Last modified: 5 Sep 2013, 15:40:43 UTC Hello: I have finished a task well 8.11 but comment some weird stuff. Task: 9x2-SANTI_MAR4222-4-25-RND9976_0 - Runtime 8317.47 seconds, on my GTX 770 and FX8350 CPU to 4.4GHz. stderr output <core_client_version> 7.2.11 </ core_client_version> <! [CDATA [ <stderr_txt> mp: 59 C # 0 Current GPU Temp: 60 C # 0 Current GPU Temp: 59 C # 0 Current GPU Temp: 59 C # 0 Current GPU Temp: 59 ....... and more.. more GPU temperature records ...?? ending, # GPU 0 Current Temp: 61 C # Time per step (avg over 3000000 steps): 2.771 ms # Approximate elapsed time for Entire WU: 8313.984 s called boinc_finish </ stderr_txt> ]]> Execution times I see them too long if I compare with other short tasks. I think that something does not work very well. Greetings. ID: 32735 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 32737 - Posted: 5 Sep 2013, 17:27:37 UTC - in response to Message 32735. Execution times I see them too long if I compare with other short tasks. I think that something does not work very well. Greetings. That is correct Carlesa25, there are problems with. There is a lengthy thread about this. Its this on: http://www.gpugrid.net/forum_thread.php?id=3450 If you start reading at the first post, you will quickly understand what is wrong. Greetings from TJ ID: 32737 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 32743 - Posted: 5 Sep 2013, 21:09:25 UTC - in response to Message 32700. Last modified: 5 Sep 2013, 21:14:38 UTC I just had an 8.13 task result in: -1073741819 (0xffffffffc0000005) Full details below. I think the cause was that an 8.11 NOELIA_KLEBEbeta (which I was suspending to test) TDR'd the drivers, and this new task segfaulted trying to get started. We still have some work to do resolving the suspending of tasks. I've noticed that they continue running for up to 15 seconds even after suspense. I thought I remember Einstein having a problem that might be related, via http://einstein.phys.uwm.edu/forum_thread.php?id=10141 ... I'm trying to dig up the BOINC API fix (from early June 2013) that solved it. EDIT: I found it. Check out this fix. Would it be applicable towards helping us (GPUGrid project and users) suspend more correctly? http://boinc.berkeley.edu/trac/changeset/b98bc309cceccf95b9fac578c47cbea06a8b8150/boinc-v2 If so, can you implement it? http://www.gpugrid.net/result.php?resultid=7250888 Name 178-MJHARVEY_CRASH1-0-25-RND2676_0 Workunit 4754442 Created 5 Sep 2013 \| 14:05:52 UTC Sent 5 Sep 2013 \| 16:39:18 UTC Received 5 Sep 2013 \| 21:02:06 UTC Server state Over Outcome Computation error Client state Compute error Exit status -1073741819 (0xffffffffc0000005) Unknown error number Computer ID 153764 Report deadline 10 Sep 2013 \| 16:39:18 UTC Run time 8.18 CPU time 0.00 Validate state Invalid Credit 0.00 Application version ACEMD beta version v8.13 (cuda55) Stderr output <core_client_version>7.2.11</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1073741819 (0xc0000005) </message> ]]> ID: 32743 · Rating: 0 · rate: / Reply Quote

acemdshort application 8.15 - discussion