Message boards :
News :
acemdshort application 8.15 - discussion
Message board moderation
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 28 Apr 11 Posts: 462 Credit: 958,266,958 RAC: 28,485 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Wow even on Fermi the new app is ~120secs faster ^^ So, seems good from this front. DSKAG Austria Research Team: http://www.research.dskag.at ![]() |
Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Shouldnt there be a tick box for CUDA 5.5 under short runs, or am I missing something? |
![]() Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
It's selected automatically, based on client driver version. MJH |
Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Looking like 90 min for the short runs. |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I think that renaming threads is not nice. Anyway, it's good news that you put the new app to the short queue. When do you plan to put it in the long queue too? |
![]() Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
In a week or so. MJH |
![]() ![]() Send message Joined: 28 Apr 11 Posts: 462 Credit: 958,266,958 RAC: 28,485 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
OK its good for cc1.3 cards too. Tried the 285GTX witch normally is retired from my side from GPUGrid, but now as Test good enough ^^ 26,241.59 secs = 7,3h DSKAG Austria Research Team: http://www.research.dskag.at ![]() |
![]() Send message Joined: 16 Aug 08 Posts: 145 Credit: 328,473,995 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Load on the graphics card to O% increase at 0%, I left the unit after 45 minutes. http://www.gpugrid.net/result.php?resultid=7221739 @+ *_* |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
You need app 8.01 and then the Noelia's run smooth as ever. Greetings from TJ |
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
8.02 makes NOELIA tasks run even smootherer |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
There is a 8.04 app in the Beta queue. I've received this alongside with some TEST14 workunits. |
![]() Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
There is a new acemdshort application, version 8.11 (Windows only). Since this is (hopefully!) the last app revision, now's a good time to summarise the changes in the 800 series over the older app: * SM3.5 support for Titan, Geforce 780, etc. * CUDA 4.2 and CUDA 5.5 builds, automatically assigned based on client driver version. This represents the first step in deprecating CUDA 4.2 and moving exclusively to CUDA 5.5. * Improved stability Fixed several bugs that caused significant rates of compute errors. * Reduced driver crashes Reduced incidence of driver hangs on suspend. The problem is not yet eliminated totally. * Improved reporting GPU stats and temperatures now reported in the stderr. Error codes cleaned up to give better data on failure modes. * application bug-fixes many fixes and enhancements for the science. MJH |
![]() Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Here is a list of compute errors codes for the 8xx series applications and their meanings. If you encounter a new one, or have a question or observation about the circumstances of an error, please PM me. * 255 See -1 * 247 See -9 * 212 See -44 * 197 The WU took longer to much complete than the client was expecting and so it was terminated. Indicates a WU misconfiguration. If recurrent, try re-attaching to the project. * 194 Unknown. ("finish file present too long") * 193 Unknown. (Segfault on Linux) * 159 See -97 * 98 See -9 * -1 Unknown * -9 The GPU compute capability is not supported by the application; for example a pre cc 1.3 G80. * -44 The computer's date is wrong. * -80 Failed to recover after an access violation (Win32) * -97 "Simulation has become unstable". This indicates that the scientific simulation that the application performs has gone wrong. If this happens as soon as the WU starts, there may be a problem with the WU. If it happens frequently or after the WU has made some progress, a hardware problem is strongly indicated. Check GPU temperatures (now reported in stderr) and for the presence of other GPU-using programs (eg games). * -185 Application doesn't start, "Access denied" error in the stderr. Check that the client has downloaded the application correctly - if unsure re-attach to the project. Could also be caused by antivirus preventing BOINC starting new processes. * -226 "Too many exits" The app repeatedly exited without indicating BOINC that the WU was complete and was restarted, until a limit was reached. Cause unclear. * -1073741515 (Windows only) The application failed to intialize properly. Indicates missing DLLs. Re-attach to the project, to force the application and its support DLLs to be re-downloaded. Ensure that VS2008 redistributables are installed http://www.microsoft.com/en-us/download/details.aspx?id=29 * -1073741819 (Windows only) Access violation (Segmentation fault). The application made an illegal memory access. These seem mostly to come from inside the Nvidia driver but root cause(s) unknown. If occurring repeatedly, reboot machine. |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The beta's are gone and the Santi's start to error again on my GTX660. Greetings from TJ |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Short runs (2-3 hours on fastest card) v8.11 (cuda55) Outcome Computation error Client state Compute error Exit status -97 (0xffffffffffffff9f) Unknown error number * -97 "Simulation has become unstable". This indicates that the scientific simulation that the application performs has gone wrong. If this happens as soon as the WU starts, there may be a problem with the WU. If it happens frequently or after the WU has made some progress, a hardware problem is strongly indicated. Check GPU temperatures (now reported in stderr) and for the presence of other GPU-using programs (eg games) - MJH Your temps look reasonable (mostly around 66°C). I suggest you restart the system, and if errors continue to occur look into what else might be causing this problem (games, video programs, antivirus scans, updates...). You might want to note the failure time and check your logs to see what was happening at that time or just before. Both times you had the error, the stderr log ends in, # GPU 0 Current Temp: 64 C # The simulation has become unstable. Terminating to avoid lock-up (1) The slight GPU temperature drop from 66°C to 64°C might indicate resource consumption by something else on your system just before the WU was ended, or the GPU temperature might just have dropped as the WU was ended? FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
![]() Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
The new app does a much better job at determining when a WU has gone bad and aborting. Previously this might have manifested itself as non-specific crash/driver reset. In some circumstances it may be possible to attempt recovery from this failure. Expect a new beta trying an idea out later today. MJH |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks for the information skgiven (and moving my post). No games or whatever on this machine, only 6 Rosetta WU's and 2 cores free for GPUGRID. Two Xeon's processors so it are real cores, no HT on these oldies. In the right event that I am behind the system and I see a WU fail, I did notice a temperature drop of the GPU. This is normal off course as it is no longer working hard. It will not drop match as a new WU start again. AV is F-Secure is this is no longer a problem for BOINC. I have been in very close contact with someone from there main office for a few months to get everything working after their update. I did a lot of testing for them and they even run Rosetta for themselves for a month to get it working. So that is no issue. I have it free with my ISP subscription and am using it for little longer than 2 years now, and for the last 8 months it has never been an issue on any PC for any project. System was restarted just before night as a WU had down clocked my GPU. I have set this system to beta to test the new Harvey's Matt will bring in shortly. Greetings from TJ |
![]() ![]() Send message Joined: 13 Nov 10 Posts: 328 Credit: 72,619,453 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hello: I have finished a task well 8.11 but comment some weird stuff. Task: 9x2-SANTI_MAR4222-4-25-RND9976_0 - Runtime 8317.47 seconds, on my GTX 770 and FX8350 CPU to 4.4GHz. stderr output <core_client_version> 7.2.11 </ core_client_version> <! [CDATA [ <stderr_txt> mp: 59 C # 0 Current GPU Temp: 60 C # 0 Current GPU Temp: 59 C # 0 Current GPU Temp: 59 C # 0 Current GPU Temp: 59 ....... and more.. more GPU temperature records ...?? ending, # GPU 0 Current Temp: 61 C # Time per step (avg over 3000000 steps): 2.771 ms # Approximate elapsed time for Entire WU: 8313.984 s called boinc_finish </ stderr_txt> ]]> Execution times I see them too long if I compare with other short tasks. I think that something does not work very well. Greetings. |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Execution times I see them too long if I compare with other short tasks. I think that something does not work very well. Greetings. That is correct Carlesa25, there are problems with. There is a lengthy thread about this. Its this on: http://www.gpugrid.net/forum_thread.php?id=3450 If you start reading at the first post, you will quickly understand what is wrong. Greetings from TJ |
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I just had an 8.13 task result in: -1073741819 (0xffffffffc0000005) Full details below. I think the cause was that an 8.11 NOELIA_KLEBEbeta (which I was suspending to test) TDR'd the drivers, and this new task segfaulted trying to get started. We still have some work to do resolving the suspending of tasks. I've noticed that they continue running for up to 15 seconds even after suspense. I thought I remember Einstein having a problem that might be related, via http://einstein.phys.uwm.edu/forum_thread.php?id=10141 ... I'm trying to dig up the BOINC API fix (from early June 2013) that solved it. EDIT: I found it. Check out this fix. Would it be applicable towards helping us (GPUGrid project and users) suspend more correctly? http://boinc.berkeley.edu/trac/changeset/b98bc309cceccf95b9fac578c47cbea06a8b8150/boinc-v2 If so, can you implement it? http://www.gpugrid.net/result.php?resultid=7250888 Name 178-MJHARVEY_CRASH1-0-25-RND2676_0 Workunit 4754442 Created 5 Sep 2013 | 14:05:52 UTC Sent 5 Sep 2013 | 16:39:18 UTC Received 5 Sep 2013 | 21:02:06 UTC Server state Over Outcome Computation error Client state Compute error Exit status -1073741819 (0xffffffffc0000005) Unknown error number Computer ID 153764 Report deadline 10 Sep 2013 | 16:39:18 UTC Run time 8.18 CPU time 0.00 Validate state Invalid Credit 0.00 Application version ACEMD beta version v8.13 (cuda55) Stderr output <core_client_version>7.2.11</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1073741819 (0xc0000005) </message> ]]> |
©2025 Universitat Pompeu Fabra