ACEMD2 6.12 cuda and 6.13 cuda31 for windows and linux

Author	Message
Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 19493 - Posted: 16 Nov 2010, 4:02:51 UTC - in response to Message 19489. Last modified: 16 Nov 2010, 4:04:41 UTC Tested my GTX260 (at factory settings) on Kubuntu. While running a nice fast GIANNI task without setting swan_sync to zero it was very slow; after 12h it had not reached 50% complete (47.5%). So it would not have finished within 24h. I then freed up a CPU core, configured swan_sync=0, restarted and the task sped up considerably: It finished in about 15½h, suggesting the task would have finished in around 7h if I had used swan_sync from the start. Just under 12ms per step. Have you tried setting the nice level as suggested above by several people. It would be interesting to compare this to using SWAN_SYNC. SK, I tested the GIANNI_DHFR500 WUs with no SWAN_SYNC myself. My GT 240 card with a modest shader OC to 1500MHz ran these 2 GIANNI_DHFR500 WUs in 41772.922 and 41906.609 seconds. GPU usage was a steady 89%. This was without setting SWAN_SYNC, just boosting the GPUGRID app's priority to high. All 4 CPU projects ran at normal speeds as did AQUA. No cores wasted. http://www.gpugrid.net/result.php?resultid=3293056 http://www.gpugrid.net/result.php?resultid=3277797 You have one machine that also has finished 2 GIANNI_DHFR500 WUs, I assume using SWAN_SYNC. The times are considerably slower even though you are using a higher OC to 1600MHz. The times are 49102.764 and 46696.179 seconds: http://www.gpugrid.net/result.php?resultid=3287168 http://www.gpugrid.net/result.php?resultid=3283754 Two more GIANNI_DHFR500 WUs, 1 from each of my other GT 240 cards. Faster yet. Times of 37537.328 seconds for the 1st card (1500MHz, 97% GPU) and 35969.656 seconds for the 2nd card (1550MHz, 99% GPU). Again, no SWAN_SYNC, priority simply boosted to high via eFMer Priority 64. http://www.gpugrid.net/result.php?resultid=3294938 http://www.gpugrid.net/result.php?resultid=3294258 ID: 19493 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 19495 - Posted: 16 Nov 2010, 10:20:23 UTC - in response to Message 19493. Last modified: 16 Nov 2010, 10:46:28 UTC Thanks Beyond, eFMer Priority will have another home for a while. These GIANNI_DHFR500 tasks are very fast; probably the fastest tasks we have seen to date, 17.5 to 25ms per step on GT240s. I hope these advancements make their way into the other WU’s. Pleased to see one turn up on my GTX470, just about to start. Although I had a few pauses/restarts during my GIANNI_DHFR500 GT240 runs (which usually adds a few minutes to run time), a big difference between our cards speeds is because I’m stuck using Vista; I think 11% was the ball park figure for 6.05, probably about the same for 6.12. Never the less, I’m sure the priority increase does help; the difference between our times is >11%. It would still need to be mapped out for Fermi’s and non-fermi’s and on XP, Vista, W7, Linux (nice values). Did you run any same type tasks without priority increases, to see what the differences are? This would really need to be compared on the same system (and perhaps with different priorities), and then confirmed on other systems and apps. The trouble with doing this on a GT240 is that it takes a few days, and you have to change the priority at the beginning of a task run (difficult for me with 4 cards all at different run stages and running different task types). While swan_sync seems to be important on Linux for all cards, on Windows it is only important for Fermis, not GT240’s running the 6.12 app (makes little or no difference). It might help with the 6.13 app with a GF200 series card, but I have not tested this. So I suspect priority may not be a replacement for swan_sync, rather an alternative/also usable option, or option for non-Fermi’s. I think one of my GIANNI_DHFR500 tasks (possibly the second one) used swan_sync, but there is little difference in CPU time. When swan_sync is used with Fermi it does use a full core/thread. Both of my GIANNI_DHFR500 tasks used slightly less CPU time than yours did. While this may be down to using a high priority it might just be a difference in CPU performance. So to test if just increasing priority is an alternative to swan_sync I think we would need to test this on a Fermi; with swan_sync=0 and default (low) priority vs swan_sync not in use and high priority. I remember looking at this in the past, and if my memory serves me right we can get away with “Set Above Normal” priority. I’m trying this now (switched on last night mid run) on my GT240’s. Just need to wait a day to get clean results. ID: 19495 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 19498 - Posted: 16 Nov 2010, 11:31:40 UTC - in response to Message 19495. The GIANNI_DHFR500 task errored out after 108sec on the GTX470: <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> http://www.gpugrid.net/result.php?resultid=3298906 XP x86, i7-920 I also see another one that errored 2 days ago: http://www.gpugrid.net/result.php?resultid=3292121 ID: 19498 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 19499 - Posted: 16 Nov 2010, 11:59:41 UTC - in response to Message 19498. The GIANNI_DHFR500 task errored out after 108sec on the GTX470: <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> My GIANNI_DHFR500 tasks erroring out the same way after various computing time on GTX480 (overclocked to 800MHz) task 3282907, 3281784, 3281542, 3281228, 3279134, 3278165 While other highly GPU utilizing tasks (such as KASHIF_HIVPR's) run correctly on these overclocked cards. When I clocked it back to factory settings (while running an other GIANNI_DHFR500 task, so the tasks page still shows the higher frequency), it finished correctly. Task 3294885 So these GIANNI_DHFR500 tasks seem to be more sensitive to overclocking on fermis, than other tasks. ID: 19499 · Rating: 0 · rate: / Reply Quote

ftpd Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level Scientific publications	Message 19500 - Posted: 16 Nov 2010, 12:00:52 UTC Last modified: 16 Nov 2010, 12:06:15 UTC 197-GIANNI_DHFR500-2-99-RND7994_1 Workunit 2076439 Aangemaakt 15 Nov 2010 17:28:35 UTC Sent 15 Nov 2010 19:41:38 UTC Received 16 Nov 2010 7:21:08 UTC Server state Over Outcome Success Client state Geen Exit status 0 (0x0) Computer ID 35174 Report deadline 20 Nov 2010 19:41:38 UTC Run time 9372.934303 CPU time 9349.922 stderr out <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 480" # Clock rate: 1.40 GHz # Total amount of global memory: 1610153984 bytes # Number of multiprocessors: 15 # Number of cores: 120 SWAN: Using synchronization method 0 MDIO ERROR: cannot open file "restart.coor" # Time per step (avg over 2000000 steps): 4.685 ms # Approximate elapsed time for entire WU: 9370.085 s called boinc_finish </stderr_txt> ]]> Validate state Geldig Claimed credit 7491.18171296296 Granted credit 11236.7725694444 @skgiven, This one is OK. Windows-xp-pro driver 260.99 boincmanager 06.10.58 swan_sync=0 Good luck! 436-GIANNI_DHFR500-2-99-RND8894_1 Workunit 2076552 Aangemaakt 15 Nov 2010 17:03:33 UTC Sent 15 Nov 2010 17:15:30 UTC Received 16 Nov 2010 7:26:10 UTC Server state Over Outcome Success Client state Geen Exit status 0 (0x0) Computer ID 47762 Report deadline 20 Nov 2010 17:15:30 UTC Run time 22050.75 CPU time 1663.875 stderr out <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> # Using device 1 # There are 2 devices supporting CUDA # Device 0: "GeForce GTX 295" # Clock rate: 1.24 GHz # Total amount of global memory: 939327488 bytes # Number of multiprocessors: 30 # Number of cores: 240 # Device 1: "GeForce GTX 295" # Clock rate: 1.24 GHz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 30 # Number of cores: 240 MDIO ERROR: cannot open file "restart.coor" # Time per step (avg over 2000000 steps): 11.023 ms # Approximate elapsed time for entire WU: 22045.188 s called boinc_finish </stderr_txt> ]]> Validate state Geldig Claimed credit 7491.18171296296 Granted credit 11236.7725694444 application version ACEMD2: GPU molecular dynamics v6.13 (cuda31) -------------------------------------------------------------------------------- And another one with GTX295! -------------------------------------------------------------------------------- Ton (ftpd) Netherlands ID: 19500 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 19501 - Posted: 16 Nov 2010, 12:06:49 UTC - in response to Message 19499. I will put them back to stock, keep the fans high and see how they get on. I have 5 threads free on the CPU and it's at stock, with turbo off, and the system is well cooled. Your 4.034ms per step looks exceptional. That could bring home 120K credits per day, and at stock. ID: 19501 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 19502 - Posted: 16 Nov 2010, 13:24:24 UTC - in response to Message 19501. These tasks might be a bit sensitive, so I decided to give it a fair chance. Suspended all work, restarted system, reset cards back to stock, upped the fan to 80% and allowed 1 GIANNI_DHFR500 task to run by itself; nothing running on the other card and no CPU tasks running. So far so good - about 37% complete after 68min. GPU Utilization at 95% GPU Temp at 59 deg C GPU Fan at 4850rpm Entire System (XP X86) only using 648MB Virtual memory size 50.14MB Working set size 37.01MB CPU usage fluctuates between 13% and 18% very rapidly. Using one full core + a bit (probably just the system, Boinc, Task Manager, FF). To be honest I would be more than happy if GPUGrid required 2 or 3 threads of my i7-920 if they are needed to support two Fermi's and bring this sort of improvement. Each GTX470 could do 100K per day for a very worthwhile project. ID: 19502 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 19503 - Posted: 16 Nov 2010, 15:08:13 UTC - in response to Message 19502. Last modified: 16 Nov 2010, 15:22:56 UTC I have another successful GIANNI_DHFR500 task (3299428). This time the GPU ran at 800MHz, and I've raised the GPU's core voltage a bit more to 1.075V (it was 1.050V before). The interesting part is the average time per step at 800MHz (4.038 ms) is even a little bit higher than at 700MHz (4.034 ms) ID: 19503 · Rating: 0 · rate: / Reply Quote

Bobrr Send message Joined: 2 Jul 10 Posts: 7 Credit: 28,599,565 RAC: 0 Level Scientific publications	Message 19505 - Posted: 16 Nov 2010, 16:45:02 UTC - in response to Message 19487. The GTS450 is showing up again BOINC: 'Tue 16 Nov 2010 11:19:53 AM EST NVIDIA GPU 0: GeForce GTS 450 (driver version unknown, CUDA version 3020, compute capability 2.1, 1023MB, 421 GFLOPS peak)' 'driver version unknown' is a bit odd as it shows up in UBUNTU as 260.19.21 The card also shows up in GPUGRID under host ID: 84614. It is currently running a task: 'Tue 16 Nov 2010 11:36:24 AM EST GPUGRID Starting task r475s1f1_r130s2-TONI_MSM5-0-4-RND8696_0 using acemd2 version 613' I'll keep an eye on it. Thanks again for the assistance. ID: 19505 · Rating: 0 · rate: / Reply Quote

Microcruncher* Send message Joined: 12 Jun 09 Posts: 4 Credit: 185,737 RAC: 0 Level Scientific publications	Message 19506 - Posted: 16 Nov 2010, 16:51:08 UTC - in response to Message 19505. Last modified: 16 Nov 2010, 16:55:21 UTC 'driver version unknown' is a bit odd as it shows up in UBUNTU as 260.19.21 That's quite normal. Here (Ubuntu 64 Bit, GTX 460, 260.19.12) it looks similar: Di 16 Nov 2010 13:47:05 CET NVIDIA GPU 0: GeForce GTX 460 (driver version unknown, CUDA version 3020, compute capability 2.1, 1023MB, 650 GFLOPS peak) ID: 19506 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 19507 - Posted: 16 Nov 2010, 16:54:26 UTC - in response to Message 19503. Last modified: 16 Nov 2010, 17:03:25 UTC Retvari Zoltan, I have seen this behaviour a few times in the past, one fairly recently. When you increase the GPU frequency tasks run faster and faster until peaking, on a sort of bell curve apex, and then less and less fast until you reach the point where they run slower than at stock, just before failing most tasks. You have to find that sweet spot and this might change for different task types, so it is better to err on the side of caution. Basically data has to be resent within the card because it did not arrive correctly. May be a feature of the memory controller ;p Try 725MHz, 750MHz and 775MHz and see which setting finishes these tasks quicker. That GIANNI_DHFR500 task completed OK on my system. 5.546 ms per step on a stock GTX470 is good going. ID: 19507 · Rating: 0 · rate: / Reply Quote

Saenger Send message Joined: 20 Jul 08 Posts: 134 Credit: 23,657,183 RAC: 0 Level Scientific publications	Message 19508 - Posted: 16 Nov 2010, 17:21:43 UTC - in response to Message 19506. 'driver version unknown' is a bit odd as it shows up in UBUNTU as 260.19.21 That's quite normal. Here (Ubuntu 64 Bit, GTX 460, 260.19.12) it looks similar: Di 16 Nov 2010 13:47:05 CET NVIDIA GPU 0: GeForce GTX 460 (driver version unknown, CUDA version 3020, compute capability 2.1, 1023MB, 650 GFLOPS peak) Here it's Mo 15 Nov 2010 20:30:06 CET NVIDIA GPU 0: GeForce GT 240 (driver version unknown, CUDA version 3020, compute capability 1.2, 511MB, 257 GFLOPS peak) And my Nvidia settings say 260.19.12 under ubuntu10.4-64bit Gruesse vom Saenger For questions about Boinc look in the BOINC-Wiki ID: 19508 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 19509 - Posted: 16 Nov 2010, 17:23:54 UTC - in response to Message 19499. My GIANNI_DHFR500 tasks erroring out the same way after various computing time on GTX480 (overclocked to 800MHz) While other highly GPU utilizing tasks (such as KASHIF_HIVPR's) run correctly on these overclocked cards. When I clocked it back to factory settings (while running an other GIANNI_DHFR500 task, so the tasks page still shows the higher frequency), it finished correctly. So these GIANNI_DHFR500 tasks seem to be more sensitive to overclocking on fermis, than other tasks. The GIANNI_DHFR500 run at a higher GPU percent usage than most other WUs, 97-99% on some of my machines. This alone would tend to push a card too heavily OCed beyond it's limits. ID: 19509 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 19510 - Posted: 16 Nov 2010, 17:47:46 UTC - in response to Message 19495. Thanks Beyond, eFMer Priority will have another home for a while. You're welcome. Just trying to find the most efficient solution so both GPU and CPU projects can be run optimally. These GIANNI_DHFR500 tasks are very fast; probably the fastest tasks we have seen to date, 17.5 to 25ms per step on GT240s. I hope these advancements make their way into the other WU’s. Pleased to see one turn up on my GTX470, just about to start. I finally got a GIANNI_DHFR500 WU on my GTX 260 too, so will be interesting to see how that goes. It would still need to be mapped out for Fermi’s and non-fermi’s and on XP, Vista, W7, Linux (nice values). Did you run any same type tasks without priority increases, to see what the differences are? Like you I started but at the default project priority the GPU is not fed properly and you will see either a low or markedly sawtooth shaped GPU usage graph along with slow WU progress. No point in testing that further as the default priority is not the correct setting at least on any of my XP64 machines. While swan_sync seems to be important on Linux for all cards, on Windows it is only important for Fermis, not GT240’s running the 6.12 app (makes little or no difference). It might help with the 6.13 app with a GF200 series card, but I have not tested this. So I suspect priority may not be a replacement for swan_sync, rather an alternative/also usable option, or option for non-Fermi’s. I think one of my GIANNI_DHFR500 tasks (possibly the second one) used swan_sync, but there is little difference in CPU time. When swan_sync is used with Fermi it does use a full core/thread. Linux users have also reported a huge WU speedup by adjusting their priority (nice) settings. Both of my GIANNI_DHFR500 tasks used slightly less CPU time than yours did. While this may be down to using a high priority it might just be a difference in CPU performance. ... I remember looking at this in the past, and if my memory serves me right we can get away with “Set Above Normal” priority. I’m trying this now (switched on last night mid run) on my GT240’s. Just need to wait a day to get clean results. For me 1,000-5,000 seconds of CPU time for a 40,000 second WU is very acceptable, much better than the 40,000 seconds of CPU used with SWAN_SYNC. Please let us know how the Above Normal priority works for you. ID: 19510 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 19511 - Posted: 16 Nov 2010, 17:48:32 UTC - in response to Message 19507. Last modified: 16 Nov 2010, 18:03:55 UTC BobR, good to see you back up and running. Thanks for listing that data. "driver version unknown" might just be a Boinc client/driver reporting issue. Perhaps Boinc expects a similar format to Windows; three numbers one dot and two more numbers (260.99) rather than nnn.nn.nn? That 421 GFLOPS peak is also wrong, it should be about 602 and Ralf's 650 GFlops peak for a GTX460 is also low; should be 907. I will also try eFMer Priority on my Fermi's tonight and report back, tomorrow hopefully. Would like to run it without swan_sync=0 and with swan_sync=0 to see if they are worth using together or just separately. If it makes no difference running them together but they equally speed up the projects then an increased priority could be added project side to the app and we would not have to use swan_sync at all, or perhaps just with Linux (depending on nice values). Alternatively both together might improve runtime even more, and then swan_sync could be kept as an option and above normal priority could be a default setting. We'll see... ID: 19511 · Rating: 0 · rate: / Reply Quote

Microcruncher* Send message Joined: 12 Jun 09 Posts: 4 Credit: 185,737 RAC: 0 Level Scientific publications	Message 19512 - Posted: 16 Nov 2010, 18:32:10 UTC - in response to Message 19511. Last modified: 16 Nov 2010, 18:34:46 UTC That 421 GFLOPS peak is also wrong, it should be about 602 and Ralf's 650 GFlops peak for a GTX460 is also low; should be 907. The GFLOPS display is pretty much useless. BOINC 6.10.58 for Windows reports 363 GFLOPS for the GTX 460 (independent of the shader clocks) while my old GTX 260 was rated at 477 GFLOPS at stock clock. ID: 19512 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level Scientific publications	Message 19514 - Posted: 16 Nov 2010, 19:06:11 UTC - in response to Message 19512. That 421 GFLOPS peak is also wrong, it should be about 602 and Ralf's 650 GFlops peak for a GTX460 is also low; should be 907. The GFLOPS display is pretty much useless. BOINC 6.10.58 for Windows reports 363 GFLOPS for the GTX 460 (independent of the shader clocks) while my old GTX 260 was rated at 477 GFLOPS at stock clock. Until we can get an answer for David Anderson's question: Is it the case that all compute capability 2.1 chips have 48 cores per processor? I can't get a clear answer from nvidia on this. -- David the peak GFlops estimate in BOINC is going to stay wrong. Has anyone else got a way of getting a clear answer from NVidia, not only for this gerenaration of chips, but hopefully a software API that will work on future generations as well? ID: 19514 · Rating: 0 · rate: / Reply Quote

Microcruncher* Send message Joined: 12 Jun 09 Posts: 4 Credit: 185,737 RAC: 0 Level Scientific publications	Message 19515 - Posted: 16 Nov 2010, 20:00:05 UTC - in response to Message 19514. Has anyone else got a way of getting a clear answer from NVidia, not only for this gerenaration of chips, but hopefully a software API that will work on future generations as well? No. Of course, an API would be nice but it should be kept simple or otherwise the BOINC programmers end up writing and maintaining a sysinfo tool only to display one not very interesting number. It is interesting if BOINC detects the GPUs but the display of the "correct" theoretical performance for a card like my GTX 460 (which behaves with one app like a 336 SP card and with another like a 224 SP card because the two warp schedulers per multiprocessor can't make use of the extra 16 cores each MP has) should be far down on the priority list. ID: 19515 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 19517 - Posted: 16 Nov 2010, 22:41:24 UTC - in response to Message 19515. There’s little chance of NVidia divulging the architectural structure of future cards. As for CC 2.1, yes. The 48:1 ratio is synonymous with CC2.1, hence the compliance of all low and mid range Fermi cards. This is set to continue for the immediate future; there will not be a GF114 version for several months and those GTX560’s will probably tow the line, or just not be CC2.1. Should another card turn up with a different ratio I would expect NVidia to make a new capability (CC3.0) – very unlikely for some time; the GTX 580 is still CC 2.0 (32:1 ratio) and the forthcoming GTX 470 will follow suit. So there might not even be a new CC until the move to 28nm, and that is a fair bit away; more chance of several new Boinc versions between now and then. GPU Caps Viewer reports GPU Compute Capability. ID: 19517 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 19519 - Posted: 16 Nov 2010, 23:33:52 UTC - in response to Message 19510. These GIANNI_DHFR500 tasks are very fast; probably the fastest tasks we have seen to date, 17.5 to 25ms per step on GT240s. I hope these advancements make their way into the other WU’s. Pleased to see one turn up on my GTX470, just about to start. I finally got a GIANNI_DHFR500 WU on my GTX 260 too, so will be interesting to see how that goes. Finished the GIANNI_DHFR500 WU on my GTX 260 now: priority high, 1530 MHz, XP64. It ran in 16904.656 seconds with only 832 seconds of CPU time, 8.452 ms/step. All in all very happy with how the GIANNI_DHFR500 WUs are running with priority boost. ID: 19519 · Rating: 0 · rate: / Reply Quote