ACEMD2 CUDA3.1 is now out

Author	Message
Snow Crash Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level Scientific publications	Message 17824 - Posted: 2 Jul 2010, 16:00:12 UTC My first return on a full size 3.1 is slower ... very disappointing. No changes to system setup: GTX480 on WinXP Examples are from the same WU type: TONI_CAPBIND* Old version average runtime was 6550 (very little delta between runs) # Time per step (avg over 650000 steps): 10.065 ms # Approximate elapsed time for entire WU: 6541.937 s New version 1 result runtime was 7768 seconds. # Time per step (avg over 650000 steps): 11.947 ms # Approximate elapsed time for entire WU: 7765.391 s Thanks - Steve ID: 17824 · Rating: 0 · rate: / Reply Quote

Snow Crash Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level Scientific publications	Message 17827 - Posted: 2 Jul 2010, 16:33:39 UTC Last modified: 2 Jul 2010, 16:42:55 UTC next result on the same system is for HERG* type WU. about the same as the old app version ... Old App average runtime 6250 sec. (very little delta between WUs) # Time per step (avg over 625000 steps): 10.001 ms # Approximate elapsed time for entire WU: 6250.359 s New App 1 result # Time per step (avg over 625000 steps): 9.815 ms # Approximate elapsed time for entire WU: 6134.672 s Thanks - Steve ID: 17827 · Rating: 0 · rate: / Reply Quote

ftpd Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level Scientific publications	Message 17829 - Posted: 2 Jul 2010, 17:38:15 UTC Two cuda 3.1 6.09 WU windows-xp-pro gtx480 takes more than 3 hours per wu. More 18.000 ms over 625000 steps. So this is not good??!! Ton (ftpd) Netherlands ID: 17829 · Rating: 0 · rate: / Reply Quote

nenym Send message Joined: 31 Mar 09 Posts: 137 Credit: 1,431,087,071 RAC: 58,001 Level Scientific publications	Message 17830 - Posted: 2 Jul 2010, 17:51:15 UTC My the first standard task succesfully finished after more than one year (65nm GTX 260, 64bit Win XP). TONI_CAPBIND* SWAN: Using synchronization method 0 # Time per step (avg over 325000 steps): 27.879 ms # Approximate elapsed time for entire WU: 18121.656 s Thanks GDF. ID: 17830 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 17834 - Posted: 2 Jul 2010, 20:20:14 UTC - in response to Message 17823. For Linux, when ? Please. Next week. gdf ID: 17834 · Rating: 0 · rate: / Reply Quote

[AF>Libristes>Jip] Elgrande71 Send message Joined: 16 Jul 08 Posts: 45 Credit: 78,618,001 RAC: 0 Level Scientific publications	Message 17837 - Posted: 2 Jul 2010, 21:47:22 UTC - in response to Message 17834. Thank you. My gpus drivers are updated to 256.35 linux version. ID: 17837 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 17839 - Posted: 2 Jul 2010, 22:24:14 UTC Last modified: 2 Jul 2010, 22:33:36 UTC I have completed four 6.09 tasks, and I'm a little disappointed. There is a performance gain, but it's not the way I thought it would be. Now my GPU is cooler with the new client, but that's not the important factor for me. :) These 6.09 WUs take longer, beacuse they use less GPU. According to GPU-Z v0.4.4 the GPU load is now around 60% to 74%. Could you optimize the code so it would cause higher GPU load (on GTX 480)? It was over 80% with the V6.05 client. V6.09 m2-IBUCH_201b_pYEEI_100304-40-80-RND1404_1 . 15.422ms 9639.047s V6.05 p41-IBUCH_501_pYEEI_100301-39-40-RND6587_1 _ 12.876ms 8047.344s V6.09 h232f99r412-TONI_CAPBINDsp2-29-100-RND6253_0 16.624ms 10805.422s V6.05 h232f99r558-TONI_CAPBINDsp2-22-100-RND1136_1 12.253ms 7964.406s Also, one WU is failed approximately at 80%. ID: 17839 · Rating: 0 · rate: / Reply Quote

Snow Crash Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level Scientific publications	Message 17840 - Posted: 2 Jul 2010, 23:59:02 UTC Last modified: 2 Jul 2010, 23:59:52 UTC While I am seeing higher GPU usage I am also seeing higher CPU usage which is causing more kernel thrashing which I believe is what is slowing this version of the app back. I have cut BOINC Manager down another thread to see if this helps. I was leaving one free so now I have two thread open for GPUGrid on each machine, not good for my WCG crunching. Overall I am seeing performance LOSSES of a few minutes on WinXP (480) and enormous (like at least an hour per WU) LOSSES on Win7 (a 285 and a 295). The only WU type that I am seeing an improvement on is TONI_HERG by a couple minutes on both systems. I am seriously considering rolling back. Perhaps the betas proved the programming was stable and perhaps I was overly optimistic in think the beta runtimes / ms per step were indivcative of production performance. Sorry if I sound harsh, I think I had my hopes up high and should crunch for aliens and esoteric math problems to regain my focus on GPUGrid which I really do think is a fantastic project. Time for a break from posting. Thanks - Steve ID: 17840 · Rating: 0 · rate: / Reply Quote

coldFuSion Send message Joined: 22 May 10 Posts: 20 Credit: 85,355,427 RAC: 0 Level Scientific publications	Message 17841 - Posted: 3 Jul 2010, 0:38:12 UTC Last modified: 3 Jul 2010, 0:40:30 UTC Results seem to vary drastically with my WinXP I392-TONI_KIDln-12-100-RND6196_0 # Device 0: "GeForce GTX 470" SWAN: Using synchronization method 0 # Time per step (avg over 1300000 steps): 4.924 ms!!! # Approximate elapsed time for entire WU: 6401.594 s h232f99r440-TONI_CAPBINDsp2-25-100-RND7492_0 # Device 0: "GeForce GTX 470" SWAN: Using synchronization method 0 # Time per step (avg over 650000 steps): 16.178 ms # Approximate elapsed time for entire WU: 10515.719 s Not sure what the difference between these 2 applications is but if the second can be optimized like first then that would be seriously impressive ID: 17841 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 17845 - Posted: 3 Jul 2010, 7:04:28 UTC - in response to Message 17841. The beta workunit has a size of the simulation box of 64x64x64. Cuda3.1 optimizes FFT in multiple of 2. So it was a bit faster and other machines (WinXP) reproduced it. As they were overclocked even improved the reference speed we have found locally. After these tests in gpugrid, it seems that other workunits are not any better than cuda3.0. I will do some tests next week on other workunit types. Nevertheless, CUDA3.1 works well also on holder card, and seems to fix several bug. So it is a good release. gdf ID: 17845 · Rating: 0 · rate: / Reply Quote

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 17847 - Posted: 3 Jul 2010, 8:16:30 UTC - in response to Message 17845. @GDF Please read this and give me an optout of 3.1 if that's possible because the result will be the probable loss of two crunchers or more for this project. Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline ID: 17847 · Rating: 0 · rate: / Reply Quote

Snow Crash Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level Scientific publications	Message 17851 - Posted: 3 Jul 2010, 11:52:54 UTC Serious note ... the majority of the degraded performance on my 2xx series cards was because the last time I switched hardware around I forgot to delete the SWAN_SYNC environ var and apparently the new version of the app recognizes and uses it. This is why I had an even greater amount of kernel usage than normal. While some WUs are faster and some are slower I hope the project gets the benefits from a more stable app. Thanks - Steve ID: 17851 · Rating: 0 · rate: / Reply Quote

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 17852 - Posted: 3 Jul 2010, 12:30:21 UTC - in response to Message 17847. @GDF Please read this and give me an optout of 3.1 if that's possible because the result will be the probable loss of two crunchers or more for this project. You could just go back to older drivers. BOINC blog ID: 17852 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 17854 - Posted: 3 Jul 2010, 21:18:52 UTC Last modified: 3 Jul 2010, 21:20:42 UTC I've just installed v258.19 driver, and it reports itself like this in Boinc Manager: 2010.07.03. 23:09:18 NVIDIA GPU 0: GeForce GTX 480 (driver version 25819, CUDA version 3020,* compute capability 2.0, 1536MB, 1345 GFLOPS peak)* It means that soon(?) tere will be CUDA 3.2 ID: 17854 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 17855 - Posted: 3 Jul 2010, 23:10:34 UTC - in response to Message 17854. This CUDA 3010 WU look promising! http://www.gpugrid.net/result.php?resultid=2611239 It finished about 40% faster than the CUDA 3000 tasks. So although there are some that are about 50% slower, there are some faster WU's as well. GDF just needs to work out what the difference is, and make sure we get the fast ones rather than the slow ones. Good luck with that, Details: The fast one, 2611239 1660106 3 Jul 2010 13:39:37 UTC 3 Jul 2010 18:51:08 UTC Completed and validated 6,601.92 6,598.16 4,727.93 7,091.89 ACEMD2: GPU molecular dynamics v6.09 (cuda31) Name I531-TONI_KIDln-13-100-RND2019_0 Workunit 1660106 Created 3 Jul 2010 13:19:57 UTC Sent 3 Jul 2010 13:39:37 UTC Received 3 Jul 2010 18:51:08 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 71363 Report deadline 8 Jul 2010 13:39:37 UTC Run time 6601.921875 CPU time 6598.156 stderr out <core_client_version>6.10.56</core_client_version> <![CDATA[ <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 470" # Clock rate: 1.41 GHz # Total amount of global memory: 1341718528 bytes # Number of multiprocessors: 14 # Number of cores: 112 SWAN: Using synchronization method 0 MDIO ERROR: cannot open file "restart.coor" # Time per step (avg over 1300000 steps): 5.077 ms # Approximate elapsed time for entire WU: 6599.953 s called boinc_finish </stderr_txt> ]]> Validate state Valid Claimed credit 4727.92939814815 Granted credit 7091.89409722222 application version ACEMD2: GPU molecular dynamics v6.09 (cuda31) An example of the older CUDA 3000 tasks, 2603160 1654062 2 Jul 2010 1:29:51 UTC 2 Jul 2010 6:15:33 UTC Completed and validated 8,218.06 8,105.92 4,428.01 6,642.02 ACEMD2: GPU molecular dynamics v6.05 (cuda30) An example of the newer slow WUs, 2611961 1657969 3 Jul 2010 17:01:16 UTC 3 Jul 2010 22:26:14 UTC Completed and validated 12,871.38 12,867.41 4,535.61 6,803.41 ACEMD2: GPU molecular dynamics v6.09 (cuda31) ID: 17855 · Rating: 0 · rate: / Reply Quote

coldFuSion Send message Joined: 22 May 10 Posts: 20 Credit: 85,355,427 RAC: 0 Level Scientific publications	Message 17857 - Posted: 4 Jul 2010, 0:11:59 UTC just want to suggest that everyone take a slow deep breath.... and remind everyone we are using bleeding edge drivers with bleeding edge developer tools. Let's not prematurely flee like rats from a sinking ship just because every single GPU transistor is not working at 100% optimization and productivity. Remember the main underlying goal is to help crunch protein simulations for medical research. If some work units are slower now than before...so what? GPU crunching is still exponentially faster than crunching the same thing on CPU based applications. If we keep our wits about us and work together we can help improve application productivity by providing constructive feedback. ID: 17857 · Rating: 0 · rate: / Reply Quote

liveonc Send message Joined: 1 Jan 10 Posts: 292 Credit: 41,567,650 RAC: 0 Level Scientific publications	Message 17858 - Posted: 4 Jul 2010, 2:13:36 UTC - in response to Message 17857. I second that! I guess that I could run Prime95 or Linpack 24/7 in all I wanted to do was produce heat & waste electricity, especially in Summer. If I like getting points, I'm sure i could do that on Facebook for free w/o needing more then a netbook. But I like GPUGRID. They do work, they have a friendly forum where stupid questions aren't answered back with a verbal gutting or impalement, & they're quick to adopt to all the new things which are still raw. That one day, there would be other things then research in mental disorders, is something I can but hope. I'm more interested in cancer research, cures for STD's, & illness' that normally don't get enough funding, this is none of the above, but if GPUGRID shows the way, maybe the rest of the pack will follow one day... ID: 17858 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 17860 - Posted: 4 Jul 2010, 9:15:18 UTC - in response to Message 17855. I got one of those fast work units: # Time per step (avg over 1300000 steps): 4.617 ms # Approximate elapsed time for entire WU: 6002.672 s Claimed credit 4728 Granted credit 7092 This WU generated 94% CPU load, wich makes me contented. But I know the other WUs did run almost at this 'efficiency level' before, and I'd like to have it back, or more :) ID: 17860 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 17862 - Posted: 4 Jul 2010, 11:20:04 UTC - in response to Message 17860. Last modified: 4 Jul 2010, 11:44:54 UTC These CUDA 3.1 I531-TONI_KIDln WU’s are clearly much faster than the CUDA 3000 WUs, and the 3.1 TONI_HERG WU's are just the same as 3.0, but the TONI_CAPBIND, KASHIF_HIVPR are much slower. TONI_KID fast, and again Before last weekend’s downtime my Fermi on XP had a steady RAC of 65K. It has now dropped to 55K and is showing no signs of increasing. If I just crunched the TONI_CAPBIND it would drop to 45K! If I just crunched KASHIF_HIVPR it would drop to about 53K If I could just crunch TONI_KIDln it would rise to 92K. Yes, these are twice as fast as TONI_CAPBIND under CUDA 3.1. So, as you can see there is a massive performance difference between tasks! Before you (GPUGrid team) sit down to work out why this is, could you build the Work Units according to this, so the Fermi’s just get TONI_KIDln Work Units? The slower tasks could just be compiled for 2.2, as they are faster that way. This would immediately speed up the research on Fermi cards (almost double it). Thanks, ID: 17862 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 17866 - Posted: 4 Jul 2010, 12:00:33 UTC - in response to Message 17862. We can reproduce the problem so we can probably fix it. http://www.gpugrid.net/forum_thread.php?id=2209&nowrap=true#17865 gdf ID: 17866 · Rating: 0 · rate: / Reply Quote