Message boards :
Graphics cards (GPUs) :
ACEMD2 CUDA3.1 is now out
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
| Author | Message |
|---|---|
|
Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My first return on a full size 3.1 is slower ... very disappointing. No changes to system setup: GTX480 on WinXP Examples are from the same WU type: TONI_CAPBIND* Old version average runtime was 6550 (very little delta between runs) # Time per step (avg over 650000 steps): 10.065 ms # Approximate elapsed time for entire WU: 6541.937 s New version 1 result runtime was 7768 seconds. # Time per step (avg over 650000 steps): 11.947 ms # Approximate elapsed time for entire WU: 7765.391 s Thanks - Steve |
|
Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
next result on the same system is for HERG* type WU. about the same as the old app version ... Old App average runtime 6250 sec. (very little delta between WUs) # Time per step (avg over 625000 steps): 10.001 ms # Approximate elapsed time for entire WU: 6250.359 s New App 1 result # Time per step (avg over 625000 steps): 9.815 ms # Approximate elapsed time for entire WU: 6134.672 s Thanks - Steve |
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Two cuda 3.1 6.09 WU windows-xp-pro gtx480 takes more than 3 hours per wu. More 18.000 ms over 625000 steps. So this is not good??!! Ton (ftpd) Netherlands |
nenymSend message Joined: 31 Mar 09 Posts: 137 Credit: 1,431,087,071 RAC: 58,001 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My the first standard task succesfully finished after more than one year (65nm GTX 260, 64bit Win XP). TONI_CAPBIND* SWAN: Using synchronization method 0 # Time per step (avg over 325000 steps): 27.879 ms # Approximate elapsed time for entire WU: 18121.656 s Thanks GDF. |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
For Linux, when ? Next week. gdf |
[AF>Libristes>Jip] Elgrande71Send message Joined: 16 Jul 08 Posts: 45 Credit: 78,618,001 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thank you. My gpus drivers are updated to 256.35 linux version. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have completed four 6.09 tasks, and I'm a little disappointed. There is a performance gain, but it's not the way I thought it would be. Now my GPU is cooler with the new client, but that's not the important factor for me. :) These 6.09 WUs take longer, beacuse they use less GPU. According to GPU-Z v0.4.4 the GPU load is now around 60% to 74%. Could you optimize the code so it would cause higher GPU load (on GTX 480)? It was over 80% with the V6.05 client. V6.09 m2-IBUCH_201b_pYEEI_100304-40-80-RND1404_1 . 15.422ms 9639.047s V6.05 p41-IBUCH_501_pYEEI_100301-39-40-RND6587_1 _ 12.876ms 8047.344s V6.09 h232f99r412-TONI_CAPBINDsp2-29-100-RND6253_0 16.624ms 10805.422s V6.05 h232f99r558-TONI_CAPBINDsp2-22-100-RND1136_1 12.253ms 7964.406s Also, one WU is failed approximately at 80%. |
|
Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
While I am seeing higher GPU usage I am also seeing higher CPU usage which is causing more kernel thrashing which I believe is what is slowing this version of the app back. I have cut BOINC Manager down another thread to see if this helps. I was leaving one free so now I have two thread open for GPUGrid on each machine, not good for my WCG crunching. Overall I am seeing performance LOSSES of a few minutes on WinXP (480) and enormous (like at least an hour per WU) LOSSES on Win7 (a 285 and a 295). The only WU type that I am seeing an improvement on is TONI_HERG by a couple minutes on both systems. I am seriously considering rolling back. Perhaps the betas proved the programming was stable and perhaps I was overly optimistic in think the beta runtimes / ms per step were indivcative of production performance. Sorry if I sound harsh, I think I had my hopes up high and should crunch for aliens and esoteric math problems to regain my focus on GPUGrid which I really do think is a fantastic project. Time for a break from posting. Thanks - Steve |
|
Send message Joined: 22 May 10 Posts: 20 Credit: 85,355,427 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Results seem to vary drastically with my WinXP I392-TONI_KIDln-12-100-RND6196_0 # Device 0: "GeForce GTX 470" SWAN: Using synchronization method 0 # Time per step (avg over 1300000 steps): 4.924 ms!!! # Approximate elapsed time for entire WU: 6401.594 s h232f99r440-TONI_CAPBINDsp2-25-100-RND7492_0 # Device 0: "GeForce GTX 470" SWAN: Using synchronization method 0 # Time per step (avg over 650000 steps): 16.178 ms # Approximate elapsed time for entire WU: 10515.719 s Not sure what the difference between these 2 applications is but if the second can be optimized like first then that would be seriously impressive |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
The beta workunit has a size of the simulation box of 64x64x64. Cuda3.1 optimizes FFT in multiple of 2. So it was a bit faster and other machines (WinXP) reproduced it. As they were overclocked even improved the reference speed we have found locally. After these tests in gpugrid, it seems that other workunits are not any better than cuda3.0. I will do some tests next week on other workunit types. Nevertheless, CUDA3.1 works well also on holder card, and seems to fix several bug. So it is a good release. gdf |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
@GDF Please read this and give me an optout of 3.1 if that's possible because the result will be the probable loss of two crunchers or more for this project. Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline |
|
Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Serious note ... the majority of the degraded performance on my 2xx series cards was because the last time I switched hardware around I forgot to delete the SWAN_SYNC environ var and apparently the new version of the app recognizes and uses it. This is why I had an even greater amount of kernel usage than normal. While some WUs are faster and some are slower I hope the project gets the benefits from a more stable app. Thanks - Steve |
|
Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
@GDF Please read this and give me an optout of 3.1 if that's possible because the result will be the probable loss of two crunchers or more for this project. You could just go back to older drivers. BOINC blog |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've just installed v258.19 driver, and it reports itself like this in Boinc Manager: 2010.07.03. 23:09:18 NVIDIA GPU 0: GeForce GTX 480 (driver version 25819, CUDA version 3020, compute capability 2.0, 1536MB, 1345 GFLOPS peak) It means that soon(?) tere will be CUDA 3.2 |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This CUDA 3010 WU look promising! http://www.gpugrid.net/result.php?resultid=2611239 It finished about 40% faster than the CUDA 3000 tasks. So although there are some that are about 50% slower, there are some faster WU's as well. GDF just needs to work out what the difference is, and make sure we get the fast ones rather than the slow ones. Good luck with that, Details: The fast one, 2611239 1660106 3 Jul 2010 13:39:37 UTC 3 Jul 2010 18:51:08 UTC Completed and validated 6,601.92 6,598.16 4,727.93 7,091.89 ACEMD2: GPU molecular dynamics v6.09 (cuda31) Name I531-TONI_KIDln-13-100-RND2019_0 Workunit 1660106 Created 3 Jul 2010 13:19:57 UTC Sent 3 Jul 2010 13:39:37 UTC Received 3 Jul 2010 18:51:08 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 71363 Report deadline 8 Jul 2010 13:39:37 UTC Run time 6601.921875 CPU time 6598.156 stderr out <core_client_version>6.10.56</core_client_version> <![CDATA[ <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 470" # Clock rate: 1.41 GHz # Total amount of global memory: 1341718528 bytes # Number of multiprocessors: 14 # Number of cores: 112 SWAN: Using synchronization method 0 MDIO ERROR: cannot open file "restart.coor" # Time per step (avg over 1300000 steps): 5.077 ms # Approximate elapsed time for entire WU: 6599.953 s called boinc_finish </stderr_txt> ]]> Validate state Valid Claimed credit 4727.92939814815 Granted credit 7091.89409722222 application version ACEMD2: GPU molecular dynamics v6.09 (cuda31) An example of the older CUDA 3000 tasks, 2603160 1654062 2 Jul 2010 1:29:51 UTC 2 Jul 2010 6:15:33 UTC Completed and validated 8,218.06 8,105.92 4,428.01 6,642.02 ACEMD2: GPU molecular dynamics v6.05 (cuda30) An example of the newer slow WUs, 2611961 1657969 3 Jul 2010 17:01:16 UTC 3 Jul 2010 22:26:14 UTC Completed and validated 12,871.38 12,867.41 4,535.61 6,803.41 ACEMD2: GPU molecular dynamics v6.09 (cuda31) |
|
Send message Joined: 22 May 10 Posts: 20 Credit: 85,355,427 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
just want to suggest that everyone take a slow deep breath.... and remind everyone we are using bleeding edge drivers with bleeding edge developer tools. Let's not prematurely flee like rats from a sinking ship just because every single GPU transistor is not working at 100% optimization and productivity. Remember the main underlying goal is to help crunch protein simulations for medical research. If some work units are slower now than before...so what? GPU crunching is still exponentially faster than crunching the same thing on CPU based applications. If we keep our wits about us and work together we can help improve application productivity by providing constructive feedback. |
liveoncSend message Joined: 1 Jan 10 Posts: 292 Credit: 41,567,650 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
I second that! I guess that I could run Prime95 or Linpack 24/7 in all I wanted to do was produce heat & waste electricity, especially in Summer. If I like getting points, I'm sure i could do that on Facebook for free w/o needing more then a netbook. But I like GPUGRID. They do work, they have a friendly forum where stupid questions aren't answered back with a verbal gutting or impalement, & they're quick to adopt to all the new things which are still raw. That one day, there would be other things then research in mental disorders, is something I can but hope. I'm more interested in cancer research, cures for STD's, & illness' that normally don't get enough funding, this is none of the above, but if GPUGRID shows the way, maybe the rest of the pack will follow one day...
|
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I got one of those fast work units: # Time per step (avg over 1300000 steps): 4.617 ms # Approximate elapsed time for entire WU: 6002.672 s Claimed credit 4728 Granted credit 7092 This WU generated 94% CPU load, wich makes me contented. But I know the other WUs did run almost at this 'efficiency level' before, and I'd like to have it back, or more :) |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
These CUDA 3.1 I531-TONI_KIDln WU’s are clearly much faster than the CUDA 3000 WUs, and the 3.1 TONI_HERG WU's are just the same as 3.0, but the TONI_CAPBIND, KASHIF_HIVPR are much slower. TONI_KID fast, and again Before last weekend’s downtime my Fermi on XP had a steady RAC of 65K. It has now dropped to 55K and is showing no signs of increasing. If I just crunched the TONI_CAPBIND it would drop to 45K! If I just crunched KASHIF_HIVPR it would drop to about 53K If I could just crunch TONI_KIDln it would rise to 92K. Yes, these are twice as fast as TONI_CAPBIND under CUDA 3.1. So, as you can see there is a massive performance difference between tasks! Before you (GPUGrid team) sit down to work out why this is, could you build the Work Units according to this, so the Fermi’s just get TONI_KIDln Work Units? The slower tasks could just be compiled for 2.2, as they are faster that way. This would immediately speed up the research on Fermi cards (almost double it). Thanks, |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
We can reproduce the problem so we can probably fix it. http://www.gpugrid.net/forum_thread.php?id=2209&nowrap=true#17865 gdf |
©2026 Universitat Pompeu Fabra