ACEMD2 CUDA3.1 is now out

Message boards : Graphics cards (GPUs) : ACEMD2 CUDA3.1 is now out
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Snow Crash

Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17824 - Posted: 2 Jul 2010, 16:00:12 UTC

My first return on a full size 3.1 is slower ... very disappointing.
No changes to system setup: GTX480 on WinXP
Examples are from the same WU type: TONI_CAPBIND*

Old version average runtime was 6550 (very little delta between runs)
# Time per step (avg over 650000 steps): 10.065 ms
# Approximate elapsed time for entire WU: 6541.937 s

New version 1 result runtime was 7768 seconds.
# Time per step (avg over 650000 steps): 11.947 ms
# Approximate elapsed time for entire WU: 7765.391 s

Thanks - Steve
ID: 17824 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Snow Crash

Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17827 - Posted: 2 Jul 2010, 16:33:39 UTC
Last modified: 2 Jul 2010, 16:42:55 UTC

next result on the same system is for HERG* type WU.
about the same as the old app version ...

Old App average runtime 6250 sec. (very little delta between WUs)
# Time per step (avg over 625000 steps): 10.001 ms
# Approximate elapsed time for entire WU: 6250.359 s

New App 1 result
# Time per step (avg over 625000 steps): 9.815 ms
# Approximate elapsed time for entire WU: 6134.672 s
Thanks - Steve
ID: 17827 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17829 - Posted: 2 Jul 2010, 17:38:15 UTC

Two cuda 3.1 6.09 WU windows-xp-pro gtx480 takes more than 3 hours per wu.
More 18.000 ms over 625000 steps.

So this is not good??!!
Ton (ftpd) Netherlands
ID: 17829 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile nenym

Send message
Joined: 31 Mar 09
Posts: 137
Credit: 1,431,087,071
RAC: 58,001
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17830 - Posted: 2 Jul 2010, 17:51:15 UTC

My the first standard task succesfully finished after more than one year (65nm GTX 260, 64bit Win XP).
TONI_CAPBIND*
SWAN: Using synchronization method 0
# Time per step (avg over 325000 steps): 27.879 ms
# Approximate elapsed time for entire WU: 18121.656 s
Thanks GDF.
ID: 17830 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 17834 - Posted: 2 Jul 2010, 20:20:14 UTC - in response to Message 17823.  

For Linux, when ?

Please.



Next week.

gdf
ID: 17834 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Libristes>Jip] Elgrande71
Avatar

Send message
Joined: 16 Jul 08
Posts: 45
Credit: 78,618,001
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17837 - Posted: 2 Jul 2010, 21:47:22 UTC - in response to Message 17834.  

Thank you.
My gpus drivers are updated to 256.35 linux version.
ID: 17837 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17839 - Posted: 2 Jul 2010, 22:24:14 UTC
Last modified: 2 Jul 2010, 22:33:36 UTC

I have completed four 6.09 tasks, and I'm a little disappointed.
There is a performance gain, but it's not the way I thought it would be.
Now my GPU is cooler with the new client, but that's not the important factor for me. :)
These 6.09 WUs take longer, beacuse they use less GPU.
According to GPU-Z v0.4.4 the GPU load is now around 60% to 74%.
Could you optimize the code so it would cause higher GPU load (on GTX 480)?
It was over 80% with the V6.05 client.

V6.09 m2-IBUCH_201b_pYEEI_100304-40-80-RND1404_1 . 15.422ms 9639.047s
V6.05 p41-IBUCH_501_pYEEI_100301-39-40-RND6587_1 _ 12.876ms 8047.344s
V6.09 h232f99r412-TONI_CAPBINDsp2-29-100-RND6253_0 16.624ms 10805.422s
V6.05 h232f99r558-TONI_CAPBINDsp2-22-100-RND1136_1 12.253ms 7964.406s

Also, one WU is failed approximately at 80%.
ID: 17839 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Snow Crash

Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17840 - Posted: 2 Jul 2010, 23:59:02 UTC
Last modified: 2 Jul 2010, 23:59:52 UTC

While I am seeing higher GPU usage I am also seeing higher CPU usage which is causing more kernel thrashing which I believe is what is slowing this version of the app back. I have cut BOINC Manager down another thread to see if this helps. I was leaving one free so now I have two thread open for GPUGrid on each machine, not good for my WCG crunching. Overall I am seeing performance LOSSES of a few minutes on WinXP (480) and enormous (like at least an hour per WU) LOSSES on Win7 (a 285 and a 295). The only WU type that I am seeing an improvement on is TONI_HERG by a couple minutes on both systems. I am seriously considering rolling back.

Perhaps the betas proved the programming was stable and perhaps I was overly optimistic in think the beta runtimes / ms per step were indivcative of production performance.

Sorry if I sound harsh, I think I had my hopes up high and should crunch for aliens and esoteric math problems to regain my focus on GPUGrid which I really do think is a fantastic project. Time for a break from posting.
Thanks - Steve
ID: 17840 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
coldFuSion

Send message
Joined: 22 May 10
Posts: 20
Credit: 85,355,427
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 17841 - Posted: 3 Jul 2010, 0:38:12 UTC
Last modified: 3 Jul 2010, 0:40:30 UTC

Results seem to vary drastically with my WinXP

I392-TONI_KIDln-12-100-RND6196_0
# Device 0: "GeForce GTX 470"
SWAN: Using synchronization method 0
# Time per step (avg over 1300000 steps): 4.924 ms!!!
# Approximate elapsed time for entire WU: 6401.594 s

h232f99r440-TONI_CAPBINDsp2-25-100-RND7492_0
# Device 0: "GeForce GTX 470"
SWAN: Using synchronization method 0
# Time per step (avg over 650000 steps): 16.178 ms
# Approximate elapsed time for entire WU: 10515.719 s

Not sure what the difference between these 2 applications is but if the second can be optimized like first then that would be seriously impressive
ID: 17841 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 17845 - Posted: 3 Jul 2010, 7:04:28 UTC - in response to Message 17841.  

The beta workunit has a size of the simulation box of 64x64x64. Cuda3.1 optimizes FFT in multiple of 2. So it was a bit faster and other machines (WinXP) reproduced it. As they were overclocked even improved the reference speed we have found locally.

After these tests in gpugrid, it seems that other workunits are not any better than cuda3.0. I will do some tests next week on other workunit types.

Nevertheless, CUDA3.1 works well also on holder card, and seems to fix several bug. So it is a good release.

gdf
ID: 17845 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17847 - Posted: 3 Jul 2010, 8:16:30 UTC - in response to Message 17845.  

@GDF Please read this and give me an optout of 3.1 if that's possible because the result will be the probable loss of two crunchers or more for this project.


Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline
ID: 17847 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Snow Crash

Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17851 - Posted: 3 Jul 2010, 11:52:54 UTC

Serious note ... the majority of the degraded performance on my 2xx series cards was because the last time I switched hardware around I forgot to delete the SWAN_SYNC environ var and apparently the new version of the app recognizes and uses it. This is why I had an even greater amount of kernel usage than normal. While some WUs are faster and some are slower I hope the project gets the benefits from a more stable app.
Thanks - Steve
ID: 17851 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MarkJ
Volunteer moderator
Volunteer tester

Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17852 - Posted: 3 Jul 2010, 12:30:21 UTC - in response to Message 17847.  

@GDF Please read this and give me an optout of 3.1 if that's possible because the result will be the probable loss of two crunchers or more for this project.


You could just go back to older drivers.
BOINC blog
ID: 17852 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17854 - Posted: 3 Jul 2010, 21:18:52 UTC
Last modified: 3 Jul 2010, 21:20:42 UTC

I've just installed v258.19 driver, and it reports itself like this in Boinc Manager:

2010.07.03. 23:09:18 NVIDIA GPU 0: GeForce GTX 480 (driver version 25819, CUDA version 3020, compute capability 2.0, 1536MB, 1345 GFLOPS peak)

It means that soon(?) tere will be CUDA 3.2
ID: 17854 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17855 - Posted: 3 Jul 2010, 23:10:34 UTC - in response to Message 17854.  

This CUDA 3010 WU look promising!
http://www.gpugrid.net/result.php?resultid=2611239

It finished about 40% faster than the CUDA 3000 tasks. So although there are some that are about 50% slower, there are some faster WU's as well.

GDF just needs to work out what the difference is, and make sure we get the fast ones rather than the slow ones.

Good luck with that,

Details:

The fast one,
2611239 1660106 3 Jul 2010 13:39:37 UTC 3 Jul 2010 18:51:08 UTC Completed and validated 6,601.92 6,598.16 4,727.93 7,091.89 ACEMD2: GPU molecular dynamics v6.09 (cuda31)

Name I531-TONI_KIDln-13-100-RND2019_0
Workunit 1660106
Created 3 Jul 2010 13:19:57 UTC
Sent 3 Jul 2010 13:39:37 UTC
Received 3 Jul 2010 18:51:08 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 71363
Report deadline 8 Jul 2010 13:39:37 UTC
Run time 6601.921875
CPU time 6598.156
stderr out

<core_client_version>6.10.56</core_client_version>
<![CDATA[
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 470"
# Clock rate: 1.41 GHz
# Total amount of global memory: 1341718528 bytes
# Number of multiprocessors: 14
# Number of cores: 112
SWAN: Using synchronization method 0
MDIO ERROR: cannot open file "restart.coor"
# Time per step (avg over 1300000 steps): 5.077 ms
# Approximate elapsed time for entire WU: 6599.953 s
called boinc_finish

</stderr_txt>
]]>

Validate state Valid
Claimed credit 4727.92939814815
Granted credit 7091.89409722222
application version ACEMD2: GPU molecular dynamics v6.09 (cuda31)

An example of the older CUDA 3000 tasks,
2603160 1654062 2 Jul 2010 1:29:51 UTC 2 Jul 2010 6:15:33 UTC Completed and validated 8,218.06 8,105.92 4,428.01 6,642.02 ACEMD2: GPU molecular dynamics v6.05 (cuda30)

An example of the newer slow WUs,
2611961 1657969 3 Jul 2010 17:01:16 UTC 3 Jul 2010 22:26:14 UTC Completed and validated 12,871.38 12,867.41 4,535.61 6,803.41 ACEMD2: GPU molecular dynamics v6.09 (cuda31)

ID: 17855 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
coldFuSion

Send message
Joined: 22 May 10
Posts: 20
Credit: 85,355,427
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 17857 - Posted: 4 Jul 2010, 0:11:59 UTC

just want to suggest that everyone take a slow deep breath....


and remind everyone we are using bleeding edge drivers with bleeding edge developer tools.

Let's not prematurely flee like rats from a sinking ship just because every single GPU transistor is not working at 100% optimization and productivity.

Remember the main underlying goal is to help crunch protein simulations for medical research.

If some work units are slower now than before...so what? GPU crunching is still exponentially faster than crunching the same thing on CPU based applications.

If we keep our wits about us and work together we can help improve application productivity by providing constructive feedback.
ID: 17857 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile liveonc
Avatar

Send message
Joined: 1 Jan 10
Posts: 292
Credit: 41,567,650
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 17858 - Posted: 4 Jul 2010, 2:13:36 UTC - in response to Message 17857.  

I second that!

I guess that I could run Prime95 or Linpack 24/7 in all I wanted to do was produce heat & waste electricity, especially in Summer. If I like getting points, I'm sure i could do that on Facebook for free w/o needing more then a netbook.

But I like GPUGRID. They do work, they have a friendly forum where stupid questions aren't answered back with a verbal gutting or impalement, & they're quick to adopt to all the new things which are still raw.

That one day, there would be other things then research in mental disorders, is something I can but hope. I'm more interested in cancer research, cures for STD's, & illness' that normally don't get enough funding, this is none of the above, but if GPUGRID shows the way, maybe the rest of the pack will follow one day...
ID: 17858 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17860 - Posted: 4 Jul 2010, 9:15:18 UTC - in response to Message 17855.  

I got one of those fast work units:
# Time per step (avg over 1300000 steps): 4.617 ms
# Approximate elapsed time for entire WU: 6002.672 s
Claimed credit 4728
Granted credit 7092
This WU generated 94% CPU load, wich makes me contented.
But I know the other WUs did run almost at this 'efficiency level' before, and I'd like to have it back, or more :)
ID: 17860 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17862 - Posted: 4 Jul 2010, 11:20:04 UTC - in response to Message 17860.  
Last modified: 4 Jul 2010, 11:44:54 UTC

These CUDA 3.1 I531-TONI_KIDln WU’s are clearly much faster than the CUDA 3000 WUs, and the 3.1 TONI_HERG WU's are just the same as 3.0, but the TONI_CAPBIND, KASHIF_HIVPR are much slower.

TONI_KID fast, and again

Before last weekend’s downtime my Fermi on XP had a steady RAC of 65K. It has now dropped to 55K and is showing no signs of increasing.
If I just crunched the TONI_CAPBIND it would drop to 45K!
If I just crunched KASHIF_HIVPR it would drop to about 53K
If I could just crunch TONI_KIDln it would rise to 92K. Yes, these are twice as fast as TONI_CAPBIND under CUDA 3.1. So, as you can see there is a massive performance difference between tasks!
Before you (GPUGrid team) sit down to work out why this is, could you build the Work Units according to this, so the Fermi’s just get TONI_KIDln Work Units? The slower tasks could just be compiled for 2.2, as they are faster that way. This would immediately speed up the research on Fermi cards (almost double it).
Thanks,
ID: 17862 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 17866 - Posted: 4 Jul 2010, 12:00:33 UTC - in response to Message 17862.  

We can reproduce the problem so we can probably fix it.
http://www.gpugrid.net/forum_thread.php?id=2209&nowrap=true#17865

gdf
ID: 17866 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Graphics cards (GPUs) : ACEMD2 CUDA3.1 is now out

©2026 Universitat Pompeu Fabra