Advanced search

Message boards : Graphics cards (GPUs) : GTX 970, 2 WU (Long), 2 CPU.

Author Message
Profile Francois Normandin
Send message
Joined: 8 Mar 11
Posts: 71
Credit: 641,960,551
RAC: 1,199
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 41485 - Posted: 5 Jul 2015 | 18:43:29 UTC

I average 19 hours each for 2 WU (long one) on my GTX970 with a dedicated cpu core for each, did someone know if this seem ok performance?

GERARD_FXCXCL12_LIG_1035426

Also Running Rosetta@home on 6 core.

GPU load 84%
Overcloked to 1440mhz Stable. +150mhz on memory.
Windows 7 64bits
Fx-8350 4ghz
Asus m5a97 r2.0
8gig ram

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2058
Credit: 15,019,398,669
RAC: 4,416,896
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41487 - Posted: 6 Jul 2015 | 8:38:17 UTC - in response to Message 41485.
Last modified: 6 Jul 2015 | 8:45:12 UTC

I average 19 hours each for 2 WU (long one) on my GTX970 with a dedicated cpu core for each, did someone know if this seem ok performance?

GERARD_FXCXCL12_LIG_1035426
GPU load 84%
Overcloked to 1440mhz Stable. +150mhz on memory.
Windows 7 64bits
Fx-8350 4ghz
Asus m5a97 r2.0
8gig ram

19 hours is a bit too long for this GPU, it should be around 12 hours (even on a WDDM OS like Windows 7).

Also Running Rosetta@home on 6 core.

Rosetta@home has the most demanding CPU app regarding CPU and memory usage (bandwidth & working set size), so this could be the reason for the GPU tasks taking this long to finish on your host. If the rosetta@home project usually grants less credits for the finished tasks than your host claims, it is a sign of that host is overcommitted. See your host vs my laptop (recently my laptop runs only CPU tasks). As GPU tasks are more rewarding, I usually prioritize these (i.e. I reduce the CPU tasks running until the GPU tasks don't suffer the lack of bandwidth). 8 CPU cores need a lot of RAM bandwidth, so this CPU's dual channel memory controller could be a serious bottleneck while using all cores for running many instances of the same demanding application (like rosetta@home's). If you have only one RAM module in this host, I suggest you to put in a same one to achieve dual channel memory (as it will really double the RAM's bandwidth).

Trotador
Send message
Joined: 25 Mar 12
Posts: 90
Credit: 1,431,740,570
RAC: 8,227
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 41488 - Posted: 6 Jul 2015 | 9:20:46 UTC

Not sure but I think he is saying that 19 hours is running two WUs at the same time.

If it were the case, it would not be that bad, but just guessing.

Profile Francois Normandin
Send message
Joined: 8 Mar 11
Posts: 71
Credit: 641,960,551
RAC: 1,199
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 41489 - Posted: 6 Jul 2015 | 10:16:03 UTC

Yes, my bad.

Two WU at the same time completed in 19-20 hours of work. (10hours/Wu)

On the cpu, 2 core around 50%, and 2 core around 90% and the last 4 core at 99%-100%. (the cpu seem to feed the card, cpu usage 80%)

Ram are 2 x 4gig.

Will test later if Rosetta@home kinda kill something on gpugrid side.







Trotador
Send message
Joined: 25 Mar 12
Posts: 90
Credit: 1,431,740,570
RAC: 8,227
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 41491 - Posted: 6 Jul 2015 | 11:05:50 UTC

It would be nice if some people with the right cards could do this same exercise with the GTX 980 and GTX 980 Ti, i.e two simultaneous Gerard WUs in a single card.

The results will be interesting to reevaluate the cost/efficiency relation among these three card models.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2058
Credit: 15,019,398,669
RAC: 4,416,896
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41492 - Posted: 6 Jul 2015 | 11:37:05 UTC - in response to Message 41489.

Then it's ok. :)

Profile Francois Normandin
Send message
Joined: 8 Mar 11
Posts: 71
Credit: 641,960,551
RAC: 1,199
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 41496 - Posted: 6 Jul 2015 | 18:49:07 UTC

Thanks Trotador, if i rememeber i passed from 71% gpu load to 84%, so maybe 10% gain? My only problem is gpugrid let me just donwload two WU at a time, so when 1 finish one, i have to wait during the upload and donwload of the next one before starting crunching again at (full) gpu load.

Thanks Retvari Zoltan*.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 82,949
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41504 - Posted: 8 Jul 2015 | 5:33:56 UTC - in response to Message 41496.
Last modified: 8 Jul 2015 | 6:08:59 UTC

For Ref/Comparison

970 on WinXP (one task at a time)*:
GERARD_FXCXCL12_LIG_6644051-0-1-RND1102_1 34,159.06 (~9.5h) 255,000.00
NOELIA_ETQunboundx2-0-2-RND2064_0 14,912.33 (~4.25h) 75,000.00

970 on Win7 (two tasks at a time)†:
GERARD_FXCXCL12_LIG 75,167.70 (~20.9h) 255,000.00
NOELIA_ETQunboundx1 23,301.72 (~6.5h) 75,000.00

* Slower system (CPU/system bus) only 90% power, 85% GPU usage.

† I've these GPU's throttled to 80% power, though they sneak an extra 5% (roughly 7% slower but using 15% less energy each [based on 2 task at a time runtimes]; ~9% more efficient on top of any overall performance gain from running 2 tasks at a time). Note that two tasks at a time doesn't (and can't) make up for the WDDM overhead.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

eXaPower
Send message
Joined: 25 Sep 13
Posts: 280
Credit: 1,449,568,667
RAC: 89
Level
Met
Scientific publications
watwatwatwatwatwatwatwat
Message 41505 - Posted: 8 Jul 2015 | 10:26:31 UTC - in response to Message 41504.
Last modified: 8 Jul 2015 | 11:14:15 UTC

PCI2.0 x4 Vs. PCI3.0 x8 GTX970 -- one task at a time Win8.1 ref/comparsion:

My 970 is pushed to it's ACEMD stable OC limit: 1519MHz NOELIA's and 1506MHz GERALD. No CPU tasks compute nor is SWAN enabled.

Everything is the same except PCIe width:

3.0 x8 lane NOELIA_ETQunbound: 13258sec (135W) 73% core
2.0 x4 lane NOELIA_ETQunbound: 16903sec (120W) 73% core

-- NOELIAs >20% slower on 2.0 x4

-- GPU core temps rose another 4C with 3.0 x8 compared to 2.0 x4. (30C Ambient)
-- +20C delta between core and ambient -- the average delta used to be +14 to 18C depending on ambient. If ambient is 20-25C - the GPU core would be 35-45C depending on ACEMD WU type.

2.0 x4 lane GERARD_FXCXCL: 46500sec (140W) 72% core
3.0 x8 lane GERARD_FXCXCL: estimated 36500sec (160W) 78% core

--2.0 x4 is slower >20% again for GERALD.

Post to thread

Message boards : Graphics cards (GPUs) : GTX 970, 2 WU (Long), 2 CPU.