Message boards :
Graphics cards (GPUs) :
Big Maxwell GM2*0
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
| Author | Message |
|---|---|
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
My guess is that if someone used a Titan X or 980Ti on XP or Linux it would perform much better and on Vista-W10 if SWAN_SYNC=1 was used, no CPU tasks were running and 2 or more tasks were run at the same time the overall performance would be much better. I would even suggest that running 2 tasks on Linux/XP might be beneficial. On W7 I had to alter the L1-P5 performance using NVidia inspector just to get my clocks normal. You really need to watch the performance details carefully - these tasks are prone to downclocking (but that might make them better for running 2 tasks simultaneously). FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Going by Localizer's returns -NOELIA_ETQunboundx1 tasks look like they are only 2 or 3% faster on a GTX980Ti than a GTX980, but SWAN_SYNC isn't on and who knows how much the CPU is being pushed? NOELIA_ETQunboundx (no SWAN_SYNC) GK107's show a 4% utilization drop (88/92%) with 2 CPU task vs. none. GM200/GM204 will see at least >4% if CPU has other WU's with or without SWAN. On GERALD's: a Quad CPU + GM204 will lose <5% GPU utilization with two CPU tasks. With 1 task = 3% while having 4 CPU WU causes a lose of 7%. (Haven't tested utility lose on current NOEILA with GM204/107.) I assume - with(out) SWAN - no matter how many PCIe3.0 lanes are active: any GPU's core/MCU/BUS ACEMD usage will still drop when the CPU has other tasks computing. On my GTX970's these NOELIA_ETQunboundx1 tasks are struggling to push the cards hard enough to keep them interested (GPU0: power 20%, usage 83%, 41C, 405MHz [downclocked], GPU1 power 76%, usage 67%, 64C, 1266MHz). This batch looks to be working the GPU's cache harder. Runtime theory: The 1/1 Titan X 24SMM vs. 22SMM 1/1. with 3072MB L2 cache affect runtimes ever so slightly. Will the 980ti truly be faster than the Titan X at ACEMD? A stalled cache can render Maxwell overclocks null and void. Currently: (10k/30K) runtime - My 750's 4/1 (4SMM/2048MB L2 cache) NOELIA_ETQunboundx output is 33% of the 980ti. . The 980ti is a (512/2816CUDA) larger GPU by 5.5x. 750 core amount is 18.1% the GTX980ti. From the look of it: the 980ti (10K) are twice GK104's runtimes at 16-28K. GTX980ti ETQunboundx runtimes are 8 to 9x better than my GK107 at (77K). |
|
Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
By the way, I thought it was SWAN_SYNC=0? Or did that change to 1? |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It's no longer 'necessary' for a good performance, but in my experience it still speeds things up a few %. The 334 driver brought things back to normal re controlling the CUDA runtime to use a low-CPU mode without the performance dropping away (prior to that it was needed to get better performance and made more of a difference). On Windows it might just need to exist now, but on Linux I think it needs to be 1 (both used to be 0). Basically SWAN_SYNC polls the CPU continuously; keeping processes fresh. When things didn't work properly process lasso and changing priority were also/alternatively sometimes used to expedite the tasks and with some success, but all this totally depends on CPU usage; completely saturate the CPU and you're just not going to get good results. Also varies by CPU type/model. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Enabled swan_sync, took another task off WCG, got 7.26 on Gerard. Over clock is a tad over 1400MHz now with a .12mV voltage increase. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
That's around 9.5% of an improvement over your previous setup and not far off the performance of a Titan X (on the chart; 150% vs 156%), not that I know the setup that was used there (optimizations, CPU usage). You said the Gerald utilization was around 75%. What was the NOELIA utilization? What's the memory run at by default? (3505MHz or 3005MHz)? You should try to run 2 tasks at a time on the GTX980Ti to get some idea of what the gain is. I'm trying this now on two GTX970's and it looks promising, but these GM's are high maintenance; I have to use NVidia inspector to keep the clocks high. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
With swan sync on, Gerald was actually 81%. Memory default I'll have to look at when I get back home, and I don't remember noelia usage. I'm going to Japan for 2 weeks on monday, so I probably won't be attempting to run 2 tasks til I get back. |
|
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
You should try to run 2 tasks at a time on the GTX980Ti to get some idea of what the gain is. I'm trying this now on two GTX970's and it looks promising, but these GM's are high maintenance; I have to use NVidia inspector to keep the clocks high. A look at hostid=137780 (win8.1) 970 computing two at a time reveals ~8WU per 24hr NOELIA_ETQunboundx. (2 tasks finish in 22k/sec) The 970's 2WUat gain up to 40% ETQunboundx daily improvement vs. 1WUat 970's. A Win7 GTX980ti daily ETQunboundx = 10.1WU (1WU per 2.36hr) >20% higher than (2WUat) 970. One ETQ task at a time GPU's: RZ's 980 output 9WU per day. (My 17SMM completes 9 per day or 1WU per 2.66Hr). 980ti supplies 27.3% more cores than a 980 - the 980ti present daily ETQ performance 10% higher than the fastest XP 980 and >15% WDDM 980's. (10-15% figures concern 1 WU at a time.) |
|
Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Just to get back to you, the memory is at 3505MHz. |
|
Send message Joined: 17 Apr 08 Posts: 113 Credit: 1,656,514,857 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Had a couple of days with the 980Ti now. In my setup it runs comfortably at 1400, any higher and I lose a few WUs. I have enabled swan_sync and Noelia WUs come in at about 8.5K seconds, just finishing up my first Gerard and that looks to run sub 7 hours if the last few % don't throw any surprises. Thanks for the tip about swan_sync - am now using that on all my hosts. With swan_sync on I am getting 75% usage on Noelia WUs & 79-80% usage on Gerard WUs. Can anyone post a link to the 'How-to' thread on Multiple WUs on a single GPU & I'll give that a go.... |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
the interesting tidbit for me was that the 980ti is doing noelia in around 10k seconds, with my older 680 rig doing them in 18k seconds. and now your returning in <9K: Run time 8,985.24 CPU time 8,834.44 https://www.gpugrid.net/result.php?resultid=14259463 I ran 2 NOELIA tasks at a time on my GTX970's to see what I could do. These are from the faster card: e3s32_e1s258f65-NOELIA_ETQunboundx2-1-2-RND7802_0 10992237 21,718.60 20,296.04 75,000.00 e5s76_e1s526f70-NOELIA_ETQunboundx2-0-2-RND6354_0 10992080 21,638.48 20,227.95 75,000.00 My fastest NOELIA by itself: e1s451_1-NOELIA_ETQunboundx1-0-2-RND8377_0 10984428 13,889.76 13,889.76 75,000.00 Suggests a 28% overall improvement on a little GTX970. Localizer, 75% usage on Noelia WUs & 79-80% usage on Gerard WUs. Plenty of room there for 2 tasks :) How To create and use an app_config.xml file in Windows FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
@localizer. I'm surprised you're losing WUs above 1400. I'm sitting at over 1450 now, and still think I may be able to push higher. Do you have a reference card, or one with better cooling? So far, I'm rather impressed with the clocks these can achieve on a minimal to no voltage increase. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
5pot, your GPU is under 60C whereas localizer's is reaching 71C. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Well, I tried to hit 1500, and walked away for the night. Whoops. Lol. Won't be able to stop it til tomorrow. I do think with an overvoltage I can make it stable. I had it working fine at 1460 without an overvolt. C'est la vie E: unless there's a way I can make it stop getting tasks through the website I suppose. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
E: unless there's a way I can make it stop getting tasks through the website I suppose. You should be able to do that by editing your preferences and deselecting,
the ACEMD GPU apps [If no work for selected applications is available, accept work from other applications?] Use Graphics Processing Unit (GPU) if available.
FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 17 Apr 08 Posts: 113 Credit: 1,656,514,857 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
........... my 980Ti is in a SFF case and is a reference card. I'm confident I can stabilise it over 1400 - but have not had time to fine tune yet. |
|
Send message Joined: 20 Jul 14 Posts: 732 Credit: 130,089,082 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
the interesting tidbit for me was that the 980ti is doing noelia in around 10k seconds, with my older 680 rig doing them in 18k seconds. https://twitter.com/TEAM_CSF/status/609620024215633920 [CSF] Thomas H.V. Dupont Founder of the team CRUNCHERS SANS FRONTIERES 2.0 www.crunchersansfrontieres |
|
Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
E: unless there's a way I can make it stop getting tasks through the website I suppose. Thank you for reminding me of this, changed it when you posted. I've got it clocked down to 1375 for the 2 weeks I'll be away. Want to make sure I don't lose that much crunching time for an OC. @Localizer, these cards can do really well. Good luck with the fine tuning of your card. @Dupont, thanks for the shout out. :) When I get back, I'll try and hit 1500. But I'm beginning to think I'll need to overvolt a little more than I would like to run it permentantly at. But it's always fun to see where the max is. |
|
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Running 1 WU I don't expect the 980Ti to be 37.5% faster than a GTX980 NOELIA_46 (467x) update: -single 980ti WU (2.903ms) per step performances (+18.3 to 30%) capable a (3.553ms) GTX980. -980ti per step efficiency (+30 to 50%) the routine 970 (4.168 ms). -GTX750 10.878ms per step (-73.3%) a 980ti. -Surprisingly: GM107 have nearly the same NOELIA_46 runtimes as cut GK104's. So, it would better to compare the 980Ti against the well established performances of the GTX980. 56k/atom NOELIA_46 WU realizing GM200's (and GM107/206/204) superior ACEMD performance. Noelia per WU utilization tends to be higher than GERALD 32K/atom design. GM200's single WUat throughout can steadily be +30% the fastest GM204's at ACEMD. When configured in peak compute conditions - a 50% increase over GM204 is viable. If the many ACEMD environment factors that can increase (Maxwell) WU runtime are active - a 50 to 70% output lose incurs compared to 2WUat. (I've purposely found the negative limit for my 970 at 1.5GHz. Next up: Achieving top performance. ACEMD is sensitive to slowdown factors.) skgiven recently outlined some ACEMD performance factors: That is a bit faster, albeit only ~2.5%. This is because the latest apps are quite good at utilizing the GPU without much CPU reliance and you have 4 real cores with 4 true threads. On CPU's with HT the difference tends to be higher. For efficiency sake: PCIe x8 required for any Maxwell - big or small. Single GPU or multi - ACEMD is influenced and very sensitive to serial links and <3GHz CPU's. An under clocked CPU and MB whose PCIe lanes are saturated - reckons runtimes. Demanding BUS tasks on x4 are driven away from optional performance. ACEMD is one of the BOINC tasks affected 20% on x4 compared to x8/16. Primegrid's N17 Genefer is another. PCIe NVMe/SATAe SSD can hamper Multi GPU's compute runtimes. (z97 chipset) BOINC USB stick and HDD storage factors show zero inference with the GPU unlike a PCIe SSD. Skylake CPU's will improve upon transfers. FYI: Maxwell BIOS tweaker program interprets clocks involved with the GPU. For example: the GM107 L2 cache clock -26-39MHz the GPC. GM204 or GM200 stock vBIOS sometimes show a 150-200MHz difference between the four boost/clock states XBAR/SYS/GPC/L2C. AMD's hybrid cooled flagship (Fury X) is available today - will GM200 MSRP drop in a couple of weeks? http://wccftech.com/amd-radeon-r9-fury-launch-reviews-roundup/ Toms hardware: Whereas GM200 measures 601mm², Fiji is almost as large at 596mm². AMD crams a claimed 8.9 billion transistors into that space, and then mounts the chip on a 1011mm² silicon interposer, flanking it with four stacks of High Bandwidth Memory. Thermal imagery: http://www.tomshardware.com/reviews/amd-radeon-r9-fury-x,4196-8.html SP/DP ratio is 1/16 - similar as Tonga GCN1.2 cores rather than GCN1.1 gaming Hawaii 1/8 (Hawaii Firepro series is 1/2 SP/DP). If a professional Fiji arch is released: FP64 performance might be 1/2 or 1/4 ratio or 1/8 or stay at 1/16. |
|
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
http://www.guru3d.com/articles_pages/zotac_geforce_gtx_980_ti_amp_extreme_review,8.html GM200 (MSI/ASUS/EVGA/ZOTAC/GALAX/INNo/Giga) custom PCB released as a couple were reviewed. Overclocked Ti's surpass reference Titan X and Fury X performance. Custom PCB GM200 have 50$ or more premium asking price over ref. The 10 phase (8+2) - GM200 3 slot heavy set AMP extreme Zotac (320W OP) 'so far' has the highest out of box (1203 TDP base/1355boost) clocks. Rated higher than a 18 phase HoF. The not yet released 17 phase (14+3) Kingpin edition is rumored to be (52MHz) 4 bins better at stock TDP core/boost. 14+3 Classified EVGA's are rated 1 bin less at stock TDP base clock than Zotac's. Gigabyte G1 has shown to be an overclocking beast with it's 600W heatsink and 10 (8+2) phase custom PCB. The ref 6+2 phase PCB limits WC GM200 if lucky 1500 24/7 before VRM start breaking down. GM200 LC or air cooled Custom PCB's >1600MHz is the forefront of 24/7 overclocks. (At these speeds: overclock's will either fail quickly or last to pass depending on the code. Precision testing mandatory for error free long-term compute) There is a 100MHz difference for my 970 between three BOINC projects: 1.5GHz for ACEMD to 1.6GHz for Primegrid's CUDA PPSsieve. POEM's OpenCL is in the middle. Zotac 980ti EX seems not to include (3) 980/970 AMP 13 phase (8+3+2) omega/extreme features: the L2N BIOS switch - manual voltage read out points - Texas Instruments MSP430 USB microcontroller that can be custom programmed. EVGA hybrid/Inno black/ZOTAC articstorm with a (full water block and 3 fans) are the standard 6+2 phase ref PCB with VRM heatsink. Buying a ref 980ti and EK water block costs a touch more than readymade GM200 LC. All these choices are making for decision zap. My newly purchased Z97 MSI MPOWER MB with 3 x16PCIe3.0 slots is waiting for an upgrade. The furthest PCIe slot from CPU true 8lane circuitry - not a 4 lane. (underside of MB shows 8 as does the power slot pins in a 16pin physical slot.) So with 3 GPU's it should be 8x/8x/8x MB. The ti on x8 slot should be enough. A 980ti/970/750 MB around 1.7mil RAC/day at 500W (GPU). If I replace the 750 with a 980 - RAC 2.2mil/day at 650W GPU and <750W system total. Is a platinum 1000W PSU with a 85amp 12V single rail enough to drive a 980ti/980/970 24/7/365 who are all 1.5GHz? Or 1200W/>100amp rail PSU? Consumption/RAC ratio for all Maxwell's are much better than Kelper's. My 970 is currently at 125W for NOELIA_ETQ and 140W for GERALD with <73% core usage. The 970 pulled 175W on ~83% core usage NOELIA_S4 during April. 212W is highest I've seen (AIDA64 Integer) benchmark and SiSoftware. For DP code: 52 total DP cores = 130W. An overclocked GM200/204/206 will operate 30% above "rated" TDP if a heatsink can handle the extra heat. GM200 overclocks can add an extra <50% performance vs. reference speeds dependent upon application MHz scaling. A side note: GTX950(ti) GM206 is supposedly going to be released soon: 768 or 896 cores. (ROP/memory sub-system is unknown) If 950(ti) cache and ROPS are 960's size: a reasonably fast GPU. As 750(ti) before it - the 950(ti) a choice economical system GPU for ACEMD compute. in my opinion the best nvidia GFC 980 based, while published as custom design pcb and cooling.. Indeed - although some were experiencing problematic voltage settings that limited overclocks. Zotac's (broken) firestorm program supposedly designed to work the USB MC. Multi GPU's on MB cause firestorm to stop working. (RIVA based overclocked programs also have an issue with GM204 AMP omega/extreme voltage sliders while NV inspector works fine.) I only recommend NV inspector for overclocks. NVinsp footprint is much smaller than MSI or EVGA. Zotac created 4000 total omega/extreme: 1000 omega 970/1000 omega 980/1000 extreme 970/1000 extreme 980. Zotac's Customer service was informative and helpful towards me - providing detailed GPU dynamics. e.g. life expectancy algorithm absolute values - assuring ~1.6GHz for years to come. So far stable OC's held three months non-stop BOINC projects. |
©2025 Universitat Pompeu Fabra