Message boards :
Graphics cards (GPUs) :
Maxwell now
Message board moderation
Previous · 1 . . . 11 · 12 · 13 · 14
| Author | Message |
|---|---|
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I was able to clock the GDDR5 to 3500MHz and over, even when OC'ing the GPU core and with power >110%. That said I did get an error at silly speeds. Initially I was just being inquisitive to understand why the 256bit bus has such a high usage for some tasks, how much of a problem it was and looking for a happy spot WRT power and efficiency. The P-states are still a bit of a conundrum and I suspect we are looking at a simplified GUI which attempts to control more complex functions than indicated by NV Inspector. It's certainly not easy to control them. P-states are apparently distinct from boost and it's not clear what if any impact CPU usage has on boost: When I don't fix the P states the GPU clocks drop under certain circumstances. For example, when running a CPU MT app the GPU had been 1050MHz (no boost) and when CPU usage changed to 6 individual CPU apps the power went from 58% to 61%, GPU usage rose from 82% to 84% and the GPU clock rose to 1075MHz - still no boost (all running a long SDOERR_thrombin WU). When I force the GPU clock to 1190MHz while using 6 CPU cores for CPU work the GPU still does not boost. Not even when I reduce the CPU usage to 5, 4 or 3 (for CPU work). By that stage GPU usage is 86 to 87% and power is 66%. I expect a system restart would allow the GPU to boost again, but IMO boost is basically broken and the P-states don't line up properly. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
skgiven: Have you tried NVAPI for 343 branch? Might help to break through P2 memory lock and boost inconsistency or at the very least: provided new information. New data structures are included for Maxwell. Reference Documents are also available. I've seen 157 API functions with 343.98 and 163? for 344.60 NV_GPU_PERF_PSTATES20_INFO_V2 NV_GPU_CLOCK_FREQUENCIES_V1 NV_GPU_CLOCK_FREQUENCIES_V2 NV_GPU_DYNAMIC_PSTATES_INFO_EX NV_GPU_PERF_PSTATES20_INFO_V1 Used in NvAPI_GPU_GetPstates20() interface call NV_GPU_PERF_PSTATES20_PARAM_DELTA Used to describe both voltage and frequency deltas NV_GPU_PERF_PSTATES_INFO_V1 NV_GPU_PERF_PSTATES_INFO_V2 NV_GPU_PSTATE20_BASE_VOLTAGE_ENTRY_V1 Used to describe single base voltage entry NV_GPU_PSTATE20_CLOCK_ENTRY_V1 Used to describe single clock entry NV_GPU_THERMAL_SETTINGS_V1 NV_GPU_THERMAL_SETTINGS_V2 https://developer.nvidia.com/nvapi |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
(Sorry for TLDR posts) Thanks eXaPower for those long posts. A lot of information concerning the NV architectures. Very interesting. |
|
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Maxwell and Kelper's: CUDA/LD/ST/SFU/TMU/ROP/warp schedulers/Instruction cache buffer/Dispatch unit/Issue/Cross Bar/Polymorph Engine"SMM" "SMX" "SMM" 128C/32LD/ST/4ROP/32SFU/8TMU= 204 4WS/4IB/8DPU/9iss/1PME/4CrB SMM Total 234 "SMX" 192C/16TMU/8ROP/32LD-DT/32SFU=280 1IC/4WS/8DPU/1CrB/9Issue/1PME SMX Total 304 "SMM" equals 77% of Kelper "SMX" Kelper GK110: 304*15= 4560 Maxwell GM204: 234*16= 3744 Kepler GK104: 304*8= 2432 GK104 65% of GM204. GK104 53.3% of GK110 GK110/GK104 "SMX" consists of 63.1% CUDA cores GM204 "SMM" 54.7% of CUDA cores. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I wrote my vendor (Galax) an email - hopefully I'm not just getting some useless standard reply. @eXaPower: how could I "try NVAPI for 343 branch"? I know the tweak tools are using it, but have no experience on using it myself. @SK: "but IMO boost is basically broken". Quite the contrary, I would say! It's better than ever, quicker and more powerful with a wider dynamic range. However, it's obviously not perfect yet. Besides something sometimes getting messed up (e.g. your example), there's also a problem when a card with a borderline OC is switched from a load boost state to a high one. Here the clock si raised faster than the voltage, which can leave the card in an unstable state for a short period of time. Enough of time to cause driver reset for gamers. But this could easily be fixed by changing the internal timings of boost, it's certainly not fundamentally broken. @Zoltan: in my example it's easy to agree that the lower memory clock makes my card stable, while the higher one doesn't. But why is it stable in Heaven for hours? Running GPU-Grid I'm getting BSODs within minutes. Surely a demanding benchmark can't be this fault-tolerant and game etc. should also crash frequently. Or put another way: if GP-GPU isn't stable at those clocks, IMO they couldn't sell the cards like this, because other software would crash / error too frequently. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Finally found the time to summarize it and post at Einstein. I also sent it as bug report to nVidia. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Any information regarding the Maxwell update? |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
In time there may well be a GTX960, 990, 960Ti, 950Ti, 950, 940, 930 &/or others, and when GM200/GM210 turns up there could be many variants in the GeForce, Quadro and Tesla ranges... FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Nvidia released a statement about GTX970 memory allocation issues. There are reports GTX970 can't properly utilize 4GB. For Reference: Kelper's [8] dispatch feeds {1} large crossbar into [10] issues routed to SMX CUDA/LD/ST/SFU. SMX has an issue for 32CUDA {1] 16SFU [1] for 16LD/ST units. Totaling [2] issues for 32SFU and [2} for 32LD/ST units with [6] issues for 192CUDA inside [1] SMX. An SMM consists of 12 issues and [8] dispatch. Maxwell's crossbar split into 4 slices per SMM. [2] dispatch for each slice that feeds [3}issues: [1] for 32CUDA [1] for 8SFU [1] for 8LD/ST. SMM Totals [4] issues for 128CUDA and {4] for 32SFU and [4] for 32LD/ST units. GTX980 consists of 64 crossbar slices for 192 issues and 128 dispatch while the 970 has 52 slices with 156 issues and 104 dispatch. A GTX780ti has 15 cross bars with 150 issues and 120 dispatch. Keep in mind: including all resources with-in an SMX or SMM - the CUDA core percentage is higher in SMX than SMM. GK110/GK104 "SMX" consists of 63.1% CUDA cores GM204 "SMM" 54.7% of CUDA cores. Nvidia states: “The GeForce GTX 970 is equipped with 4GB of dedicated graphics memory. However the 970 has a different configuration of SMs than the 980, and fewer crossbar resources to the memory system. To optimally manage memory traffic in this configuration, we segment graphics memory into a 3.5GB section and a 0.5GB section. The GPU has higher priority access to the 3.5GB section. When a game needs less than 3.5GB of video memory per draw command then it will only access the first partition, and 3rd party applications that measure memory usage will report 3.5GB of memory in use on GTX 970, but may report more for GTX 980 if there is more memory used by other commands. When a game requires more than 3.5GB of memory then we use both segments. http://www.pcper.com/news/Graphics-Cards/NVIDIA-Responds-GTX-970-35GB-Memory-Issue http://images.anandtech.com/doci/7764/SMX_575px.png http://images.anandtech.com/doci/7764/SMMrecolored_575px.png |
|
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Nvidia admits the GTX970 allows 1792KB of L2 cache and 56ROPS to be accessed. http://www.pcper.com/reviews/Graphics-Cards/NVIDIA-Discloses-Full-Memory-Structure-and-Limitations-GTX-970 Despite initial reviews and information from NVIDIA, the GTX 970 actually has fewer ROPs and less L2 cache than the GTX 980. NVIDIA says this was an error in the reviewer’s guide and a misunderstanding between the engineering team and the technical PR team on how the architecture itself functioned. That means the GTX 970 has 56 ROPs and 1792 KB of L2 cache compared to 64 ROPs and 2048 KB of L2 cache for the GTX 980. http://anandtech.com/show/8935/geforce-gtx-970-correcting-the-specs-exploring-memory-allocation The benchmarking program Si software has reported 1.8MB (1792KB) of cache for GTX970 since the beginning. It was always chocked up has a bug. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
For here, the MCU usage is lower on the GTX970 than on the GTX980. Although the 970's bus width is effective 224+32 and suggestions are that 224bits are predominately used, the lower MCU usage might still be explained by the marginally favourable (by 7%) shader to bus ratio of the 970 over the 980 (1664 over 224bits vs 2048 over 256bits) and slightly lower GPU clocks. However, and despite some interpretations, I think its possible that all 256bits of the bus are actually used for accessing up to 3.5GB of GDDR5, after which it becomes 224bits for the first 3.5GB and 32 for the next 0.5GB. While on the whole the 2MB to 1.75MB cache doesn't appear to have any impact, it might explain some of the relative performance variation of different WU types. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Well, we can surely say that the smaller L2 cache doesn't hurt GPU-Grid performance much. But we can't say anything more specific, can we? Regarding the "new" bandwidth: from the explanation given to Anandtech it's clear that the card can not read or write with full 256 bit. The pipe between the memory controllers and the cross bar just isn't wide enough for this. You can see it where the traffic from the memory controller with the disabled ROP/L2 is routed through the ROP/L2 from its companion. At this point 2 x 32 bit would have to share a 1 x 32 bit bus. This bus is bidirectional, though, so you can use all memory controllers at once if the 7- and 1-partitions are performing different operations. Which I imagine is difficult to exploit efficiently via software. MrS Scanning for our furry friends since Jan 2002 |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
|
|
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
You're correct: Kelper disables a full memory controller and 8 ROP that comes with SMX. The GTX970 is still the best NVidia performance/cost card for amount of feature sets included. To make matters even more confusing about SMX disabling: the GT630 (2 SMX/C.C3.5) has 64 Bit bus with 512kb cache. Another example: GTX650tiboost (4SMX/GK106) die that includes a full GK106 cache (384KB) 24ROPS and memory bus (192bit) along with 3GPC - same as 5SMX GTX660. In the prior generation of Kepler-derived GPUs, Alben explained, any chips with faulty portions of L2 cache would need to have an entire memory partition disabled. For example, the GeForce GTX 660 Ti is based on a GK104 chip with several SMs and an entire memory partition inactive, so it has an aggregate 192-bit connection to memory, down 64 bits from the full chip's capabilities. From Damien Triolet at Hardware.fr: The pixel fillrate can be linked to the number of ROPs for some GPUs, but it’s been limited elsewhere for years for many Nvidia GPUs. Basically there are 3 levels that might have a say at what the peak fillrate is : Testing (forums) reveals GTX970 from different manufactures have varying results for Peak rasterization rates and Peak pixel fill rates. (at the same clock/memory speeds) Is this because GTX970 SMM/cache/ROPS/Memory structures are disabled differently form one another? Reports include - not ALL GTX970 are affected by 224+32bit bus or the separate 512MB pool with a 20-28GB bandwidth slow down. Does this reveal NVidia changed the SMM disablement process? Is/was a second revision of GTX970 produced yet? AnandTech decent explanations about SMX structures can be found in the GTX660ti and GTX650ti reviews to compare with SMM recent articles. When sorting through for reliable information: Some tech forums threads misinformed 970 comments are hyperbolic - completely unprofessional and full of trolling. |
©2025 Universitat Pompeu Fabra