Possible to run two instances of GPUGrid on a GTX 660?

Author	Message
lukeu Send message Joined: 14 Oct 11 Posts: 31 Credit: 81,420,504 RAC: 0 Level Scientific publications	Message 27980 - Posted: 9 Jan 2013, 11:05:51 UTC Last modified: 9 Jan 2013, 11:06:13 UTC Greetings, The background is that I've upgraded from GTX 260 -> 660. I'm happy to report my desktop experience is now beautifully fluid even with GPUGrid crunching! So fluid in fact, that I've made these observations: - GPUGrid now consumes 100% of a CPU core, so it now seems to be CPU-bound. (Gee, I didn't think an i7-860 was ~that~ shabby a CPU.) - After a few hours of 3D gaming, I noticed it didn't make that much of a dent in the time of the results returned. This makes me think... maybe I can crunch more work by running 2 instances of GPUGrid simultaneously? Is there a straightforward way to convince BOINC to do this? I found some promising info in the following link, but I have to say, I'm completely lost: http://www.xtremesystems.org/forums/showthread.php?283510-Customizing-BOINC-app_info.xml ID: 27980 · Rating: 0 · rate: / Reply Quote

lukeu Send message Joined: 14 Oct 11 Posts: 31 Credit: 81,420,504 RAC: 0 Level Scientific publications	Message 27981 - Posted: 9 Jan 2013, 12:40:34 UTC Update - the bottom of this page: http://boinc.berkeley.edu/wiki/Client_configuration Suggests an easy sounding way, by adding a 'app_config.xml' file to the projects' directory. However this is only available with 7.0.40+ clients. But the official BOINC is still at 7.0.28 :-/ Hmmm... should I bother with a possibly unstable version... ID: 27981 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 27982 - Posted: 9 Jan 2013, 12:46:57 UTC - in response to Message 27980. - GPUGrid now consumes 100% of a CPU core, so it now seems to be CPU-bound. (Gee, I didn't think an i7-860 was ~that~ shabby a CPU.) It's not because your CPU became shabby. It's because the GPUGrid client is programmed to use a full CPU thread to feed any member of the Kepler GPU family. - After a few hours of 3D gaming, I noticed it didn't make that much of a dent in the time of the results returned. The Kepler GPUs are superscalar, which feature can't be utilized by the GPUGrid client (e.g. the GPUGrid client can utilize only the 2/3rd of the shaders), but the games can use this feature, which means that running a game simultaneously would not impact significantly the performance of the GPUGrid client (depending on many factors) This makes me think... maybe I can crunch more work by running 2 instances of GPUGrid simultaneously? No. You can check the GPU usage by third party tools like MSI Afterburner. It will show you the GPU usage is above 95% (actually it shows the utilization of the work distributing components inside the GPU, not the percentage of the utilized shaders). If you would run two instances at the same time, the running times would double (at least), and you would lose the bonus for fast result returns. Is there a straightforward way to convince BOINC to do this? There is only a very complicated way to do that... I found some promising info in the following link, but I have to say, I'm completely lost: http://www.xtremesystems.org/forums/showthread.php?283510-Customizing-BOINC-app_info.xml ...as you already know that. Other projects have low GPU utilizing clients, those can gain performance increase from that method. It's counterproductive here at GPUGrid. ID: 27982 · Rating: 0 · rate: / Reply Quote

lukeu Send message Joined: 14 Oct 11 Posts: 31 Credit: 81,420,504 RAC: 0 Level Scientific publications	Message 27984 - Posted: 9 Jan 2013, 13:38:51 UTC Köszi Zoltán! Good to know it's not worth the bother. The Kepler GPUs are superscalar(...) Cheers for the keyword tip :-) A quick Google helped me understand your statement better. (http://www.anandtech.com/show/3809/nvidias-geforce-gtx-460-the-200-king/2) You can check the GPU usage by third party tools like MSI Afterburner. It will show you the GPU usage is above 95% (actually it shows the utilization of the work distributing components inside the GPU, not the percentage of the utilized shaders) Indeed, I've been using Process Explorer which showed GPU "Engine 1" around 90% (and Engine 0 around 5%) with GPUGrid running. Running a game bumps Engine 0 to 50-70%. So I'd realised it's not possible to infer the actual GPU 'saturation' (if you like) from this metric. So thanks also for the tip about why that is. If you would run two instances at the same time, the running times would double (at least), and you would lose the bonus for fast result returns. I'm actually running the short queue since my PC is only on 8 hours a day, so I thought it might still be useful in this context. But apparently not. ID: 27984 · Rating: 0 · rate: / Reply Quote

lukeu Send message Joined: 14 Oct 11 Posts: 31 Credit: 81,420,504 RAC: 0 Level Scientific publications	Message 28016 - Posted: 11 Jan 2013, 17:45:30 UTC - in response to Message 27982. It's because the GPUGrid client is programmed to use a full CPU thread to feed any member of the Kepler GPU family. I still can't help but think that either: (a) The dedicated GPUGrid thread (calling 'cuStreamSynchronize' according to ProcessMonitor) really has maxed-out one CPU core (12.48% CPU w/ 8 CPUs), which I would presume implies that some queues must be running empty somewhere in the GPU(??) So then the question would be whether the additional GPU scheduling overheads from 2 processes would exceed any small gain. (I remember reading articles saying Windows 7 has done wonders with regards to GPU scheduling...) (b) Or are you saying GPUGrid is polling & spinning (rather than waiting on some event) with Kepler-based cards? It sounds inefficient, if so. UPDATE: I had a brain-wave how to test this: I under-clocked the card to 50% Memory + Core speeds, confirmed that GPUGrid processing slowed down. Yet, the CPU usage stayed at 100% of 1 CPU core. Wow, that really does back theory (b) CONCLUSION: That seems rather inefficient! In fact, I had to reduce the number of CPU-only tasks I run in BOINC by one due to GPUGrid after I upgraded to a GFX660, because of its high-CPU usage. IF (and that's a big if!) my armchair reasoning holds, it would seem a bit unfair that other projects lose CPU time due to this inefficient loop (?) ID: 28016 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 28017 - Posted: 11 Jan 2013, 19:34:50 UTC - in response to Message 28016. (I remember reading articles saying Windows 7 has done wonders with regards to GPU scheduling...) Maybe it's because Windows XP came out 8 years earlier than Windows 7 therefore it couldn't know about this fresh marketing bs, so the GPU tasks are running 10% faster on WinXP than on Windows 7. (Actually it's because the WDDM overhead in Windows 7.) (b)... are you saying GPUGrid is polling & spinning (rather than waiting on some event) with Kepler-based cards? I'm not the programmer of the GPUGrid application, but as far as I can tell it works this way. Earlier (so it still applies to the CUDA 3.1 application) there was a system environment variable called SWAN_SYNC. If this variable was set to "0", the CUDA 3.1 client was continuously polling the GPU, causing significant increase in the GPU usage (at the time when the Fermis came out). It sounds inefficient, if so. Quite the contrary. It would be inefficient if the GPU ran at lower utilization. However it's obvious that the lesser Keplers benefit less from this method than the high-end ones. UPDATE: I had a brain-wave how to test this: I under-clocked the card to 50% Memory + Core speeds, confirmed that GPUGrid processing slowed down. Yet, the CPU usage stayed at 100% of 1 CPU core. Wow, that really does back theory (b) CONCLUSION: That seems rather inefficient! In fact, I had to reduce the number of CPU-only tasks I run in BOINC by one due to GPUGrid after I upgraded to a GFX660, because of its high-CPU usage. IF (and that's a big if!) my armchair reasoning holds, it would seem a bit unfair that other projects lose CPU time due to this inefficient loop (?) You benefit more on GPUGrid, than you lose on the CPU project. You should also consider the fact that your CPU has only 4 cores (e.g. 4 FPUs) while each core can handle 2 threads at the same time, so 4 really efficient CPU tasks could saturate (in theory) the 4 FPUs (in that case you wouldn't gain any performance by running more than 4 tasks simultaneously). In reality the gain of running more than 4 tasks simultaneously depends on the (FPU utilization efficiency of the) code of the given application (e.g. the project). Maybe you wouldn't see any drop in the RAC of your CPU project on this host by reducing the number of CPU only tasks. ID: 28017 · Rating: 0 · rate: / Reply Quote

lukeu Send message Joined: 14 Oct 11 Posts: 31 Credit: 81,420,504 RAC: 0 Level Scientific publications	Message 28019 - Posted: 11 Jan 2013, 23:34:11 UTC - in response to Message 28017. Maybe it's because Windows XP came out 8 years earlier than Windows 7 therefore it couldn't know about this fresh marketing bs, so the GPU tasks are running 10% faster on WinXP than on Windows 7. (Actually it's because the WDDM overhead in Windows 7.) Nice way to put it :-) It was more like academic stuff - e.g. GPUView, which this guy developed during an internship at Microsoft, that took my interest: http://graphics.stanford.edu/~mdfisher/GPUView.html I found it by digging around trying to understand these "GPU Engines". Personally, smooth multitasking matters a lot to me, and this is the first hardware+OS combo I've been (finally) impressed by. I can live with 10% if WDDM gives me that. Quite the contrary. It would be inefficient if the GPU ran at lower utilization. (...) You should also consider the fact that your CPU has only 4 cores (...) so 4 really efficient CPU tasks could saturate (in theory) the 4 FPUs (...) Maybe you wouldn't see any drop in the RAC of your CPU project on this host by reducing the number of CPU only tasks. All very good points. My limit of 3 CPU tasks (now 2 CPU + GPUGrid) is to avoid fan noise. Since you make a convincing argument for the high CPU, I may look into getting a water cooler to add 2 more CPU threads. ID: 28019 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 28020 - Posted: 12 Jan 2013, 9:36:25 UTC - in response to Message 28019. Last modified: 12 Jan 2013, 9:43:31 UTC It's still 11% assuming you optimize the operating system (adjust for best performance, turn off aero). If you don't optimize and you use the system a lot then performance could be 15 or 20% less than with XP or Linux. Polling isn't the same as crunching on the CPU; it won't produce as much heat, and it will have less impact on the CPU, especially an HT CPU. The way the CPU is now used to support Kepler cards is an improvement on previous systems; its better optimized for GPU utilization and takes some of the configuration away from the cruncher (it's done for you). The result is improved credit and less errors. The price of a CPU water cooler can be two or three times the price of a good heatsink and fan. £60 to £100 vs £20 to £30. The benefits can be reduced noise, reduced CPU temperature, increased space, reduced heat radiation in the system and onto the GPU in particular. The cons might be longevity of the cooler, especially running 24/7, potential leaking. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 28020 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 28022 - Posted: 12 Jan 2013, 10:51:38 UTC Zoltan said it all quite nicely, nothing to add there :) However, regarding you cpu fan noise problem: the cheapest thing to do would be undervolting the CPU. THere should be at least a margin of 0.1 V at stock clocks, probably even more. This automatically reduces the power consumption of the GPU, fan noise and electricity bill. Go too far down and the CPU becomes unstable, though - so some stability testing would be needed. I'm also assuming you're using the Intel stock heat sink & fan. These are sufficient for normal use, but too weak for continous crunching at comfortable noise levels. Any modern mid-range 30€ air cooler (e.g. Arctic Cooling i30, Thermalright Macho) should have no problem with this CPU at stock settings. No need for expensive water cooling. MrS Scanning for our furry friends since Jan 2002 ID: 28022 · Rating: 0 · rate: / Reply Quote

lukeu Send message Joined: 14 Oct 11 Posts: 31 Credit: 81,420,504 RAC: 0 Level Scientific publications	Message 28064 - Posted: 16 Jan 2013, 17:58:21 UTC - in response to Message 28022. Regarding you cpu fan noise problem: the cheapest thing to do would be undervolting the CPU. Oh brilliant advice. What I did specifically was edit my Balanced Power plan, and nudge the "maximum processor state" down to 99%. This seems to just tell the i7 "stop overclocking yourself", as CPU_Z showed the core speed drop from 2933MHz to the stock 2800 MHz. The voltage went from teetering around 1.14 to around 1.08. The temperature visibly dropped within seconds. Nice. I then launched my 3rd BOINC CPU task again, and I'm still running blissfully silently. So that's 43% extra CPU computation within the temperature constraint, with no measurable impact on GPUGrid. Nice! ID: 28064 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 28066 - Posted: 16 Jan 2013, 22:52:37 UTC - in response to Message 28064. Glad to hear it helped :) With this first generation of turbo-mode Intel was playing it quite safe and used unnecessarily high voltages. And I'm positive that your chip could do the 2.8 GHz at significantly less than 1.08 V.. but this was a nice first step anyway! MrS Scanning for our furry friends since Jan 2002 ID: 28066 · Rating: 0 · rate: / Reply Quote

lukeu Send message Joined: 14 Oct 11 Posts: 31 Credit: 81,420,504 RAC: 0 Level Scientific publications	Message 28069 - Posted: 17 Jan 2013, 10:54:21 UTC - in response to Message 28066. I'd like to keep dynamic voltage. I just checked my BIOS and while it supports setting an offset, it only allows a positive offset. So I just took the next logical step within Windows: 95% power = 2666MHz/1.07V ==> 4 BOINC CPU tasks == 80% more CPU work, in near silence Quite happy with that. ID: 28069 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 28073 - Posted: 17 Jan 2013, 19:52:10 UTC - in response to Message 28069. Uhh.. I've got such a stupid mobo at work as well. Can't clock the i7 2600 as high as before because I can't go below stock voltage any more. Anyway, problem solved :) MrS Scanning for our furry friends since Jan 2002 ID: 28073 · Rating: 0 · rate: / Reply Quote