GPU Task Performance (vs. CPU core usage, app_config, multiple GPU tasks on 1 GPU, etc.)

Author	Message
Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 29353 - Posted: 6 Apr 2013, 17:50:15 UTC - in response to Message 29326. Careful, guys. The GTX650Ti (Johns GPUs) sounds like it's almost the same as a GTX660Ti (Jacobs GPUs), but it's actually about a factor of 2 slower. Currently 70k credit long-runs take John 33k seconds, running 2 of them might require ~60 ks. That's almost one day, so we're getting close to missing the deadline for the credit-bonus here for even longer tasks (some give 150k credits, so should take over twice as long). Another big fly in the ointment regarding the 2x suggestion: Jacob's 660 Ti has 3GB of memory and the 650 Ti in question has only 1GB. I doubt if it can even run 2x. I have a 650 TI and ordered another for this project. Good bang for the buck and for power usage. I would not however suggest running 2x WUs on a 650 Ti. Personally I don't think running multiple concurrent WUs on GPUGrid is a good idea in general as Jacob's seemingly invalid (but validated) WUs would indicate. ID: 29353 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 29356 - Posted: 6 Apr 2013, 19:01:48 UTC - in response to Message 29353. Last modified: 6 Apr 2013, 19:29:28 UTC You are correct that there is a memory concern, thank you for reminding me about this. My testing indicates that going beyond your GPU's memory limit will result in an immediate task failure for the task being added. To test/prove this, I wanted to overload my secondary GPU, the GTX 460 1 GB (which is not connected to any monitor). Below is my setup and my testing results: I suspended all tasks, suspended network activity, closed BOINC, made a copy of my data directory (so later I could undo all this testing without losing active work), changed the cc_config to exclude GPUGrid on device 0 (the 660 Ti), changed the GPUGrid app_config <gpu_usage> value to 0.25 (yup, this is a stress test), and I started resuming GPUGrid tasks 1 at a time, while monitoring Memory Usage (Dynamic) in GPU-Z. Here are the scenarios that I saw (had to reset the data directory for nearly each scenario): Scenario 1: * Added a Long-run Nathan dhfr, GPU Mem Usage became 394 MB * Added a Short-run Nathan RPS1, GPU Mem Usage increased to 840 MB, no problems * Added a Long-run Nathan dhfr, GPU Mem Usage spiked up to 1004 MB, and the task immediately failed with Computation Error Scenario 2: * Added a Short-run Nathan RPS1, GPU Mem Usage became 898 MB * Added a Long-run Nathan dhfr, GPU Mem Usage spiked up to 998 MB, and the task immediately failed with Computation Error Scenario 3: * Added a Long-run Nathan dhfr, GPU Mem Usage became 394 MB * Added a Long-run Nathan dhfr, GPU Mem Usage increased to 789 MB, no problems * Added a Short-run Nathan RPS1, GPU Mem Usage spiked, and the task immediately failed with Computation Error Scenario 4: * Added a Long-run Nathan dhfr, GPU Mem Usage became 394 MB * Added a Long-run Nathan dhfr, GPU Mem Usage increased to 789 MB, no problems * Added a Long-run Nathan dhfr, GPU Mem Usage increased to 998 MB, and surprisingly, all 3 tasks appear to be crunching OK CONCLUSIONS: * If adding a task will put the memory usage beyond your GPU's memory limit, the task being added will immediately fail with Computation Error. * If you look carefully at those Short-run's being added, it looks like they "detect" the available GPU Ram, and run in a certain mode made to fit within that limit. For instance, in Scenario 1, I think the task detected 610 MB free, thought it was a 512 MB card, and limited itself to (840-394=) 446 MB. Then, in Scenario 2, it saw 1004 MB, thought it was a 1GB card, and limited itself to 898 MB. Then, in Scenario 3, there was only 215 MB free, and it couldn't load at all. * If you look at the Long-run's, they follow a similar pattern. In Scenario 1, it saw a 1GB card, but maybe the tasks were built to only ever need 512 MB cards, so it limits to 394 MB. In Scenario 2, it couldn't load within 106 MB. Scenario 4 is interesting, because for the third task, it saw 215 MB free, and was able to run using (998-789=) 209 MB. So it can scale to 256 MB cards perhaps. Looking back at scenario 1, we see that the third task only had 164 MB free, which wasn't enough. * So... I'm going to have to revisit my settings. I'd like to tell BOINC to run 2 tasks at a time on device 0, but only 1 task at a time on device 1... but I don't think that's an option. Another thing I might do is disable certain GPUGrid applications on device 1, such that I can be sure to only double-up ones that I know would work. But, because GPUGrid mixes "types" within a single "app" (like NOELIA and NATHAN, both within the same application), I don't think that's a valid play either. I may just devote device 1 to another project, whilst device 0 is set for 2 tasks. Still deciding, and am open to any suggestions you may have. * Note, this issue hasn't bitten me yet, because device 1 is currently focused on World Community Grid's (WCG) Help Conquer Cancer (HCC)tasks; so GPUGrid tasks haven't doubled up much on my device 1 at all yet, but they will soon, when WCG HCC runs out of work in < 1 month, and I'll need a plan by then. Hope you find this helpful -- it took 40 minutes to test and write up, and I encourage you to do your own testing too! Regards, Jacob ID: 29356 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 29370 - Posted: 7 Apr 2013, 11:48:19 UTC Nice testing Jacob, thanks! I've read somewhere that ACMED supports 2 different modes, a faster one using more RAM and a slower one using less memory. Which one is employed is automatically decided based on available memory. Not sure how flexible the memory consumption is.. but it's certainly a point to keep in mind. We don't want to run 2 WUs for higher throughput and thereby force the app into a slower algorithm, which would very probably lead to a net performance loss. MrS Scanning for our furry friends since Jan 2002 ID: 29370 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 29372 - Posted: 7 Apr 2013, 12:18:43 UTC Another thing to keep in mind concerning the 2x strategy. These current Nathans are the shortest running long type WUs we've seen in quite a while (BTW, not complaining at all). The Tonis are longer, and the Noelia and Gianni WUs are a lot longer. ID: 29372 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 29373 - Posted: 7 Apr 2013, 12:58:11 UTC - in response to Message 29372. Last modified: 7 Apr 2013, 18:11:35 UTC My impression so far is that if you have a sub-optimal GPU setup, you are more likely to see improvements from running two GPU tasks than if you have already optimized as much as possible for the GPUGrid project. Clearly, with some WU types their is something to be gained from some systems and setups, but running other WU's will not gain anything, and in some cases will result in massive losses; with my setup running two short NATHAN_RPS1_respawn WU's (6.52app) was desperate. I never really thought doing this on anything but the top GPU's could yield much and with the possible exception of Titans (which don't yet work) I doubt that running more than 2tasks would improve throughput for any GPUGrid work, but I think the memory crash test has led to better understanding by crunchers (though the researchers might feel differently). It's clear that different WU types require different amounts of GDDR memory and different amounts of CPU attention. When I wan running with swan on I noticed that as well as the runtimes going up, the CPU times also went up: Single task: 063ppx48x2-NOELIA_Klebe_Equ-0-1-RND3600_0 4339021 7 Apr 2013 \| 13:11:15 UTC 7 Apr 2013 \| 15:21:46 UTC Completed and validated 6,133.10 2,529.98 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42) Two tasks together: 041px2x1-NOELIA_Klebe_Equ-0-1-RND3215_0 4338518 7 Apr 2013 \| 6:43:58 UTC 7 Apr 2013 \| 10:27:29 UTC Completed and validated 13,320.82 4,158.10 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42) 041px21x3-NOELIA_Klebe_Equ-0-1-RND6607_0 4338582 7 Apr 2013 \| 6:07:54 UTC 7 Apr 2013 \| 10:08:02 UTC Completed and validated 13,243.66 5,032.64 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42) Obviously this is undesirable, especially saying as the tasks were not significantly faster. On the 2GB GTX660Ti card versions, two memory controllers have to interleave. I know most tasks are only ~300MB so two would not reach 1.5GB, but I don't know if all the controllers are used simultaneously or what way that works. Anyway, could this memory layout hinder the 2GB GTX660Ti cards when running two tasks but not the 3GB GTX660Ti cards? - Started running two more NOELIA_Klebe_Equ WU's. The first used 934MB and when the second ran the total went up to 1742MB. That means the second is only using 808MB, but between them they are entering that 1.5 to 2.0GB limited memory bandwidth zone for 2GB GTX660Ti's. Perhaps this is having some influence on my results compared to those on a 3GB card? FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 29373 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 29379 - Posted: 7 Apr 2013, 18:30:32 UTC - in response to Message 29373. Last modified: 7 Apr 2013, 18:30:43 UTC Well, the speed-up for your 1st task would be really cool, if it also applied to the 2nd one. 2 possible reasons why it didn't: the task uses less memory and may have just switched to the slower algorithm, or is just generally slower with less memory even if the faster algorithm is still used. Or it's really the memory controller. As far as I have heard all 3 controllers are used for the first 1.5 GB and upon using the last 512 MB only 2 controllers are used. GPU-Grid has never been that bandwidth-hungry, however. MrS Scanning for our furry friends since Jan 2002 ID: 29379 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 29380 - Posted: 7 Apr 2013, 19:35:40 UTC - in response to Message 29379. Last modified: 7 Apr 2013, 19:41:06 UTC I have not yet seen any overall improvement running any two tasks with my system/setup but Jacob has. Fairly different rigs though (LGA1366 vs 1155, 3GB vs 2GB, 100% CPU usage vs 75% to 87%). Too many variables for my liking, and I still think some CPU tasks could influence performance (my SATA rattles away running POGS but not malaria). My system is fairly well optimized for GPU crunching, so I might have less headroom and the 3GB cards could well have an advantage, only seen with lots of memory. Perhaps the app uses the faster equations when it sees X amount of RAM available (1GB or 1.5GB perhaps). That parameter might be app or even task specific. In the past when I tested this it was only really beneficial when the GPU utilization was quite low, under 80%, and there was an abundance of one task type - but new WU's, apps and GPU's have arrived since then. According to GPUZ the memory controller load on the GTX660Ti is 40% when both tasks are running and 38% when one task is running. There is not that much difference but then GPU utilization could only go up by ~10%. 40% is relatively high; a GTX470 running @95% GPU utilization has a controller load of 23% and IIRC a GTX260 was only around 15%, but I don't know how significant it is. When I suspended one of the running NOELIA_Klebe_Equ tasks, to see what the controller load was for 1 task, I got an app crash. I quickly closed Boinc, waited a minute and then closed the stupid windows error and started Boinc again. The suspended task started OK when I resumed it, but this highlights the fact that the tasks are not designed to run this way. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 29380 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 29381 - Posted: 7 Apr 2013, 19:37:59 UTC - in response to Message 29380. Just a heads up - Some NOELIA tasks are currently crashing when suspended or closed, even when running just 1. Those crashes are not the fault of running 2-at-once, so far as I know. ID: 29381 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 29382 - Posted: 7 Apr 2013, 19:43:27 UTC - in response to Message 29381. Thanks, I suggest people don't try to run them unless they have selected the recommended "Use GPU while computer is in use" FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 29382 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 29388 - Posted: 7 Apr 2013, 23:16:16 UTC - in response to Message 29382. Last modified: 7 Apr 2013, 23:17:42 UTC Well, I now have a set of results that is +ve for 2 tasks, albeit by 6%. Note this is with swan off, and relative to a task where swan was on. Two tasks running at the same time: 148nx13x1-NOELIA_Klebe_Equ-0-1-RND7041_0 4339235 7 Apr 2013 \| 17:38:21 UTC 7 Apr 2013 \| 20:54:29 UTC Completed and validated 11,493.31 3,749.67 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42) 109nx42x1-NOELIA_Klebe_Equ-0-1-RND5274_0 4339165 7 Apr 2013 \| 17:37:44 UTC 7 Apr 2013 \| 20:52:54 UTC Completed and validated 11,534.36 3,623.14 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42) One task by itself (ref): 063ppx48x2-NOELIA_Klebe_Equ-0-1-RND3600_0 4339021 7 Apr 2013 \| 13:11:15 UTC 7 Apr 2013 \| 15:21:46 UTC Completed and validated 6,133.10 2,529.98 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42) What I'm interested in is why the CPU time is still higher than the single task with swan on. Obviously this isn't a good thing because you are gaining some GPU performance increase but losing some CPU performance. These were my result with swan on: 041px21x3-NOELIA_Klebe_Equ-0-1-RND6607_0 4338582 7 Apr 2013 \| 6:07:54 UTC 7 Apr 2013 \| 10:08:02 UTC Completed and validated 13,243.66 5,032.64 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42) 041px2x1-NOELIA_Klebe_Equ-0-1-RND3215_0 4338518 7 Apr 2013 \| 6:43:58 UTC 7 Apr 2013 \| 10:27:29 UTC Completed and validated 13,320.82 4,158.10 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42) FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 29388 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 29389 - Posted: 7 Apr 2013, 23:52:26 UTC - in response to Message 29388. Last modified: 7 Apr 2013, 23:53:36 UTC First, I believe the correct time numbers to use, when comparing, is Run Time (which determines overall throughput of how quickly you can complete tasks, and can be considered overall performance I believe), not CPU Time (which may vary, depending on several factors). Second, for whatever reason, I thought that the "Swan" setting wasn't used anymore. Do you believe it's still used? If so, is it still used via a System Variable called "Swan_Sync" with a value of 0 or 1? Is it maybe only used on certain types of GPUS (like non-Keplar)? Is there a task type I could easily test turning Swan on and off, to immediately and easily see the resulting CPU Usage variation in Task Manager? Regards, Jacob ID: 29389 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 29392 - Posted: 8 Apr 2013, 20:49:32 UTC - in response to Message 29389. I understood that the SWAN_SYNC environment variable is not being used any more. For Keplers they went the brute-force way and always request ~1 core per task and GPU. The SWAN_SYNC doesn't work well anymore if cards become too fast (relative to the CPU scheduler time slice length). that's why we started turning it off with fast Fermis already (SWAN_SYNC=0, default was 1). Not sure if it still works on Fermis.. but that's not really of current interest, I think. Actually.. taking a quick look at my results I see "GPU itme = CPU time" for long-runs, whereas short runs use less CPU (~40% the GPU time). MrS Scanning for our furry friends since Jan 2002 ID: 29392 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 29393 - Posted: 8 Apr 2013, 21:06:08 UTC - in response to Message 29392. Last modified: 8 Apr 2013, 21:06:32 UTC Yep, I'm seeing similar times when comparing Run Time to CPU Time. And Task manager agrees with those times... For me, for the Nathans tasks, Task Manager shows: - the long-run tasks use a full CPU core all the time while processing - the short-run tasks only use a portion of the CPU So, because the CPU usage can very task to task (think different task types within the same app, even), that is why I have set <cpu_usage> to 0.001 for each app, allowing my CPU tasks to pick up any remaining actual CPU slack. ID: 29393 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 29394 - Posted: 8 Apr 2013, 22:00:45 UTC - in response to Message 29389. Last modified: 8 Apr 2013, 22:13:52 UTC I've been Run Time centric from the outset. The SWAN_SYNC variable now appears to be inert/inactive, though it might be built into the Long apps? I'm seeing ~31% CPU usage per GPUGrid app for the Short NOELIA_Klebe_Equ tasks. The Long tasks were using close to 100% of a CPU thread. Presently running a I1R132-NATHAN_RPS1_respawn3-14-32-RND2200_1 and a 306px29x2-NOELIA_Klebe_Equ-0-1-RND4174_0 WU (SWAN disabled/off). 98% GPU Utilization 1.7GB DDR5. The Short NATHAN_RPS1_respawn3 WU's used 1 full CPU thread, which might lead to why 2 didn't run well together (on my setup): I1R137-NATHAN_RPS1_respawn3-14-32-RND8262_0 4338051 7 Apr 2013 \| 0:05:04 UTC 7 Apr 2013 \| 3:05:55 UTC Completed and validated 8,745.14 8,731.06 16,200.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42) I1R414-NATHAN_RPS1_respawn3-24-32-RND6785_1 4337248 6 Apr 2013 \| 19:28:08 UTC 7 Apr 2013 \| 0:39:58 UTC Completed and validated 8,756.17 8,740.61 16,200.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42) I1R439-NATHAN_RPS1_respawn3-20-32-RND6989_0 4337286 6 Apr 2013 \| 17:11:24 UTC 6 Apr 2013 \| 22:13:55 UTC Completed and validated 12,182.96 12,106.08 16,200.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42) I1R81-NATHAN_RPS1_respawn3-25-32-RND4658_0 4336752 6 Apr 2013 \| 14:24:43 UTC 6 Apr 2013 \| 18:35:50 UTC Completed and validated 14,656.44 14,508.81 16,200.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42) Note that Nathan has more that one task type. It's important to be task type specific. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 29394 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 29426 - Posted: 12 Apr 2013, 13:41:09 UTC - in response to Message 29297. FYI: Per Toni (a project administrator), because of application and validator problems, I was told to disable running 2-at-a-time. So I have suspended my testing/research, and have put <gpu_usage> to 1, in the app_config.xml file. http://www.gpugrid.net/forum_thread.php?id=3332#29425 Regards, Jacob Klein ID: 29426 · Rating: 0 · rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 29654 - Posted: 3 May 2013, 16:47:08 UTC For the past two years I've run an OCed ASUS GTX 460 1GB; core 850, shader 1700, memory 2000. It's stable (perhaps I should try pushing it further...?). The fan is silent, and temp is around 66C. Current long Nathans take about nine hours. If I run two WUs concurrently, what % increase in thruput might I see? Thanks, Tom ID: 29654 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 29656 - Posted: 3 May 2013, 17:41:17 UTC - in response to Message 29654. tomba, running two tasks on a GTX 460 1GB is a bad idea; many tasks use too much GDDR (~700MB) so they would eat each other. You wouldn't see any benefit. You would probably see a slow down and possibly failures or system crashes. Overall benefits are likely to only be seen on cards with 3GB or more GDDR5. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 29656 · Rating: 0 · rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 29665 - Posted: 4 May 2013, 7:55:23 UTC - in response to Message 29656. tomba, running two tasks on a GTX 460 1GB is a bad idea Thanks for the heads-up. ID: 29665 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 29673 - Posted: 4 May 2013, 11:52:05 UTC - in response to Message 29656. Last modified: 4 May 2013, 11:52:37 UTC running two tasks on a GTX 460 1GB is a bad idea; No, it'a a BAD idea :-) ID: 29673 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 29905 - Posted: 13 May 2013, 18:03:34 UTC - in response to Message 29426. Last modified: 13 May 2013, 18:13:58 UTC FYI: Per Toni (a project administrator), because of application and validator problems, I was told to disable running 2-at-a-time. So I have suspended my testing/research, and have put <gpu_usage> to 1, in the app_config.xml file. http://www.gpugrid.net/forum_thread.php?id=3332#29425 Regards, Jacob Klein I have solved the problem I was having where Nathan tasks were not processing correctly on my machine. The problem was completely unrelated to the app_config.xml file. Details here: http://www.gpugrid.net/forum_thread.php?id=3332&nowrap=true#29894 So, we can resume app_config testing (including 2-tasks-on-1-GPU). I still recommend using the app_config.xml file that is in this post: http://www.gpugrid.net/forum_thread.php?id=3319&nowrap=true#29216 ... and I only recommend trying 2-tasks-on-1-GPU if you are running GPUGrid on GPUs that have 2GB or more RAM; if that's the case, you might try using <gpu_usage> value of 0.5 in your app_config.xml file, to see if GPU Load increases (by using GPU-Z) and throughput increases (by looking at task Run Times). Thanks, Jacob ID: 29905 · Rating: 0 · rate: / Reply Quote