Kepler - Not fully using CPU?

Message boards : Number crunching : Kepler - Not fully using CPU?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34890 - Posted: 4 Feb 2014, 4:08:35 UTC
Last modified: 4 Feb 2014, 4:11:11 UTC

I am currently running a Long-run task called "901x-SANTI_MARwtcap310-2-32-RND7131_0" using the "cuda55" plan class, using the 8.15 app version, on my Kepler GTX 660 Ti, using new drivers: 334.67 BETA, on Windows 8.1.

I expected Task Manager and Process Explorer to both show that this task would make the "acemd.815-55.exe" process fully utilize a virtual CPU core. However, it is only using a tiny portion (~25%) of the core. Furthermore, this GPU is sometimes not jumping to boost clock while this task is running, and is instead sometimes staying at standard 3D clock; utilization is at 85% with no other tasks running, and I expect the behavior is related to the CPU usage issue.

So... to my questions....
Did something break how SWAN SYNC is automatically selected? Why is the process not using a full virtual core on my Kepler GPU? Is the app broken, or is the driver broken, or is this behavior expected? Are certain GPUGrid tasks intentionally set up to not use a full core on Kepler GPUs?

Thanks,
Jacob
ID: 34890 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34892 - Posted: 4 Feb 2014, 19:02:26 UTC - in response to Message 34890.  

Isn't it just the same issue we have been discussing in the other thread?
http://www.gpugrid.net/forum_thread.php?id=3561&nowrap=true#34709

Unless there is something very different about Win8.1, it is probably just due to the work unit (or portions thereof) being more difficult than the card can handle, so it down clocks. (I assume that affects the CPU too, but have never investigated that to any extent.)

The usual suspects to change are:
    Increase power limit
    Reduce GPU/Memory clocks
    Increase GPU core voltage



I have seen this problem a number of times on GTX 660s and 650 Ti, and have them all running perfectly now by doing the above.

ID: 34892 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34893 - Posted: 4 Feb 2014, 19:04:50 UTC - in response to Message 34892.  
Last modified: 4 Feb 2014, 19:06:07 UTC

Jim,

This is a different issue altogether. For some reason, tasks on Kepler GPUs are no-longer utilizing the full CPU core.

My questions remain. I welcome answers.
ID: 34893 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ROBtheLIONHEART

Send message
Joined: 21 Nov 13
Posts: 34
Credit: 636,026,131
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 34894 - Posted: 4 Feb 2014, 19:41:04 UTC - in response to Message 34890.  

Hello , I only crunch here part time and am not a super tech. I did notice though that you and Tomba (the other user who noticed same ) use the same beta driver. I have not upgraded my 770 or 780 yet(still 331.82) and still have the standard cpu usage on my recent tasks . So perhaps the driver ??

Rob
ID: 34894 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34895 - Posted: 4 Feb 2014, 19:44:51 UTC - in response to Message 34894.  
Last modified: 4 Feb 2014, 19:45:24 UTC

I am thinking it might be a conflict between the driver and the 8.15 GPUGrid application. I'm going to do some testing tonight, by installing the prior driver version, to prove/verify that.

We still will need someone from GPUGrid to pinpoint the exact nature of the problem (if it is a problem?), though. I can't report an issue to NVIDIA unless someone at GPUGrid confirms that the driver has a problem.

GPUGrid admins? Any input?
ID: 34895 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34896 - Posted: 4 Feb 2014, 21:20:52 UTC - in response to Message 34893.  
Last modified: 4 Feb 2014, 21:21:13 UTC

All my tasks have been using a full CPU core for some time. The only exception being a beta that ran 4 days ago.

Your system contains a GTX460 (which is a Fermi, not a Kepler). When the task runs on the GTX460 it does not use a full CPU core (because it's a Fermi), and when a task starts on a GTX660Ti then stops and restarts on the GTX460 it will stop using the full CPU. This is normal:

GTX460, I5-SANTI_baxbimSPW2-38-62-RND6735_4 5106678 24 Jan 2014 | 6:44:07 UTC 24 Jan 2014 | 18:24:22 UTC Completed and validated 21,501.92 4,728.77 20,550.00 Short runs (2-3 hours on fastest card) v8.15 (cuda55)

GTX 660 Ti, I949-SANTI_baxbimSPW2-59-62-RND9438_0 5108086 24 Jan 2014 | 6:44:07 UTC 24 Jan 2014 | 9:47:14 UTC Completed and validated 10,793.23 10,738.45 20,550.00 Short runs (2-3 hours on fastest card) v8.15 (cuda55)
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 34896 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34897 - Posted: 4 Feb 2014, 21:27:03 UTC - in response to Message 34896.  
Last modified: 4 Feb 2014, 21:55:31 UTC

---------------------------------
skgiven:

I fully understand that, the way we are used to seeing it working, is that the GTX 660 Ti (a Kepler card) will use a full core (shown as 12.5 CPU in both Task Manager and in Process Explorer), whereas my GTX 460 does not (it uses about 3.25 CPU).

What I'm saying is that, with the 334.67 BETA drivers, the behavior has changed. With 334.67, all of the tasks use 3.25 CPU, even those running on the Kepler.

I've verified that 332.21 WHQL drivers exhibit the "normal" behavior, whereas the 334.67 BETA drivers exhibit this "new" behavior. Can you verify the new behavior using the new drivers?

So, again, my questions remain:
Did something break how SWAN SYNC is automatically selected? Why is the process not using a full virtual core on my Kepler GPU? Is the app broken, or is the driver broken, or is this behavior expected? Are certain GPUGrid tasks intentionally set up to not use a full core on Kepler GPUs?

---------------------------------
MJH?
GPUGrid Admins?
Is something not working as expected?
ID: 34897 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34898 - Posted: 4 Feb 2014, 22:22:29 UTC
Last modified: 4 Feb 2014, 22:24:04 UTC

I had this issue with older drivers on WinXPx64 (and on WinXPx86 too), but only with NOELIA_DIPEPT-0-2 workunits:
Task 7717507, 7718044, 7717962, 7717686, 7718047, 7717487.
The really strange part is that the two workunits following task 7717507 were ok without any intervention: task 7720966, 7722153
All of my hosts processed the subsequent NOELIA_DIPEPT-0-2 workunits normally.
ID: 34898 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34899 - Posted: 4 Feb 2014, 22:23:46 UTC - in response to Message 34898.  

For me, my issue seems related to the new driver version, and not batches of tasks.
ID: 34899 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34900 - Posted: 4 Feb 2014, 22:45:57 UTC - in response to Message 34897.  

The 8.15app was added to the long queue on the 23 Jan 2014.
ID: 34900 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ROBtheLIONHEART

Send message
Joined: 21 Nov 13
Posts: 34
Credit: 636,026,131
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 34901 - Posted: 4 Feb 2014, 22:56:57 UTC

Another odd thing I see in the runtimes when I read Tomba`s post on the server board(?) and checked his times. Post driver change cpu runtime is about 75% less but gpu run times pre & post driver change are about the same on similar wus.

It is odd that with 75% less cpu usage the completion time would stay about the same.

JK are you seeing same?
ID: 34901 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34902 - Posted: 4 Feb 2014, 23:00:38 UTC - in response to Message 34901.  
Last modified: 4 Feb 2014, 23:02:09 UTC

It's going to be hard for me to determine if the task run time is affected. I stop and start tasks many times before they are completed, and I have 2 heterogeneous GPUs, the 660 Ti and the 460. So, the same task usually has a portion of it ran using the 660 Ti, and a portion ran using the 460. So, I will likely be unable to determine if the new behavior of "Tasks are not using a full CPU on Kepler GPUs using 334.67 drivers" is going to affect my task run times.

But the behavior is new/different, and I'm still hoping that skgiven can confirm the behavior, and that someone can answer the questions, especially if it is expected behavior or not.
ID: 34902 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34903 - Posted: 4 Feb 2014, 23:18:12 UTC - in response to Message 34902.  
Last modified: 5 Feb 2014, 11:15:14 UTC

The 'problem' is with the Beta driver; it does prevent a full CPU core from being used.

Manually applying SWAN_SYNC=0 does not work, after a restart.

I'm using 7.2.33 (x64), so it's nothing to do with the Boinc Beta.
I'm using W7, so nothing to do with Win 8.1.
I have a GTX670 and a GTX770 in the test system, so nothing to do with a Fermi GPU.
The issue applies to both of my GPU's.

To test the times, go to Boinc Tasks. Select a GPUGrid WU and click properties. Note down the times and after 10minutes do the same. If the runtime and CPU time are not very close and there is a large gap, you can see that this is happening.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 34903 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34904 - Posted: 4 Feb 2014, 23:23:17 UTC - in response to Message 34903.  
Last modified: 4 Feb 2014, 23:24:40 UTC

Thanks for testing, skgiven. I'm glad you reproduced the behavior.

I am still requiring the following questions to be answered:
1) Is it an actual problem, or is it correct behavior?
2) If it is a problem, is the problem in the driver, or is the problem in the application code that determines how the acemd process functions?

Can anyone close to the acemd 8.15 application answer those questions?

Thanks,
Jacob
ID: 34904 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34905 - Posted: 4 Feb 2014, 23:35:43 UTC - in response to Message 34904.  
Last modified: 4 Feb 2014, 23:37:53 UTC

It is an actual problem in that the application intends to force the use of a full GPU (for many reasons). However, how this change impacts upon different cards and setups remains to be seen. I suspect that if you did not use your CPU for anything it will not make much difference in WU performance, but it will take a couple of days of results to know for sure if it even impacts when you do crunch CPU tasks at other projects...

The app hasn't changed, but might need to (if this isn't a bug in the app).
The driver has changed. Nothing else on my system changed, other than how the ACEMD app performs.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 34905 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34906 - Posted: 4 Feb 2014, 23:39:19 UTC - in response to Message 34905.  
Last modified: 4 Feb 2014, 23:39:52 UTC

I know the app hasn't changed, but the driver has. Even if this means a problem has surfaced, we don't know whether the problem is in the app or in the driver.

I'll monitor my results, to at least try to get a feel of whether it affects my GPUGrid performance on my GTX 660 Ti.

I'm hoping an admin can chime in, so that we can determine if it's a bug or not, and if it, where the bug is.
ID: 34906 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34907 - Posted: 4 Feb 2014, 23:56:19 UTC - in response to Message 34906.  
Last modified: 5 Feb 2014, 0:14:30 UTC

If it is a problem its with the driver. The app calls the drivers and it's a one way thing. The ACEMD based 8.15 app can use CUDA4.2 or CUDA5.5, however the behavior is the same for both. To me this suggests that the driver is testing some feature of the forthcoming CUDA6 which has changed the way something worked in both 4.2 and 5.5.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 34907 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34908 - Posted: 4 Feb 2014, 23:58:14 UTC - in response to Message 34907.  
Last modified: 4 Feb 2014, 23:58:28 UTC

Thanks for the information. I have put out a request for more info from NVIDIA, in their 334.67 driver feedback thread.

My post is here:
https://forums.geforce.com/default/topic/679611/geforce-drivers/official-nvidia-334-67-beta-display-driver-feedback-thread-released-1-27-14-/post/4112315/#4112315
ID: 34908 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34910 - Posted: 5 Feb 2014, 11:12:32 UTC - in response to Message 34908.  
Last modified: 5 Feb 2014, 13:12:49 UTC

Just a first glance at tasks as they come in, but it appears that tasks run just as fast. To me this suggests that the driver 'fixes' something; tasks take as long but 'use' less CPU. Obviously better, if the freed up processor power can be used for something else without any impact on the GPUGrid run times.

Both same type tasks on same system and same setup run only on the GTX770 (CUDA5.5):

Older driver only,
316x-SANTI_MARwtcap310-0-32-RND3599_0
Run time 29,309.21
CPU time 29,195.03

Older driver (for ~75%) + Beta driver (for ~25%),
824x-SANTI_MARwtcap310-3-32-RND7471_1
Run time 29,177.10
CPU time 22,138.07


Same type of tasks on same system and setup run only on the GTX770 :

Older driver only,
I78R4-NATHAN_KIDKIXc22_6-48-50-RND2248_0
Run time 28,945.29
CPU time 28,801.58

Beta driver only,
I71R6-NATHAN_KIDKIXc22_6-49-50-RND1077_0
Run time 29,050.67
CPU time 6,376.88


The estimated runtime/CPU time of a tasks I'm presently running is ~11h/3h.

This needs to be checked for other card types, and on other systems and setups (more and less CPU cores in use), and the short queue, but for my system it appears the driver has reduced the need for CPU polling. It's likely a CUDA improvement.
I have been running 5 CPU tasks, 2 GPU tasks and some NCI tasks. Yesterday the CPU usage was around 92%, today it's around 72%.

At some stage I will increase the CPU usage for CPU projects, and see how it affects the GPUGrid runtimes.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 34910 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34912 - Posted: 5 Feb 2014, 13:19:00 UTC - in response to Message 34910.  
Last modified: 5 Feb 2014, 13:20:44 UTC

Phenomenal information, skgiven, thanks for sharing. I love the angle of "perhaps it's a fix instead of a bug" :)

This has prompted me to change my app_config.xml cpu_usage values. I had been using 0.5, such that when a GPUGrid task was running on each GPU, and I knew a full core would be used because of the Kepler, the summed values would be 0.5+0.5=1.0 CPU, so BOINC considered a CPU used by the GPU tasks, and did not over-commit my system. But now, with these 334.67 drivers, I'm using a cpu_usage value of 0.2, in order to keep my system full committed. Essentially, this results in my system being able to run an additional CPU task!

I'll chime back in if I hear anything from NVIDIA.
ID: 34912 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Kepler - Not fully using CPU?

©2026 Universitat Pompeu Fabra