Message boards :
Number crunching :
Kepler - Not fully using CPU?
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I am currently running a Long-run task called "901x-SANTI_MARwtcap310-2-32-RND7131_0" using the "cuda55" plan class, using the 8.15 app version, on my Kepler GTX 660 Ti, using new drivers: 334.67 BETA, on Windows 8.1. I expected Task Manager and Process Explorer to both show that this task would make the "acemd.815-55.exe" process fully utilize a virtual CPU core. However, it is only using a tiny portion (~25%) of the core. Furthermore, this GPU is sometimes not jumping to boost clock while this task is running, and is instead sometimes staying at standard 3D clock; utilization is at 85% with no other tasks running, and I expect the behavior is related to the CPU usage issue. So... to my questions.... Did something break how SWAN SYNC is automatically selected? Why is the process not using a full virtual core on my Kepler GPU? Is the app broken, or is the driver broken, or is this behavior expected? Are certain GPUGrid tasks intentionally set up to not use a full core on Kepler GPUs? Thanks, Jacob |
|
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Isn't it just the same issue we have been discussing in the other thread? http://www.gpugrid.net/forum_thread.php?id=3561&nowrap=true#34709 Unless there is something very different about Win8.1, it is probably just due to the work unit (or portions thereof) being more difficult than the card can handle, so it down clocks. (I assume that affects the CPU too, but have never investigated that to any extent.) The usual suspects to change are:
Reduce GPU/Memory clocks Increase GPU core voltage
|
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Jim, This is a different issue altogether. For some reason, tasks on Kepler GPUs are no-longer utilizing the full CPU core. My questions remain. I welcome answers. |
|
Send message Joined: 21 Nov 13 Posts: 34 Credit: 636,026,131 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hello , I only crunch here part time and am not a super tech. I did notice though that you and Tomba (the other user who noticed same ) use the same beta driver. I have not upgraded my 770 or 780 yet(still 331.82) and still have the standard cpu usage on my recent tasks . So perhaps the driver ?? Rob |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I am thinking it might be a conflict between the driver and the 8.15 GPUGrid application. I'm going to do some testing tonight, by installing the prior driver version, to prove/verify that. We still will need someone from GPUGrid to pinpoint the exact nature of the problem (if it is a problem?), though. I can't report an issue to NVIDIA unless someone at GPUGrid confirms that the driver has a problem. GPUGrid admins? Any input? |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
All my tasks have been using a full CPU core for some time. The only exception being a beta that ran 4 days ago. Your system contains a GTX460 (which is a Fermi, not a Kepler). When the task runs on the GTX460 it does not use a full CPU core (because it's a Fermi), and when a task starts on a GTX660Ti then stops and restarts on the GTX460 it will stop using the full CPU. This is normal: GTX460, I5-SANTI_baxbimSPW2-38-62-RND6735_4 5106678 24 Jan 2014 | 6:44:07 UTC 24 Jan 2014 | 18:24:22 UTC Completed and validated 21,501.92 4,728.77 20,550.00 Short runs (2-3 hours on fastest card) v8.15 (cuda55) GTX 660 Ti, I949-SANTI_baxbimSPW2-59-62-RND9438_0 5108086 24 Jan 2014 | 6:44:07 UTC 24 Jan 2014 | 9:47:14 UTC Completed and validated 10,793.23 10,738.45 20,550.00 Short runs (2-3 hours on fastest card) v8.15 (cuda55) FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
--------------------------------- skgiven: I fully understand that, the way we are used to seeing it working, is that the GTX 660 Ti (a Kepler card) will use a full core (shown as 12.5 CPU in both Task Manager and in Process Explorer), whereas my GTX 460 does not (it uses about 3.25 CPU). What I'm saying is that, with the 334.67 BETA drivers, the behavior has changed. With 334.67, all of the tasks use 3.25 CPU, even those running on the Kepler. I've verified that 332.21 WHQL drivers exhibit the "normal" behavior, whereas the 334.67 BETA drivers exhibit this "new" behavior. Can you verify the new behavior using the new drivers? So, again, my questions remain: Did something break how SWAN SYNC is automatically selected? Why is the process not using a full virtual core on my Kepler GPU? Is the app broken, or is the driver broken, or is this behavior expected? Are certain GPUGrid tasks intentionally set up to not use a full core on Kepler GPUs? --------------------------------- MJH? GPUGrid Admins? Is something not working as expected? |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I had this issue with older drivers on WinXPx64 (and on WinXPx86 too), but only with NOELIA_DIPEPT-0-2 workunits: Task 7717507, 7718044, 7717962, 7717686, 7718047, 7717487. The really strange part is that the two workunits following task 7717507 were ok without any intervention: task 7720966, 7722153 All of my hosts processed the subsequent NOELIA_DIPEPT-0-2 workunits normally. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
For me, my issue seems related to the new driver version, and not batches of tasks. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The 8.15app was added to the long queue on the 23 Jan 2014. |
|
Send message Joined: 21 Nov 13 Posts: 34 Credit: 636,026,131 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Another odd thing I see in the runtimes when I read Tomba`s post on the server board(?) and checked his times. Post driver change cpu runtime is about 75% less but gpu run times pre & post driver change are about the same on similar wus. It is odd that with 75% less cpu usage the completion time would stay about the same. JK are you seeing same? |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It's going to be hard for me to determine if the task run time is affected. I stop and start tasks many times before they are completed, and I have 2 heterogeneous GPUs, the 660 Ti and the 460. So, the same task usually has a portion of it ran using the 660 Ti, and a portion ran using the 460. So, I will likely be unable to determine if the new behavior of "Tasks are not using a full CPU on Kepler GPUs using 334.67 drivers" is going to affect my task run times. But the behavior is new/different, and I'm still hoping that skgiven can confirm the behavior, and that someone can answer the questions, especially if it is expected behavior or not. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The 'problem' is with the Beta driver; it does prevent a full CPU core from being used. Manually applying SWAN_SYNC=0 does not work, after a restart. I'm using 7.2.33 (x64), so it's nothing to do with the Boinc Beta. I'm using W7, so nothing to do with Win 8.1. I have a GTX670 and a GTX770 in the test system, so nothing to do with a Fermi GPU. The issue applies to both of my GPU's. To test the times, go to Boinc Tasks. Select a GPUGrid WU and click properties. Note down the times and after 10minutes do the same. If the runtime and CPU time are not very close and there is a large gap, you can see that this is happening. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thanks for testing, skgiven. I'm glad you reproduced the behavior. I am still requiring the following questions to be answered: 1) Is it an actual problem, or is it correct behavior? 2) If it is a problem, is the problem in the driver, or is the problem in the application code that determines how the acemd process functions? Can anyone close to the acemd 8.15 application answer those questions? Thanks, Jacob |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It is an actual problem in that the application intends to force the use of a full GPU (for many reasons). However, how this change impacts upon different cards and setups remains to be seen. I suspect that if you did not use your CPU for anything it will not make much difference in WU performance, but it will take a couple of days of results to know for sure if it even impacts when you do crunch CPU tasks at other projects... The app hasn't changed, but might need to (if this isn't a bug in the app). The driver has changed. Nothing else on my system changed, other than how the ACEMD app performs. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I know the app hasn't changed, but the driver has. Even if this means a problem has surfaced, we don't know whether the problem is in the app or in the driver. I'll monitor my results, to at least try to get a feel of whether it affects my GPUGrid performance on my GTX 660 Ti. I'm hoping an admin can chime in, so that we can determine if it's a bug or not, and if it, where the bug is. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
If it is a problem its with the driver. The app calls the drivers and it's a one way thing. The ACEMD based 8.15 app can use CUDA4.2 or CUDA5.5, however the behavior is the same for both. To me this suggests that the driver is testing some feature of the forthcoming CUDA6 which has changed the way something worked in both 4.2 and 5.5. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thanks for the information. I have put out a request for more info from NVIDIA, in their 334.67 driver feedback thread. My post is here: https://forums.geforce.com/default/topic/679611/geforce-drivers/official-nvidia-334-67-beta-display-driver-feedback-thread-released-1-27-14-/post/4112315/#4112315 |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Just a first glance at tasks as they come in, but it appears that tasks run just as fast. To me this suggests that the driver 'fixes' something; tasks take as long but 'use' less CPU. Obviously better, if the freed up processor power can be used for something else without any impact on the GPUGrid run times. Both same type tasks on same system and same setup run only on the GTX770 (CUDA5.5): Older driver only, 316x-SANTI_MARwtcap310-0-32-RND3599_0 Run time 29,309.21 CPU time 29,195.03 Older driver (for ~75%) + Beta driver (for ~25%), 824x-SANTI_MARwtcap310-3-32-RND7471_1 Run time 29,177.10 CPU time 22,138.07 Same type of tasks on same system and setup run only on the GTX770 : Older driver only, I78R4-NATHAN_KIDKIXc22_6-48-50-RND2248_0 Run time 28,945.29 CPU time 28,801.58 Beta driver only, I71R6-NATHAN_KIDKIXc22_6-49-50-RND1077_0 Run time 29,050.67 CPU time 6,376.88 The estimated runtime/CPU time of a tasks I'm presently running is ~11h/3h. This needs to be checked for other card types, and on other systems and setups (more and less CPU cores in use), and the short queue, but for my system it appears the driver has reduced the need for CPU polling. It's likely a CUDA improvement. I have been running 5 CPU tasks, 2 GPU tasks and some NCI tasks. Yesterday the CPU usage was around 92%, today it's around 72%. At some stage I will increase the CPU usage for CPU projects, and see how it affects the GPUGrid runtimes. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Phenomenal information, skgiven, thanks for sharing. I love the angle of "perhaps it's a fix instead of a bug" :) This has prompted me to change my app_config.xml cpu_usage values. I had been using 0.5, such that when a GPUGrid task was running on each GPU, and I knew a full core would be used because of the Kepler, the summed values would be 0.5+0.5=1.0 CPU, so BOINC considered a CPU used by the GPU tasks, and did not over-commit my system. But now, with these 334.67 drivers, I'm using a cpu_usage value of 0.2, in order to keep my system full committed. Essentially, this results in my system being able to run an additional CPU task! I'll chime back in if I hear anything from NVIDIA. |
©2026 Universitat Pompeu Fabra