Advanced search

Message boards : Number crunching : GPUGRID Performance Problem After Win7 Reinstall

Author Message
tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32066 - Posted: 19 Aug 2013 | 14:52:57 UTC

Yesterday I did a complete reinstall of Win 7 Home Edition 64, having been bitten by malware. After multitudinous Windows Updates I installed BOINC 7.0.64 (x64) and Nvidia driver 320.49, then set GPUGRID running on my GTX660.

The first WU just finished. See here. The run time is normal for me but the elapsed time is more than double.

Below is my GPU Monitor as of now. I have never before seen the GPU and MC loads fluctuate like this. Help!

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32240 - Posted: 24 Aug 2013 | 21:49:57 UTC

Do you still have that problem? It seems like the work is being paused for short intervals. Possible reasons: TThrottle being active or setting BOINC to us <100% of CPU time (I don't think this applies to GPUs, but haven't tested this). Or some other program could grab CPU time so heavily that the GPU gets starved of new work. Any anomalies in task manager? Does it help to reduce the number of cores available for BOINC or stop all CPU projects?

MrS
____________
Scanning for our furry friends since Jan 2002

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32296 - Posted: 26 Aug 2013 | 17:14:38 UTC - in response to Message 32240.

Thanks for responding, ETA. I forgot to subscribe to the thread!!

Do you still have that problem?


Yes, in spades!!

It seems like the work is being paused for short intervals.


For LONG intervals! Previously, a WU finished in in about 12 hours of processing and 12 hours of elapsed time, indicating that the GPU was working 100% of the time. Since the Win 7 rebuild, the elapsed time has doubled!

Possible reasons: TThrottle being active or setting BOINC to us <100% of CPU time (I don't think this applies to GPUs, but haven't tested this).


I don't use TThrottle.

Or some other program could grab CPU time so heavily that the GPU gets starved of new work.


Nothing I run would meet that...

Any anomalies in task manager?


No.

Does it help to reduce the number of cores available for BOINC or stop all CPU projects?


I don't run any CPU apps because of the fan noise, but I'll set it to zero.

I do note that Poem WUs exhibit the same variation in GPU usage.

Thanks for responding!!

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32297 - Posted: 26 Aug 2013 | 17:27:39 UTC - in response to Message 32296.
Last modified: 26 Aug 2013 | 19:19:20 UTC

Possible reasons: TThrottle being active or setting BOINC to us <100% of CPU time (I don't think this applies to GPUs, but haven't tested this).

I don't use TThrottle.

That only answers half of the question/suggestion!
Is Boinc set to use less than 100% of the CPU?
- Boinc Manager (Advanced View), Tools, Computing preferences..., processor usage tab, needs to be Use at most 100.00 % CPU time.

If not I have a few more suggestions...
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32299 - Posted: 26 Aug 2013 | 18:09:29 UTC - in response to Message 32297.

That only answers half of the question/suggestion!

Oops - I thought I answered them all...

Is Boinc set to use less than 100% of the CPU?
- Boinc Manager (Advanced View), processor usage tab, needs to be Use at most 100.00 % CPU time.

I don't have a processor usage tab:

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32304 - Posted: 26 Aug 2013 | 19:20:25 UTC - in response to Message 32299.

- Boinc Manager (Advanced View), Tools, Computing preferences..., processor usage tab, needs to be Use at most 100.00 % CPU time.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

werdwerdus
Send message
Joined: 15 Apr 10
Posts: 123
Credit: 1,004,473,861
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32313 - Posted: 27 Aug 2013 | 4:29:18 UTC

yes IIRC the default is only 60%
____________
XtremeSystems.org - #1 Team in GPUGrid

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32323 - Posted: 27 Aug 2013 | 16:51:05 UTC - in response to Message 32304.

- Boinc Manager (Advanced View), Tools, Computing preferences..., processor usage tab, needs to be Use at most 100.00 % CPU time.

It was set at 80%. I upped it to 100%. Since then I've had one Nathan short (surprise since shorts are switched off...) which completed in not much more elapsed time than the run time, and I'm now processing a Nathan long which looks like elapsed time is going to equal run time, as before the opsys rebuild. Many thanks for the pointer!

I wonder why CPU usage % relates to GPU usage...

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32336 - Posted: 27 Aug 2013 | 21:52:49 UTC - in response to Message 32323.

tomba, thanks for getting back to us regarding the issue - it's always good to know the suggested fix worked.

Most 'GPU' projects actually use a GPU + a CPU (to varying extents), so by limiting the CPU the GPU is inadvertently limiting the GPU.
In the most extreme cases its beneficial to run a WU for every available CPU core/thread on the one GPU (POEM).


____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

werdwerdus
Send message
Joined: 15 Apr 10
Posts: 123
Credit: 1,004,473,861
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32341 - Posted: 28 Aug 2013 | 1:41:02 UTC

this "feature" cropped up in one of the v7 boinc clients IIRC
____________
XtremeSystems.org - #1 Team in GPUGrid

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32572 - Posted: 31 Aug 2013 | 15:41:34 UTC - in response to Message 32323.

I'm now processing a Nathan long which looks like elapsed time is going to equal run time, as before the opsys rebuild.

Oops - no - it took 20 hours elapsed (I94R2-NATHAN_KIDKIXc22_6-3-50-RND1053). Problem not solved.

Have a look here:



Before the Win 7 reinstall, given up to a couple of hours for the result to upload on my rather slow Internet connection, run time equals elapsed time.

After the Win 7 reinstall, with the exception of the short Nathans, the elapsed time has almost doubled!

What is going on?

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32590 - Posted: 1 Sep 2013 | 12:44:05 UTC - in response to Message 32572.

Are you sure your "elapsed time after reinstall" numbers are correct? Looking at your task list I can not see such WUs. 45ks for a NATHAN_KIDKIXc22 seems about right for a GTX660 and runtime and CPU time agree (give or take a bit).

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,222,865,968
RAC: 1,764,666
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32595 - Posted: 1 Sep 2013 | 14:23:28 UTC - in response to Message 32572.
Last modified: 1 Sep 2013 | 14:24:08 UTC

What source did you use for reinstall?
It's clear, that something is 'stealing' CPU time on your host, which makes the GPU starve, and it results in longer processing time. This 'something' could be a malicious software like a rootkit.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32596 - Posted: 1 Sep 2013 | 15:40:01 UTC - in response to Message 32590.

Are you sure your "elapsed time after reinstall" numbers are correct? Looking at your task list I can not see such WUs. 45ks for a NATHAN_KIDKIXc22 seems about right for a GTX660 and runtime and CPU time agree (give or take a bit).

Thanks for responding, MrS.

Agreed that runtime and CPU time agreement at around 45k seconds on a GTX 660 is right. My problem is the elapsed time for those WUs; time from sent, to time reported. Before the Win 7 reinstall, elapsed time was a couple of hours longer than runtime and CPU time, those two hours being used up for wait-to-start time and upload time. Now it is often more than double.

This WU ran for ~44.6k CPU seconds but its elapsed time, from 22 Aug 2013 @ 17:25:19 to 24 Aug 2013 @ 7:12:45, was about 38 hours!

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32600 - Posted: 1 Sep 2013 | 18:12:07 UTC - in response to Message 32596.

Does your machine run GPU-Grid 24/7? If so, there's still a serious problem. Before the reinstall you managed to return 2 tasks each day, on average. If by now the time between download and upload is larger, it might not matter all that much as long as still 2 tasks per day are returned. Which is not the case.. until it's not running full-throttle any more.

MrS
____________
Scanning for our furry friends since Jan 2002

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32601 - Posted: 1 Sep 2013 | 18:50:27 UTC

I had a similar problem. I had an GTX285 with Vistax64 running smooth. I put an GTX660 in it and the trouble began. It took longer to finish a WU than with the 285. Kernel times where high and system slow. After searching and posting here, PC Angel, installed by the system builder, could be an issue. I removed it, booted the system and however some improvement, still bad performance.

I put in a new 660 with the same results. I put in the old 285, and no problems, kernel times low again.
So I did a complete new installation of the OS, with formatting the hard disk. And installed only the minimum software to get BOINC working.
Fast with the 285 and still bad with the 660. A bios update was not possible as XFX is not supporting that anymore.
I even tried a GTX770 and another new installation of the OS but it didn´t help.

Off course your rig did work before with the 660.
Can you check your kernel times, if they are almost as high as CPU-usage then that is an indication of some problem(s). You find these in Task Manager, show kernel times.
I don´t know tomba if you have another GPU, you could try that one. See how it performs.
____________
Greetings from TJ

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32602 - Posted: 1 Sep 2013 | 19:00:22 UTC - in response to Message 32595.
Last modified: 1 Sep 2013 | 19:13:30 UTC

Thanks for responding, Retvari.

What source did you use for reinstall?

The CD that came with my PC.

It's clear, that something is 'stealing' CPU time on your host, which makes the GPU starve, and it results in longer processing time. This 'something' could be a malicious software like a rootkit.

Here's a snapshot of my Task Manager:



It never varies much from this configuration. I don't run other BOINC CPU tasks in the eight threads because the fan noise is unacceptable.

Here's my GPU Monitor for the current GPUGrid task:

Looks good to me!



I ran Malwarebyte to look for nasties. It found five, which I deleted:



Could one of them be the culprit?

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32604 - Posted: 1 Sep 2013 | 19:35:59 UTC - in response to Message 32601.

Thank you for responding, TJ,


Can you check your kernel times, if they are almost as high as CPU-usage then that is an indication of some problem(s). You find these in Task Manager, show kernel times.

I don't find 'kernel times' in Task Manager...

I don´t know tomba if you have another GPU, you could try that one. See how it performs.

I still have my trusty old GTX 460, which I guess will do a Nathan in ~24 hours. Should I try it, to confirm whether not not my GTX 660 is up to the job??

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,222,865,968
RAC: 1,764,666
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32606 - Posted: 1 Sep 2013 | 20:12:05 UTC - in response to Message 32602.

What source did you use for reinstall?

The CD that came with my PC.

I assume that this is the same source from which the OS was originally installed. In that case we can rule this out.
Do you have a router? In this case your PC will have an IP address which is looks like: 192.168.x.x
Or your PC connected directly to the internet with a public IP address?
In that case it is possible that your OS was infected with a malware during the installation, before it could have downloaded the security updates.

Here's a snapshot of my Task Manager:

The snapshot is looking fine. However, rootkits are hard to find (you won't spot one in the task manager)

Here's my GPU Monitor for the current GPUGrid task:

Looks good to me!

Looks good to me as well.

I ran Malwarebyte to look for nasties. It found five, which I deleted:
Could one of them be the culprit?

No, they are not that harmful. Besides only the 4th seems to be the one which was run after the reinstall.

Have you noticed something unusual during the installation?

Malwares can be very hard to remove, even formatting or removing a partition would not remove them, because they can reside in the MBR (Master Boot Record), and the area between the partition table and the first logical block of the first partition. In this case, I usually destroy (overwrite with zeroes) the first couple of megabytes on the HDD before the re-installation. To do that you need a bootable USB device (pendrive) or another PC with an OS and a HDD safe erase software (or HDD sentinel) on it.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32612 - Posted: 2 Sep 2013 | 3:17:20 UTC
Last modified: 2 Sep 2013 | 3:21:53 UTC

Guys, it seems that "Task run time (sec)" is no-longer an issue. He has fixed the first problem by changing the "CPU Throttle" from 80% to 100%; the NATHAN_KIDKIX are now taking the same amount of time to complete.

His new problem deals with comparing "time the task was sent" to "time the result was received". That means the problem is that either:
a) the task wasn't running non-stop during the duration of that time difference, or
b) a network setting or problem prevented the result from uploading

tomba, Can you please post the contents of your global_prefs.xml and global_prefs_override.xml files? They are found in your data directory, which is listed at the top of the Event Log around line 10, and is often C:\ProgramData\BOINC

I'd be curious what your exact settings are, to see if some network setting is conflicting.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32616 - Posted: 2 Sep 2013 | 6:47:03 UTC

Just found this:



I swiftly changed it to never sleep. Let's see what happens....

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32617 - Posted: 2 Sep 2013 | 7:07:11 UTC

I change the power plan so the PC sleeps in one minute. Here's what happened:



Seems the Win 7 default is to sleep after 30 minutes, not good for 24/7 BOINCing!

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32621 - Posted: 2 Sep 2013 | 8:50:51 UTC - in response to Message 32612.
Last modified: 2 Sep 2013 | 9:04:37 UTC

tomba, Can you please post the contents of your global_prefs.xml and global_prefs_override.xml files? They are found in your data directory, which is listed at the top of the Event Log around line 10, and is often C:\ProgramData\BOINC

Couldn't find 'em. Did a search on c: and I found them in c:\windows.old

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32622 - Posted: 2 Sep 2013 | 8:53:48 UTC

tomba for the kernel times you go to the task manager, then the command bar shows you four options: File, Options, View, Help.
Go to View, and then the bottom, Show kernel times. This will show a red line in the graph and is below the green line.
Ideally the red line (kernel times) are low, if they are high (close to the green line) this is an indication of a (hardware) problem.
if its low, the 660 is okay. I guess it is in your case, as you have used it before in this system and many people use this card as well.

Perhaps you could Jacob Klein the files he asked for in the BOINC Data folder. He may help you better than I can.
____________
Greetings from TJ

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,222,865,968
RAC: 1,764,666
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32625 - Posted: 2 Sep 2013 | 9:51:13 UTC - in response to Message 32616.

Just found this:



I swiftly changed it to never sleep. Let's see what happens....

This could be the source of your problem...
It's so self-evident that a PC used for crunching should be set to never go to sleep that I didn't thought of that (and I didn't find it in the FAQ topic). Sometimes I forgot to set it after reinstall, so I've made a customized Windows (XP x64) installation package with the necessary drivers and presets.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32629 - Posted: 2 Sep 2013 | 11:02:17 UTC - in response to Message 32625.

This could be the source of your problem...

Looking promising. After I set the beast not to go to sleep I observed the in-process Nathan long. It was 71% complete and 19 hours had elapsed since it was sent.

It just finished. The 29% unfinished was done in 3h33min. That's equivalent to 12+ hours for a full Nathan long so the GPU was running without interruption for 3h33min.

Now, if only I could get another Nathan long....

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32641 - Posted: 2 Sep 2013 | 21:03:00 UTC - in response to Message 32600.

Good to hear it works now! Well.. actually this or similar issues are why I asked yesterday:

Does your machine run GPU-Grid 24/7?

Just in case you're being asked this again ;)

MrS
____________
Scanning for our furry friends since Jan 2002

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32656 - Posted: 3 Sep 2013 | 13:59:03 UTC

Just to confirm that I'm out of the woods :-)

This morning I completed a Noelia long that had run overnight. Run time was 11h:58min and elapsed time, from sent to reported, was exactly 13 hours (and the credit was a very generous 180k! Perhaps a little payback for the many failed Noelias these past weeks?).

Grateful thanks to all posters for your support.

Tom

Post to thread

Message boards : Number crunching : GPUGRID Performance Problem After Win7 Reinstall

//