Advanced search

Message boards : Graphics cards (GPUs) : GPU monitoring

Author Message
Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8327 - Posted: 9 Apr 2009 | 22:03:42 UTC

Does anyone know of a GPU monitoring tool? I am trying to find one that tells you if the GPU is actually doing something. Right now the only way that I know we can sniff this out is to look at the temperatures of the cores to see if they are running "hot"...

I tried a debug tool off the Nvidia site but it seemed to be oriented towards being used to debug graphics applications ... besides I really only need to know if the GPU is running tasks ...

Jeremy
Send message
Joined: 15 Feb 09
Posts: 55
Credit: 3,542,733
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8332 - Posted: 10 Apr 2009 | 3:27:24 UTC - in response to Message 8327.

There's no direct way that I know of, only incidentals. Clock speed and GPU temp are usually a pretty good indicator of the current load on the GPU. GPU-Z is a nice lightweight program to see all that.

schizo1988
Send message
Joined: 16 Dec 08
Posts: 16
Credit: 10,644,256
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwat
Message 8333 - Posted: 10 Apr 2009 | 5:12:27 UTC - in response to Message 8332.

actually there is a Cuda-Z app available for download from sourceforge, just google search cuda-z which gives info about the gpu core performance

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8335 - Posted: 10 Apr 2009 | 5:46:35 UTC - in response to Message 8333.

actually there is a Cuda-Z app available for download from sourceforge, just google search cuda-z which gives info about the gpu core performance


Thanks for the hint, but, it registers operational numbers regardless if anything is running. So, it is logging the effective speed and not the operational state a-la Task Manager for the CPUs in the system.

Profile JockMacMad TSBT
Send message
Joined: 26 Jan 09
Posts: 31
Credit: 3,877,912
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 8337 - Posted: 10 Apr 2009 | 8:35:37 UTC - in response to Message 8335.
Last modified: 10 Apr 2009 | 8:38:22 UTC

GPU-Z.

Go to the Sensors tab and it shows GPU Load. Not sure how it works this out though and how accurate it is.

RivaTuner also has a GPU Usage meter.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8352 - Posted: 10 Apr 2009 | 18:23:59 UTC - in response to Message 8337.

GPU-Z.

Go to the Sensors tab and it shows GPU Load. Not sure how it works this out though and how accurate it is.

RivaTuner also has a GPU Usage meter.

well, I just installed RivaTuner 2.24 and I sure don't see a usage meter, only the ability to configure registry entries. I guess I don't get it ... and the documentation so far has not been much help to me ... sigh ...

jrobbio
Send message
Joined: 13 Mar 09
Posts: 59
Credit: 324,366
RAC: 0
Level

Scientific publications
watwatwatwat
Message 8382 - Posted: 14 Apr 2009 | 10:20:37 UTC - in response to Message 8352.

Would this do any good?:
Perfhud 6 GPU Performance Analysis

Regards,

Rob

Profile JockMacMad TSBT
Send message
Joined: 26 Jan 09
Posts: 31
Credit: 3,877,912
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 8383 - Posted: 14 Apr 2009 | 10:28:24 UTC - in response to Message 8382.

Paul on the Riva Tuner Main Tab look just under where you select the GPU and it will say something like (off my ATI machine but heh) 256-bit RV770. At the end of this field is a small button looking like a 3,4,5 triangle. Select that and a mini popup bar opens. On the end is Hardware Monitoring. This will open graphs for each present GPU and you will see GPU Usage is one of them.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8409 - Posted: 14 Apr 2009 | 15:28:10 UTC - in response to Message 8383.

Paul on the Riva Tuner Main Tab look just under where you select the GPU and it will say something like (off my ATI machine but heh) 256-bit RV770. At the end of this field is a small button looking like a 3,4,5 triangle. Select that and a mini popup bar opens. On the end is Hardware Monitoring. This will open graphs for each present GPU and you will see GPU Usage is one of them.

Hmm, I had not tried that button because I did not want to make changes ... :)

NOw that I found it I just tried it and So far the closest I can find is the graph for the CPU, but it is the main CPUs and not the CPUs on the GPU Cards. THe rest are the same of temperature sensors and clocks.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8410 - Posted: 14 Apr 2009 | 15:32:29 UTC - in response to Message 8382.

Would this do any good?:
Perfhud 6 GPU Performance Analysis

Regards,

Rob

Rob,

Yeah this was the other tool I have tried that seems to only be able to relate and graph WITHIN a graphics application. Again, I may be missing something and not using it right.

I am also asking Rom to make a change to the system that will change the output message so we can see which GPU is being scheduled by BOINC to be used on a task.

BUt, the question is still out there if BOINC is properly scheduling tasks on the available GPUs. It is hard to tell what is going on and the symptom is that tasks that should run in 6 hours are taking as long as 24 hours to run. The first question is if this bug is in 6.6.20, our imagination, or something else, like bad tasks from GPGU Grid.

Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 288,467,046
RAC: 2,071,714
Level
Asn
Scientific publications
watwatwatwatwatwatwatwat
Message 8415 - Posted: 14 Apr 2009 | 16:10:53 UTC - in response to Message 8410.
Last modified: 14 Apr 2009 | 16:12:58 UTC

Well, I don't think the problem is only in our imagination Paul...
Since I got my HD4870 fo Milkyway, I swapped the GTX 260 in my Linux box with another 8800GT card. During the GPUGRID server outage I had some time to watch the tasks running. I watched two pxxx-GIANNI- tasks that started at the exactly same time. The weird thing is they needed the same wall-clock time to complete (also CPU time, which should be different for these cards)!
I'd say the GTX260 should be a little bit faster than a 8800GT! ;-)

The host where I saw this is - http://www.gpugrid.net/results.php?hostid=32169. BOINC client version is 6.6.20 for Linux x86_64.

Looking at the stderr of the last four results (521310,521292,521244,521238) I completed during the server outage, I noticed that the WUs never ever only ran on one GPU. They always switched between the two cards. Some WUs only once, and some more often... But actually my "switch between applications every x minutes" is set to 1440 minutes (=24 hours, but that's ignored by BOINC anyway it seems) and there was nothing in deadline trouble. Nothing was running at "high priority", so there never shouldn't be any task switching between the two cards. To me it looks that this is slowing down the computation a good bit...

[edit]Sorry Paul, I know it's off-topic in this thread, but I just wanted to show you that it looks like there's really something wrong with the GPU scheduling if you have two ot more (different) cards in one computer...
____________

pixelicious.at - my little photoblog

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8434 - Posted: 14 Apr 2009 | 20:08:47 UTC - in response to Message 8415.

Well, I don't think the problem is only in our imagination Paul...

...

[edit]Sorry Paul, I know it's off-topic in this thread, but I just wanted to show you that it looks like there's really something wrong with the GPU scheduling if you have two ot more (different) cards in one computer...

Not at all off topic ... why do you think I am searching for tools? :)

I also asked Rom to change the output message listing the task vs. GPU to indicate the actual GPU vice "(0.18 CPUs, 1 CUDA)" to be "(0.18 CPUs, CUDA x)" so we can track this ... and the tools to see what is actually being used ... I know that I am one of the very few that has multiple GPUs in a single system and as usual on the cutting edge ...

I have been concerned about the resource scheduler (formerly CPU Scheduler) for a long time and your intuition may be correct. The problem is that it seems to be a little intermittent so it is not clear why it is happening.

One of the questions I have is if the tasks are being run on one of n GPUs leaving one GPU idle ... in my case it would be running one two of four or three of four while still attempting to run 4 total tasks ...

You are very correct that BOINC does not respect the TSI as a lower limit, it is actually treated as an upper limit in the sense that rescheduling tasks can occur as often as 5-6 times per minute (on my systems) with every upload, scheduler request, timer expire, bird transit, and dog bark triggering an examination of the scheduling with rr_sim being run and new decisions being made.

John defends this status quo though I have not seen where he puts up a "real" defense that holds water in the face of the countervailing arguments, like my experience with a "fast" and "wide" system that regularly produces 300 to 500 completed tasks each day ... and the fact that I don't have a long queue and have only rarely had tasks that are really in deadline trouble.

In summary I have been jawboning on the mailing lists about these issues and mostly have been the recipient of the slings and arrows of those that would rather that we ignore issues rather than to recognize them so that they can be corrected... ah well ...

The Brain QC
Send message
Joined: 27 Oct 08
Posts: 27
Credit: 3,211,916
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwat
Message 8437 - Posted: 14 Apr 2009 | 20:42:32 UTC

Try EVGA Precision on EVGA.com, not availaible for monitoring exact ressources GPUs are given, but a really interesting tool to monitor Temps and OC.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8470 - Posted: 15 Apr 2009 | 20:24:35 UTC

GPU-Z and RivaTuner can monitor GPU activity on ATI cards. I recently swapped my NV for an ATI and the "monitors" appeared, no software upgrades required.

Having said that I must admit I don't know a better way than checking the temperature. A long time ago I read something about implementing a 2D downclock on G92 chips for power saving (which NV gracefully omits). I think they used RivaTuner and somehow configured it to detect GPU load. Sorry, can't remember much more.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile JockMacMad TSBT
Send message
Joined: 26 Jan 09
Posts: 31
Credit: 3,877,912
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 8482 - Posted: 16 Apr 2009 | 0:15:12 UTC - in response to Message 8470.
Last modified: 16 Apr 2009 | 0:16:53 UTC

Good point MrS I looked on my ATI machine. When I switched backt to the nVidia machine I lost the GPU Utilization monitors :(

Sorry Paul my bad.

I also have multiple GPU's and the id of the GPU would be useful indeed.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8485 - Posted: 16 Apr 2009 | 2:33:05 UTC - in response to Message 8482.

Good point MrS I looked on my ATI machine. When I switched backt to the nVidia machine I lost the GPU Utilization monitors :(

Sorry Paul my bad.

I also have multiple GPU's and the id of the GPU would be useful indeed.

Well, at the moment, and for what it is worth, 6.6.23 SEEMS to be behaving. Tasks are racking up about 5 to 6:30 in run time which is about what I would expect.

Not sure if my suggestion will take root, probably not ... but only time will tell ...

jrobbio
Send message
Joined: 13 Mar 09
Posts: 59
Credit: 324,366
RAC: 0
Level

Scientific publications
watwatwatwat
Message 8491 - Posted: 16 Apr 2009 | 9:39:54 UTC - in response to Message 8482.

Good point MrS I looked on my ATI machine. When I switched backt to the nVidia machine I lost the GPU Utilization monitors :(

Sorry Paul my bad.

I also have multiple GPU's and the id of the GPU would be useful indeed.


It looks like Everest has some form of GPU monitoring that Nvidia cards can provide. Check out Kazgirls post on this thread

I haven't tested it yet.

Rob

jrobbio
Send message
Joined: 13 Mar 09
Posts: 59
Credit: 324,366
RAC: 0
Level

Scientific publications
watwatwatwat
Message 8492 - Posted: 16 Apr 2009 | 10:15:52 UTC - in response to Message 8491.

I just found this very good article on how to use Performance Monitor (and various other tools) to view the Nvidia cards various performance indicators.

I downloaded Perfkit 6.5 from here

Rob

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8498 - Posted: 16 Apr 2009 | 14:48:32 UTC - in response to Message 8492.

I just found this very good article on how to use Performance Monitor (and various other tools) to view the Nvidia cards various performance indicators.

I downloaded Perfkit 6.5 from here

Rob

I downloaded that toolkit but could not make sense out of it ... but, that article may have some clues on what I might try next time.

For the moment, the problem that I was looking to diagnose is not showing up on 6.6.23 so, the need may be moot...

Hmmm, here is a task that looks to be like it is going to take 7 hours to run ...

Anyway, thanks for the tips ...

Andrew
Send message
Joined: 9 Dec 08
Posts: 29
Credit: 18,754,468
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 8966 - Posted: 26 Apr 2009 | 22:31:40 UTC

@ETA, with RivaTuner you can create an underclocked profile, and your normal profile, plus any extras you might want. Then in the scheduler you can set a trigger for the underclocked profile if the hardware acceleration monitor is 0, or your normal profile if the hardware accel monitor is 1.
On Vista I use Aero, so the HW accel monitor is always on 1; I use framerate as an indicator instead. The framerate is 0 while using CUDA and Aero, so you can rev your card back up for games.

Personally I have two ways of running my hot 8800GT:
- Games I run my 120% overclock with auto fan from the launcher (disable scheduler)
- For crunching I set my fan level manually depending on the amount of noise I can stand, and then have downclocking thresholds set up so that the GPU runs as fast as poss. without getting too hot (not over 83C), and leave it on overnight etc. A constant fan isn't so annoying as a fan constantly oscillating its speed, even by a small amount.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9022 - Posted: 27 Apr 2009 | 21:27:52 UTC - in response to Message 8966.

Thanks for the information. Triggering on the hardware acceleration monitor is indeed what they suggested.. upon reading I remembered :)

So what Paul would need / want is some setting which alarms him when ever the framerate is not zero? Well, or maybe for more than 5s to allow for WU switches. This still couldn't detect hanging or slow WUs, where the GPU is still executing CUDA code.

MrS
____________
Scanning for our furry friends since Jan 2002

Abhinav Gaur
Send message
Joined: 21 Aug 11
Posts: 1
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 21865 - Posted: 21 Aug 2011 | 11:34:14 UTC - in response to Message 8327.

I know this a late reply. I am posting this just for the sake of completing the thread. I think the nvidia-smi is an answer to your problem.

Post to thread

Message boards : Graphics cards (GPUs) : GPU monitoring

//