Message boards :
Graphics cards (GPUs) :
GPU monitoring
Message board moderation
| Author | Message |
|---|---|
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Does anyone know of a GPU monitoring tool? I am trying to find one that tells you if the GPU is actually doing something. Right now the only way that I know we can sniff this out is to look at the temperatures of the cores to see if they are running "hot"... I tried a debug tool off the Nvidia site but it seemed to be oriented towards being used to debug graphics applications ... besides I really only need to know if the GPU is running tasks ... |
|
Send message Joined: 15 Feb 09 Posts: 55 Credit: 3,542,733 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
There's no direct way that I know of, only incidentals. Clock speed and GPU temp are usually a pretty good indicator of the current load on the GPU. GPU-Z is a nice lightweight program to see all that. |
|
Send message Joined: 16 Dec 08 Posts: 16 Credit: 10,644,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
actually there is a Cuda-Z app available for download from sourceforge, just google search cuda-z which gives info about the gpu core performance |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
actually there is a Cuda-Z app available for download from sourceforge, just google search cuda-z which gives info about the gpu core performance Thanks for the hint, but, it registers operational numbers regardless if anything is running. So, it is logging the effective speed and not the operational state a-la Task Manager for the CPUs in the system. |
JockMacMad TSBTSend message Joined: 26 Jan 09 Posts: 31 Credit: 3,877,912 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
GPU-Z. Go to the Sensors tab and it shows GPU Load. Not sure how it works this out though and how accurate it is. RivaTuner also has a GPU Usage meter. |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
GPU-Z. well, I just installed RivaTuner 2.24 and I sure don't see a usage meter, only the ability to configure registry entries. I guess I don't get it ... and the documentation so far has not been much help to me ... sigh ... |
|
Send message Joined: 13 Mar 09 Posts: 59 Credit: 324,366 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
|
JockMacMad TSBTSend message Joined: 26 Jan 09 Posts: 31 Credit: 3,877,912 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Paul on the Riva Tuner Main Tab look just under where you select the GPU and it will say something like (off my ATI machine but heh) 256-bit RV770. At the end of this field is a small button looking like a 3,4,5 triangle. Select that and a mini popup bar opens. On the end is Hardware Monitoring. This will open graphs for each present GPU and you will see GPU Usage is one of them. |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Paul on the Riva Tuner Main Tab look just under where you select the GPU and it will say something like (off my ATI machine but heh) 256-bit RV770. At the end of this field is a small button looking like a 3,4,5 triangle. Select that and a mini popup bar opens. On the end is Hardware Monitoring. This will open graphs for each present GPU and you will see GPU Usage is one of them. Hmm, I had not tried that button because I did not want to make changes ... :) NOw that I found it I just tried it and So far the closest I can find is the graph for the CPU, but it is the main CPUs and not the CPUs on the GPU Cards. THe rest are the same of temperature sensors and clocks. |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Would this do any good?: Rob, Yeah this was the other tool I have tried that seems to only be able to relate and graph WITHIN a graphics application. Again, I may be missing something and not using it right. I am also asking Rom to make a change to the system that will change the output message so we can see which GPU is being scheduled by BOINC to be used on a task. BUt, the question is still out there if BOINC is properly scheduling tasks on the available GPUs. It is hard to tell what is going on and the symptom is that tasks that should run in 6 hours are taking as long as 24 hours to run. The first question is if this bug is in 6.6.20, our imagination, or something else, like bad tasks from GPGU Grid. |
Stefan LedwinaSend message Joined: 16 Jul 07 Posts: 464 Credit: 298,573,998 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Well, I don't think the problem is only in our imagination Paul... Since I got my HD4870 fo Milkyway, I swapped the GTX 260 in my Linux box with another 8800GT card. During the GPUGRID server outage I had some time to watch the tasks running. I watched two pxxx-GIANNI- tasks that started at the exactly same time. The weird thing is they needed the same wall-clock time to complete (also CPU time, which should be different for these cards)! I'd say the GTX260 should be a little bit faster than a 8800GT! ;-) The host where I saw this is - http://www.gpugrid.net/results.php?hostid=32169. BOINC client version is 6.6.20 for Linux x86_64. Looking at the stderr of the last four results (521310,521292,521244,521238) I completed during the server outage, I noticed that the WUs never ever only ran on one GPU. They always switched between the two cards. Some WUs only once, and some more often... But actually my "switch between applications every x minutes" is set to 1440 minutes (=24 hours, but that's ignored by BOINC anyway it seems) and there was nothing in deadline trouble. Nothing was running at "high priority", so there never shouldn't be any task switching between the two cards. To me it looks that this is slowing down the computation a good bit... [edit]Sorry Paul, I know it's off-topic in this thread, but I just wanted to show you that it looks like there's really something wrong with the GPU scheduling if you have two ot more (different) cards in one computer... pixelicious.at - my little photoblog |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Well, I don't think the problem is only in our imagination Paul... Not at all off topic ... why do you think I am searching for tools? :) I also asked Rom to change the output message listing the task vs. GPU to indicate the actual GPU vice "(0.18 CPUs, 1 CUDA)" to be "(0.18 CPUs, CUDA x)" so we can track this ... and the tools to see what is actually being used ... I know that I am one of the very few that has multiple GPUs in a single system and as usual on the cutting edge ... I have been concerned about the resource scheduler (formerly CPU Scheduler) for a long time and your intuition may be correct. The problem is that it seems to be a little intermittent so it is not clear why it is happening. One of the questions I have is if the tasks are being run on one of n GPUs leaving one GPU idle ... in my case it would be running one two of four or three of four while still attempting to run 4 total tasks ... You are very correct that BOINC does not respect the TSI as a lower limit, it is actually treated as an upper limit in the sense that rescheduling tasks can occur as often as 5-6 times per minute (on my systems) with every upload, scheduler request, timer expire, bird transit, and dog bark triggering an examination of the scheduling with rr_sim being run and new decisions being made. John defends this status quo though I have not seen where he puts up a "real" defense that holds water in the face of the countervailing arguments, like my experience with a "fast" and "wide" system that regularly produces 300 to 500 completed tasks each day ... and the fact that I don't have a long queue and have only rarely had tasks that are really in deadline trouble. In summary I have been jawboning on the mailing lists about these issues and mostly have been the recipient of the slings and arrows of those that would rather that we ignore issues rather than to recognize them so that they can be corrected... ah well ... |
|
Send message Joined: 27 Oct 08 Posts: 27 Credit: 3,211,916 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Try EVGA Precision on EVGA.com, not availaible for monitoring exact ressources GPUs are given, but a really interesting tool to monitor Temps and OC. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
GPU-Z and RivaTuner can monitor GPU activity on ATI cards. I recently swapped my NV for an ATI and the "monitors" appeared, no software upgrades required. Having said that I must admit I don't know a better way than checking the temperature. A long time ago I read something about implementing a 2D downclock on G92 chips for power saving (which NV gracefully omits). I think they used RivaTuner and somehow configured it to detect GPU load. Sorry, can't remember much more. MrS Scanning for our furry friends since Jan 2002 |
JockMacMad TSBTSend message Joined: 26 Jan 09 Posts: 31 Credit: 3,877,912 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Good point MrS I looked on my ATI machine. When I switched backt to the nVidia machine I lost the GPU Utilization monitors :( Sorry Paul my bad. I also have multiple GPU's and the id of the GPU would be useful indeed. |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Good point MrS I looked on my ATI machine. When I switched backt to the nVidia machine I lost the GPU Utilization monitors :( Well, at the moment, and for what it is worth, 6.6.23 SEEMS to be behaving. Tasks are racking up about 5 to 6:30 in run time which is about what I would expect. Not sure if my suggestion will take root, probably not ... but only time will tell ... |
|
Send message Joined: 13 Mar 09 Posts: 59 Credit: 324,366 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Good point MrS I looked on my ATI machine. When I switched backt to the nVidia machine I lost the GPU Utilization monitors :( It looks like Everest has some form of GPU monitoring that Nvidia cards can provide. Check out Kazgirls post on this thread I haven't tested it yet. Rob |
|
Send message Joined: 13 Mar 09 Posts: 59 Credit: 324,366 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
|
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I just found this very good article on how to use Performance Monitor (and various other tools) to view the Nvidia cards various performance indicators. I downloaded that toolkit but could not make sense out of it ... but, that article may have some clues on what I might try next time. For the moment, the problem that I was looking to diagnose is not showing up on 6.6.23 so, the need may be moot... Hmmm, here is a task that looks to be like it is going to take 7 hours to run ... Anyway, thanks for the tips ... |
|
Send message Joined: 9 Dec 08 Posts: 29 Credit: 18,754,468 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
@ETA, with RivaTuner you can create an underclocked profile, and your normal profile, plus any extras you might want. Then in the scheduler you can set a trigger for the underclocked profile if the hardware acceleration monitor is 0, or your normal profile if the hardware accel monitor is 1. On Vista I use Aero, so the HW accel monitor is always on 1; I use framerate as an indicator instead. The framerate is 0 while using CUDA and Aero, so you can rev your card back up for games. Personally I have two ways of running my hot 8800GT: - Games I run my 120% overclock with auto fan from the launcher (disable scheduler) - For crunching I set my fan level manually depending on the amount of noise I can stand, and then have downclocking thresholds set up so that the GPU runs as fast as poss. without getting too hot (not over 83C), and leave it on overnight etc. A constant fan isn't so annoying as a fan constantly oscillating its speed, even by a small amount. |
©2025 Universitat Pompeu Fabra