Message boards :
Graphics cards (GPUs) :
New application version
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
| Author | Message |
|---|---|
|
Send message Joined: 25 Nov 08 Posts: 51 Credit: 980,186 RAC: 0 Level ![]() Scientific publications ![]()
|
The new windows application 6.61 certainly seems to sucking up huge amounts of CPU time. On the the machine with GTX260 card switching on the monitor just gave a black screen - the only way to get the display back was to reboot having first suspended Boinc activity from the other machine. The other machine, with the slower 8600GTS card, didn't suffer from any black screens - I'm not sure why that should be the case. Both machines drive the same monitor via a KVM switch. On both machines I've noticed one of the the other project's tasks getting much less share of the CPU and as a result taking 50% longer than normal to complete. Accordingly, I've had to switch both machines to 1 + 3 mode by setting % of processors to 99. Something I've been able to avoid doing until this point. Phoneman1 |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Well, I didn't announce anything, I just interpreted GDFs comments on 6.61. Seems like I may have been wrong regarding a possible speed up due to the higher priority.. but then it was a question to the guys already running it, not a statement. And Kokomiko, you couldn't possibly get a speedup (due to higher priority) if you're already running 3+1 on a quad. It could only help if you're running 4+1 and your GPU is underutilized. Why we have to use polling at all? Sure, the concept seems very ill-conceived, but that's actually what nVidia proposed for CUDA. @Stefan: the new WU types feature more complex models, so if the time per step goes up on your GTX 260 you approach the performance region of the older cards with the old WUs, where things get sluggish. MrS Scanning for our furry friends since Jan 2002 |
K1atOdessaSend message Joined: 25 Feb 08 Posts: 249 Credit: 444,646,963 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
|
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Why we have to use polling at all? Sure, the concept seems very ill-conceived, but that's actually what nVidia proposed for CUDA. Not sure I was trying to give the impression that it was ill-conceived. Sometimes the use of fast acting polling loop with a large amount of sleeping time are sometimes a good enough solution to the problem. Now that we have the realities and some experience I was asking the question so that maybe we can move in the direction were we are using a more sophisticated mode of operation. If you look closely I was asking broader questions than that. I know it CAN seem like carping if you are in the mood to read it that way. What I am doing is brainstorming what I can see, what I can infer and compare that with what I have experienced as a systems engineer ... Like the memory load. I have been pondering the memory load and the CPU load and I mused about the difference betwixt SaH and GPU Grid and speculated that the SaH CPU load was a lot lower. Of course that is likely because the whole of the SaH task can be loaded into VRAM and then the program executed. GPU Gird appears larger and thus we have load and store actions. THAT said, I also pondered on the faster cards with 1G or more VRAM if we could lower that CPU load if we used bigger slices of data, slices sized for the amount of available VRAM. Using IRQ and or a single polling task which dispatches to "grooming" tasks that do the other operations means that we only have one task that sucks up CPU while doing the polling of ALL of the GPUs and only when grooming needs to be done do we run the other threads. Operationally what would happen is that when BOINC Launched a task it would check to see if there was a polling task, if there was not, one would be launched ... if there was one, it would be used ... in other words, with one GPU we have two tasks running ... for two, three ... and for that lucky guy with 3 GTX 295s, 7 ... Anyway, all this in an attempt to make things better for all of us. Wasted CPU is wasted CPU ... if this IS the only way to get er done ... then we will live with it ... but, I think we should be wanting to look at all of the possibilities ... |
K1atOdessaSend message Joined: 25 Feb 08 Posts: 249 Credit: 444,646,963 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Lots of reported errors with 6.61. Sorry, wrong link. Correct link GDF's Response to issue |
|
Send message Joined: 18 Jul 08 Posts: 33 Credit: 3,233,174 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Finally a version that runs fine with 4+1 Tasks (4 for CPU 1 for ACEMD). Boinc 6.6.2 + Acemd 6.6.1 does the trick. CPU Utilisation is at about 12% (Quad Q6600 @3.4 GHZ) Finally a Version that produces same heat for the GTX260 as 3+1 Tasks (one Core dedicated for GPUGRID via using "processlasso" 4+1 with 6.6.2 and 6.5.0 produced about 18% CPU usage. way to go guys - i´m really impressed that you nearly got it working now. now only thing left is the implementation of a less time-consuming polling. |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Finally a version that runs fine with 4+1 Tasks (4 for CPU 1 for ACEMD). YOu be lucky ... I just tried it and my results stay the same ... 22% CPU usage ... which means essentially one full core used only for baby sitting ... |
Venturini Dario[VENETO]Send message Joined: 26 Jul 08 Posts: 44 Credit: 4,832,360 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I can just point out: before - 1 hour CPU time per WU Now - 6 hours NOT the way to go. I strongly encourage a solution P.S. On linux |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi Paul, about the "ill-conceived": that was actually coming from me, I was nto trying to read it into your comment. I think it's a wrong concept that the CPU has to ask the GPU if it's ready. The GPU should be able to tell the CPU about this. Don't know if there is an IRQ for this.. seems too straight forward if it was ;) And I forgot about your comment regarding one polling loop for all GPUs. Surely seems like a good idea, but may not be neccessary if they can get the cpu usage down as expected. The only problem I see with that approach is that in BOINC every task is insulated from the others. This is by design and it's actually a strength.. it lets you utilize multi cores rather easily and protects you from all kinds of strange interaction (which you usually get when you parallelize outside a core). One would have to program this carefully, but it seems manageable nevertheless. Regarding the VRAM: right now the WUs use about 70 - 80 MB of VRAM. More complex models could be simulated, which would require more memory, but then they time per step would go up and the screen would get really sluggish even on GT200 cards. I guess that's not what people want right now ;) Similarly if one told the GPU to compute several steps within one polling interval (to reduce the polling frquency) and I don't know if it would be possible at all. bottom line: I don't see potential for improvement by using more VRAM. Regards, MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Oh, and my 6.61 result for a 2479 credit WU, running BOINC 6.5.0 under XP 32 and 4+1: app 6.61 CPU time | time per step 28500 | 67.8 app 6.55 CPU time | time per step 14700 | 77.5 21200 | 72.3 20600 | 75.6 17300 | 87.0 (odd) 21200 | 70.4 19200 | 72.8 If I discard the odd one I get an average of 19400s and 73.7 ms/step for the old app. So it seems I am seeing an increase in GPU speed and an increase in CPU time. Based on my current RAC of 5000 that would give me 5430 RAC. Not bad. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 25 Nov 08 Posts: 51 Credit: 980,186 RAC: 0 Level ![]() Scientific publications ![]()
|
CPU time. Based on my current RAC of 5000 that would give me 5430 RAC. Not bad. But has your other project(s) suffered more than GPU has gained??? You rarely get anything for nothing! Phoneman1 |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
CPU time. Based on my current RAC of 5000 that would give me 5430 RAC. Not bad. Yes ... And this is especially acute if GPU Grid is not a high interest project for you. I don't know, there are, what, 211 protean folding projects out there? If not that many, maybe only 197 ... :) Anyway, the question is not directly winners and losers on a project by project basis, but science as a whole ... The change in 6.55 to 6.61 increased by 50% the load on the CPUs, this means that for every hour of GPU Grid I lose half an hour for some other project just due to idle polling. I know they are working on it and I am NOT ranting ... I am just expressing a little frustration in the way things are working and though I have been dedicating more resources here than I normally would it is because this is a new technology I want to see get off the ground. THAT said, I will also say that at this time I like the project's responsiveness with ETA "G" whatsis name stopping by to see what is on our minds (if anything, sometimes mine is a blank) ... but I do not want it to be forgotten that "playing nice" is important ... Just as an example, if GPU Grid cannot get the CPU load down for their project tasks ... well I stop work for project all the time ... or reduce the share they get ... the only advantage they have at the moment is that they are the only science project around using the GPU on BOINC ... SETI@Home is not doing science, they are exploring, whole nother animal ... Anyway, in the last month I have gone from a 9800 GT to add a GTX 280 AND just now a GTX 295 ...which experience I will talk about in another thread soon ... |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi Paul, Hi ... No harm, no foul (Fowl?) ... I worry about communication sometimes because people read into what I write sometimes ... and I know I miss a lot of the detail because I am so literal minded ... a regular Sar Trek Mr. Spock type ... that is me ... The video cards do have an IRQ assigned to them which is why I asked about using the IRQs instead. And I forgot about your comment regarding one polling loop for all GPUs. Surely seems like a good idea, but may not be neccessary if they can get the cpu usage down as expected. The only problem I see with that approach is that in BOINC every task is insulated from the others. This is by design and it's actually a strength.. it lets you utilize multi cores rather easily and protects you from all kinds of strange interaction (which you usually get when you parallelize outside a core). One would have to program this carefully, but it seems manageable nevertheless. If they are going to continue to use a polling loop this is still a good idea. The point being that we reduce the number of poling loops to the minimum required, which is one. As to BOINC and task isolation this is not necessarily a show stopper. The first task starts, checks to see if there is a polling task alive, if not it starts on, otherwise it continues after regestering with the live polling loop. In between times it is servicing its GPU the task would sleep ... the polling loop would not strictly be a BOINC task, the BOINC task would be the task that you currently have sans the polling loop ... Regarding the VRAM: right now the WUs use about 70 - 80 MB of VRAM. More complex models could be simulated, which would require more memory, but then they time per step would go up and the screen would get really sluggish even on GT200 cards. I guess that's not what people want right now ;) Similarly if one told the GPU to compute several steps within one polling interval (to reduce the polling frquency) and I don't know if it would be possible at all. bottom line: I don't see potential for improvement by using more VRAM. Just a thought ... when I look at the application using one of the nvidia tools I would see about 50% memory load ... sometimes the way to reduce loads like that is to upload more stuff if there is room ... the bottom line is I want more efficiency ... :) |
|
Send message Joined: 2 Jan 09 Posts: 40 Credit: 16,762,688 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
When the beastly software gets out of control, don't fear to slay it. ;-) Meanwhile, I wonder why the Windows port of GPUGRID loads the CPU so... Today I see loads of 12% to 14% per core on a 2.4 GHz Phenom under Linux 2.6.28 and nvidia-kernel 180.22 with GPUGRID application version 6.59 and GTX 260 GPU hardware. Could it be that my GPUs are underutilized? Or is the Windows software that inefficient? |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Could it be that my GPUs are underutilized? You are now a witness to a great truth ... Microsoft does not actually write software that is all that good. :) Though the load of 12-14% is also pretty high as these things SHOULD be going. On a quad that is a 50% CPU load (25% of the total power of the system) ... On my windows quad I am seeing about 22% load and on the i7 about 7% per GPU task (I now have 3 GPU CPUs running, two in the new GTX 295 and the GTX 280) ... This is why i was suggesting a single polling loop for all GPU Grid processes with the processes registered with BOINC being the processes that groom the individual GPU core. I tried to get a GPU going on the linux box and I guess I am too stupid because I was not able to do so. When I D/L the driver package it told me I needed other things and it was just too much of a mess that I said the heck with it for the moment. I may play with it later, but I fell out of love with DOS style command lines back about the time I bought my Lisa ... Anyway, you are losing half the processing power of your core just to see if the GPU needs attention. the real cost should be about 1% ... anything above that is actually quite high. And if it is per GPU core, that 3% figure for my system for example, would mean that 6% would be really, really wasted overhead. I don't mind 3% overhead that much, but 3% times 3 cores is 9% of the system CPU in a idle detection loop ... not good ... So, I am back to my IRQ, single polling thread, or the ability to select "Nice" vs Performance for the GPU ... |
|
Send message Joined: 25 Aug 08 Posts: 143 Credit: 64,937,578 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Got a 6.61 WU. 30-40% of CPU on Athlon X2. GTX 260. BOINC 6.5.0. It's terrible. |
|
Send message Joined: 21 Oct 08 Posts: 144 Credit: 2,973,555 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
Tried a few 6.61 units on my Athlon X2 3800+, 9500GT, BOINC 6.50, 32-bit Vista Home Premium. The GPU tasks took over one full core (never dropping below 47% of total CPU usage, going as high as 73%, and most typically in the range of 52-56%). With GPU task running, the machine is virtually unusable as it takes several seconds to switch between simple tasks such as a Mozilla web page tab and an already open BOINC manager. Unfortunately, I am forced to suspend crunching GPUGRID on this box at least until another app version is available. |
|
Send message Joined: 20 Aug 07 Posts: 18 Credit: 1,319,274 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Something is going on with the project. Athlon 64 X2 6000 Win Vista Home Premium 32 bit SLI 2- XFX 9600 GSO with 798 MB DDR2 RAM. (almost 1.6 MB total) 2 Gig PC6400 DDR2 This started just a couple of days ago. When working on multiple BOINC projects along with GPU@Home, the work units under GPU@home operate normally, however the other projects just seem to crawl along. Its been taking some 8 actual seconds to to complete one CPU second of work. A couple of days ago, there was no conflicts in any BOINC project. How much actual memory is this project taking to cause such a slow down. It just seems to go against the realm of CUDA and GPU processing.
|
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Something is going on with the project. The project is still in the early days and this is a major complaint with the GPU Grid application. The CPU usage is much higher than desired / expected. The project is aware and is promising a newer version of the application that uses less CPU though that has not happened yet. The latest release 6.61 almost doubled the CPU useage over 6.55 where we had hoped that the opposite would happen, that the usage of 6.55 would be halved. The idea was that the GPU run times would be lowered. THough that has not been my experience to this point. Most of us agree that the theroetical ideal would be for negligible CPU use by the GPU tasks on the order of 1% to 3% of the core usage where the lowest I have experienced is 3% of total system resource usage. Sadly, we are stuck in the situation that if we want to keep the GPU spinning we have to pay for it with significant reductions in the productivity of the CPU projects we support. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Scott, are your cpu usage numbers set to 100% = 1 core? Otherwise they would look strange. And the very slow responsiveness you're seeing is due to your relatively slow card (32 shaders vs recommended 50+). Nothing has changed here during the different client versions. But the WUs have changed, now we also crunch more complex models, which take even longer to process (and thus make the lag worse) MrS Scanning for our furry friends since Jan 2002 |
©2025 Universitat Pompeu Fabra