Advanced search

Message boards : Graphics cards (GPUs) : Need 6.3.14 Multi-GPU Help.....

Author Message
naja002
Avatar
Send message
Joined: 25 Sep 08
Posts: 111
Credit: 10,352,599
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 2911 - Posted: 9 Oct 2008 | 16:46:37 UTC


Hey Everybody,

Going to copy some quotes over from this thread in order to start a new thread (this one) and not derail that thread.

First is the Probelm:

Anybody tried multi-gpu on 6.3.14 yet? I'm having issues on my 2x GPU rig. It runs F@H GPU2 fine. My other 2 rigs are single GPU and I'm using the cc_config file and not having any issues.

On the multi-GPU rig:

Vista x64
2x XFX 8800GS

Same effect whether I use the cc_config file or not. I start Boinc (I run WCG and GpuGrid) and it will start running 4 of the WCG WUs and 2 of the GpuGrid WUs, but after a few minutes....sometimes 10-15mins....it will kick both of the GpuGrid WUs into "Waiting to run" mode. The 4 WCG WUs will continue on...and it will not start any other WUs to replace the GpuGrid WUs. For some reason its kicking them out. This doesn't happen on either of my single GPU rigs. And as state before: this is with or w/o the cc_config file, and it doesn't seem to matter what the ncpus is set to when the file is used. Yes, I rebooted. Desktop is extended via vga-dvi dummy. I tried them OCed and at stock.

Driver should be 177.92

All 3 rigs are running Vista x64.

Kinda lost atm...just wondering if anyone else has tried multi-gpu with this version.....?


I tried the suggestions made in the other thread--without luck:


Is your ressource share >33%? I'm not sure if it matters for GPUs, but it may confuse BOINc if it's too low. And something else: the BOINC scheduler sometimes gets confused by the debt. I suggest the following: stop BOINC, open the client_state.xml in the editor, search for "debt" and you'll find <long_term_debt> xxxxx </long_term_debt> and <short_term_debt> yyyyy </short_term_debt> for every project. Set them all to 0 and see if it helps.

MrS



So, now I'm a bit more lost then before......
It seems to run the 2 WUs for ~10 mins and then kicks them into "waiting to run" mode as quoted above. At least once last night and once today...Boinc went into "Not Responding" mode. Today it showed 3 WCG WUs at 25% cpu, 1 at 0% and the Boinc Manager at 25% while in Not responding mode.

According to the Vista Performance Monitor--When--the GPUGRID WUs are running....they are only using less then 1% cpu....like 0.06-0.01, so something isn't right.

According to boincview the 2 1x GPU rigs run ~200Mflops, but the 2xGPU rig was running:

5.8Mflops
7.7Mflops




So, something is definitely wrong here. Things ran better before I changed both the File and the resource share (33%). I've changed the--long_term_debt--back to zero many times while trying to sort this out.

Not sure where to go from here.

The 2xGPU rig is:

Q6600 @ 3.58
Asus P5K-E
G.Skill 2x 1GB 1066
OCZ 800w PSU
2x XFX 8800GS SC
Vid driver is 177.92
Boinc Ver. 6.3.14
GPU1 connected via KVM
GPU2-Monitor extended via Vga-Dvi Dummy (works fine on F@H GPU2)
Cpu, NB, and Both GPU are water cooled.


This is my first time trying Multi-GPU on GPUGRID. Prior was single GPU. I run multi-GPU on F@H GPU2-NP.

Anybody having any luck or problems with this 6.3.14 and Multi-GPU?

naja002
Avatar
Send message
Joined: 25 Sep 08
Posts: 111
Credit: 10,352,599
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 2921 - Posted: 9 Oct 2008 | 18:43:55 UTC - in response to Message 2911.
Last modified: 9 Oct 2008 | 19:02:09 UTC

Ok, this isn't going to make much sense, but things actually run Better! with both of the F@H GPU2 clients going.

The GG WUs show 7-9% cpu usage in task manager, flops go up into the Gflops...according to Boincview.

Yes, it still kicks the GpuGrid WUs into "waiting to run" mode after ~10mins, but this is not making any sense what-so-ever.....

BTW, Flops are all over from Kflops, Mflops, Gflops....

Also, in the screenshot above...the 2 GG WUs on Farm1 and Farm2 are also Yellow--which means that they are "Running Slow" according to Boincview. Don't know why, but I just learned that (why they were yellow) a little bit ago.

naja002
Avatar
Send message
Joined: 25 Sep 08
Posts: 111
Credit: 10,352,599
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 2926 - Posted: 9 Oct 2008 | 19:39:06 UTC - in response to Message 2921.
Last modified: 9 Oct 2008 | 19:42:34 UTC



A couple of screenshot--these are from Farm1 which is a single GPU rig. Trying to make sense of this stuff, but....this doesn't make any sense to me:


With the F@H GPU2 client paused (--it works the same w/o the client even started). Date, time and paused F@H GPU2 client at bottom right:




F@H GPU2 client restarted and GG WU flops go up about 6x:




Cpu useage changes also. This make any sense to anybody?



naja002
Avatar
Send message
Joined: 25 Sep 08
Posts: 111
Credit: 10,352,599
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 2929 - Posted: 9 Oct 2008 | 21:00:49 UTC - in response to Message 2926.

Tried actually hooking up a 2nd monitor--no change. With the F@H GPU2 clients running the Flops of the GG WUs goes up 6x....but it goes down on 3 out of 4 WCG WUs. Cpu usage increases on the GG WUs and decreases on 3 out of 4 WCG WUs. So, its just robbing cpu from the WCG WUs....


Been running these GPUs at stock since last night. Just shutdown Rivatuner 2.10--doesn't appear to be any change. I restarted Boinc with RT shutdown....


Not sure what to try next. I'm pretty lost.....



ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 2932 - Posted: 9 Oct 2008 | 22:23:56 UTC

How does BoincView determine the GFlops? Does it know anything about the GPU-client? If it tries to judge GFlops by the BOINC benchmark and CPU time it will get things totally wrong. So this "6x increase" from 200 MFlops to ~1 GFlop doesn't tell you much. A 8800GT is capable of ~500 GFlops, so in practice it should achieve at least tens of GFlops, if not 100+.

MrS
____________
Scanning for our furry friends since Jan 2002

naja002
Avatar
Send message
Joined: 25 Sep 08
Posts: 111
Credit: 10,352,599
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 2934 - Posted: 9 Oct 2008 | 22:44:21 UTC - in response to Message 2932.

How does BoincView determine the GFlops? Does it know anything about the GPU-client? If it tries to judge GFlops by the BOINC benchmark and CPU time it will get things totally wrong. So this "6x increase" from 200 MFlops to ~1 GFlop doesn't tell you much. A 8800GT is capable of ~500 GFlops, so in practice it should achieve at least tens of GFlops, if not 100+.

MrS




You are correct. I just timed the GG WU % increase with and w/o the F@H GPU2 client running. With it running--a single step increase in % took 31 seconds. W/o it running it took 7-8 seconds. So, things move along faster W/O the F@H GPU2 running.

I started assuming what You are getting at: that BV only reads the amount of CPU used, so when F@H kicks in the GG WU is pushed to use more Cpu--increasing the flops reading in BV. W/O F@H running the GG WU % increases almost 4x faster. So, it IS using the GPU and does run faster w/o the F@H client.

Now I still have the multi-GPU issue...Why both GG WUs kick into "waiting to run" mode after ~10 mins.....?

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 2951 - Posted: 10 Oct 2008 | 7:58:37 UTC - in response to Message 2934.

I started assuming what You are getting at: ...


You're right. And I'm sorry I can't help you with your actual problem.

MrS
____________
Scanning for our furry friends since Jan 2002

naja002
Avatar
Send message
Joined: 25 Sep 08
Posts: 111
Credit: 10,352,599
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 3017 - Posted: 13 Oct 2008 | 11:23:57 UTC - in response to Message 2951.
Last modified: 13 Oct 2008 | 11:57:18 UTC

I started assuming what You are getting at: ...


You're right. And I'm sorry I can't help you with your actual problem.

MrS




NP.....I just need to get it out here in hopes that we can get it figured out...and apologies for the delay, wkends are my busy time, so I have very little time/energy to mess with this...

I think that I may have realized what the problem is: On this particular setup, it seems that the GG WUs run fine--as long as they have access to a core each.

I let the GG WUs run over the wkend on the other 2 rigs, but turned them off on this rig and ran F@H, because anything else would have just been an exercise in futility. I did not suspend them though....I just let boinc kick the GG WUs into "waiting to run" mode and fired up the F@H clients. Apparently as the deadline approached Boinc started them up, but only a total of 4 cores (ie, 2x WCG WUs and 2x GG WUs). Its been running through the GG WUs just fine that way.

This has been reflected over the wkend and via Flops in boincview--its using the GPU....but needing 1 core each to do it.

So, apparently there is still a bug on the multi-gpu level that needs to be worked out. I definitely can't do it! ;) WCG is my main project, so I need to run it at 100%. I realize that by running GPU clients--I'm not actually running WCG at 100%. But that's ok. The little bit of cpu that properly working GPU clients take is acceptable to me. But I cannot run the GG WUs on this rig the way things currently are...it cuts my WCG production in half.

I changed the cc_config back to 6 cpus ~15 mins ago so now I'm waiting to see if it kicks something into waiting to run mode....maybe the deadline will take priority and it will run....maybe it will kick a couple of WCG WUs out....or finish a couple and not restart more. Right now--I don't know, but I'm going to keep an eye on it. I expect to change the cc_config file to 5 cpus and see if maybe the GG client is happy with 50% core each...never know.....

I'll report back....

EDIT: Its been running fine now for the last 35mins--doesn't mean much. But I changed the cc-config to 6 cpus (which didn't help before). Flops in BV went from Gflops down to ~200 Mflops (-which is about where it should be in my experience). Task Manager is showing that the GG WUs are using 0-3% cpu with an average of ~2%. So, its working exactly as it should--ATM. Not sure why, and I don't expect it to last, but I would guess that it has something to do with the GG WU deadlines....Again, I'll report back....


EDIT 2: Strange......it completed a WCG WU and started another one--just like its supposed to. Its been running for ~50 mins now and is overall running--exactly--like its suppose to.....hmmmmm....not sure what's going on other than some connection to the GG WUs deadline....but even that is making less sense...

naja002
Avatar
Send message
Joined: 25 Sep 08
Posts: 111
Credit: 10,352,599
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 3019 - Posted: 13 Oct 2008 | 12:27:30 UTC - in response to Message 3017.
Last modified: 13 Oct 2008 | 12:28:56 UTC



EDIT 3: Just wanted to add a thought/note: The WCG WUs are "Running", the GG WUs are "Running, high priority". In the F@H GPU2 config it is necessary to select:

"Core Priority: Slightly higher {use if other distributed computing applications are stopping F@H}".

My guess is: The "High Priority" given to the GG WUs because of the dealine is what's causing things to work properly, and that this "Slightly Higher" priority needs to be built into either the WUs or the GG client.....

Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 288,598,037
RAC: 1,971,738
Level
Asn
Scientific publications
watwatwatwatwatwatwatwat
Message 3022 - Posted: 13 Oct 2008 | 13:18:03 UTC - in response to Message 3019.

"Running, high priority" does only mean that the BOINC scheduler is giving the running task a higher priority to meet the deadline and it does not obey the resource share until the project which is running in high priority has computed the task which is in a deadline trouble.

It has nothing to do with the priority settings for Windows tasks.
Therefore it has also nothing to do with the setting in the FAH GPU2 client.

We have made some test a few days ago where we tried to set a higher priority for GPUGRID tasks than for normal BOINC tasks. There wasn't any difference if they ran with normal priority or with the lowest...
____________

pixelicious.at - my little photoblog

naja002
Avatar
Send message
Joined: 25 Sep 08
Posts: 111
Credit: 10,352,599
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 3024 - Posted: 13 Oct 2008 | 14:14:39 UTC - in response to Message 3022.
Last modified: 13 Oct 2008 | 14:19:06 UTC

"Running, high priority" does only mean that the BOINC scheduler is giving the running task a higher priority to meet the deadline and it does not obey the resource share until the project which is running in high priority has computed the task which is in a deadline trouble.


Right, but it also means that there is a way for "slightly higher priority" to be given within the Boinc Manager....

It has nothing to do with the priority settings for Windows tasks.
Therefore it has also nothing to do with the setting in the FAH GPU2 client.


I understand that priority control is within the Boinc manager. Also, I'm not sure if the 2nd statement is a mis-statement or not--I realize that the F@H config/client/etc....has no effect/control over Boinc or the projects within. I was just pointing to a necessity of a similar client. It may turn out to be a necessity for GG also....that's all.


We have made some test a few days ago where we tried to set a higher priority for GPUGRID tasks than for normal BOINC tasks. There wasn't any difference if they ran with normal priority or with the lowest...


I understand what you are saying. At this point--I don't know what's going on--all I know is what my experiences were and are. Right now everything is running beautifully. Why? I have no idea. I've been busy and haven't changed anything until I got up this morning. Now instead of running 4 WUs (2xwcg + 2x GG), its running 6 (4x wcg + 2x GG)....the only thing I've changed this morning was the cc_config file to 6cpus. <But that didn't help last week. The only difference that I can see is the "Running, High priority" because of the expired deadline--that's it. What's going on? I don't know. I'm shooting in the dark here using what I've got to work with to make guesses.....that's all.

3 WCG WUs have completed and 3 more have started. So, everything is running as it should. I won't really know what the deal is until the first running GG WU completes....then I'll be able to see what happens....

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3037 - Posted: 13 Oct 2008 | 20:38:00 UTC

Note, there is some new intended behavior in 6.3 clients.

This case will occur when there are CPU tasks in deadline trouble "High Priority", it will run those in addition to a CPU/CUDA task in order to keep the GPU in full use too.

"High Prioirity" only means they are on top of the boinc to do list, so they get done first, and can be done outside of your resource share, things will balance out in the long run, giving this project more time now to complete before deadline, and less time later to even the resource shares back out. It has nothing to do with the system priority like Low,Normal,High,etc.

Everyone should read this new wiki note which expalins the new behavior of 6.3 clients

The new behavior will be to make max use of CPUs and GPUs, even if it means sometimes using more CPU than you physically have or have allocated.

naja002
Avatar
Send message
Joined: 25 Sep 08
Posts: 111
Credit: 10,352,599
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 3038 - Posted: 13 Oct 2008 | 21:17:14 UTC - in response to Message 3037.
Last modified: 13 Oct 2008 | 21:17:59 UTC

Ok, well its working like its supposed to on all 3 rigs (=2x 1GPU = 1x 2GPU). I have no idea what changed on this multi-GPU rig, but its working. The 2 High priority WUs completed and each time it started a new GG WU and is running them in regular running mode.

Hoping it lasts....

Post to thread

Message boards : Graphics cards (GPUs) : Need 6.3.14 Multi-GPU Help.....

//