Advanced search

Message boards : Graphics cards (GPUs) : No work? Fixed a bug in the scheduler

Author Message
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4641 - Posted: 20 Dec 2008 | 21:41:54 UTC
Last modified: 20 Dec 2008 | 21:42:25 UTC

Thanks for all your feedbacks. BOINC dev finally found the problem on the scheduler. I have already upgraded the server software please report if the problem is fixed.

Thanks, gdf.

Profile koschi
Avatar
Send message
Joined: 14 Aug 08
Posts: 124
Credit: 792,979,198
RAC: 799
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 4643 - Posted: 20 Dec 2008 | 21:49:39 UTC - in response to Message 4641.

For me the problem is fixed. Earlier today I didn't get any new work, my Dualcore with 8800GTS was out of work :-/
Just some minutes ago I forced an update, got 2 units and now the GPU is slowly heating up again :-D

Great :)

Profile Nognlite
Send message
Joined: 9 Nov 08
Posts: 69
Credit: 25,106,923
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 4645 - Posted: 20 Dec 2008 | 21:51:31 UTC - in response to Message 4641.

Got new work WU without manual update!! Ya-a-ay!!

Still wierd is one WU at 32hrs is high priority and one at 38hrs is normal. Both are 850 step WU's.

As well the two new WU's are 500 step 1887 credit WU's one at 64hrs and one at 6.5hrs. Seems way off but my DCF is heading back down for both my rigs.

Pass on the Well Done's!!!, GDF.

Cheers

Pat

Profile Nightlord
Avatar
Send message
Joined: 22 Jul 08
Posts: 61
Credit: 5,461,041
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 4647 - Posted: 20 Dec 2008 | 22:13:24 UTC

Thankyou GDF, my GPU's are running warm again!

Please pass on our thanks to the developers on a weekend.

NL
____________

Profile Megacruncher TSBT
Send message
Joined: 7 Aug 08
Posts: 8
Credit: 5,690,158,672
RAC: 593,010
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4648 - Posted: 20 Dec 2008 | 23:57:38 UTC - in response to Message 4647.

All my GPUs are back in business too! Many thanks.

YeeHaarr!

____________
The Scottish Boinc Team

samsausage
Send message
Joined: 18 Nov 08
Posts: 12
Credit: 70,480,919
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4649 - Posted: 20 Dec 2008 | 23:58:00 UTC
Last modified: 20 Dec 2008 | 23:58:44 UTC

I still can't get any WU's using 6.4.2

I was getting them but it stopped, I tried suspending all projects and manually updating, this worked at first but it's not working for me anymore.

Getting the "not available for your type of computer" message

Rowpie of The Scottish Bo...
Send message
Joined: 20 Dec 08
Posts: 4
Credit: 3,155,051
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 4650 - Posted: 21 Dec 2008 | 0:27:54 UTC - in response to Message 4649.

Indeed thanks.

Today was my first attempt at getting my new system online and i was convinced i was doing something wrong however it's nice to know it wasn't my side.

Many thanks for the regular updates and the actual fix its self.
____________

Desti
Send message
Joined: 10 Jul 07
Posts: 19
Credit: 1,272,950
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwat
Message 4651 - Posted: 21 Dec 2008 | 0:37:40 UTC

If it is still not working try this:

-detach from grpugrid
-stop boinc
-remove all files with *gpugrid* that where left in the boinc dir
-restart boinc and attach to gpugrid
____________
Linux Users Everywhere @ BOINC

Profile Nognlite
Send message
Joined: 9 Nov 08
Posts: 69
Credit: 25,106,923
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 4654 - Posted: 21 Dec 2008 | 3:13:15 UTC
Last modified: 21 Dec 2008 | 3:13:57 UTC

Well, I executed a manual update and got:

12/20/2008 10:04:51 PM|GPUGRID|Sending scheduler request: Requested by user. Requesting 1059370 seconds of work, reporting 1 completed tasks
12/20/2008 10:05:01 PM|GPUGRID|Scheduler request completed: got 0 new tasks
12/20/2008 10:05:01 PM|GPUGRID|Message from server: No work sent
12/20/2008 10:05:01 PM|GPUGRID|Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer.
12/20/2008 10:05:01 PM|GPUGRID|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.
12/20/2008 10:05:01 PM|GPUGRID|Message from server: (won't finish in time) BOINC runs 99.1% of time, computation enabled 100.0% of that

This on a system with 2 280's. My DCF is still going down thank my lucky stars.

Just completed a second manual update and got a WU. Gald I didn't wait the 24hrs!!

Going to let it run for 24hrs and see what errors I get.

Pat

samsausage
Send message
Joined: 18 Nov 08
Posts: 12
Credit: 70,480,919
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4679 - Posted: 21 Dec 2008 | 15:59:21 UTC

I just got a WU, it's working again for me.

Profile Nognlite
Send message
Joined: 9 Nov 08
Posts: 69
Credit: 25,106,923
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 4692 - Posted: 21 Dec 2008 | 21:59:55 UTC

Well, I don't know what is going on but here is the outcome:

12/21/2008 4:47:34 PM|GPUGRID|Sending scheduler request: To fetch work. Requesting 981948 seconds of work, reporting 0 completed tasks
12/21/2008 4:47:39 PM|GPUGRID|Scheduler request completed: got 0 new tasks
12/21/2008 4:47:39 PM|GPUGRID|Message from server: No work sent
12/21/2008 4:47:39 PM|GPUGRID|Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer.
12/21/2008 4:47:39 PM|GPUGRID|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.

This after trying a manual update. I at least have two WU processing but once again after they are done my 280's will sit idle.

Also my DCF is going up on host 16824. It's now at 4.767947. That tells me that my 280's will take 4.7 time the alotted time to complete a WU. How can this be.

Pat

Profile K1atOdessa
Send message
Joined: 25 Feb 08
Posts: 249
Credit: 422,354,314
RAC: 938,897
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4696 - Posted: 21 Dec 2008 | 22:40:00 UTC - in response to Message 4641.

Thanks for all your feedbacks. BOINC dev finally found the problem on the scheduler. I have already upgraded the server software please report if the problem is fixed.

Thanks, gdf.


My computer has been unattended for 48 hours. It just reported 3 WU's and got 4 new ones. Everything working fine for me.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4704 - Posted: 21 Dec 2008 | 23:40:07 UTC - in response to Message 4692.
Last modified: 21 Dec 2008 | 23:42:05 UTC

Also my DCF is going up on host 16824. It's now at 4.767947. That tells me that my 280's will take 4.7 time the alotted time to complete a WU. How can this be.


The DCFs are tied to the estimated WU runtimes and I think they're still messed up and are gradually changed to correct values, as an abrupt change caused other problems.
Edit @nognlite: which BOINC version?

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Nognlite
Send message
Joined: 9 Nov 08
Posts: 69
Credit: 25,106,923
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 4708 - Posted: 22 Dec 2008 | 0:24:48 UTC - in response to Message 4704.

I'm attempting to use 6.5.0 with 180.48 drivers.

Pat

Alain Maes
Send message
Joined: 8 Sep 08
Posts: 63
Credit: 1,664,330,860
RAC: 561,698
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4719 - Posted: 22 Dec 2008 | 8:55:27 UTC - in response to Message 4708.

Worked well on 21 Dec till after 2000 UTC with 4 WU for me, one running - three in wait.
Then again the dreadful .. no work... message, also after a manual update of a finished WU this morning. Down to two WU now, one to finish soon. Will check what happens then.

Kind regards and happy crunching

Alain

hostID 16551

Alain Maes
Send message
Joined: 8 Sep 08
Posts: 63
Credit: 1,664,330,860
RAC: 561,698
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4721 - Posted: 22 Dec 2008 | 9:21:55 UTC - in response to Message 4719.

Last WU finished and manual update forced download of master file and three more WUs.

Great!

Alain

Profile mike047
Send message
Joined: 21 Dec 08
Posts: 47
Credit: 7,330,049
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 4726 - Posted: 22 Dec 2008 | 13:00:12 UTC - in response to Message 4721.
Last modified: 22 Dec 2008 | 13:00:38 UTC

I have reinstalled everything fresh. 6.4.2 and 177.82.

also tried 6.4.5/180.06

I get;
"no work sent"
"Full-atom molecular dynamics for cell processor not available"

I'm not having much luck with this:(

Card is a 9600GSO and Ubuntu 6.04 os.

Have I screwed the pooch??

Any help will be appreciated and attempted when I get back from the doctor for blood work:D

mike

Profile Nognlite
Send message
Joined: 9 Nov 08
Posts: 69
Credit: 25,106,923
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 4727 - Posted: 22 Dec 2008 | 13:26:20 UTC
Last modified: 22 Dec 2008 | 13:37:35 UTC

Same here again. I'm all for progress but maybe we should go back to one size of WU and put everything back to the way it was before Dec 10th. That's when this all started.

Just noticed that if I shut down BOINCmgr and restart it 20-30sec later I get WU's. Could have a problem with releasing available GPU memory which causes the no work problem. Wasn't watching, will check next time.

Pat

frankhagen
Send message
Joined: 18 Sep 08
Posts: 65
Credit: 3,037,414
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 4731 - Posted: 22 Dec 2008 | 16:34:36 UTC - in response to Message 4727.
Last modified: 22 Dec 2008 | 16:46:03 UTC

even after a fresh clean install of 6.5.0 i still can't get work.. :(

tried again and after some dozend unsuccessfull attempts i got 4 in a row..

Profile Fish
Avatar
Send message
Joined: 7 Oct 08
Posts: 7
Credit: 2,515,001
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 4734 - Posted: 22 Dec 2008 | 16:49:10 UTC - in response to Message 4731.

Frank, have you tried the 180.84 driver yet? It seems to have solved all my 64bit issues. It's only been a day, but no problems since :) I don't know if or how a driver could have anything to do with getting work... but I have both SETI and GPUGrid on my machine.



Fish

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4736 - Posted: 22 Dec 2008 | 17:24:06 UTC

@Mike: sorry, don't know much about how to handle this in Linux. Could it have anything to do with your distro being rather *old*?

@Pat: the different WU sizes are not causing these problems. And when you close and restart BOINC manager, do you also shut down BOINC? Or just the manager? Seems like you're on of the few who can still reproduce the issue on a regular basis, so you may have the chance to find the cause ;)

MrS
____________
Scanning for our furry friends since Jan 2002

frankhagen
Send message
Joined: 18 Sep 08
Posts: 65
Credit: 3,037,414
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 4752 - Posted: 22 Dec 2008 | 19:36:21 UTC - in response to Message 4734.

Frank, have you tried the 180.84 driver yet? It seems to have solved all my 64bit issues. It's only been a day, but no problems since :)
Fish


180.84 has been running fine for some days - in the middle of PG-challenge boinc suddenly wasn't able to get fresh work from GPU.

Profile mike047
Send message
Joined: 21 Dec 08
Posts: 47
Credit: 7,330,049
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 4756 - Posted: 22 Dec 2008 | 20:17:54 UTC - in response to Message 4736.
Last modified: 22 Dec 2008 | 20:22:57 UTC

@Mike: sorry, don't know much about how to handle this in Linux. Could it have anything to do with your distro being rather *old*?

@Pat: the different WU sizes are not causing these problems. And when you close and restart BOINC manager, do you also shut down BOINC? Or just the manager? Seems like you're on of the few who can still reproduce the issue on a regular basis, so you may have the chance to find the cause ;)

MrS


6.04 is a typing mistake by me it is actually;

Ubuntu 8.04LTS isn't outdated, there is another release 8.10. Both are reliable os's.

There is some other issue, because yesterday it worked for a couple of hours.

I guess I will put this project on the back burner for awhile. Good science but difficult [for me] to set up and run with out babysitting it.

mike

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4759 - Posted: 22 Dec 2008 | 21:55:50 UTC

You're right, 8.04 is surely not causing this problem.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Nognlite
Send message
Joined: 9 Nov 08
Posts: 69
Credit: 25,106,923
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 4762 - Posted: 22 Dec 2008 | 22:06:30 UTC - in response to Message 4736.
Last modified: 22 Dec 2008 | 22:08:57 UTC

MrS:

Seems that both rigs are responding properly (knock on wood) but will keep an eye on it.

When I shut BOINCmgr down I also selected "Stop running science applications when exiting the Manager".

Pat

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4770 - Posted: 22 Dec 2008 | 23:24:43 UTC - in response to Message 4756.

Mike,
do you have installed boinc for Linux x86_64 or 32 bit?

gdf

Profile mike047
Send message
Joined: 21 Dec 08
Posts: 47
Credit: 7,330,049
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 4782 - Posted: 23 Dec 2008 | 9:34:40 UTC - in response to Message 4770.
Last modified: 23 Dec 2008 | 9:42:12 UTC

64 bit with ia32-libs

mike

edit; I just tried again and got 2 work units and it is working one. I did not change anything.....let's see how long it will work.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4810 - Posted: 23 Dec 2008 | 21:19:41 UTC - in response to Message 4782.

edit; I just tried again and got 2 work units and it is working one. I did not change anything.....let's see how long it will work.

I just got up a little bit ago and my machine had completed one task and I reported it in and got another. Just like it is supposed to work ... :)

Now, was it a one time miracle or will it repeat?

About half way through the next task and I have two in queue ... so, theory says I should be keeping about that many locally ...

Fingers crossed ...

Profile mike047
Send message
Joined: 21 Dec 08
Posts: 47
Credit: 7,330,049
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 4820 - Posted: 24 Dec 2008 | 8:52:17 UTC

My two units will run about 8 hours and then go to "waiting on memory". I re boot and then it will pick one back up and crunch????? When it goes to waiting on memory, it does a constant write to hard drive. I will try 180.06 drivers later today or after it finishes these units and see if it will behave:)

mike

Profile koschi
Avatar
Send message
Joined: 14 Aug 08
Posts: 124
Credit: 792,979,198
RAC: 799
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 4824 - Posted: 24 Dec 2008 | 12:31:24 UTC

Sounds like the memory leak in the linux app. See also:
http://www.gpugrid.net/forum_thread.php?id=571

Donnie
Send message
Joined: 13 Nov 08
Posts: 11
Credit: 11,185,470
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 4829 - Posted: 24 Dec 2008 | 17:49:42 UTC - in response to Message 4692.

It's back!!!

12/24/2008 11:47:25 AM|GPUGRID|Sending scheduler request: Requested by user. Requesting 387007 seconds of work, reporting 0 completed tasks
12/24/2008 11:47:30 AM|GPUGRID|Scheduler request completed: got 0 new tasks
12/24/2008 11:47:30 AM|GPUGRID|Message from server: No work sent
12/24/2008 11:47:30 AM|GPUGRID|Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer.
12/24/2008 11:47:30 AM|GPUGRID|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4831 - Posted: 24 Dec 2008 | 19:20:50 UTC - in response to Message 4829.

Simply server was out of work.

More workunits now and many more in the next few days.

gdf

Donnie
Send message
Joined: 13 Nov 08
Posts: 11
Credit: 11,185,470
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 4834 - Posted: 24 Dec 2008 | 23:40:28 UTC - in response to Message 4831.

My bad!!! Thanks GDF!!! I guess the next time I cry wolf, I'll check the server first. Thanks again to you and all of your staff (if any) for all of your hard & dedicated work to correct these problems and listening to all of us complain.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4836 - Posted: 25 Dec 2008 | 0:03:07 UTC

GDF,

A little cheer maybe. The one machine I am running at the moment seemed to have auto-magically obtained another task on the 24th at about 1400 and my last 2.56 task is nearing completion. So ... encouraging news and I am getting tempted to fire up the other machine and let it rip! :)

But, it is looking like, at least for me, that I am in the "normal" group now (about the only thing normal about me).

Post to thread

Message boards : Graphics cards (GPUs) : No work? Fixed a bug in the scheduler

//