Recent problems for WUs on older GPUs

Message boards : Graphics cards (GPUs) : Recent problems for WUs on older GPUs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
uBronan
Avatar

Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 9906 - Posted: 17 May 2009, 12:48:21 UTC

Sadly i was not paying attention so the last one did error out again,but to be honest i was expecting it to fail also since i had to restart it 3 time in a row to start seeing progress.

I am on Win XP pro with 182.50 driver and boinc 6.6.28 , for me there was however indeed some gain with the 185.85 but i just wanted to make sure the drivers aren't the issue.

The newer driver gave a little faster finishing time the old was 20 - 27 hours and the 85 between 19 - 23 hours.

I have been trying to test the older 180.XX driver,
But it made my system unstable for some reason so i cleared out all nvidia stuff and reinstalled 182.50 whql version.

I am now going to change back the boinc to 6.5.0

ID: 9906 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Amis des Lapins]Gilloox

Send message
Joined: 21 Mar 08
Posts: 7
Credit: 24,394,688
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 9908 - Posted: 17 May 2009, 13:24:06 UTC

Thank you for link. I'opened the Web page of my pc , as for GPU 260GTX of this pc's I am with boinc 6.6.20 who satisfied me and Nvidia 182.08 on Win Xp pro64.


http://www.gpugrid.net/hosts_user.php?userid=1695

On the contrary for points over 24h00:10.000 points on GPU 260 O/C:( all GPU 280/285GTX

@+

ID: 9908 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Amis des Lapins]Gilloox

Send message
Joined: 21 Mar 08
Posts: 7
Credit: 24,394,688
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 9909 - Posted: 17 May 2009, 13:29:32 UTC

Drivers Nvidia 1XX.XX http://www.nvidia.fr/Download/Find.aspx?lang=fr
ID: 9909 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9910 - Posted: 17 May 2009, 15:11:06 UTC - in response to Message 9908.  

I am with boinc 6.6.20 who satisfied me


Except for the fact that some of your tasks take longer than they should?

MrS
Scanning for our furry friends since Jan 2002
ID: 9910 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9912 - Posted: 17 May 2009, 15:27:21 UTC - in response to Message 9903.  
Last modified: 17 May 2009, 15:31:13 UTC

I really gotta call that electrician to change my old 3 phase 230 V UPS socket into a 30A 115 supply...


Do you think that's a good idea? I don't know your 230V, but at 115V the power supplies loose efficiency compared to 230V. 30A @ 115V is 3.5kW, quite massive :D
I know we can draw at least 2kW over the regular 230V, whereas I heard the US net may deliver something around 1.5kW at 110V. Our 3 phase plugs are 380V and I think you can get 5 - 6 kW from them.. but you're not talking about these, right?

Yes it does, the problem is that to get a 230V UPS is about twice as expensive as a normal one ... the lat time I looked to get one about the size I would need would be about 3K ...

The problem is that I can tell that I am pulling way high on the circuits in use ... if I change to another dedicated line, well, then I can leave some on the current room sockets and the rest on the dedicated line.

The only point of the exercise is to get more power to the room ... I think adding new GPUs is pushing me up to the line again ... at least I got rid of the power hungry systems that were slower than dirt.

In a month or so I will likely get an upgrade card to replace the 9800GT though I will likely keep it in the closet for that time when I upgrade to wider MB and might need a slot filler ...
ID: 9912 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9915 - Posted: 17 May 2009, 17:05:16 UTC - in response to Message 9912.  

OK, except cost there's nothing to argue against a dedicated line :)

MrS
Scanning for our furry friends since Jan 2002
ID: 9915 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Amis des Lapins]Gilloox

Send message
Joined: 21 Mar 08
Posts: 7
Credit: 24,394,688
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 9918 - Posted: 17 May 2009, 18:20:11 UTC - in response to Message 9910.  
Last modified: 17 May 2009, 18:57:45 UTC

Yes really 84000s instead of 42000s for 14-KASHIF_HIVPR_dim_ba3-8-100-RND7871_1

http://www.gpugrid.net/result.php?Resultid=680472
ID: 9918 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9921 - Posted: 17 May 2009, 19:29:15 UTC - in response to Message 9918.  

OK, to put it more clear: you don't like the long runtime, but you say 6.6.18/20 satisfied you. The post I linked to says that the long runtime is caused by an error in 6.6.20 and some previous clients. So something doesn't add up and you may want to up-/ or downgrade ;)

MrS
Scanning for our furry friends since Jan 2002
ID: 9921 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Amis des Lapins]Gilloox

Send message
Joined: 21 Mar 08
Posts: 7
Credit: 24,394,688
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 9928 - Posted: 17 May 2009, 20:45:20 UTC
Last modified: 17 May 2009, 20:50:33 UTC

I am crossed has 6.6.28 boinc It is possible that Seti beta is responsble of this probleme. Thanks for your help, I keep posted PS3GRID about suite.
ID: 9928 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Aardvark
Avatar

Send message
Joined: 27 Nov 08
Posts: 28
Credit: 82,362,324
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 9932 - Posted: 17 May 2009, 22:47:37 UTC - in response to Message 9921.  

I rolled back my drivers from 185.85 to 182.50. With windows Vista 64 bit, Boinc client 6.6.28. Since which I have returned three successfull results, one of which had run for 30 hours on one core of my 9800 GX2 and gave me just over 10,000 credits :-)
So at present this role back on the driver is working for me (touch wood).

I also rolled back the driver on my other machine from 185.85 to 180.48.With windows Vista 32 bit, Boinc client 6.6.20 (Yes, I know :-) ). This has so far returned one result, plus another well on its way. I realise that neither of these is a large sample. But looks promising given the quantity of failures I had seen just prior to changeing drivers.

I will now leave alone for a few days and see how things turn out.
ID: 9932 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9933 - Posted: 18 May 2009, 1:03:27 UTC

I am finding it hard to tell what is going on... I seem to be getting tasks our of order so that they don't sort well on the results pages. As I watch the computers they seem to be returning mostly good results ... with occasional errors.

Well, I guess I will have to wait till Monday when the staff comes back in and fixes the universe ... :)
ID: 9933 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9937 - Posted: 18 May 2009, 8:46:07 UTC

I still cannot make heads nor tails of the pattern of errors. One of the problems of course is the difficulty of gathering data about the failures.

Some of the older tasks that failed on one of my systems passes on another system that is very much alike. I thought I was onto something about memory size where some of my cards have that 895 instead of 1G and the tasks passed on the 1G cards. Alas, I quickly found another case where it failed on mine and passed on someone else's card and they too had only 895 M VRAM.

Driver versions 182.50 on my systems failed, but the systems where the task passed also were running the same version.

The tasks are of all name classes...

Even my i7 with the pair of GTX295 cards finally had [url-http://www.gpugrid.net/result.php?resultid=685755]one fail[/url], the message is singularly unhelpful.
ID: 9937 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MarkJ
Volunteer moderator
Volunteer tester

Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9938 - Posted: 18 May 2009, 8:55:14 UTC

I upgraded just the drivers on all my machines to 185.85. I had a couple of machines start getting errors.

Interestingly Seti doesn't get errors with their app. However as i'm using an app_info for them I dropped in the latest DLL's. It may just be their app is more compatible or maybe the combination of current driver with cuda 2.2 DLL's that make it work.

Has anyone tried updating the DLL's and see if that cures the problem? The only way I could see to do this is to setup an app_info so that you don't get issues with the file signatures.

I'll downgrade Maul (it has 2 x GTX260's) to 182.50 once its knocked over its current cuda work. At least it can get back to being productive while this issue gets worked out. My other machines can concentrate on Seti for a while.
BOINC blog
ID: 9938 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
uBronan
Avatar

Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 9942 - Posted: 18 May 2009, 13:19:05 UTC - in response to Message 9928.  
Last modified: 18 May 2009, 13:27:35 UTC

I am crossed has 6.6.28 boinc It is possible that Seti beta is responsble of this probleme. Thanks for your help, I keep posted PS3GRID about suite.


Believe me you don't want to run seti beta together with other projects.

Seti itself has been crashing my gpugrid units also but sometimes runned without problems seti seems only to use cuda 1.0 instructions with no optimisations if you don't use the optimized ones.
The optimized kwsn application has caused me failures on gpugrid as well.

But thats probably because seti was being running together in the same time as gpugrid while i have only 1 cuda device.

I advice you not to use seti and gpugrid at the same time it has been known to me to crash many units.
Although sometimes it looks like nothing is wrong i found some units keep the memory locked so when some units are finished the ram is not released properly causing other projects (gpugrid) to error out.

Another one which is gonna give you problems together with gpugrid can be CPDN which has units which eat up to at least 1,5 GB memory, so that meant for me 4 units with 1,5 GB minimal gave me a load of 7,2 GB ram memory being used :D
Now believe me that makes trouble, if i had booted under win 64 i prolly could run them since i have 8 gb memory.
But since i run 32 bits windows it only uses 3,2 GB.

Have anyone tried to use updated dll's


Believe me i tried all combinations of drivers, boinc and cuda versions.

Everytime same result in the end some units simply crash, even when babysitting them they seem to know when i am busy doing other tasks and crash ;)
So it looks to me that if a unit gets locked it will die if you are not in time to pause and restart the unit to work.
I mean by that: The unit is locked at x,xxx % for a at least an hour if it does move the % you can try the pause/restart trick but some units will still crash no matter what i do.
Now make sure not to restart it too quick after is started again because that will surely crash it also !!
ID: 9942 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 9958 - Posted: 18 May 2009, 22:37:42 UTC - in response to Message 9942.  

we are running this set of workunits called

x-GIANNI_newFB-...

If they go on ok, then we have isolated the problem with G90 chips. It is not solved yet but still at least we would know where to look.

gdf
ID: 9958 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Zydor

Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 9961 - Posted: 19 May 2009, 9:19:15 UTC - in response to Message 9942.  

The CPDN memory limit of 1.5Gb is set that way to allow for four running on a quad, and enough left over for op sys etc within the quoted figure of 1.5Gb. Each of the larger CPDN WUs takes up 210-220Mb in memory, therefore four of them will eat around 850Mb, with a comfortable margin for opsys etc, within the stated 1.5Gb. Its not 1.5Gb each WU, that figure they state as advisory, is total memory on the PC, not per WU.

Most CPDN models are much smaller - either side of 100Mb - albeit on the larger size than most BOINC WUs. I have happily run four of the biggest ones on my quad without issues and GPUGRID on the 9800GTX+. Usually I have two of the bigger CPDN ones running with two SETI Astropulse on the quad and GPUGRID on the 9800GTX+, they run fine with no issues.

Regards
Zy
ID: 9961 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 9964 - Posted: 19 May 2009, 11:24:57 UTC - in response to Message 9961.  

We should be able to test a fix by tomorrow.
It's a test, as the problem is not completely understood.

gdf
ID: 9964 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
uBronan
Avatar

Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 9967 - Posted: 19 May 2009, 13:20:26 UTC
Last modified: 19 May 2009, 13:24:15 UTC

Zydor did you actually select the big units on your account page since by default they are not loaded, you really should read what it is stated there.
Those units are minimal 1,5 Gb of memory nothing else the warning is clear >.<

Sadly i have forgotten to take a screenshot when i was running 4 of those biggest units at the same time.

Ofcourse i think the change that you get 1 or even more then 2 of those big ones is very small.

I have not seen recently any of those big units, so it could have been a freak moment that i received 4 of those big units at once. It also can be that these huge units are only send to x64 machines i have not been following up any news about them.

The only thing i can say with you running seti and gpugrid together that in my case it ended up several times with crashing my pc or the unit, but again i was having more problems with the seti beta then with the normal seti.

If it does not happen to you does not mean other people can be so lucky that all goes well.
ID: 9967 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9969 - Posted: 19 May 2009, 15:10:24 UTC

I just had a case where a suspended CPDN task caused two GPU Grid tasks to go into waiting for memory state. I had to stop BOINC and restart it so that the CPDN task (only 300K) would be swapped out ...

As usual i reported it so that there is another bug for UCB to ignore ... :)
ID: 9969 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Zydor

Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 9981 - Posted: 19 May 2009, 21:32:48 UTC - in response to Message 9967.  
Last modified: 19 May 2009, 21:36:08 UTC

Task Manager reported the larger ones taking up 220Mb min and did go to 400Mb at times, and four did run fine. You can get four by setting preferences for only those units.

I thought the same as you re the 1.5Gb, but also thought it strange they would produce one that size even these days, it would cripple many PCs, not a good thing for general release.

I therefore checked it out with CPDN, the response was they take up at the most 500Mb, and four of them would fit on a PC with 3Gb with no issues. When I ran the four, Task Manager reported either side of 220-400Mb in use, may well have gone to 500Mb when I was not watching, didnt log it. The post and respones is :

http://climateprediction.net/board/viewtopic.php?f=21&t=8675

I had no doubts you had issues running the two. Its also the case that often responses on success of combinations can produce as much info to help debug as failures, as a comparitor can help isolate an issue. I've often cursed when something hasnt worked, then scratched my head when I discovered others were having some success - helped me.

Regards
Zy
ID: 9981 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Graphics cards (GPUs) : Recent problems for WUs on older GPUs

©2025 Universitat Pompeu Fabra