Message boards :
Graphics cards (GPUs) :
Recent problems for WUs on older GPUs
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
| Author | Message |
|---|---|
|
Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Sadly i was not paying attention so the last one did error out again,but to be honest i was expecting it to fail also since i had to restart it 3 time in a row to start seeing progress. I am on Win XP pro with 182.50 driver and boinc 6.6.28 , for me there was however indeed some gain with the 185.85 but i just wanted to make sure the drivers aren't the issue. The newer driver gave a little faster finishing time the old was 20 - 27 hours and the 85 between 19 - 23 hours. I have been trying to test the older 180.XX driver, But it made my system unstable for some reason so i cleared out all nvidia stuff and reinstalled 182.50 whql version. I am now going to change back the boinc to 6.5.0 |
[AF>Amis des Lapins]GillooxSend message Joined: 21 Mar 08 Posts: 7 Credit: 24,394,688 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thank you for link. I'opened the Web page of my pc , as for GPU 260GTX of this pc's I am with boinc 6.6.20 who satisfied me and Nvidia 182.08 on Win Xp pro64. http://www.gpugrid.net/hosts_user.php?userid=1695 On the contrary for points over 24h00:10.000 points on GPU 260 O/C:( all GPU 280/285GTX @+ |
[AF>Amis des Lapins]GillooxSend message Joined: 21 Mar 08 Posts: 7 Credit: 24,394,688 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Drivers Nvidia 1XX.XX http://www.nvidia.fr/Download/Find.aspx?lang=fr |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I am with boinc 6.6.20 who satisfied me Except for the fact that some of your tasks take longer than they should? MrS Scanning for our furry friends since Jan 2002 |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I really gotta call that electrician to change my old 3 phase 230 V UPS socket into a 30A 115 supply... Yes it does, the problem is that to get a 230V UPS is about twice as expensive as a normal one ... the lat time I looked to get one about the size I would need would be about 3K ... The problem is that I can tell that I am pulling way high on the circuits in use ... if I change to another dedicated line, well, then I can leave some on the current room sockets and the rest on the dedicated line. The only point of the exercise is to get more power to the room ... I think adding new GPUs is pushing me up to the line again ... at least I got rid of the power hungry systems that were slower than dirt. In a month or so I will likely get an upgrade card to replace the 9800GT though I will likely keep it in the closet for that time when I upgrade to wider MB and might need a slot filler ... |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
OK, except cost there's nothing to argue against a dedicated line :) MrS Scanning for our furry friends since Jan 2002 |
[AF>Amis des Lapins]GillooxSend message Joined: 21 Mar 08 Posts: 7 Credit: 24,394,688 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Yes really 84000s instead of 42000s for 14-KASHIF_HIVPR_dim_ba3-8-100-RND7871_1 http://www.gpugrid.net/result.php?Resultid=680472 |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
OK, to put it more clear: you don't like the long runtime, but you say 6.6.18/20 satisfied you. The post I linked to says that the long runtime is caused by an error in 6.6.20 and some previous clients. So something doesn't add up and you may want to up-/ or downgrade ;) MrS Scanning for our furry friends since Jan 2002 |
[AF>Amis des Lapins]GillooxSend message Joined: 21 Mar 08 Posts: 7 Credit: 24,394,688 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I am crossed has 6.6.28 boinc It is possible that Seti beta is responsble of this probleme. Thanks for your help, I keep posted PS3GRID about suite. |
AardvarkSend message Joined: 27 Nov 08 Posts: 28 Credit: 82,362,324 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I rolled back my drivers from 185.85 to 182.50. With windows Vista 64 bit, Boinc client 6.6.28. Since which I have returned three successfull results, one of which had run for 30 hours on one core of my 9800 GX2 and gave me just over 10,000 credits :-) So at present this role back on the driver is working for me (touch wood). I also rolled back the driver on my other machine from 185.85 to 180.48.With windows Vista 32 bit, Boinc client 6.6.20 (Yes, I know :-) ). This has so far returned one result, plus another well on its way. I realise that neither of these is a large sample. But looks promising given the quantity of failures I had seen just prior to changeing drivers. I will now leave alone for a few days and see how things turn out. |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I am finding it hard to tell what is going on... I seem to be getting tasks our of order so that they don't sort well on the results pages. As I watch the computers they seem to be returning mostly good results ... with occasional errors. Well, I guess I will have to wait till Monday when the staff comes back in and fixes the universe ... :) |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I still cannot make heads nor tails of the pattern of errors. One of the problems of course is the difficulty of gathering data about the failures. Some of the older tasks that failed on one of my systems passes on another system that is very much alike. I thought I was onto something about memory size where some of my cards have that 895 instead of 1G and the tasks passed on the 1G cards. Alas, I quickly found another case where it failed on mine and passed on someone else's card and they too had only 895 M VRAM. Driver versions 182.50 on my systems failed, but the systems where the task passed also were running the same version. The tasks are of all name classes... Even my i7 with the pair of GTX295 cards finally had [url-http://www.gpugrid.net/result.php?resultid=685755]one fail[/url], the message is singularly unhelpful. |
|
Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I upgraded just the drivers on all my machines to 185.85. I had a couple of machines start getting errors. Interestingly Seti doesn't get errors with their app. However as i'm using an app_info for them I dropped in the latest DLL's. It may just be their app is more compatible or maybe the combination of current driver with cuda 2.2 DLL's that make it work. Has anyone tried updating the DLL's and see if that cures the problem? The only way I could see to do this is to setup an app_info so that you don't get issues with the file signatures. I'll downgrade Maul (it has 2 x GTX260's) to 182.50 once its knocked over its current cuda work. At least it can get back to being productive while this issue gets worked out. My other machines can concentrate on Seti for a while. BOINC blog |
|
Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I am crossed has 6.6.28 boinc It is possible that Seti beta is responsble of this probleme. Thanks for your help, I keep posted PS3GRID about suite. Believe me you don't want to run seti beta together with other projects. Seti itself has been crashing my gpugrid units also but sometimes runned without problems seti seems only to use cuda 1.0 instructions with no optimisations if you don't use the optimized ones. The optimized kwsn application has caused me failures on gpugrid as well. But thats probably because seti was being running together in the same time as gpugrid while i have only 1 cuda device. I advice you not to use seti and gpugrid at the same time it has been known to me to crash many units. Although sometimes it looks like nothing is wrong i found some units keep the memory locked so when some units are finished the ram is not released properly causing other projects (gpugrid) to error out. Another one which is gonna give you problems together with gpugrid can be CPDN which has units which eat up to at least 1,5 GB memory, so that meant for me 4 units with 1,5 GB minimal gave me a load of 7,2 GB ram memory being used :D Now believe me that makes trouble, if i had booted under win 64 i prolly could run them since i have 8 gb memory. But since i run 32 bits windows it only uses 3,2 GB. Have anyone tried to use updated dll's Believe me i tried all combinations of drivers, boinc and cuda versions. Everytime same result in the end some units simply crash, even when babysitting them they seem to know when i am busy doing other tasks and crash ;) So it looks to me that if a unit gets locked it will die if you are not in time to pause and restart the unit to work. I mean by that: The unit is locked at x,xxx % for a at least an hour if it does move the % you can try the pause/restart trick but some units will still crash no matter what i do. Now make sure not to restart it too quick after is started again because that will surely crash it also !! |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
we are running this set of workunits called x-GIANNI_newFB-... If they go on ok, then we have isolated the problem with G90 chips. It is not solved yet but still at least we would know where to look. gdf |
ZydorSend message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
The CPDN memory limit of 1.5Gb is set that way to allow for four running on a quad, and enough left over for op sys etc within the quoted figure of 1.5Gb. Each of the larger CPDN WUs takes up 210-220Mb in memory, therefore four of them will eat around 850Mb, with a comfortable margin for opsys etc, within the stated 1.5Gb. Its not 1.5Gb each WU, that figure they state as advisory, is total memory on the PC, not per WU. Most CPDN models are much smaller - either side of 100Mb - albeit on the larger size than most BOINC WUs. I have happily run four of the biggest ones on my quad without issues and GPUGRID on the 9800GTX+. Usually I have two of the bigger CPDN ones running with two SETI Astropulse on the quad and GPUGRID on the 9800GTX+, they run fine with no issues. Regards Zy |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
We should be able to test a fix by tomorrow. It's a test, as the problem is not completely understood. gdf |
|
Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Zydor did you actually select the big units on your account page since by default they are not loaded, you really should read what it is stated there. Those units are minimal 1,5 Gb of memory nothing else the warning is clear >.< Sadly i have forgotten to take a screenshot when i was running 4 of those biggest units at the same time. Ofcourse i think the change that you get 1 or even more then 2 of those big ones is very small. I have not seen recently any of those big units, so it could have been a freak moment that i received 4 of those big units at once. It also can be that these huge units are only send to x64 machines i have not been following up any news about them. The only thing i can say with you running seti and gpugrid together that in my case it ended up several times with crashing my pc or the unit, but again i was having more problems with the seti beta then with the normal seti. If it does not happen to you does not mean other people can be so lucky that all goes well. |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I just had a case where a suspended CPDN task caused two GPU Grid tasks to go into waiting for memory state. I had to stop BOINC and restart it so that the CPDN task (only 300K) would be swapped out ... As usual i reported it so that there is another bug for UCB to ignore ... :) |
ZydorSend message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Task Manager reported the larger ones taking up 220Mb min and did go to 400Mb at times, and four did run fine. You can get four by setting preferences for only those units. I thought the same as you re the 1.5Gb, but also thought it strange they would produce one that size even these days, it would cripple many PCs, not a good thing for general release. I therefore checked it out with CPDN, the response was they take up at the most 500Mb, and four of them would fit on a PC with 3Gb with no issues. When I ran the four, Task Manager reported either side of 220-400Mb in use, may well have gone to 500Mb when I was not watching, didnt log it. The post and respones is : http://climateprediction.net/board/viewtopic.php?f=21&t=8675 I had no doubts you had issues running the two. Its also the case that often responses on success of combinations can produce as much info to help debug as failures, as a comparitor can help isolate an issue. I've often cursed when something hasnt worked, then scratched my head when I discovered others were having some success - helped me. Regards Zy |
©2025 Universitat Pompeu Fabra