More bad WUs? ------ KASHIF_HIVPR_auto_spawn

Message boards : Graphics cards (GPUs) : More bad WUs? ------ KASHIF_HIVPR_auto_spawn
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17094 - Posted: 17 May 2010, 18:17:43 UTC

Seems like many if not most of the KASHIF_HIVPR_auto_spawn WUs are failing :-(
ID: 17094 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17095 - Posted: 17 May 2010, 18:54:44 UTC - in response to Message 17094.  

Seems like many if not most of the KASHIF_HIVPR_auto_spawn WUs are failing :-(


Would you like to point to a few to back up that statement?


Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline
ID: 17095 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17096 - Posted: 17 May 2010, 19:20:06 UTC - in response to Message 17095.  

Seems like many if not most of the KASHIF_HIVPR_auto_spawn WUs are failing :-(

Would you like to point to a few to back up that statement?

Looks like they were just released today, but here's some of the results I've found so far:

http://www.gpugrid.net/workunit.php?wuid=1483784
http://www.gpugrid.net/workunit.php?wuid=1483852
http://www.gpugrid.net/workunit.php?wuid=1483846
http://www.gpugrid.net/workunit.php?wuid=1483936
http://www.gpugrid.net/workunit.php?wuid=1483879
http://www.gpugrid.net/workunit.php?wuid=1483947
http://www.gpugrid.net/workunit.php?wuid=1483953
http://www.gpugrid.net/workunit.php?wuid=1483863
http://www.gpugrid.net/workunit.php?wuid=1483862
http://www.gpugrid.net/workunit.php?wuid=1483861
http://www.gpugrid.net/workunit.php?wuid=1483787
http://www.gpugrid.net/workunit.php?wuid=1483792
http://www.gpugrid.net/workunit.php?wuid=1483799
http://www.gpugrid.net/workunit.php?wuid=1483805

Seems like whenever someone reports a problem on this forum, people get all defensive. These bad WUs are simple to find taking 5 minutes to look. They're so new that most of them have only one failure so far, but I've only found 2 that completed. I'd post a lot more but it's such a pain to add URLs on this forum...


ID: 17096 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17097 - Posted: 17 May 2010, 19:40:30 UTC - in response to Message 17096.  
Last modified: 17 May 2010, 19:42:57 UTC

Thanks for that, they all appear to have failed within a few seconds and I've got one on one of my machines that has been running over 3 hours now so we'll see what happens. it had been to someone else and failed within a few seconds there as well.

Here
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline
ID: 17097 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17098 - Posted: 17 May 2010, 19:51:13 UTC - in response to Message 17097.  

Could be that it's a big coincidence but thought I'd report it since pretty much all I was finding was failed ones.
ID: 17098 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17099 - Posted: 17 May 2010, 19:53:02 UTC - in response to Message 17096.  

Just had a quick look at 6 of those units you posted and the machines that errored are not great examples of anything other than they produce a lot of errored WU's of all types.
Couldn't be bothered to look through the whole list.



Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline
ID: 17099 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17100 - Posted: 17 May 2010, 20:02:35 UTC

The reason I noticed it at all was that the one that failed for me was on a GPU that seldom ever fails, so started looking at the results from the top RAC machines. BTW, the above comment about "defensiveness" was not aimed at you :-)
ID: 17100 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17101 - Posted: 17 May 2010, 20:12:47 UTC - in response to Message 17100.  

It's okay, It didn't bother me as I'm the last person to defend a project if they've got it wrong. :)

I'll let you know if I have any problems with these WU's.



Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline
ID: 17101 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17102 - Posted: 17 May 2010, 22:30:01 UTC - in response to Message 17101.  

I for one have had a nightmare when it comes to 6.72 WU failures.
We eventually got it sorted, well sort of - the 197.xx drivers were to blame on my Vista x64 machine with four GTX240's, but I tried so many different drivers on my GTX260sp216, that in the end I just gave up.
It is on Win7 x64 - the same as your two GTX275's!
My GTX260 is a good card, and kept failing immediately, even when natively clocked and temps under 60 deg C. I see you dripped your clocks as well, just in case.

I guess you are having a similar problem.
ID: 17102 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17104 - Posted: 17 May 2010, 22:56:32 UTC - in response to Message 17102.  

I for one have had a nightmare when it comes to 6.72 WU failures.
We eventually got it sorted, well sort of - the 197.xx drivers were to blame on my Vista x64 machine with four GTX240's, but I tried so many different drivers on my GTX260sp216, that in the end I just gave up.
It is on Win7 x64 - the same as your two GTX275's!
My GTX260 is a good card, and kept failing immediately, even when natively clocked and temps under 60 deg C. I see you dripped your clocks as well, just in case.

I guess you are having a similar problem.

Neither of us has a GTX 275, let alone 2. But if you'd like to send me a couple I'd be happy to accept them as a gift :-)

ID: 17104 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17105 - Posted: 17 May 2010, 23:48:11 UTC - in response to Message 17102.  

My 2 GT240's are working fine with that setup SK but my 2 remote machines (core2 E4300 and Dual Core E6300) are giving me problems. Why is it always remote machines you have problems with?

You started OK with your 4 GT240's don't know what the problem could be. I have mine clocked at Core 640, Memory 2000, and Shaders 1580 for my 2 GT240 GDDR5 on my Quad and Core 630, Memory 840, and Shaders 1580 on my GDDR3 machines. All clocks are as GPUZ shows them.


Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline
ID: 17105 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17109 - Posted: 18 May 2010, 6:47:04 UTC

1 success and 1 failure to date for these units "KASHIF_HIVPR_auto_spawn"

Fail

Success



Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline
ID: 17109 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Snow Crash

Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17112 - Posted: 18 May 2010, 8:47:21 UTC

My really stable 295 failed 12 of these with the following error:

ERROR: file ntnbrlist.cpp line 63: Insufficent memory available for pairlists. Set pairlistdist to match the cutoff


It also failed 2 TONI (alos 6.72) with a error:
SWAN: FATAL : swanMalloc failed

I am now working on 2 more TONIs which are OK after 2+ hours so I think they will be fine.


Thanks - Steve
ID: 17112 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17113 - Posted: 18 May 2010, 8:55:00 UTC

Task 2347063 OK with gtx470.
Ton (ftpd) Netherlands
ID: 17113 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17116 - Posted: 18 May 2010, 9:59:44 UTC - in response to Message 17104.  
Last modified: 18 May 2010, 10:18:58 UTC

Beyond, I meant GTX260. On which we were/are both having problems with 6.72 tasks under Win7.
I tried several drivers and clients but the 6.72 WUs still failed (running native).

KASHIF_HIVPR WU are also failing for me on my GT240s. This is after going back to the 19621 driver (which is still working for all TONI_HERG 6.72 WUs).
ID: 17116 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17123 - Posted: 18 May 2010, 15:18:30 UTC - in response to Message 17116.  

Beyond, I meant GTX260. On which we were/are both having problems with 6.72 tasks under Win7.
I tried several drivers and clients but the 6.72 WUs still failed (running native).

Actually if you look at that machine, it really has a GTX 260 and a GT 240. For some reason the BOINC server code lists 2 GTX 260s. I reported the problem on the BOINC Dev list. It's running XP64 (not Win7) and NV 197.45. Also it's not having problems with v6.72 WUs. There have been 86 successful, 2 failures on the GT 240 and 2 on the GTX 260. Those were my fault as I was experimenting with higher shader clocks. For me the v6.72 WUs have been FAR more reliable than v6.03. I've been following your messages about various drivers/OSes/BOINC vers with the GT 240. I have 3 of the GDDR5 models running for a long time and have had no issues with any of them with any BOINC version. They've run on XP32, XP64, Win7-32 and Win7-64 machines at various times with no problems. They've run on NV v195.62, 196.21 and v197.45 with no problems. They've run on a large variety of BOINC clients from 6.10.18 up to v6.10.45 with no problems. Will be going to v6.10.56 today on some. Have you tried pulling one of your GT 240 cards from that 4x machine yet?

KASHIF_HIVPR WU are also failing for me on my GT240s. This is after going back to the 19621 driver (which is still working for all TONI_HERG 6.72 WUs).

I notice that most of the KASHIF_HIVPR_auto_spawn WUs that I listed above are now validating with subsequent machines so I probably jumped the gun by posting this thread. Like I said before, trying to be proactive. Seems they run fine on some machines, not so fine on others.

ID: 17123 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile liveonc
Avatar

Send message
Joined: 1 Jan 10
Posts: 292
Credit: 41,567,650
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 17126 - Posted: 18 May 2010, 16:00:18 UTC
Last modified: 18 May 2010, 16:07:54 UTC

What's the temps on these failed WU's? I know that there is a need for speed & it's always nicer to get these WU's done faster. But it's "my personal opinion" that GPUGRID is becoming more & more elitist. It's Enthusiast friendly, Consumer unfriendly when WU's become so aggressive that even stock clocked GPU's are failing due to overheating. Not everybody has water cooled systems. I've moved around just about every cable to enhance the airflow of my PC's, but now that Linux is so good & Windows is so slow, I'm almost forced to stop using Linux because my GPU's are running at 80-90 degrees on Linux no matter how they're clocked or how great the airflow is. I've got 2 PC's in enclosures that aren't meant for this, & the only way to improve airflow there, is to get new enclosures.

If GPUGRID keeps on improving their WU's for Windows, I soon won't be able to use Windows either. But that's no reason not to. I'm just itching for an option to throttle down the use of GPU, as is possible to set a max CPU use of x%.
ID: 17126 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Snow Crash

Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17127 - Posted: 18 May 2010, 16:27:23 UTC - in response to Message 17126.  

Heat is not a problem for me ... the 295 that has recieved lots of errors is running a cool 75 degrees c.

The errors as I posted above appear to be memory allocation issues ... with
1896 MB shared between both cards I doubt it really is "insufficient".

Perhaps CUDA or the driver is reporting incorrectly or a bug at some internal condition? That will not be easy for the GPUGrid devs to identify buit because the errors are always the same that might point in a particular direction for investigation.

When HERG WUs fail (some do pass) it is always the same error =
"SWAN: FATAL : swanMalloc failed"

When KAHIF WUs fail (I havbe not had any success yet with them) it is always the same error =
"ERROR: file ntnbrlist.cpp line 63: Insufficent memory available for pairlists. Set pairlistdist to match the cutoff."

This is a dedicated cruncher so the only other thing going on is WCG which should not matter.
Thanks - Steve
ID: 17127 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17129 - Posted: 18 May 2010, 16:39:17 UTC - in response to Message 17126.  
Last modified: 18 May 2010, 16:41:03 UTC

What's the temps on these failed WU's? I know that there is a need for speed & it's always nicer to get these WU's done faster. But it's "my personal opinion" that GPUGRID is becoming more & more elitist. It's Enthusiast friendly, Consumer unfriendly when WU's become so aggressive that even stock clocked GPU's are failing due to overheating. Not everybody has water cooled systems. I've moved around just about every cable to enhance the airflow of my PC's, but now that Linux is so good & Windows is so slow, I'm almost forced to stop using Linux because my GPU's are running at 80-90 degrees on Linux no matter how they're clocked or how great the airflow is. I've got 2 PC's in enclosures that aren't meant for this, & the only way to improve airflow there, is to get new enclosures.

If GPUGRID keeps on improving their WU's for Windows, I soon won't be able to use Windows either. But that's no reason not to. I'm just itching for an option to throttle down the use of GPU, as is possible to set a max CPU use of x%.

Actually it looks like your Win7 GTX 260 machine is running the v6.72 WUs at a higher credit/hour rate than your similar cards in Linux. There would be even a larger difference if you were running XP. I'm currently using Win7 and XP for my GPUGRID crunching, all cards are at 54C - 64C, max GPU fan is 57%. I use Antec 300 cases which can often be had for around $50 and add two extra low to mid speed 120mm fans: 1 in the front and 1 on the side, both blowing inward. Most are running 2 GPUs. An efficient PSU is also helpful.

The XP machines are running at 93-96% GPU versus 80-85% in Win7, undoubtedly the reason that XP is markedly faster in GPUGRID. I'm hoping that the client can be changed so that Win7 will run in the 90%+ range.

I do hear what you're saying though. GPU computing of any kind produces considerable heat and the stock Dell, HP, etc. machines are not built to handle the loads. Better to build our own machines:-)
ID: 17129 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bikermatt

Send message
Joined: 8 Apr 10
Posts: 37
Credit: 4,431,457,619
RAC: 36,378
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17138 - Posted: 18 May 2010, 19:45:24 UTC

They have all failed an my system within 10 seconds. I am running BOINC 6.10.43 and 197.13 driver on 3 GT 240s.
-Matt




ID: 17138 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Graphics cards (GPUs) : More bad WUs? ------ KASHIF_HIVPR_auto_spawn

©2026 Universitat Pompeu Fabra