Message boards :
Graphics cards (GPUs) :
More bad WUs? ------ KASHIF_HIVPR_auto_spawn
Message board moderation
| Author | Message |
|---|---|
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Seems like many if not most of the KASHIF_HIVPR_auto_spawn WUs are failing :-( |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Seems like many if not most of the KASHIF_HIVPR_auto_spawn WUs are failing :-( Would you like to point to a few to back up that statement? Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Seems like many if not most of the KASHIF_HIVPR_auto_spawn WUs are failing :-( Looks like they were just released today, but here's some of the results I've found so far: http://www.gpugrid.net/workunit.php?wuid=1483784 http://www.gpugrid.net/workunit.php?wuid=1483852 http://www.gpugrid.net/workunit.php?wuid=1483846 http://www.gpugrid.net/workunit.php?wuid=1483936 http://www.gpugrid.net/workunit.php?wuid=1483879 http://www.gpugrid.net/workunit.php?wuid=1483947 http://www.gpugrid.net/workunit.php?wuid=1483953 http://www.gpugrid.net/workunit.php?wuid=1483863 http://www.gpugrid.net/workunit.php?wuid=1483862 http://www.gpugrid.net/workunit.php?wuid=1483861 http://www.gpugrid.net/workunit.php?wuid=1483787 http://www.gpugrid.net/workunit.php?wuid=1483792 http://www.gpugrid.net/workunit.php?wuid=1483799 http://www.gpugrid.net/workunit.php?wuid=1483805 Seems like whenever someone reports a problem on this forum, people get all defensive. These bad WUs are simple to find taking 5 minutes to look. They're so new that most of them have only one failure so far, but I've only found 2 that completed. I'd post a lot more but it's such a pain to add URLs on this forum... |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thanks for that, they all appear to have failed within a few seconds and I've got one on one of my machines that has been running over 3 hours now so we'll see what happens. it had been to someone else and failed within a few seconds there as well. Here Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Could be that it's a big coincidence but thought I'd report it since pretty much all I was finding was failed ones. |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Just had a quick look at 6 of those units you posted and the machines that errored are not great examples of anything other than they produce a lot of errored WU's of all types. Couldn't be bothered to look through the whole list. Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The reason I noticed it at all was that the one that failed for me was on a GPU that seldom ever fails, so started looking at the results from the top RAC machines. BTW, the above comment about "defensiveness" was not aimed at you :-) |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It's okay, It didn't bother me as I'm the last person to defend a project if they've got it wrong. :) I'll let you know if I have any problems with these WU's. Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I for one have had a nightmare when it comes to 6.72 WU failures. We eventually got it sorted, well sort of - the 197.xx drivers were to blame on my Vista x64 machine with four GTX240's, but I tried so many different drivers on my GTX260sp216, that in the end I just gave up. It is on Win7 x64 - the same as your two GTX275's! My GTX260 is a good card, and kept failing immediately, even when natively clocked and temps under 60 deg C. I see you dripped your clocks as well, just in case. I guess you are having a similar problem. |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I for one have had a nightmare when it comes to 6.72 WU failures. Neither of us has a GTX 275, let alone 2. But if you'd like to send me a couple I'd be happy to accept them as a gift :-) |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My 2 GT240's are working fine with that setup SK but my 2 remote machines (core2 E4300 and Dual Core E6300) are giving me problems. Why is it always remote machines you have problems with? You started OK with your 4 GT240's don't know what the problem could be. I have mine clocked at Core 640, Memory 2000, and Shaders 1580 for my 2 GT240 GDDR5 on my Quad and Core 630, Memory 840, and Shaders 1580 on my GDDR3 machines. All clocks are as GPUZ shows them. Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
1 success and 1 failure to date for these units "KASHIF_HIVPR_auto_spawn" Fail Success Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline |
|
Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My really stable 295 failed 12 of these with the following error: ERROR: file ntnbrlist.cpp line 63: Insufficent memory available for pairlists. Set pairlistdist to match the cutoff It also failed 2 TONI (alos 6.72) with a error: SWAN: FATAL : swanMalloc failed I am now working on 2 more TONIs which are OK after 2+ hours so I think they will be fine. Thanks - Steve |
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Task 2347063 OK with gtx470. Ton (ftpd) Netherlands |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Beyond, I meant GTX260. On which we were/are both having problems with 6.72 tasks under Win7. I tried several drivers and clients but the 6.72 WUs still failed (running native). KASHIF_HIVPR WU are also failing for me on my GT240s. This is after going back to the 19621 driver (which is still working for all TONI_HERG 6.72 WUs). |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Beyond, I meant GTX260. On which we were/are both having problems with 6.72 tasks under Win7. Actually if you look at that machine, it really has a GTX 260 and a GT 240. For some reason the BOINC server code lists 2 GTX 260s. I reported the problem on the BOINC Dev list. It's running XP64 (not Win7) and NV 197.45. Also it's not having problems with v6.72 WUs. There have been 86 successful, 2 failures on the GT 240 and 2 on the GTX 260. Those were my fault as I was experimenting with higher shader clocks. For me the v6.72 WUs have been FAR more reliable than v6.03. I've been following your messages about various drivers/OSes/BOINC vers with the GT 240. I have 3 of the GDDR5 models running for a long time and have had no issues with any of them with any BOINC version. They've run on XP32, XP64, Win7-32 and Win7-64 machines at various times with no problems. They've run on NV v195.62, 196.21 and v197.45 with no problems. They've run on a large variety of BOINC clients from 6.10.18 up to v6.10.45 with no problems. Will be going to v6.10.56 today on some. Have you tried pulling one of your GT 240 cards from that 4x machine yet? KASHIF_HIVPR WU are also failing for me on my GT240s. This is after going back to the 19621 driver (which is still working for all TONI_HERG 6.72 WUs). I notice that most of the KASHIF_HIVPR_auto_spawn WUs that I listed above are now validating with subsequent machines so I probably jumped the gun by posting this thread. Like I said before, trying to be proactive. Seems they run fine on some machines, not so fine on others. |
liveoncSend message Joined: 1 Jan 10 Posts: 292 Credit: 41,567,650 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
What's the temps on these failed WU's? I know that there is a need for speed & it's always nicer to get these WU's done faster. But it's "my personal opinion" that GPUGRID is becoming more & more elitist. It's Enthusiast friendly, Consumer unfriendly when WU's become so aggressive that even stock clocked GPU's are failing due to overheating. Not everybody has water cooled systems. I've moved around just about every cable to enhance the airflow of my PC's, but now that Linux is so good & Windows is so slow, I'm almost forced to stop using Linux because my GPU's are running at 80-90 degrees on Linux no matter how they're clocked or how great the airflow is. I've got 2 PC's in enclosures that aren't meant for this, & the only way to improve airflow there, is to get new enclosures. If GPUGRID keeps on improving their WU's for Windows, I soon won't be able to use Windows either. But that's no reason not to. I'm just itching for an option to throttle down the use of GPU, as is possible to set a max CPU use of x%.
|
|
Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Heat is not a problem for me ... the 295 that has recieved lots of errors is running a cool 75 degrees c. The errors as I posted above appear to be memory allocation issues ... with 1896 MB shared between both cards I doubt it really is "insufficient". Perhaps CUDA or the driver is reporting incorrectly or a bug at some internal condition? That will not be easy for the GPUGrid devs to identify buit because the errors are always the same that might point in a particular direction for investigation. When HERG WUs fail (some do pass) it is always the same error = "SWAN: FATAL : swanMalloc failed" When KAHIF WUs fail (I havbe not had any success yet with them) it is always the same error = "ERROR: file ntnbrlist.cpp line 63: Insufficent memory available for pairlists. Set pairlistdist to match the cutoff." This is a dedicated cruncher so the only other thing going on is WCG which should not matter. Thanks - Steve |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
What's the temps on these failed WU's? I know that there is a need for speed & it's always nicer to get these WU's done faster. But it's "my personal opinion" that GPUGRID is becoming more & more elitist. It's Enthusiast friendly, Consumer unfriendly when WU's become so aggressive that even stock clocked GPU's are failing due to overheating. Not everybody has water cooled systems. I've moved around just about every cable to enhance the airflow of my PC's, but now that Linux is so good & Windows is so slow, I'm almost forced to stop using Linux because my GPU's are running at 80-90 degrees on Linux no matter how they're clocked or how great the airflow is. I've got 2 PC's in enclosures that aren't meant for this, & the only way to improve airflow there, is to get new enclosures. Actually it looks like your Win7 GTX 260 machine is running the v6.72 WUs at a higher credit/hour rate than your similar cards in Linux. There would be even a larger difference if you were running XP. I'm currently using Win7 and XP for my GPUGRID crunching, all cards are at 54C - 64C, max GPU fan is 57%. I use Antec 300 cases which can often be had for around $50 and add two extra low to mid speed 120mm fans: 1 in the front and 1 on the side, both blowing inward. Most are running 2 GPUs. An efficient PSU is also helpful. The XP machines are running at 93-96% GPU versus 80-85% in Win7, undoubtedly the reason that XP is markedly faster in GPUGRID. I'm hoping that the client can be changed so that Win7 will run in the 90%+ range. I do hear what you're saying though. GPU computing of any kind produces considerable heat and the stock Dell, HP, etc. machines are not built to handle the loads. Better to build our own machines:-) |
BikermattSend message Joined: 8 Apr 10 Posts: 37 Credit: 4,431,457,619 RAC: 36,378 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
They have all failed an my system within 10 seconds. I am running BOINC 6.10.43 and 197.13 driver on 3 GT 240s. -Matt |
©2026 Universitat Pompeu Fabra