More bad WUs? ------ KASHIF_HIVPR_auto_spawn

Message boards : Graphics cards (GPUs) : More bad WUs? ------ KASHIF_HIVPR_auto_spawn
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17139 - Posted: 18 May 2010, 19:51:36 UTC

Just finished one of these on a Fermi (470) - 285-KASHIF_HIVPR_auto_spawn_2_90_ba1-0-100-RND3939_2.

Two errors before mine, on a GT 240M and a GTX 295, which possibly suggests that they're tough. Also, I saw a lot more screen freezing while it was running, for up to 10 seconds at a time: made the computer almost unusable. I'm running with Swan_Sync=0, and BOINC restricted to 7 out of 8 cores.

BTW, although this is a "stock Dell", I don't think cooling is going to be the issue. It's a Precision 490 workstation, with two factory-fitted front case fans, and even a separate dinky little fan angled onto the RAM. Eeven though it's three and a half years old, it took Windows 7 and the Fermi with no problems at all.
ID: 17139 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17140 - Posted: 18 May 2010, 20:04:30 UTC - in response to Message 17139.  
Last modified: 18 May 2010, 20:29:20 UTC

This is my present situation:

My GTX470 is working well on XP x86. It worked slowly on Win7, and as far as I know this is the case for everyone using a Fermi’s on Win7.

My GTX260sp216 works well for 6.03 tasks on Win7 x64, but will not crunch any 6.72 WU’s of any kind - Tried many versions of Boinc and many drivers. It is a solid card and has been working well for many months on many other WUs and in several systems. I will move that card into a different system at some stage; when the 6.03 WU’s run out. I expect a much earlier driver would do the trick, but I have had enough with drivers for one week.

My single GT240 systems are reasonably stable. These are on XP, and Win7.
One Win 7 card occasionally drops the shaders to 400MHz, and then misses the bonus deadline (working on that), but I did solve the intermittent connection problem it had; It is networked using a wireless USB dongle, and the system very occasionally disabled the USB ports in some sort of power saving effort (apparently randomly and despite high performance power mode being selected). It took a Bios update and a re-configuration of the advanced Power Saving features (USB) – so much for plug and play!
This Win7 card (and the other) can crunch the TONY_HERG and IBUCH 6.72 WU’s but not the KASHIF_HIVPR 6.72 WU’s (using 196.34 & 196.21).

The XP x86 card crunches everything perfectly (197.45, Boinc 6.10.51).

My four card system is doing reasonably well on Vista x64:
It crunches the TONI_HERG 6.72 WU’s very well, no failures with these since 6.10.21 went on. However, it fails ALL the KASHIF_HIVPR 6.72 WU’s (usually in about 8seconds).
Using Boinc 6.10.56. My RAC with that system is 54K and rising towards the potential 70K. Pulling a card won’t make any difference; I have already demonstrated that the system can crunch 6.72 WU’s and that the initial problem was the driver, and not power. Perhaps this is a different driver issue or a WU issue. Perhaps the KASHIF_HIVPR 6.72 WU’s just don’t like any of my XFX GT240 DDR5 cards, or my Gigabyte DDR5 card or my Gigabyte DDR3 card, or my GTX260?
I’m getting the impression that Vista & Win7 don’t like KASHIF_HIVPR 6.72 WU’s.

My temperatures are all fine:
GT240’s all below 65 deg C.
GTX260 60 deg C (native)
GTX470 is about 78 deg C, but is OC’d.

liveonc, if you can, try removing a back plate close to the GPU, it sometimes helps a bit. Could you turn the GPU fan speed up?

I also doubt that 475MB RAM is not enough to run the tasks. I suspect that the Boinc code is looking for exactly 512MB video RAM, and the latest drivers are reporting less. As this only seems to be the case with Vista & Win 7, I suspect the drivers are specifically allocating this to Aero in order to prevent other applications trying to use it and crashing apps and systems. The code might be detecting an error in the difference between two reported values from the drivers. Perhaps the drivers are saying this is what the card has and then this is what is available, and as these don’t match (Aero or other) Boinc reports that some of the RAM is erroneous and ends the tasks early. ???
ID: 17140 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 17144 - Posted: 18 May 2010, 20:55:56 UTC - in response to Message 17140.  

skgiven,

BOINC 6.10.51 will randomly stop running tasks on your GPUs, it also has a memory leak... move up to 6.10.56 which has neither of these problems ... don't know if it will cure any of the issues you are having for sure, but it cannot hurt to use the better version ... all the versions between 6.10.45 and 6.10.55 have these two issues at the very least ... I know, I tried most of them ...
ID: 17144 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17204 - Posted: 21 May 2010, 15:25:32 UTC - in response to Message 17123.  

KASHIF_HIVPR WU are also failing for me on my GT240s. This is after going back to the 19621 driver (which is still working for all TONI_HERG 6.72 WUs).

I notice that most of the KASHIF_HIVPR_auto_spawn WUs that I listed above are now validating with subsequent machines so I probably jumped the gun by posting this thread. Like I said before, trying to be proactive. Seems they run fine on some machines, not so fine on others.

An update on the original topic. The KASHIF_HIVPR_auto_spawn WUs are running fine on my GTX 260 but so far have not worked on any of my GT 240 cards. The theory posted above by Richard that these WUs are "tough" may be the answer. Has anyone had success with them on a GT 240?
ID: 17204 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17212 - Posted: 21 May 2010, 20:04:08 UTC

This is what I'm getting on every KASHIF_HIVPR_auto_spawn on any GT 240, doesn't matter what driver or OS:


<core_client_version>6.10.56</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GT 240"
# Clock rate: 1.55 GHz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 12
# Number of cores: 96
ERROR: file ntnbrlist.cpp line 63: Insufficent memory available for pairlists. Set pairlistdist to match the cutoff.
called boinc_finish

</stderr_txt>
]]>

Seems the KASHIF_HIVPR_auto_spawn WUs are asking for more memory than 512k?
ID: 17212 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17213 - Posted: 21 May 2010, 23:47:37 UTC - in response to Message 17212.  

Only one GT240 works for me:
XP Pro SP3 x86, driver 19745, was Boinc 6.10.51, now moved to 6.10.56.

I have had no failures on that card for any task, but Boinc did eventually lock up, hence the late move to 6.10.56.
Shaders at 1.6GHz working fine on a DDR3 card.

http://www.gpugrid.net/result.php?resultid=2349243

Looks like we are on our own when it comes to drivers, so please, post up any working specs!
ID: 17213 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17217 - Posted: 22 May 2010, 11:52:30 UTC - in response to Message 17213.  

Those units succeed on all my gt240 1 gig gddr3 cards but fail on my gt240's with 512mb gddr5 cards.
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline
ID: 17217 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17278 - Posted: 25 May 2010, 8:56:09 UTC

WU 1496749 looks like a bad job, by any standard and on any card. Just failed on my Fermi.
ID: 17278 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 17286 - Posted: 25 May 2010, 11:51:24 UTC - in response to Message 17278.  

Hopefully they fail immediately and don't cause credit loss.
ID: 17286 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17289 - Posted: 25 May 2010, 12:02:42 UTC - in response to Message 17217.  

Those units succeed on all my gt240 1 gig gddr3 cards but fail on my gt240's with 512mb gddr5 cards.


That seems to be the case with Win7.
Vista is a similar picture.

I have a 512MB GDDR3 GT240 on Win XP x86 and it has still not had one error for any task including the KASHIF_HIVPR WU's. I might try the GTX260 in that system, with an early driver to see how it fairs, as it is sitting idle.
ID: 17289 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17325 - Posted: 26 May 2010, 7:47:02 UTC - in response to Message 17286.  

Hopefully they fail immediately and don't cause credit loss.

Yes, they do fail quickly, like today's WU 1496749, but they still cause research data loss ;-)
ID: 17325 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17882 - Posted: 5 Jul 2010, 11:03:48 UTC - in response to Message 17289.  

My GTX260 worked well on XP and on Linux.
Unfortunately all the HIV WUs still seem to fail after about 13sec for me on systems with a GT240 and only 512MB RAM. This Vista System for example.
Fortunately the other WUs run fine and I am picking up enough of them.

KASHIF_HIVPR does not run on these systems,
19621 drivers, 4xGT240 (512MB GDDR5), Vista x64, Phenom II 940, 4GB RAM, 1TB Drive.
19562 drivers also fail on Microsoft Windows Server 2008 R2 x64 with a GT240 (512MB GDDR5).
19621 drivers also fail on Win 7 x64, again with a GT240 (512MB).

KASHIF_HIVPR does work on these Windows setups,
19745 drivers work for Win XP x86, again with a GT240 (512MB), as do other drivers.
19634 drivers work on Win7 for a GT240 (1024MB).

It is clear (as reported before) that the issue is with the amount of RAM on the card, and the operating system; if it has 512MB it will not complete KASHIF_HIVPR tasks on Vista, Win 7 or 2008 R2 Server with any driver, but it will work with XP and Linux. This is with a range of drivers from 19562 through to 19745. I have not put the latest drivers on, as these further slow crunching down on Vista and Win7.

If the 1GB cards succeed while the 512MB cards fail (depending on driver and OS) then this might continue into the future, and we are about to see another wave of Fermi cards with varying RAM amounts – just to make things more complicated.

Perhaps the servers could be made to distinguish between cards that fail and succeed on each task type and allocate accordingly. It would reduce the Internet overhead and improve performance slightly. At the minute this is done on a system to any task failure rate, rather than a card to individual task type failure rate.

If I have a GT240 that under a given OS can run one task type perfectly but fails others, I would like it just to pick up the tasks that it will run successfully. Picking up tasks randomly can lead to picking up no tasks. Obviously with new tasks there will be a learning period, but you could send out a few new tasks compared to many known working tasks. Crunchers don’t have the option to select projects so we cannot do this for ourselves, and automated systems tend to be more fool proof.

Doing this might also map good drivers to cards for specific work units. Something that could perhaps be published on the site from time to time.


ID: 17882 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17884 - Posted: 5 Jul 2010, 13:28:52 UTC - in response to Message 17882.  

Unfortunately all the HIV WUs still seem to fail after about 13sec for me on systems with a GT240 and only 512MB RAM. This Vista System for example.
Fortunately the other WUs run fine and I am picking up enough of them.

KASHIF_HIVPR does not run on these systems,
19621 drivers, 4xGT240 (512MB GDDR5), Vista x64, Phenom II 940, 4GB RAM, 1TB Drive.
19562 drivers also fail on Microsoft Windows Server 2008 R2 x64 with a GT240 (512MB GDDR5).
19621 drivers also fail on Win 7 x64, again with a GT240 (512MB).

Same here. All my GT 240 cards fail on the KASHIF_HIVPR_auto_spawn WUs. They're also all 512k and fail both in Win7 & XP. This is always the message:

- exit code 98 (0x62)
</message>
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GT 240"
# Clock rate: 1.50 GHz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 12
# Number of cores: 96
ERROR: file ntnbrlist.cpp line 63: Insufficent memory available for pairlists. Set pairlistdist to match the cutoff.
called boinc_finish

Looks like the WUs are asking for too much memory.
ID: 17884 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17885 - Posted: 5 Jul 2010, 13:51:07 UTC - in response to Message 17884.  

Upgrade to the 257.21 drivers and they will work OK



Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline
ID: 17885 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17887 - Posted: 5 Jul 2010, 14:35:14 UTC - in response to Message 17885.  
Last modified: 5 Jul 2010, 14:41:40 UTC

Betting Slip, well spotted!

The latest drivers facilitate CUDA 3.1 tasks, even for CC1.2 cards.
So the KASHIF_HIVPR WU's compiled using CUDA 3.1 (6.09) work for GT240 cards with 512MB RAM, while the older 6.05 WU's do not work for these cards with earlier CUDA 3.0 drivers.
I'm not sure that if you install the latest drivers you will be able to crunch 6.05 KASHIF_HIVPR WUs, but as long as you just pick up the 6.09 task the problem is solved.

However, we took a big speed hit from the last few drivers.
Fortunately GDF said he knows why some WU's are running slower under 3.1, and in a few days (probably) they will manage to correct it. I'm not sure if this just applies to Fermi tasks or if the tasks will also speed up for CC1.1, CC1.2 and CC1.3 cards?
I think I will move one system over at a time.
ID: 17887 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17893 - Posted: 5 Jul 2010, 17:46:00 UTC - in response to Message 17887.  

On second thoughts, I think I will sit tight, and wait this one out.
Good luck,
ID: 17893 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Graphics cards (GPUs) : More bad WUs? ------ KASHIF_HIVPR_auto_spawn

©2026 Universitat Pompeu Fabra