Advanced search

Message boards : Number crunching : Calculation Errors

Author Message
Profile Adam Alexander
Send message
Joined: 8 Aug 08
Posts: 18
Credit: 11,326,180
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 6489 - Posted: 7 Feb 2009 | 20:49:04 UTC

A couple of days in a row I had multiple calculation errors on three of four WU's including one morning where the system was locked up. Today I witnessed what was happening. When one WU finished and started the upload and the next WU started there was a calculation error on the second WU. When the third WU started, the system locked up. Have any of you experienced this, and if so, do you have a fix? I detached and reattached the client.

This machine is running Boinc 6.6.2 on C2Q 6700 with 8gig of RAM, 8800GT 512MB video card and Vista Ult 64.

Thanks
____________

Currently running:
2 x Intel Core 2 Quad CPU Q6700 @ 2.66GHz
PS3PF

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 6490 - Posted: 7 Feb 2009 | 21:03:52 UTC - in response to Message 6489.

A couple of days in a row I had multiple calculation errors on three of four WU's including one morning where the system was locked up. Today I witnessed what was happening. When one WU finished and started the upload and the next WU started there was a calculation error on the second WU. When the third WU started, the system locked up. Have any of you experienced this, and if so, do you have a fix? I detached and reattached the client.

This machine is running Boinc 6.6.2 on C2Q 6700 with 8gig of RAM, 8800GT 512MB video card and Vista Ult 64.

Thanks

What version of the video drivers are you running?

Profile Adam Alexander
Send message
Joined: 8 Aug 08
Posts: 18
Credit: 11,326,180
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 6491 - Posted: 7 Feb 2009 | 22:12:45 UTC - in response to Message 6490.
Last modified: 7 Feb 2009 | 22:13:02 UTC

According to device manager the driver version is 7.15.11.7813
____________

Currently running:
2 x Intel Core 2 Quad CPU Q6700 @ 2.66GHz
PS3PF

Profile K1atOdessa
Send message
Joined: 25 Feb 08
Posts: 249
Credit: 392,697,553
RAC: 1,469,522
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 6492 - Posted: 8 Feb 2009 | 3:17:57 UTC - in response to Message 6491.

According to device manager the driver version is 7.15.11.7813


You could always try the current version: 181.22

Version 178.13 was released on Sept 25th, 2008 and the newest version (181.22) is the fourth newer version since. You could start with the newest and see if it resolves your issue. If not, try one of the others using the Advance Search from Nvidia's site.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 6493 - Posted: 8 Feb 2009 | 9:58:37 UTC - in response to Message 6492.

According to device manager the driver version is 7.15.11.7813


You could always try the current version: 181.22

Version 178.13 was released on Sept 25th, 2008 and the newest version (181.22) is the fourth newer version since. You could start with the newest and see if it resolves your issue. If not, try one of the others using the Advance Search from Nvidia's site.


I guess I am just slow, but this is the first time I have been able to figure out where the heck Nvidia puts the version number ... all these years ...

Profile Adam Alexander
Send message
Joined: 8 Aug 08
Posts: 18
Credit: 11,326,180
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 6513 - Posted: 9 Feb 2009 | 0:47:21 UTC - in response to Message 6492.

According to device manager the driver version is 7.15.11.7813


You could always try the current version: 181.22

Version 178.13 was released on Sept 25th, 2008 and the newest version (181.22) is the fourth newer version since. You could start with the newest and see if it resolves your issue. If not, try one of the others using the Advance Search from Nvidia's site.


Thanks, I'll give that a shot. The problem has not happened again since I detached and reattached.

Profile KWSN imcrazynow
Avatar
Send message
Joined: 27 Jan 09
Posts: 26
Credit: 3,572,637
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwat
Message 6561 - Posted: 11 Feb 2009 | 13:09:51 UTC

My last two tasks have errored out. I have no idea why. Running BOINC 6.5.0 and 181.22 driver for card. They are as follows.
http://www.gpugrid.net/result.php?resultid=294827
http://www.gpugrid.net/result.php?resultid=295877
Any ideas? I hate having bad results. :(
____________

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 6562 - Posted: 11 Feb 2009 | 14:29:38 UTC - in response to Message 6561.
Last modified: 11 Feb 2009 | 14:53:18 UTC

My last two tasks have errored out. I have no idea why. Running BOINC 6.5.0 and 181.22 driver for card. They are as follows.
http://www.gpugrid.net/result.php?resultid=294827
http://www.gpugrid.net/result.php?resultid=295877
Any ideas? I hate having bad results. :(


We've already spotted and fixed the source of the problem. It is an input parameter problem. An extended explanation is provided on this post:
http://www.gpugrid.net/forum_thread.php?id=726&nowrap=true#6558

[edit]
New WUs of these series are out. They look like this one:
lF22075-SMD10_1-0-1-SH2
[/edit]

thanks,
ignasi

Profile KWSN imcrazynow
Avatar
Send message
Joined: 27 Jan 09
Posts: 26
Credit: 3,572,637
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwat
Message 6585 - Posted: 11 Feb 2009 | 23:44:59 UTC - in response to Message 6562.

Thanks for the update.
____________

Profile Adam Alexander
Send message
Joined: 8 Aug 08
Posts: 18
Credit: 11,326,180
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 6586 - Posted: 11 Feb 2009 | 23:57:05 UTC - in response to Message 6562.
Last modified: 11 Feb 2009 | 23:57:20 UTC

Thanks! I thought I had this issue fixed and then had 10 WU's with compute errors after 0.00 seconds run time yesterday. Everything seems to be running okay now, though.
____________

Currently running:
2 x Intel Core 2 Quad CPU Q6700 @ 2.66GHz
PS3PF

Profile KWSN imcrazynow
Avatar
Send message
Joined: 27 Jan 09
Posts: 26
Credit: 3,572,637
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwat
Message 6611 - Posted: 13 Feb 2009 | 13:04:35 UTC - in response to Message 6586.

You got lucky. Mine seemed to run to almost full time.
____________

Profile Adam Alexander
Send message
Joined: 8 Aug 08
Posts: 18
Credit: 11,326,180
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 6616 - Posted: 13 Feb 2009 | 21:05:15 UTC - in response to Message 6611.

I've had to do quite a bit of fiddling since my last post and the WU's are either larger, or they are taking longer. I wound up reinstalling Boinc and now the only two projects it will get work from are GPU and SHA-1. That doesn't bother me at the moment since SHA-1 is the POTM for KWSN, but I'll have to get this ironed out eventually.
____________

Currently running:
2 x Intel Core 2 Quad CPU Q6700 @ 2.66GHz
PS3PF

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 6623 - Posted: 14 Feb 2009 | 9:44:47 UTC - in response to Message 6616.

I've had to do quite a bit of fiddling since my last post and the WU's are either larger, or they are taking longer. I wound up reinstalling Boinc and now the only two projects it will get work from are GPU and SHA-1. That doesn't bother me at the moment since SHA-1 is the POTM for KWSN, but I'll have to get this ironed out eventually.


Which version of BOINC did you install?

Profile Adam Alexander
Send message
Joined: 8 Aug 08
Posts: 18
Credit: 11,326,180
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 6632 - Posted: 14 Feb 2009 | 15:35:39 UTC - in response to Message 6623.

I've got it fixed. It looks like the problem was that I installed a 64 bit version on top of a 32 bit version. After a reinstall, it works fine. The 32 bit was 6.2.28 WCG and the 64 is 6.5.0.
____________

Currently running:
2 x Intel Core 2 Quad CPU Q6700 @ 2.66GHz
PS3PF

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 6639 - Posted: 14 Feb 2009 | 18:47:26 UTC - in response to Message 6632.

I've got it fixed. It looks like the problem was that I installed a 64 bit version on top of a 32 bit version. After a reinstall, it works fine. The 32 bit was 6.2.28 WCG and the 64 is 6.5.0.


Cool!

So, which SHOULD you have installed? :)

From all the reports 6.5.0 is still the "best" of the available options with even 6.6.7 not quite working right ... though it seems to be a little better than the prior versions. As of earlier this morning there were no developer notes on what they had done from 6.6.4 on ... sadly ...

Though if they can get this version right, 6.7.x should be pretty decent in that there are a number of Trak Tickets being addressed in the changes ...

Profile Adam Alexander
Send message
Joined: 8 Aug 08
Posts: 18
Credit: 11,326,180
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 6641 - Posted: 14 Feb 2009 | 18:51:52 UTC - in response to Message 6639.

I've got it fixed. It looks like the problem was that I installed a 64 bit version on top of a 32 bit version. After a reinstall, it works fine. The 32 bit was 6.2.28 WCG and the 64 is 6.5.0.


Cool!

So, which SHOULD you have installed? :)

From all the reports 6.5.0 is still the "best" of the available options with even 6.6.7 not quite working right ... though it seems to be a little better than the prior versions. As of earlier this morning there were no developer notes on what they had done from 6.6.4 on ... sadly ...

Though if they can get this version right, 6.7.x should be pretty decent in that there are a number of Trak Tickets being addressed in the changes ...


Good question. I was running 6.6.2 for awhile and ran into a problem with where it wouldn't stop getting new WU's. So far, 6.5.0 seems stable, but I've only had it running about 18 hours.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 6650 - Posted: 14 Feb 2009 | 21:37:20 UTC - in response to Message 6641.

Good question. I was running 6.6.2 for awhile and ran into a problem with where it wouldn't stop getting new WU's. So far, 6.5.0 seems stable, but I've only had it running about 18 hours.


Well, I have been running 6.5.0 for at least a month now ... so I am pretty happy with it ...

Actually I was referring to 64-bit vs. 32-bit ...

Profile Adam Alexander
Send message
Joined: 8 Aug 08
Posts: 18
Credit: 11,326,180
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 6653 - Posted: 14 Feb 2009 | 22:24:13 UTC - in response to Message 6650.

Good question. I was running 6.6.2 for awhile and ran into a problem with where it wouldn't stop getting new WU's. So far, 6.5.0 seems stable, but I've only had it running about 18 hours.


Well, I have been running 6.5.0 for at least a month now ... so I am pretty happy with it ...

Actually I was referring to 64-bit vs. 32-bit ...


The 64.

Post to thread

Message boards : Number crunching : Calculation Errors

//