GPU problem

Message boards : Graphics cards (GPUs) : GPU problem
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Temujin

Send message
Joined: 12 Jul 07
Posts: 100
Credit: 21,848,502
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 1316 - Posted: 21 Jul 2008, 19:31:56 UTC - in response to Message 1313.  

If you go over quota for the day, let me have your hostid.

got 3,
2 normal & 1 shortie, running the shortie now

thanks
ID: 1316 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 1317 - Posted: 21 Jul 2008, 19:34:04 UTC - in response to Message 1315.  




Hi,

your card seems to be overclocked which makes it unstable and causes the errors!
Is it right?

GDF
who, me?

not as far as I know, I've certainly not tweaked anything.

nvclock gives the following
-- General info --
Card: Unknown Nvidia card
Architecture: G92 A2
PCI id: 0x606
GPU clock: 601.712 MHz
Bustype: PCI-Express

-- Shader info --
Clock: 1674.000 MHz
Stream units: 96 (1b)
ROP units: 12 (1b)
-- Memory info --
Amount: 384 MB
Type: 128 bit DDR3
Clock: 899.996 MHz

-- PCI-Express info --
Current Rate: 16X
Maximum rate: 16X

-- Sensor info --
Sensor: GPU Internal Sensor
GPU temperature: 18C

-- VideoBios information --
Version: 62.92.29.00.00
Signon message: ASUS EN8800GS TOP VGA BIOS Ver 62.92.29.00.AS13
Performance level 0: gpu 600MHz/shader 1700MHz/memory 900MHz/0.00V/100%
VID mask: 3
Voltage level 0: 0.95V, VID: 0
Voltage level 1: 1.00V, VID: 1
Voltage level 2: 1.05V, VID: 2
Voltage level 3: 1.10V, VID: 3




Maybe it was overclocked by the vendor.
According to this
http://en.wikipedia.org/wiki/GeForce_8_Series

the shader should be clocked at 1375

You could try to reduce it.
GDF
ID: 1317 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 1318 - Posted: 21 Jul 2008, 19:37:36 UTC
Last modified: 21 Jul 2008, 19:43:05 UTC

Hi,

According to the Asus website, at [url=http://www.asus.co.nz/news_show.aspx?id=9578], your card is a factory overclocked 8800GS. Whilst overclocking is often acceptable for games, because the GPUGRID application works the card very hard it is quite possible that it becoming unstable. I suggest that you try reducing the clock back to the standard settings shown in the Wikipedia article griven by GDF.

Before you can change the clock frequencies, you may need to add the following to the screen or device section of /etc/X11/xorg.conf:

Option "Coolbits" "1"

Then restart X. The nvidia-settings program will then have a panel called "clock settings".

MJH
ID: 1318 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Stefan Ledwina
Avatar

Send message
Joined: 16 Jul 07
Posts: 464
Credit: 298,573,998
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwat
Message 1319 - Posted: 21 Jul 2008, 19:40:11 UTC
Last modified: 21 Jul 2008, 19:41:56 UTC

If you meant me with the oc'd card - no they aren't overclocked...

My 9800GTX is an EVGA 9800 GTX SC (super clocked) and is a little bit overclocked by default, but it was stable 24/7 over one week when I was running the Folding@home GPU Client under Windows - Haven't had one error on this card at FAH. And the other two cards do show the same errors with ps3grid and they are for sure not oc'd - not by me and not by the vendor...

-----
Just had another error after nine hours, with a new error code-
The WU was http://www.ps3grid.net/PS3GRID/result.php?resultid=38719 ,and the error was

<core_client_version>6.3.5</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>

</stderr_txt>
]]>

pixelicious.at - my little photoblog
ID: 1319 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Temujin

Send message
Joined: 12 Jul 07
Posts: 100
Credit: 21,848,502
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 1320 - Posted: 21 Jul 2008, 19:59:25 UTC - in response to Message 1318.  

Hi,

According to the Asus website, at [url=http://www.asus.co.nz/news_show.aspx?id=9578], your card is a factory overclocked 8800GS. Whilst overclocking is often acceptable for games, because the GPUGRID application works the card very hard it is quite possible that it becoming unstable. I suggest that you try reducing the clock back to the standard settings shown in the Wikipedia article griven by GDF.
Yep, that would make sense.
Must admit, I didn't realise it was an overclock card, i just bought it because it was a cheap 8800.
As far as temps go, its 16C at idle and a max of 28C while running WUs.

Before you can change the clock frequencies, you may need to add the following to the screen or device section of /etc/X11/xorg.conf:

Option "Coolbits" "1"

Then restart X. The nvidia-settings program will then have a panel called "clock settings".
Yep, done that and up pops the extra panel but I can only adjust GPU (at default of 600Mhz) and Memory (at default of 900 Mhz) settings. There's no access to the Shader settings.

I'm willing to knock both GPU & Memory down if that will help, any suggestions what to set it to?

I've also done a "nvclock -r" (reset) but it didn't seem to change the output from "nvclock -i -f"
ID: 1320 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Temujin

Send message
Joined: 12 Jul 07
Posts: 100
Credit: 21,848,502
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 1322 - Posted: 21 Jul 2008, 20:23:43 UTC - in response to Message 1320.  
Last modified: 21 Jul 2008, 20:26:19 UTC

ok, I've gone for GPU @ 550 and Memory @ 850
shader is still at 1700 though, anyone know how to adjust that?

edit
oops, nope, its dropped down to 1566Mhz
I'll have a play around with GPU & Mem settings
ID: 1322 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile UBT - NaRyan
Avatar

Send message
Joined: 16 Jul 08
Posts: 68
Credit: 1,242,980
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 1323 - Posted: 21 Jul 2008, 20:39:29 UTC
Last modified: 21 Jul 2008, 20:53:21 UTC

My 8800GT is factory overclocked should be 600MHz Core, 1500MHz Shader & 1800MHz Memory.
However it runs at 700MHz Core, 1700MHz Shader & 2000MHz Memory.

And it's the computer that's so far been 100% stable.

I do however have the fan set to 100% on it.

*EDIT*
oops forgot it was Quad systems having probs not dual core ones *ahem*
ID: 1323 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Stefan Ledwina
Avatar

Send message
Joined: 16 Jul 07
Posts: 464
Credit: 298,573,998
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwat
Message 1324 - Posted: 21 Jul 2008, 20:49:45 UTC

Ok, that's your Dualcore without problems, but the Quadcore has also some WUs with errors...

Any word from GDF or MJH about that it looks like those errors only appear on Quadcore computers?

pixelicious.at - my little photoblog
ID: 1324 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 1325 - Posted: 21 Jul 2008, 21:31:11 UTC - in response to Message 1324.  

Ok, that's your Dualcore without problems, but the Quadcore has also some WUs with errors...

Any word from GDF or MJH about that it looks like those errors only appear on Quadcore computers?


I don't see any reason why quad cores could create problems.

GDF
ID: 1325 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile UBT - NaRyan
Avatar

Send message
Joined: 16 Jul 08
Posts: 68
Credit: 1,242,980
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 1328 - Posted: 21 Jul 2008, 22:22:47 UTC - in response to Message 1325.  

Ok, that's your Dualcore without problems, but the Quadcore has also some WUs with errors...

Any word from GDF or MJH about that it looks like those errors only appear on Quadcore computers?


I don't see any reason why quad cores could create problems.

GDF


Thing is, looking at the Top Computers every single Quad Core has them :(
Need someone with an AMD Quad to join to see if that also has the same probs.
ID: 1328 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Stefan Ledwina
Avatar

Send message
Joined: 16 Jul 07
Posts: 464
Credit: 298,573,998
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwat
Message 1330 - Posted: 21 Jul 2008, 23:12:06 UTC
Last modified: 21 Jul 2008, 23:14:26 UTC

Ok, let's look a little bit further at the top hosts list...

Computers with computations errors:

stefan@home Intel Quad Core
stefan@home Intel Quad Core
JG4KEZ(Koichi Soraku) Intel Quad Core Xeon
sneakysaurus Intel Quad Core
Anonymous user Intel Quad Core
UBT-NaRyan Intel Quad Core
stefan@home Intel Quad Core
Anonymous user Intel Dual Core !
[AF>Linux>Gentoo] elgrande71 Intel Dual Core !

Computers without computation errors:

UBT-NaRyan AMD Dual Core

Ok that's - this are all GPU computers I could find (Except the computer of GDF, I have excluded it, because he also had computation errors, but they were dated before the official start of gpugrid)... Seems the errors are not only related to Quad Cores, but the only computer without errors is an AMD Dual Core...
Don't know what it means, but at least all Intel computers show these errors...

pixelicious.at - my little photoblog
ID: 1330 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Temujin

Send message
Joined: 12 Jul 07
Posts: 100
Credit: 21,848,502
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 1333 - Posted: 22 Jul 2008, 8:12:52 UTC - in response to Message 1316.  

got 3, 2 normal & 1 shortie, running the shortie now

The shortie turned out to be not a shortie :(

1st one crashed again but that was before I underclocked the card.
Current WU is now 12 hours in with an estimated 1.5 hours to go.
ID: 1333 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile UBT - NaRyan
Avatar

Send message
Joined: 16 Jul 08
Posts: 68
Credit: 1,242,980
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 1334 - Posted: 22 Jul 2008, 8:21:24 UTC - in response to Message 1333.  
Last modified: 22 Jul 2008, 8:22:37 UTC

got 3, 2 normal & 1 shortie, running the shortie now

The shortie turned out to be not a shortie :(

1st one crashed again but that was before I underclocked the card.
Current WU is now 12 hours in with an estimated 1.5 hours to go.


The shorties have the name "xxxxxxx-FASTTEST-x-x-xxxxx" and for me it took about 47 Seconds to complete for 1.99 credits :)

Down with the Kredit Kops!!!
ID: 1334 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Temujin

Send message
Joined: 12 Jul 07
Posts: 100
Credit: 21,848,502
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 1336 - Posted: 22 Jul 2008, 12:27:10 UTC - in response to Message 1334.  

The shorties have the name "xxxxxxx-FASTTEST-x-x-xxxxx" and for me it took about 47 Seconds to complete for 1.99 credits :)
I didn't get any of them then :(
ID: 1336 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Temujin

Send message
Joined: 12 Jul 07
Posts: 100
Credit: 21,848,502
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 1351 - Posted: 24 Jul 2008, 10:52:26 UTC
Last modified: 24 Jul 2008, 11:27:14 UTC

I think I may have sorted my GPU problems.

I run my own little boinc stats database for my team and mysqld gets a tad busy during updates for about 10 minutes.
I'm also in the process of upgrading my machine from Fedora 7 to Fedora 9.
To do that, I've borrowed a machine from work and have moved the stats over to that machine while I do the upgrade.
I'm waiting a couple of days before upgrading the main machine untill I'm sure everything is running ok on the temp machine.

Since moving the stats all GPU WUs have completed successfully.
Ok, its only done 2 and a bit WUs but its never managed that before, I was lucky if I had 1 in 5 succeed and never 2 sequentially.

Its a bit early to claim 100% victory but its looking good, maybe, touch wood :D
ID: 1351 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile UBT - NaRyan
Avatar

Send message
Joined: 16 Jul 08
Posts: 68
Credit: 1,242,980
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 1354 - Posted: 25 Jul 2008, 2:46:22 UTC

I just got another error on my quad. Task ID 39247

Last one I had on the Quad was 4 days ago, so can't moan about it.
And the AMD dual core is still plodding along as happy as can be :)

Down with the Kredit Kops!!!
ID: 1354 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Nightlord
Avatar

Send message
Joined: 22 Jul 08
Posts: 61
Credit: 5,461,041
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 1366 - Posted: 26 Jul 2008, 18:57:36 UTC
Last modified: 26 Jul 2008, 18:59:20 UTC

Hi guys! A couple of error types to report.

I loaded up two boxes with 8800GT's (the first box ran a 8600GT as a test bed for a couple of days).

One box (this one)seems a bit unstable. Lots of compute errors. I've reduced the clock using nvclock as discussed earlier in the thread and we'll see what happens

On the other box, I have some missing libcudart.so errors. Was there a fix found for the missing libcudart.so discussed earlier? This host seems to do that on every second WU - immediately it tries to start up the second WU after completing the first, it fails for missing libcudart. I've checked and the file is present in the /projects/ps3grid.net folder, so stumped really.

Both boxes are running ubuntu 8.04, with 8800GT's fed from dual core E4300's, which have been fine on boinc up till now.
ID: 1366 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 1376 - Posted: 28 Jul 2008, 9:47:42 UTC - in response to Message 1366.  

Hi guys! A couple of error types to report.

I loaded up two boxes with 8800GT's (the first box ran a 8600GT as a test bed for a couple of days).

One box (this one)seems a bit unstable. Lots of compute errors. I've reduced the clock using nvclock as discussed earlier in the thread and we'll see what happens

On the other box, I have some missing libcudart.so errors. Was there a fix found for the missing libcudart.so discussed earlier? This host seems to do that on every second WU - immediately it tries to start up the second WU after completing the first, it fails for missing libcudart. I've checked and the file is present in the /projects/ps3grid.net folder, so stumped really.

Both boxes are running ubuntu 8.04, with 8800GT's fed from dual core E4300's, which have been fine on boinc up till now.



We have a workaround for the libcudart missing problem. It seems to happen in a strange way, which we cannot replicate on our fedora box.
The workaround is to install the Nvidia toolkit (same page of the the driver) and set in your .bashrc file the following command:

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib

This should not be needed in the future. I keep you updated for when we find a solution.

gdf


ID: 1376 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Nightlord
Avatar

Send message
Joined: 22 Jul 08
Posts: 61
Credit: 5,461,041
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 1380 - Posted: 28 Jul 2008, 17:38:52 UTC - in response to Message 1376.  



We have a workaround for the libcudart missing problem. It seems to happen in a strange way, which we cannot replicate on our fedora box.
The workaround is to install the Nvidia toolkit (same page of the the driver) and set in your .bashrc file the following command:

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib

This should not be needed in the future. I keep you updated for when we find a solution.

gdf





Thanks for your tip, I'll keep a watch on the boxes and patch if needed.

I reduced the clocks slightly on the machine that appeared to have a stability problem, it seems fine now. The other machine has not encountered a libcudart.so fail since Saturday evening.



ID: 1380 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Temujin

Send message
Joined: 12 Jul 07
Posts: 100
Credit: 21,848,502
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 1409 - Posted: 4 Aug 2008, 12:20:18 UTC
Last modified: 4 Aug 2008, 12:22:48 UTC

Hi

I had a WU running for ~9 hours when the benchmarks kicked in.
The WU resumed after the benchmarks and promptly failed with error 1

I know my card isn't the most stable but this failure looked to be caused by the benchmarks.

If a benchmark is due within the estimated WU run period is there any way that the benchmark could be run before starting the WU?
ID: 1409 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Graphics cards (GPUs) : GPU problem

©2025 Universitat Pompeu Fabra