GPU problem

Message boards : Graphics cards (GPUs) : GPU problem
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 1410 - Posted: 4 Aug 2008, 14:18:34 UTC - in response to Message 1409.  

Hi

I had a WU running for ~9 hours when the benchmarks kicked in.
The WU resumed after the benchmarks and promptly failed with error 1

I know my card isn't the most stable but this failure looked to be caused by the benchmarks.

If a benchmark is due within the estimated WU run period is there any way that the benchmark could be run before starting the WU?


The only thing we saw is that you were first running with the factory default frequency

# Clock rate: 1674000 kilohertz

After the restart, you were running with the underclocked values and crashed

# Clock rate: 1350000 kilohertz

It would have been less surprising the other way round...

gdf
ID: 1410 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Filipe

Send message
Joined: 28 Apr 08
Posts: 3
Credit: 1,994,582
RAC: 0
Level
Ala
Scientific publications
watwatwat
Message 1411 - Posted: 4 Aug 2008, 14:31:12 UTC

Hi all,

keep up the good work !! and look forward to progressing with the first boinc project utilizing gpu's.

I've started 4 gpu test work units and 3 have failed (compute error), system details below.

1 x 9600GT slightly OC'ed card (by manufacturer)
Ubuntu hardy 8.04
cuda driver 177.13
client 6.3.5
Pentium D processor

out of 4 units 3 units failed with a stderr error of

"process exited with code 1 (0x1, -255)"

but the 4th work unit passed with exit code 0(0x0).

The 3 units completed processing times before compute error of 35k, 384 (yes small) and 63k cpu seconds. Now i know that the 1st failure occurred when at 35k i hit the sleep button (ridiculously placed on keyboard), when system came out of sleep mode compute error occured. The others failed on their own!..really. So my questions are below and pls excuse any questions that may seem simple with Linux as i'm still picking it up after a few years off..and getting used to Ubuntu differences.

1) it seems to me that the logging of actual gpu results as they are processed (or at set point)are a little bugy. since after sleep mode it just didnt restart, the 384 seconds unit i actually closed boinc manager normaly and opened it again and compute error happened. So does the data get stored correctly in this beta version as its being processed? or are there known issues here? or havent i done something right?

2) do i need to set the boinc files location in my path or Lib path? as i have to click the boinc client then boinc manager to start project. clicking boinc manager by itself doesnt start the process..thats were i had connection refused coming up before i realised running the client created the files needed for the boinc manager to connect to client 6.3.5 and start processing.

3) my dual core cpu indicates that both cores are 90-100% all the time? i saw from other posts that only 1 core should indicate close to 100% because of polling. no other apps were running.

4) in messages section of boinc manager it says cant download anymore files because of 1 cpu limit..i guess this should say i gpu limit? or is it cpu limit then that could explain 3) above also.

Other comments
After compute error the claimed credit says 1987.41 no matter what the computation time before error occurred i.e 384s or 63,000s. Not too worried if i dont get any credit for these errors as really this is all about science and testing a new system for future advancement....but its fun :-)

Cheers,
Fil.
p.s sorry about any spelling mistakes...it's late and i'm going to sleep now.

ID: 1411 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 1413 - Posted: 4 Aug 2008, 14:41:28 UTC - in response to Message 1411.  

[quote]Hi all,

1) restart is usually rock solid. What went wrong here is that the sleep mode caused the GPU program to crash notifying a compute error, so the client did not even try to restart. Even using the desktop heavily could cause the application to crash, display has priority of cuda runs.

2) There is a problem when running directly the manager, this is a boinc problem related to SELinux present even before the GPU. Workaround start first the client with boinc -daemon and then the graphical interface.

3) We use only 1 CPU to sync. What are the processes using the other one?

4) At the moment, boinc checks on the number of cpus. If you guys collect in a thread proven BOINC issues I will present them to D. Anderson when he comes to Barcelona in September.

gdf
ID: 1413 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Temujin

Send message
Joined: 12 Jul 07
Posts: 100
Credit: 21,848,502
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 1414 - Posted: 4 Aug 2008, 14:50:24 UTC - in response to Message 1410.  

Hi

I had a WU running for ~9 hours when the benchmarks kicked in.
The WU resumed after the benchmarks and promptly failed with error 1

I know my card isn't the most stable but this failure looked to be caused by the benchmarks.

If a benchmark is due within the estimated WU run period is there any way that the benchmark could be run before starting the WU?


The only thing we saw is that you were first running with the factory default frequency
# Clock rate: 1674000 kilohertz

After the restart, you were running with the underclocked values and crashed
# Clock rate: 1350000 kilohertz

It would have been less surprising the other way round...

gdf
Ahh, thats interesting.
I upgraded my machine to Fedora 9 yesterday, installed 173 cuda drivers, downloaded a WU and it crashed straightaway.
I checked nvclock and the card had reset to factory overclock, so I underclocked the card back down to 450 & 800 and then downloaded the WU in question.

The NVidia X utility and nvclock both reported the lower clock values before that WU started but boinc saw the higher values untill after the benchmarks, almost 9 hours later ??
me=puzzled :D


ID: 1414 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Filipe

Send message
Joined: 28 Apr 08
Posts: 3
Credit: 1,994,582
RAC: 0
Level
Ala
Scientific publications
watwatwat
Message 1421 - Posted: 6 Aug 2008, 13:09:17 UTC - in response to Message 1413.  

Thanks gdf for your comments,

In relation to point 3)the client seems to be using 48-50% of my dual core cpu, as reported by processes of the system monitor in Ubuntu.

However i previously mentioned that 100% or both cores were being used . This is true if viewing the resources animation tab and reports both at 100% (most of the time). Looking on the web i found that there is a bug with the compiz plugins 0.7.4 for Ubunto (hardy 8.04 version included) ..actually there are a whole list of issues to be addressed with the compiz plugins as reported at
https://bugs.launchpad.net/ubuntu/+source/compiz/+bug/218726

i unloaded all the compiz plugins but i still get the problem. I guess its not a ps3grid/Boinc issue but a Ubuntu/Linux issue.

cheers,
Fil.



quote]Hi all,

1) restart is usually rock solid. What went wrong here is that the sleep mode caused the GPU program to crash notifying a compute error, so the client did not even try to restart. Even using the desktop heavily could cause the application to crash, display has priority of cuda runs.

2) There is a problem when running directly the manager, this is a boinc problem related to SELinux present even before the GPU. Workaround start first the client with boinc -daemon and then the graphical interface.

3) We use only 1 CPU to sync. What are the processes using the other one?

4) At the moment, boinc checks on the number of cpus. If you guys collect in a thread proven BOINC issues I will present them to D. Anderson when he comes to Barcelona in September.

gdf[/quote]
ID: 1421 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bender10
Avatar

Send message
Joined: 3 Dec 07
Posts: 167
Credit: 8,368,897
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 1434 - Posted: 10 Aug 2008, 22:34:38 UTC


Hi, I just added my AMD Quad to the mix.

Nothing here OC'd

Evga 8800GS
AMD 9550
Gigabyte GA-M78SM-S2H mb, with 4 gig ram
Ubuntu 8.04
Boinc 6.3.5
Nvidia 173.14 drivers

I have been running other Boinc project WU's for about a week, just to test out the setup (with Boinc 6.3.5), and everything has been fine until today...

I just got it running a few minutes ago on PS3Grid. And went through 4 failed wu's with this error:

"process exited with code 193 (0xc1, -63)"

I'm not sure what is going on. And I am a Linux noob....



Consciousness: That annoying time between naps......

Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it.
ID: 1434 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bender10
Avatar

Send message
Joined: 3 Dec 07
Posts: 167
Credit: 8,368,897
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 1435 - Posted: 10 Aug 2008, 23:37:26 UTC


Ooops...Here are the tasks...

http://www.ps3grid.net/results.php?hostid=5914




Consciousness: That annoying time between naps......

Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it.
ID: 1435 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Nightlord
Avatar

Send message
Joined: 22 Jul 08
Posts: 61
Credit: 5,461,041
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 1436 - Posted: 11 Aug 2008, 11:18:54 UTC
Last modified: 11 Aug 2008, 11:29:29 UTC

I had one of these overnight too.

Running from a 3GHz P4, not overclocked. On Ubuntu 8.04 a slow 8600GT and 6.3.8 client.

Link to WU: http://www.ps3grid.net/result.php?resultid=42123

It failed right at the start of the run. I noticed some error messages in the message tab - Sorry, I'm not at the box now, so I can't be certain, but it was something like "result file not found". I'll post the exact message later today if anyone needs it.

/edit/ I should add that other boxes were converted over to 6.3.8 last night without issue.

ID: 1436 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bender10
Avatar

Send message
Joined: 3 Dec 07
Posts: 167
Credit: 8,368,897
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 1437 - Posted: 11 Aug 2008, 11:22:20 UTC


I installed the new 6.3.8 over the 6.3.5 version. It runs PrimeGrid wu's fine. I'll try PS3Grid wu's this morning. Again



Consciousness: That annoying time between naps......

Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it.
ID: 1437 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>HFR>RR] Jim PROFIT

Send message
Joined: 3 Jun 07
Posts: 107
Credit: 31,331,137
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 1438 - Posted: 11 Aug 2008, 15:34:03 UTC
Last modified: 11 Aug 2008, 15:35:35 UTC

So after some test, i think i have a problem but i can't solve it.

I have only one WU finished without problems!!
All others are with errors.

I don't understand why, but sometimes, after Boinc finished a WU for another project, and start PS3Grid, my PC freeze!
And not at the same time.
Yesterday it was near 2H. But i saw it his morning!

And when i restart, the WU is dead with computation errors!
As i can't be all the times in front of the PC, i think i will no try to crunch a WU. And after 4 WU with errors, i can't download another one.
If i try others projects, i don't have any problems. So i think taht's not Ubuntu who have a problem. I don't use any applications with it. Just for testing PS3Grid and the GPU.

For information:
Ubuntu 8.04.1
Nvidia 177.13
Boinc 6.3.8
Q9450 @ 3.4Ghz
GTX 260 not OC
8Go DDR2

May be i will try one other time, but i wait with expectations, the windows version.
I don't have any problems with windows, and i continue to think that Linux is not so stable.
ID: 1438 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Nightlord
Avatar

Send message
Joined: 22 Jul 08
Posts: 61
Credit: 5,461,041
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 1439 - Posted: 11 Aug 2008, 17:22:39 UTC
Last modified: 11 Aug 2008, 17:22:50 UTC

I found on one of my 8800GT boxes that a heavy O/C on the CPU made it unstable on this project. It has been 100% stable on CPU projects before. I reduced the O/C on the CPU by about 10% and it has crunched without fail since.

Hope that helps.

ID: 1439 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 1440 - Posted: 11 Aug 2008, 18:18:02 UTC - in response to Message 1439.  

I found on one of my 8800GT boxes that a heavy O/C on the CPU made it unstable on this project. It has been 100% stable on CPU projects before. I reduced the O/C on the CPU by about 10% and it has crunched without fail since.

Hope that helps.



What is O/C?

gdf
ID: 1440 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Nightlord
Avatar

Send message
Joined: 22 Jul 08
Posts: 61
Credit: 5,461,041
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 1441 - Posted: 11 Aug 2008, 18:29:34 UTC
Last modified: 11 Aug 2008, 18:30:51 UTC

Sorry.....O/C = OverClock

His CPU is heavily overclocked and on one of my machines that made the GPU card unstable on this project. Just a 10% reduction in the CPU overclock brought it back - might be the same for him?

ID: 1441 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bender10
Avatar

Send message
Joined: 3 Dec 07
Posts: 167
Credit: 8,368,897
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 1442 - Posted: 12 Aug 2008, 0:50:06 UTC
Last modified: 12 Aug 2008, 1:47:17 UTC


Ok, I tried changing the default GPU (550) and MEM (800) settings to 500 and 700. I picked up 2 new wu's (http://www.ps3grid.net/results.php?hostid=5914). So far the 1st wu has been running for ~10 minutes. I have another Boinc project running on the other 3 cpu's also.

If this werks, I will bump up the settings a bit.

Time will tell....


Evga 8800GS
AMD 9550
Gigabyte GA-M78SM-S2H mb, with 4 gig ram
Ubuntu 8.04
Boinc 6.3.5
Nvidia 173.14 drivers



Consciousness: That annoying time between naps......

Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it.
ID: 1442 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Stefan Ledwina
Avatar

Send message
Joined: 16 Jul 07
Posts: 464
Credit: 298,573,998
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwat
Message 1444 - Posted: 12 Aug 2008, 5:38:22 UTC - in response to Message 1438.  
Last modified: 12 Aug 2008, 5:39:43 UTC

So after some test, i think i have a problem but i can't solve it.

I have only one WU finished without problems!!
All others are with errors.

I don't understand why, but sometimes, after Boinc finished a WU for another project, and start PS3Grid, my PC freeze!
And not at the same time.
Yesterday it was near 2H. But i saw it his morning!

And when i restart, the WU is dead with computation errors!
As i can't be all the times in front of the PC, i think i will no try to crunch a WU. And after 4 WU with errors, i can't download another one.
If i try others projects, i don't have any problems. So i think taht's not Ubuntu who have a problem. I don't use any applications with it. Just for testing PS3Grid and the GPU.

For information:
Ubuntu 8.04.1
Nvidia 177.13
Boinc 6.3.8
Q9450 @ 3.4Ghz
GTX 260 not OC
8Go DDR2

May be i will try one other time, but i wait with expectations, the windows version.
I don't have any problems with windows, and i continue to think that Linux is not so stable.


I have had the same problems with a GTX260 and the 177.13 drivers!
Seems this are driver related problems...
I already have sent a bug report to NVIDIA a few weeks ago, but got no answer, so I had to install Vista on this PC to crunch for Folding@home...
Hope there will be a Windows version of the GPUGRID application very soon! ;-)

pixelicious.at - my little photoblog
ID: 1444 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3

Message boards : Graphics cards (GPUs) : GPU problem

©2025 Universitat Pompeu Fabra