Message boards :
Graphics cards (GPUs) :
GPU problem
Message board moderation
Previous · 1 · 2 · 3
| Author | Message |
|---|---|
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Hi The only thing we saw is that you were first running with the factory default frequency # Clock rate: 1674000 kilohertz After the restart, you were running with the underclocked values and crashed # Clock rate: 1350000 kilohertz It would have been less surprising the other way round... gdf |
|
Send message Joined: 28 Apr 08 Posts: 3 Credit: 1,994,582 RAC: 0 Level ![]() Scientific publications ![]() ![]()
|
Hi all, keep up the good work !! and look forward to progressing with the first boinc project utilizing gpu's. I've started 4 gpu test work units and 3 have failed (compute error), system details below. 1 x 9600GT slightly OC'ed card (by manufacturer) Ubuntu hardy 8.04 cuda driver 177.13 client 6.3.5 Pentium D processor out of 4 units 3 units failed with a stderr error of "process exited with code 1 (0x1, -255)" but the 4th work unit passed with exit code 0(0x0). The 3 units completed processing times before compute error of 35k, 384 (yes small) and 63k cpu seconds. Now i know that the 1st failure occurred when at 35k i hit the sleep button (ridiculously placed on keyboard), when system came out of sleep mode compute error occured. The others failed on their own!..really. So my questions are below and pls excuse any questions that may seem simple with Linux as i'm still picking it up after a few years off..and getting used to Ubuntu differences. 1) it seems to me that the logging of actual gpu results as they are processed (or at set point)are a little bugy. since after sleep mode it just didnt restart, the 384 seconds unit i actually closed boinc manager normaly and opened it again and compute error happened. So does the data get stored correctly in this beta version as its being processed? or are there known issues here? or havent i done something right? 2) do i need to set the boinc files location in my path or Lib path? as i have to click the boinc client then boinc manager to start project. clicking boinc manager by itself doesnt start the process..thats were i had connection refused coming up before i realised running the client created the files needed for the boinc manager to connect to client 6.3.5 and start processing. 3) my dual core cpu indicates that both cores are 90-100% all the time? i saw from other posts that only 1 core should indicate close to 100% because of polling. no other apps were running. 4) in messages section of boinc manager it says cant download anymore files because of 1 cpu limit..i guess this should say i gpu limit? or is it cpu limit then that could explain 3) above also. Other comments After compute error the claimed credit says 1987.41 no matter what the computation time before error occurred i.e 384s or 63,000s. Not too worried if i dont get any credit for these errors as really this is all about science and testing a new system for future advancement....but its fun :-) Cheers, Fil. p.s sorry about any spelling mistakes...it's late and i'm going to sleep now. |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
[quote]Hi all, 1) restart is usually rock solid. What went wrong here is that the sleep mode caused the GPU program to crash notifying a compute error, so the client did not even try to restart. Even using the desktop heavily could cause the application to crash, display has priority of cuda runs. 2) There is a problem when running directly the manager, this is a boinc problem related to SELinux present even before the GPU. Workaround start first the client with boinc -daemon and then the graphical interface. 3) We use only 1 CPU to sync. What are the processes using the other one? 4) At the moment, boinc checks on the number of cpus. If you guys collect in a thread proven BOINC issues I will present them to D. Anderson when he comes to Barcelona in September. gdf |
|
Send message Joined: 12 Jul 07 Posts: 100 Credit: 21,848,502 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Ahh, thats interesting.Hi I upgraded my machine to Fedora 9 yesterday, installed 173 cuda drivers, downloaded a WU and it crashed straightaway. I checked nvclock and the card had reset to factory overclock, so I underclocked the card back down to 450 & 800 and then downloaded the WU in question. The NVidia X utility and nvclock both reported the lower clock values before that WU started but boinc saw the higher values untill after the benchmarks, almost 9 hours later ?? me=puzzled :D |
|
Send message Joined: 28 Apr 08 Posts: 3 Credit: 1,994,582 RAC: 0 Level ![]() Scientific publications ![]() ![]()
|
Thanks gdf for your comments, In relation to point 3)the client seems to be using 48-50% of my dual core cpu, as reported by processes of the system monitor in Ubuntu. However i previously mentioned that 100% or both cores were being used . This is true if viewing the resources animation tab and reports both at 100% (most of the time). Looking on the web i found that there is a bug with the compiz plugins 0.7.4 for Ubunto (hardy 8.04 version included) ..actually there are a whole list of issues to be addressed with the compiz plugins as reported at https://bugs.launchpad.net/ubuntu/+source/compiz/+bug/218726 i unloaded all the compiz plugins but i still get the problem. I guess its not a ps3grid/Boinc issue but a Ubuntu/Linux issue. cheers, Fil. quote]Hi all, 1) restart is usually rock solid. What went wrong here is that the sleep mode caused the GPU program to crash notifying a compute error, so the client did not even try to restart. Even using the desktop heavily could cause the application to crash, display has priority of cuda runs. 2) There is a problem when running directly the manager, this is a boinc problem related to SELinux present even before the GPU. Workaround start first the client with boinc -daemon and then the graphical interface. 3) We use only 1 CPU to sync. What are the processes using the other one? 4) At the moment, boinc checks on the number of cpus. If you guys collect in a thread proven BOINC issues I will present them to D. Anderson when he comes to Barcelona in September. gdf[/quote] |
Bender10Send message Joined: 3 Dec 07 Posts: 167 Credit: 8,368,897 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi, I just added my AMD Quad to the mix. Nothing here OC'd Evga 8800GS AMD 9550 Gigabyte GA-M78SM-S2H mb, with 4 gig ram Ubuntu 8.04 Boinc 6.3.5 Nvidia 173.14 drivers I have been running other Boinc project WU's for about a week, just to test out the setup (with Boinc 6.3.5), and everything has been fine until today... I just got it running a few minutes ago on PS3Grid. And went through 4 failed wu's with this error: "process exited with code 193 (0xc1, -63)" I'm not sure what is going on. And I am a Linux noob.... Consciousness: That annoying time between naps...... Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it. |
Bender10Send message Joined: 3 Dec 07 Posts: 167 Credit: 8,368,897 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
Ooops...Here are the tasks... http://www.ps3grid.net/results.php?hostid=5914 Consciousness: That annoying time between naps...... Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it. |
NightlordSend message Joined: 22 Jul 08 Posts: 61 Credit: 5,461,041 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I had one of these overnight too. Running from a 3GHz P4, not overclocked. On Ubuntu 8.04 a slow 8600GT and 6.3.8 client. Link to WU: http://www.ps3grid.net/result.php?resultid=42123 It failed right at the start of the run. I noticed some error messages in the message tab - Sorry, I'm not at the box now, so I can't be certain, but it was something like "result file not found". I'll post the exact message later today if anyone needs it. /edit/ I should add that other boxes were converted over to 6.3.8 last night without issue. |
Bender10Send message Joined: 3 Dec 07 Posts: 167 Credit: 8,368,897 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
|
[AF>HFR>RR] Jim PROFITSend message Joined: 3 Jun 07 Posts: 107 Credit: 31,331,137 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
So after some test, i think i have a problem but i can't solve it. I have only one WU finished without problems!! All others are with errors. I don't understand why, but sometimes, after Boinc finished a WU for another project, and start PS3Grid, my PC freeze! And not at the same time. Yesterday it was near 2H. But i saw it his morning! And when i restart, the WU is dead with computation errors! As i can't be all the times in front of the PC, i think i will no try to crunch a WU. And after 4 WU with errors, i can't download another one. If i try others projects, i don't have any problems. So i think taht's not Ubuntu who have a problem. I don't use any applications with it. Just for testing PS3Grid and the GPU. For information: Ubuntu 8.04.1 Nvidia 177.13 Boinc 6.3.8 Q9450 @ 3.4Ghz GTX 260 not OC 8Go DDR2 May be i will try one other time, but i wait with expectations, the windows version. I don't have any problems with windows, and i continue to think that Linux is not so stable. |
NightlordSend message Joined: 22 Jul 08 Posts: 61 Credit: 5,461,041 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I found on one of my 8800GT boxes that a heavy O/C on the CPU made it unstable on this project. It has been 100% stable on CPU projects before. I reduced the O/C on the CPU by about 10% and it has crunched without fail since. Hope that helps. |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
I found on one of my 8800GT boxes that a heavy O/C on the CPU made it unstable on this project. It has been 100% stable on CPU projects before. I reduced the O/C on the CPU by about 10% and it has crunched without fail since. What is O/C? gdf |
NightlordSend message Joined: 22 Jul 08 Posts: 61 Credit: 5,461,041 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Sorry.....O/C = OverClock His CPU is heavily overclocked and on one of my machines that made the GPU card unstable on this project. Just a 10% reduction in the CPU overclock brought it back - might be the same for him? |
Bender10Send message Joined: 3 Dec 07 Posts: 167 Credit: 8,368,897 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
Ok, I tried changing the default GPU (550) and MEM (800) settings to 500 and 700. I picked up 2 new wu's (http://www.ps3grid.net/results.php?hostid=5914). So far the 1st wu has been running for ~10 minutes. I have another Boinc project running on the other 3 cpu's also. If this werks, I will bump up the settings a bit. Time will tell.... Evga 8800GS AMD 9550 Gigabyte GA-M78SM-S2H mb, with 4 gig ram Ubuntu 8.04 Boinc 6.3.5 Nvidia 173.14 drivers Consciousness: That annoying time between naps...... Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it. |
Stefan LedwinaSend message Joined: 16 Jul 07 Posts: 464 Credit: 298,573,998 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So after some test, i think i have a problem but i can't solve it. I have had the same problems with a GTX260 and the 177.13 drivers! Seems this are driver related problems... I already have sent a bug report to NVIDIA a few weeks ago, but got no answer, so I had to install Vista on this PC to crunch for Folding@home... Hope there will be a Windows version of the GPUGRID application very soon! ;-) pixelicious.at - my little photoblog |
©2025 Universitat Pompeu Fabra