Message boards :
Graphics cards (GPUs) :
Eight compute errors in a row
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 11 Apr 09 Posts: 17 Credit: 11,086,149 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So, after finally getting the server to send me work, the next units go down with compute errors. They all failed in <5-10 seconds after starting and gave me an on-screen error message. I didn't get a chance to copy it down but an exe was failing to start. The units came from a few different subprojects (KASHIF, IBUCH, etc.). After running so smoothly for over a month it's weird to suddenly have so many problems. I'll try to capture the error when the server tries to send more work units. It won't now saying i exceeded my quota and that the server has no work. Does anyone know if BOINC has another GPU intensive application other than SETI? Milkyway never seems to have any work. any other projects? |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So, after finally getting the server to send me work, the next units go down with compute errors. They all failed in <5-10 seconds after starting and gave me an on-screen error message. I didn't get a chance to copy it down but an exe was failing to start. The units came from a few different subprojects (KASHIF, IBUCH, etc.). After running so smoothly for over a month it's weird to suddenly have so many problems. I'll try to capture the error when the server tries to send more work units. It won't now saying i exceeded my quota and that the server has no work. At the moment there are only three GPU Grid, SaH, and SaH Beta that have a "steady" supply of work. The Lattice Project, Ramsey, Aqua, and Milky Way are all in the start up phase with unknown work availability. Aqua just came on line in the last 24 hours. I have not tried them yet but they have been having problems so don't know their status. Milky Way is just getting going so they do not yet have work. The Lattice Project always has been intermittent with work and they had a limited CUDA test but I don't know if they are issuing work at this time. Einstein said they were on the verge of a CUDA release, but, have not said anything positive about it in a month or so ... |
|
Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Well there is another project "folding@home" , they make their own application and runs on many videocards and does support also smp meaning 4 cpu can work on 1 unit, i am not sure if the gpu-client also uses the cpu's but it probably will. The client runs on ATI,Nvidia GPU, SMP and normal so in anyway impressive build sadly they turned their back towards boinc, so its not a boinc project anymore. They reported boinc a much to unstable platform :D |
|
Send message Joined: 11 Apr 09 Posts: 17 Credit: 11,086,149 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Yeah, and the last time I tried to install the GPU verison of the Folding At Home client (few weeks ago) it was an impressively complicated install. I've been using computers since I was 4 so I know I can do it, but the effort involved and the amount of changes made to my computer, and this rig is actually important for some other things, means I"m not gonna do it. Hmm, I'm still getting the No-work available error from GPUgrid so can't tell if the situation is fixed or not. |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Yeah, and the last time I tried to install the GPU verison of the Folding At Home client (few weeks ago) it was an impressively complicated install. I've been using computers since I was 4 so I know I can do it, but the effort involved and the amount of changes made to my computer, and this rig is actually important for some other things, means I"m not gonna do it. Youwill likely get that message for 24 hours ... try Aqua them may have their issues worked out. I have not tried it myself, but, you may have nothing to lose but some time ... There are a couple threads on the boards about the CUDA experience there though they don't have much in them yet ... be the first ... start a fashion ... :) |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
anthonmg, you had one task which failed after quite some time and subsequently all other WUs failed immediately with BOINC reporting "device emulation" again. Could be your computer needed a restart after the crash. In such a state I suppose all / most other 3D software will fail as well. 182.65 is an unusual driver. If you experience more problems try to remove it and try 182.50 (WHQL). uBronan, I wouldn't say "is not a BOINc project any more". Actually they never were. They are among the DC pioneers and have been running great long before BOINC. Apparently they evaluated it at some point and were not pleased. Actually I've been running folding@home GPU1 on my old ATI 1950Pro. That was cool, the first GPU crunching, long before anyone said "CUDA" officially :D Oh, why I mention it: I thought it was pretty easy and straight forward to set up. It only got messy with new and / or multiple cards. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
anthonmg, I've just seen you already asked the same question (well, asked for help by stating the problem) in the other thread and got plenty of help over there. Please try to keep the discussion focused, so peoples time is not wasted. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 11 Apr 09 Posts: 17 Credit: 11,086,149 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Sorry about that. This was originally a thread about a related problem to the main one, but when the second problem croped up, seemingly unrelated to the first, I started a new thread on it, and got different help in each one. It's finally working. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Glad to hear the problem is solved! MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 9 Apr 09 Posts: 7 Credit: 25,115,977 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
I have one GPU that has a runtime error every week or so. It kills one running WU then kills the next 5 or 6. After a reboot, it runs fine for the next week. Any ideas why this one machine has this problem and how I can fix it?? |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have one GPU that has a runtime error every week or so. It kills one running WU then kills the next 5 or 6. After a reboot, it runs fine for the next week. If you raised the OC, lower it ... check the fans for dirt, check running temps, make sure you are running one of the "approved" versions of the drivers for your OS ... check for viruses and malware ... and not to be too flip, reboot the machine every three days ... :) |
|
Send message Joined: 9 Apr 09 Posts: 7 Credit: 25,115,977 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
This one GPU does run hotter than my other 3, but I have increased the fan to 90%. Is runtime an OS program or Nvidia or Cuda?? |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I checked one of your hosts and it gets quite some errors. It's running at 1.40 GHz, whereas standard is 1.25 GHz. Could be clock speed.. keep in mind that individual chips and their frequency / temperature headroom are different and they degrade over time. Runtime just means the error happened while you actually run the app, not during compilation or whatever. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 9 Apr 09 Posts: 7 Credit: 25,115,977 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
This MSI card is factory OC'ed. I have not done any overclocking myself. In fact, EVGA Precision shows it at less than MSI claims, 648 to 655 respectively. I did OC the entire system and have cut that back a bit. Maybe that might help. |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This one GPU does run hotter than my other 3, but I have increased the fan to 90%. outlnder, since that card is running hotter than your other 3 MSI factory OCed GTX 260s it most likely has a problem, maybe a heatsink that isn't getting good contact. Since it's a very new card you may want to consider an RMA. Have you tried it on a different machine (not that you have many to spare :-) Edit: It looks like you might have already swapped it with a different card since the failing one is listed as your oldest client. Personally I'd RMA it. |
|
Send message Joined: 9 Apr 09 Posts: 7 Credit: 25,115,977 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
The last errored out WU also errored the Docking WU's being done by the CPU. This tends to tell me that it isn't the GPU that is causing this problem. I will continue to watch it, but I think it may be the OS causing the errors. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The last errored out WU also errored the Docking WU's being done by the CPU. This tends to tell me that it isn't the GPU that is causing this problem. I will continue to watch it, but I think it may be the OS causing the errors. That may be a very useful observation. Swap GPUs with a system that works. If the errors travel with the GPU it looks like hardware (clockspeed / temperature), but if the same machine erros out than it's not the GPU and an RMA won't help. People frequently blame the OS if anything goes wrong.. but mostly that's not the reason. There could be file corruption or some dodgy driver installation, but there's also CPUs overclocked too much and defect memory or, much more common: RAM set to wrong timings, either by the user during OC or the bios in automatic mode. MrS Scanning for our furry friends since Jan 2002 |
©2025 Universitat Pompeu Fabra