Message boards :
Frequently Asked Questions (FAQ) :
FAQ - Why does my run fail? Some answers
Message board moderation
| Author | Message |
|---|---|
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
We are monitoring causes for some crashes and their causes, in some cases even with the help of the volunteer which gives us access to their machine to do tests. NORMAL BEHAVIOUR IS THAT YOU EXPERIENCE NO CRASH AT ALL (let's say <1%). These are the most common cases of errors: 1) OVERCLOCKING. SYNTHOMS: the application succeed but more or less often the application crashes with errors randomly appearing in several different GPU kernels (shake, langevin,pme, whatever). SOLUTION. Reduce the clocks to the reccomended clocks for your board (note that some manufacturers increase the clock, so it might be that you did not do anything but the gpu is actually overclocked). See wikipedia for correct frequencies. 2) POOR COOLING. SYNTHOMS: Same as before random errors on different kernels. SOLUTIONS: If your board is not overclocked according to the number given by wikipedia, then it could cooling. Open your case or buy extra fans. Air has to come in from the front of the gpu and leave from the rear. 3) NVIDIA bugs SYNTHOMS: you change driver and it stops working or if the error is always on the same kernel (PME, FFT. Now for instance we have the infamous FFT bug) SOLUTIONS: If the driver works do not update unless you need it for some game. If it stops working, then try to update the driver. The fft bug reported to Nvidia by us was solved on 190 drivers for G80 chips. It is still there for some GTX216 cards (it is unclear if these 216 work with 182 drivers. Try.) 4) BOINC bug SYNTHOMS: Various SOLUTIONS: Stick to a client that works for you, only change if we require to do so or you are willing to experiment. 5) POOR DRIVER INSTALLATION SYNTHOMS: You can't run any workunits at all and the application crashes immediately. This is ofter a problems for Windows users. SOLUTIONS. Reinstall the drivers in a proper way. Try this: http://www.guru3d.com/category/driversweeper/ This is rather common on WIndows machines. In general, new drivers and new BOINC versions add features and solve old bugs, but as well introduce new ones. This is normal, find your best equilibrium. Happy crunching. GDF |
|
Send message Joined: 21 Dec 08 Posts: 7 Credit: 251,750,735 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
GDF, it is very clear to me that the one eVGA 260-216 that I'm having issues with works just fine on driver 182.50. It will not work on anything 185.xx or higher. It just shoots out errors on the higher drivers. (Machine #20013) The card part number is 896-P3-1267-FR. It's their "superclocked" edition. My other 260-216s seem to be working fine on a mix from 185.85 to the newest driver, 190.62. Hope this helps others. Bob |
|
Send message Joined: 2 Mar 09 Posts: 159 Credit: 13,639,818 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
TBH, my card is already FOC, but just to try some stuff, i did overclock my 216 core 260 card to 675 mhz from 630. this didn't produce any errors and not much of a speed up really. i keep my fan on a constant 70 % Fan speed whether i'm gaming or cudaing. i'm gone throug the 6 series, and no real errors to speak of. running 6.10.0 right now. |
|
Send message Joined: 4 Jul 09 Posts: 76 Credit: 114,610,402 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have a GTX 260 192 that will not run any game or GPUgrid on any driver above 182.50. With 182.50 it ran GPUGrid fine and F@H fine with never an error. |
|
Send message Joined: 2 Mar 09 Posts: 159 Credit: 13,639,818 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have a GTX 260 192 that will not run any game or GPUgrid on any driver above 182.50. With 182.50 it ran GPUGrid fine and F@H fine with never an error. no game, sounds like you now have a bad card!!!!! |
|
Send message Joined: 11 Dec 08 Posts: 43 Credit: 2,216,617 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Just as a general comment across different operating systems using WindowsXP x64, Suse Linux, Sabayon and Ubuntu, I have found the following to be true with my GTX 260 and AMD X2. 1. In Linux, I cannot start the BOINC manager and Gpugrid with the video card overclocked. I have to start the program, run it for 5 minutes and then suspend calculations to start the overclock. 2. To overclock successfully, I do much better if I use a light window manager like IceWM in Linux or change the Windows video settings to maximum performance versus highest quality. 3. I have been able to run Gpugrid with any Nvidia driver that was supported except for the period when the Linux 185 drivers would error out all the time. I am now using Ubuntu 64bit with the Nvidia 190 driver with no issues. 4. I can run any video operation with the current drivers and run Gpugrid. I prefer to suspend Gpugrid if I am transcoding video or doing heavy file copying operations. 5. Linux has been the most stable setup for running Gpugrid day in and day out without issues. Windows XP x64 was a close second. Windows Vista 32 bit was less stable with me for some reason. 6. If the system crashes for any reason in any operating system, I am better off either deleting the affected work units or resetting the operation as soon as I start BOINC. 7. The first time that I start a new install of BOINC and Gpugrid, it will always freeze. I then reboot, do a reset of the project and proceed. 8. Being more demanding on the video card means that Gpugrid is less stable than Folding@home, Seti, Aqua or Einstein cuda applications. The Gpugrid program does more useful work and demands less from my CPU. |
ZydorSend message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Too many people trust graphics driver deinstall routines via Windows deinstall, used to include me, despite being told many times over the years - clean out drivers once you have deinstalled and before installing the new graphics drivers. I learnt recently the hard way not to be idle about this, and to religiously go through the clean out routine. There has been yet another example of this over on the number crunching forum. It boils down to a simple fact - Windows deinstall routine will not delete a file if its flagged as "in use". On top of that is the fact that not all graphics drivers file setup is the same, will vary from version to version. End result - bits of old driver installs left behind that will cause issues with the new driver. NVidia used to make great play of proper deinstall, they dont now, I suspect some PR guru clown has poked his nose into the real world ... and this is not only NVidia, it applies to ATI drivers as well. The Guru3d Driver Cleaner includes sweeping ATI drivers as well for that reason. Similar issues occur with sound drivers. Some make great play of switching drivers left right and centre trying to squeeze out the last performance drop - whether or not it achieves that is for another day - what it will achieve is a rapid build up of undeleted garbage that is not always over-written by a new install, no matter how a deinstall was done (indeed if they did a deinstall at all - many are just installing over the top of existing installations. Deinstall old drivers via windows, then boot into safe mode, run Guru3d driver sweeper, reboot, and install the new drivers. Its only a few minutes extra work, but will save days if not weeks of grief. Its not the beall and endall of all graphics issues, thats for sure, but I'll bet my pension its in the majority ..... Regards Zy |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Has anybody done any analysis of when tasks fail? I mean, what time of day? Mine show a distinct tendency to fail in the early hours of the morning (between 3am and 6am, local time). It's not so farfetched that there could be a correlation. I've noticed when working on server installations with UPSs that the electricity supply voltage can vary over 24 hours - lower when local demand is high, higher when everyone is asleep in bed and most appliances are switched off. So a cooling solution which is adequate when you're around to measure it might be inadequate under higher power draw (likely if the input voltage is higher). |
|
Send message Joined: 10 Nov 07 Posts: 10 Credit: 12,777,491 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Palit GeForce GTX 260 Sonic 216 SP - Vista64 - 190.62 x64 drivers, no issues, everything working fine. No failure until now. This card is slightly overclocked from vendor-side, 625 mhz instead of 585 mhz. But has two fans so it is on 55°C even after some days of permanent work. |
|
Send message Joined: 16 Aug 09 Posts: 1 Credit: 542,905 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
I've got a MSI GTX 260 OCv3, windows Media center (32bits), AMD64 3400+ (such an old processor for this video card) in a emachines and I've never have had a problem with errors in GPUGRID, the only error I got was from cancelling the first task because I had a GeForce 8400GS and tried GPU grid (until I read the GPU supported for this project and really, the speed it was processing, was ridiculous (like .04% in an hour or so). keep in mind that this graphic card is 655/1408/2100 in comparison to the stock one of 576/1240/1998 I have to recognize tough that in Seti@Home it produced a lot of errors in comparison to when I was using the 8400GS for that project where i didn't have errors or maybe one or two only. Altough I had a lot of tasks turned in fast, I got a lot of errors some times, like 2 to 5 in a row. |
ocgbargasSend message Joined: 18 Jun 09 Posts: 12 Credit: 4,327,530 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Hello everyone. First ask forgiveness for my English, as it is translated with google. Again I post because I still can not make any GPUGRID unit. I've marked this post http://www.gpugrid.net/forum_thread.php?id=1172&nowrap=true#10792 and still the same problem. I have tried all these drivers 181.20, 185.85, 186.18, 190.38, 190.6 and all these versions of boinc 6.6.20, 6.6.28 and 6.6.36. The graph is a Zotac gtx 260 (216) with the values of manufactures 576/1242/999. The computer is a i7 920 with 6 gb ram on gigabyte board. OS win7 64. The temperature of the graph does not exceed 70 º. I know that are having more trouble with the 260 than with any but not normal since I have been more than 1 month with folding Collanzo or task and not one has given me error. I squeezed the playing card for over 4 hours to games that require it the most and not a single error. Something is happening and are not resolved. The error always comes from the same site, the famous nvlddmkm you see in the event viewer. I hope I can help. Thanks and best regards. |
JetSend message Joined: 14 Jun 09 Posts: 25 Credit: 5,835,455 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I was make some analyses of fails, basically, no system. Should say, that the key problem is OC'ing, the consequence of this is slight overheating (close to the edge of stability), than, probably, power surge on the edge of the load. Small spike is enough for fail if your cards are running very close to the max power rate of the power supply. In short words with facts: 1. PowerLux power supply, rated 750 watt. 2. 3 x GTX 260 Matrix by ASUS, with two fans & heat pipe system. 3. Intel Quad Core Q 9550, om ASUS WS Evolution board, + 10 % OC'ed. Running MW on all 4 cores, so 100+% of the power load. 4. Manually OC'ed from 576 Mhz gpu / 999 mhz mem to 756 Mhz gpu / 1111 Mhz mem. 5. Cards are sitting very close to each other. The card in the middle, due to the lack of the incoming air, are normally 57-60C. This to high, that the reason for additional external big fan mounted over it + constantly running room conditioning system with a 23C level. 6. Every day I've one typical error: "redundant result" or "computation error", some times this errors are combined, sometimes - not. 7.Additionally should state, that alt GTX 260 are not mechanically fixed in the slots, just used their weight to be fixed on the MoBo. So, could be some pure mechanical \ misconducting reasons, as well. So, in general, taking a.m. facts into consideration, described system should be an error generator, but in fact - not so. Regards, Jet |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
I have added cause 5 to the starting message. gdf 5) POOR DRIVER INSTALLATION SYNTHOMS: You can't run any workunits at all and the application crashes immediately. This is ofter a problems for Windows users. SOLUTIONS. Reinstall the drivers in a proper way. Try this: http://www.gpugrid.net/forum_thread.php?id=1293 |
ocgbargasSend message Joined: 18 Jun 09 Posts: 12 Credit: 4,327,530 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I have always followed these steps to uninstall-install the drivers. Restarting a test mode failures, driver and uninstall the program you step driver sweeper, cleaning all that is nvidia. Step ccleaner to clean debris. Reboot again to test failure mode and install new driver. Reboot again and normal. A greeting. |
|
Send message Joined: 18 Feb 09 Posts: 12 Credit: 13,624,069 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have added cause 5 to the starting message. I suggest linking directly to http://www.guru3d.com/category/driversweeper/ Will save a lot of people the trouble of reading through loads of text, erm. it would be more efficient. The thread itself doesn't include any practical info, other than the link for the unexperienced. |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Done. gdf |
|
Send message Joined: 16 Apr 09 Posts: 163 Credit: 921,733,849 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Nice troubleshooting note. As a follow up, I've one workstation that has started erroring out on GPUGrid (but not Collatz) in the last couple of days. It is the only workstation I have with a GTS 250. Running Windows XP, on an AMD 945. Some comments to the troubleshooting note for this workstation.
|
|
Send message Joined: 25 Oct 08 Posts: 42 Credit: 42,812,268 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Always the same problem at home, with my GTX260 (216) and one 8800 gt on Intel 8400. Vista 32, driver 182.5 and boinc 6.6.36 13/09/2009 16:32:48 GPUGRID Computation for task 225-GIANNI_BIND001-24-100-RND7793_0 finished |
|
Send message Joined: 16 Apr 09 Posts: 163 Credit: 921,733,849 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
One of the other components of troubleshooting for root causes probably should include a look at the work units being sent. When computation errors go from say one in 25 (still too high) to 1 in 4 or 1 in 3, with no change on the end user hardware or software configuration, it strikes me that another variable should be considered. So far, it seems that possible problem source is not being considered, and frankly, from the end user point of view, there is nothing the end user can do to address it. |
ocgbargasSend message Joined: 18 Jun 09 Posts: 12 Credit: 4,327,530 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I always and state medical projects and I would love to continue with GPUGRID, but after the last update made grpugrid cuda and I have not able to make a single unit of this project. I have my card in Collanzo processing that is not a project but I especially like doing that better than this to stand. I would like to sacasen as a solution because we are many people that this problem is happening to us that no processing occurs in other projects. A greeting. |
©2025 Universitat Pompeu Fabra