Message boards :
Graphics cards (GPUs) :
Stability of the WUs
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 18 Nov 09 Posts: 7 Credit: 52,996,450 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
Hey hey, I love the new badge system and so but I am quite worried about something else. Lately an increasing amount of jobs ended with computing errors. At the moment it happens so often that I think I can better put my GPU on another project. I dunno what causes it. I have the latest beta drivers from Nvidia installed here (290.53). It solved crashing of the display driver (caused by the hardware acceleration of FireFox I think). When the display driver crashes the GPUGRID task also crashes (have not seen an exception yet). Besides that, still too often the GPUGRID workunits crash for other reasons like: http://www.gpugrid.net/result.php?resultid=4879675 An incorrect function, how is that possible? I would very much appreciate some effort into getting the workunits more stable. Regards, iconized. |
|
Send message Joined: 15 Jan 10 Posts: 42 Credit: 18,255,462 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Are you overclocking the GPU? |
nenymSend message Joined: 31 Mar 09 Posts: 137 Credit: 1,429,587,071 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Try to change the core clock to 880 - 890 MHz. I had the same problem with factory OCed GTX560Ti. |
|
Send message Joined: 18 Nov 09 Posts: 7 Credit: 52,996,450 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
Yes, factory OC of 900 MHz, quite a bit higher than 822 MHz (the norm). I Will lower it. Thanks for the replies! |
|
Send message Joined: 8 Jan 12 Posts: 20 Credit: 5,132,859 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
At the moment both my cards can't even finish a WU on stock speed while the first few wu's were done with a reasonable OC. FI : Stderr output <core_client_version>6.10.60</core_client_version> <![CDATA[ <message> Het systeem kan het opgegeven pad niet vinden. (0x3) - exit code 3 (0x3) </message> <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 460" # Clock rate: 1.53 GHz # Total amount of global memory: 804847616 bytes # Number of multiprocessors: 7 # Number of cores: 56 MDIO: cannot open file "restart.coor" SWAN: FATAL : swanMemcpyDtoH failed Assertion failed: 0, file swanlib_nv.c, line 390 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. </stderr_txt> ]]> and Stderr output <core_client_version>6.10.60</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 460" # Clock rate: 1.84 GHz # Total amount of global memory: 1073283072 bytes # Number of multiprocessors: 7 # Number of cores: 56 SWAN: Using synchronization method 0 MDIO: cannot open file "restart.coor" # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 460" # Clock rate: 1.80 GHz # Total amount of global memory: 1073283072 bytes # Number of multiprocessors: 7 # Number of cores: 56 SWAN: Using synchronization method 0 MDIO: cannot open file "restart.coor" ERROR: # Energies have become nan called boinc_finish </stderr_txt> ]]> |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
When you start getting errors you should make some observations, temps of GPU, CPU, board, fan speeds, task failure types, system usage at time of failure. There are several generic things you can do, Restart the system (stops system related runaway errors), Increase fan speed / improve ventilation (reduces temps), Free up a CPU core/thread (stops some heartbeat issues), Reduce CPU clocks if the CPU is overclocked (reduces system temperature/motherboard and component overheating issues, especially chipset), Reduce GPU clocks (start by trying to reduce the memory speed, then move onto the GPU if need be, but you shouldn't have to go below 10%) Rollback, reinstall or upgrade drivers, Increase GPU voltage very slightly. If none of these work, there's more, Clean the GPU and system, Reset the Bios, Re-seat the GPU (take it out, reboot, power down, re-seat the GPU), Restore or reinstall the operating system. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 8 Jan 12 Posts: 20 Credit: 5,132,859 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Well, I came from MW@H, with a GTX460FTW/920 and a GTX460SC/835 which also worked fine on S@H, because they ran out off wu's . After a few wu's trouble arose but had done nothing to settings or whatever. No probs with temps, if temps go up I take my compressor and clean the whole lot. So back to stock speed, no results. Tried SWAN_SYNC=0 and freed a core but than I saw "Suspend work when non-BOINC CPU usage is above 25%" which is a bit strange if you set gpu to work always. This "Suspend" comes back irregularly without me having changed a thing. Is getting extremely annoying. Just getting a bit tired of trying everything again and again. Perhaps later I'll try a clean install of everything and replacing a AM2+ mobo by a Asrock 870 Xtreme, 4GB DDR2 by 8 GB DRR3 anmem and a HD. |
|
Send message Joined: 18 Nov 09 Posts: 7 Credit: 52,996,450 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
My factory OC-ed 560 Ti normally: core clock: 900 MHz shader clock: 1800 MHz memory clock: 2004 MHz I have been running a bit lower for a couple of days now: core clock: 800 MHz shader clock: 1600 MHz memory clock: 1800 MHz These are all MSI Afterburner numbers so there might be some rounding errors. I also ran a Video memory Stress test (vmt) with these settings and no problems. I keep getting errors and all the latest errors are with the NATHAN units: http://www.gpugrid.net/results.php?hostid=111996 All latest units producing errors gave this error: Incorrect function. (0x1) - exit code 1 (0x1) I don't have problems with other GPU projects (PrimeGrid, Mfaktc for GIMPS) with the factory OC. So perhaps it is a problem with those work units? |
|
Send message Joined: 8 Jan 12 Posts: 20 Credit: 5,132,859 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
A clean upgrade to xp 64, installing a fresh boinc, 258.96 and the card on stock settings didn't result in no more errors. Tried it before and got errors then too. It looks like GPUGRID is not for me. Fyi : 23 nathan's and 1 gianni. |
|
Send message Joined: 8 Jan 12 Posts: 20 Credit: 5,132,859 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Sjips, my sgs is stuttering |
|
Send message Joined: 21 Dec 11 Posts: 2 Credit: 742,012,065 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have the exact same issue (Assertion failed: 0, file swanlib_nv.c, line 390 ) since I updated to Nvidia's beta drivers (295.51) Everything was working nicely prior to that. I don't believe that's a coincidence. |
|
Send message Joined: 18 Nov 09 Posts: 7 Credit: 52,996,450 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
This is also funny: http://www.gpugrid.net/workunit.php?wuid=3127645 http://www.gpugrid.net/workunit.php?wuid=3127506 But not correlated. |
DamaralandSend message Joined: 7 Nov 09 Posts: 152 Credit: 16,181,924 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have the same problem? with brand new GPU, hardware and distro This was my first WU, but is crunching well at Einstein@Home acemd2_6.14_x86_64-pc-linux-gnu__cuda31: swanlib_nv.c:388: error: Assertion `0' failed. One weird thing I noticed is that the screen had a "scrambled" image. I restated... let's see it tomorrow. |
©2025 Universitat Pompeu Fabra