Message boards :
Number crunching :
Error while computing
Message board moderation
Previous · 1 · 2 · 3 · Next
| Author | Message |
|---|---|
|
Send message Joined: 23 Jul 09 Posts: 2 Credit: 332,582 RAC: 0 Level ![]() Scientific publications
|
Try turning off TDR. http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx 1.make a txt file call it update.reg, make sure it has no txt extension. 2.edit and add these lines. [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers] "TdrLevel"=dword:00000000 3.run update.reg, select yes when asked to update registry. 4.restart. |
|
Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Makes for interesting reading ... even though it says specifically to only use these reg keys for testing I wonder if your suggestion of disabling detection and recovery would actually improve performance because it (hopefully) the OS will no longer be spending as many cycles watching what the GPU is doing? slicedbread ... have you tried this yourself? Thanks - Steve |
|
Send message Joined: 23 Jul 09 Posts: 2 Credit: 332,582 RAC: 0 Level ![]() Scientific publications
|
Yes, i've tried this because i had errors. works on windows 7. Not sure if this will give you a performance boost. :/ |
|
Send message Joined: 6 May 10 Posts: 80 Credit: 98,784,188 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
If it starts running instable while the PC is untouched, I was on holiday when this started to happen.... Then it can only be something in GPUGRID causing this. "Error while computing" as error message does not give me any information, so maybe a GPUGRID member can investigate the real reason why the WU's have an error. If it is in my system, I know what I can fix, if it is in GPUGRID, they can fix. This has effectively already been done. When a work unit fails an identical task is automatically reissued to different computer. Comparing your results to the results of others is an excellent troubleshooting technique. If a work unit fails on your system and also fails on other systems the work unit is most likely "bad". OTOH if a work unit fails on your system but other volunteers complete the work unit without errors the problem is most likely your system. I don't see the point of running anothter OS especially for GPUGRID. Many other projects (e.g. MilkyWay like my GPU also).... The point of running a different OS is to differentiate between hardware and software issues. That, and FatDog-64 is totally cool and easy (including the nVidia drivers, they install with a single click). If your system works perfect with one OS and works less than perfect with a different OS it is likely that there is some sort of software issue. |
|
Send message Joined: 28 Aug 09 Posts: 12 Credit: 4,537,060 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
So you're asking me to throw away my current OS with my current programs solely for GPU GRID sake. Too bad that most programs I use are not available for linux. OTOH. System has been running without problems from the beginning. While no hardware changes is done AND no software change is done, only (and solely) GPUGRID) started to run instable. It is a pity that problems are pinpointed to the (volunterring) users. For the next batch of GPU tasks, can you print a message inside the BOINC message list WHY there is an "error in computing" There is a reason for failing the computation, GPUGRID is able to detect it, and just says "Error in computing"... It would be handy if it says a real reason of the failure instead of a meaningless phrase that does not mean anything to anyone. "Workunit Corrupt", "NVIDIA Driver incompatible" or another of such message would be at least a little handy. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Barts, more error info would probably help the scientists too. GPUGrid has to use NVidia drivers, CUDA from NVidia and Boinc. If there is a problem with the drivers, a CUDA bug or an issue with Boinc it makes things difficult to trace and fix. Differences in card designs also makes it more difficult, so one GTX275 will work fine, but another fails tasks and the only differences seems to be the amount of RAM on the card. Under Win7 my Palit GTX260-216 worked, then started to fail more and more task types (no matter which driver I used); possibly a CUDA bug. When I installed XP it worked fine again and when I installed Linux it ran equally well. You could dual boot the system with Linux, all you need is a Linux CD and some space on your existing drive or a USB stick. I would first try the latest Boinc Beta version along with the latest drivers; the Boinc Beta says it fixed a CUDA leak so it might help. |
|
Send message Joined: 28 Aug 09 Posts: 12 Credit: 4,537,060 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I know all about being able to do dual boot, but it won't be more than just a test adding another 'PC' into my account with again another starting date etc. I will give the beta boinc a try... meanwhile I just leave my OS as it is, my PC is not dedicated GPUGRID only, I use it for other things too |
|
Send message Joined: 28 Aug 09 Posts: 12 Credit: 4,537,060 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
The beta also not works. Milkyway = Running correct - no failures Collatz Conjuncture = Running correct - no failures GPUGGRID = Failing 85% of the WU's For me 1+1=2... there must be something wrong in GPUGRID Back to the latests release version of BOINC. the beta does not show the message tab anymore which makes if even more hard to find out some info if there are failures or not Anyone having options to get GPUGRID running again as it was running before may 2010 (as before that time it was running correct !! - and no - it did not start failing because of changes in PC OS, SW or drivers as it started failing during a holiday !!) |
|
Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The beta also not works. I can understand you frustration but if you take a look through the "Top Hosts" listing you can find lots of 275 cards that are returning error free. Not only that but the very WUs that are erroring on your machine are completing sucessfully on others. Maybe your card is starting to go bad? Milkyway and Collatz do not exercise your card as much as GPUGrid so I don't think they are good bellweathers for determining a card's functionality/ stability Have you tried running anay of the standard GPU benchmark program lately? Furmark, OCCT, etc. Thanks - Steve |
|
Send message Joined: 23 Nov 09 Posts: 29 Credit: 17,591,899 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
In case anyone is tracking broken workunits, taskID 2778863, a TONI_CAPBIND, threw an unhandled exception after 1.01sec. I see it also crashed on (all 5) other hosts. The stderr looks very complete, including runtime debugger output. This is the first WU crash I've had since upgrading to a GTX 465SC and figuring out what overclock was tolerable. The computerID is 57387.
|
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
barts, the only way you are going to know for sure if your card is stuffed is if you try it on Linux or XP running GPUGrid tasks; a 7min task elsewhere will not tell you much. jjwhalen, 6 Failures now, so it is a bad task/bug: errors Too many errors (may have bug) |
KPXSend message Joined: 29 Sep 09 Posts: 5 Credit: 116,222,589 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have this "Error while computing" problem as well. In my case, it seems GPUGrid is not detecting my graphics card... I thought installing the latest nVidia driver would fix this, but it didn't. Any idea what's wrong? I am posting the failed WU details, and the computer details below that: ------------------------------------------------------------------------------- Name h232f99r168-TONI_CAPBINDsp2-72-100-RND1083_0 Workunit 1789399 Created 11 Aug 2010 5:21:12 UTC Sent 11 Aug 2010 5:47:17 UTC Received 11 Aug 2010 5:48:51 UTC Server state Over Outcome Client error Client state Compute error Exit status -40 (0xffffffffffffffd8) Computer ID 71984 Report deadline 16 Aug 2010 5:47:17 UTC Run time 0 CPU time 0 stderr out <core_client_version>6.10.57</core_client_version> <![CDATA[ <message> - exit code -40 (0xffffffd8) </message> <stderr_txt> # Using device 0 # There is no device supporting CUDA. # Device 0: "Device Emulation (CPU)" # Clock rate: 1.35 GHz # Total amount of global memory: -1 bytes # Number of multiprocessors: 16 # Number of cores: 128 SWAN: FATAL : No device found </stderr_txt> ]]> Validate state Invalid Claimed credit 0 Granted credit 0 application version ACEMD2: GPU molecular dynamics v6.05 (cuda) ------------------------------------------------------------------------------- CPU type GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz [Family 6 Model 23 Stepping 10] Number of processors 4 Coprocessors NVIDIA GeForce GT 240 (474MB) driver: 25896 Operating System Microsoft Windows 7 Ultimate x64 Edition, (06.01.7600.00) BOINC client version 6.10.57 Memory 4095.12 MB Cache 6144 KB Swap space 8188.38 MB Total disk space 149.05 GB Free Disk Space 101.51 GB Measured floating point speed 2849.9 million ops/sec Measured integer speed 8782.37 million ops/sec Average upload rate 32.48 KB/sec Average download rate 300.82 KB/sec Average turnaround time 0.97 days Maximum daily WU quota per CPU 1/day Tasks 33 Number of times client has contacted server 286 Last time contacted server 11 Aug 2010 5:48:51 UTC % of time BOINC client is running 99.9352 % While BOINC running, % of time host has an Internet connection 100 % While BOINC running, % of time work is allowed 99.9917 % Task duration correction factor 2.510605 |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Your GT240 has 96shaders and not 128, so the driver that is installed needs to be uninstalled. Then restart in Safe Mode and install the correct driver. After that restart again. -Update Boinc while you are at it. |
KPXSend message Joined: 29 Sep 09 Posts: 5 Credit: 116,222,589 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
You are right, the number of shaders is detected incorrectly. But what do you mean by correct driver? I have installed the latest one from the nvidia website... why is that not correct? |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I see you have not updated Boinc yet and still have 112 shaders. Uninstall Boinc, restart, uninstall the present (Probably corrupt) driver, restart to Safe Mode. Install the latest (25896) driver. Restart, install Boinc and restart again before trying any tasks. |
|
Send message Joined: 9 Mar 09 Posts: 1 Credit: 42,239 RAC: 0 Level ![]() Scientific publications
|
I'm getting computational errors now as well on my win7 64 bit machine, I believe this just started. I'll let the project run a few more days and if it continues then I'll just drop the project. It's not worth the hassle for me to trouble shoot this since these are home computers that I set up to run projects while not in use. You want to provide additional information in the information error I'd be happy to post what I get. Regards. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
You have a G210M graphics card. With only 16 shaders this card is not up to running GPUGRID tasks - even if it did not crash tasks it would probably take 4days to complete. You should stop trying to use it with GPUGRID as all your tasks are failing and the card is too slow to complete in a reasonable time. It may be of some use to other GPU projects (SETI, Einstein, Folding@home, Collatz) but not all; it will not work on MilkyWay. |
robertmilesSend message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
One idea on a possible cause for the errors: On my computer, they appear to happen only if all three of the following programs are running at once: A GPUGRID workunit. Norton Internet Security 2010, in full scan mode, especially if manually started in this mode. BOINC directories excluded from scanning. Windows Live Mail version 2009 (Build 14.0.8117.0416) - the current version for 64-bit Vista; in newsgroups mode. When the error occurs, many flashing dots appear on the screen - too many to read the screen well; and the GPUGRID workunit tries to restart but eventually fails. How close is this combination to what others are running when they see failures? 9/21/2010 3:06:14 PM Starting BOINC client version 6.10.56 for windows_x86_64 9/21/2010 3:06:14 PM log flags: file_xfer, sched_ops, task 9/21/2010 3:06:14 PM Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3 9/21/2010 3:06:14 PM Data directory: C:\ProgramData\BOINC 9/21/2010 3:06:14 PM Running under account Bobby 9/21/2010 3:06:16 PM Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz [Family 6 Model 23 Stepping 10] 9/21/2010 3:06:16 PM Processor: 6.00 MB cache 9/21/2010 3:06:16 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 syscall nx lm vmx smx tm2 pbe 9/21/2010 3:06:16 PM OS: Microsoft Windows Vista: Home Premium x64 Edition, Service Pack 2, (06.00.6002.00) 9/21/2010 3:06:16 PM Memory: 8.00 GB physical, 16.11 GB virtual 9/21/2010 3:06:16 PM Disk: 919.67 GB total, 723.13 GB free 9/21/2010 3:06:16 PM Local time is UTC -5 hours 9/21/2010 3:06:42 PM NVIDIA GPU 0: GeForce 9800 GT (driver version 19621, CUDA version 3000, compute capability 1.1, 1024MB, 336 GFLOPS peak) 9/21/2010 3:06:43 PM GPUGRID URL http://www.gpugrid.net/; Computer ID 48221; resource share 35 About a dozen other BOINC projects, but all other GPU-using projects disabled when the errors occurred. |
|
Send message Joined: 19 Aug 07 Posts: 46 Credit: 45,339,082 RAC: 34 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
I had a task p35-IBUCH_1_TRYP_101025-3-4-RND1655_0 fail after 4.38 hours with the following errors MDIO ERROR: cannot open file "restart.coor" ERROR: file tclutil.cpp line 31: get_Dvec() element 0 (b) called boinc_finish. I'm running Win7 64 bit Boinc 6.10.58 with a GTX 470 driver 260.89. Link to result3205760 Exit status 98 (0x62) |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Update to 26099 from 26089 - different issue but you should still do it. Dont know the reason for this specific IBUCH error; only one of the scientist could tell you (unless it is a driver issue). You might want to read this thread, http://www.gpugrid.net/forum_thread.php?id=2123 GPU crunching is folly at times, better luck with your next task. |
©2025 Universitat Pompeu Fabra