Error while computing

Message boards : Number crunching : Error while computing
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
slicedbread

Send message
Joined: 23 Jul 09
Posts: 2
Credit: 332,582
RAC: 0
Level

Scientific publications
wat
Message 18146 - Posted: 23 Jul 2010, 15:18:51 UTC - in response to Message 17849.  
Last modified: 23 Jul 2010, 15:31:12 UTC

Try turning off TDR.

http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx


1.make a txt file call it update.reg, make sure it has no txt extension.
2.edit and add these lines.

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers]
"TdrLevel"=dword:00000000

3.run update.reg, select yes when asked to update registry.
4.restart.
ID: 18146 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Snow Crash

Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18147 - Posted: 23 Jul 2010, 16:58:21 UTC - in response to Message 18146.  

Makes for interesting reading ... even though it says specifically to only use these reg keys for testing I wonder if your suggestion of disabling detection and recovery would actually improve performance because it (hopefully) the OS will no longer be spending as many cycles watching what the GPU is doing?

slicedbread ... have you tried this yourself?
Thanks - Steve
ID: 18147 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
slicedbread

Send message
Joined: 23 Jul 09
Posts: 2
Credit: 332,582
RAC: 0
Level

Scientific publications
wat
Message 18150 - Posted: 23 Jul 2010, 19:36:57 UTC - in response to Message 18147.  

Yes, i've tried this because i had errors. works on windows 7.

Not sure if this will give you a performance boost. :/
ID: 18150 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bigtuna
Volunteer moderator

Send message
Joined: 6 May 10
Posts: 80
Credit: 98,784,188
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 18153 - Posted: 24 Jul 2010, 9:13:46 UTC - in response to Message 17849.  

If it starts running instable while the PC is untouched, I was on holiday when this started to happen.... Then it can only be something in GPUGRID causing this. "Error while computing" as error message does not give me any information, so maybe a GPUGRID member can investigate the real reason why the WU's have an error. If it is in my system, I know what I can fix, if it is in GPUGRID, they can fix.

This has effectively already been done. When a work unit fails an identical task is automatically reissued to different computer. Comparing your results to the results of others is an excellent troubleshooting technique. If a work unit fails on your system and also fails on other systems the work unit is most likely "bad". OTOH if a work unit fails on your system but other volunteers complete the work unit without errors the problem is most likely your system.

I don't see the point of running anothter OS especially for GPUGRID. Many other projects (e.g. MilkyWay like my GPU also)....

The point of running a different OS is to differentiate between hardware and software issues. That, and FatDog-64 is totally cool and easy (including the nVidia drivers, they install with a single click). If your system works perfect with one OS and works less than perfect with a different OS it is likely that there is some sort of software issue.

ID: 18153 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
barts

Send message
Joined: 28 Aug 09
Posts: 12
Credit: 4,537,060
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 18220 - Posted: 1 Aug 2010, 13:41:47 UTC - in response to Message 18153.  

So you're asking me to throw away my current OS with my current programs solely for GPU GRID sake. Too bad that most programs I use are not available for linux.

OTOH. System has been running without problems from the beginning. While no hardware changes is done AND no software change is done, only (and solely) GPUGRID) started to run instable. It is a pity that problems are pinpointed to the (volunterring) users. For the next batch of GPU tasks, can you print a message inside the BOINC message list WHY there is an "error in computing"

There is a reason for failing the computation, GPUGRID is able to detect it, and just says "Error in computing"... It would be handy if it says a real reason of the failure instead of a meaningless phrase that does not mean anything to anyone.

"Workunit Corrupt", "NVIDIA Driver incompatible" or another of such message would be at least a little handy.
ID: 18220 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18229 - Posted: 2 Aug 2010, 10:16:46 UTC - in response to Message 18220.  

Barts, more error info would probably help the scientists too.

GPUGrid has to use NVidia drivers, CUDA from NVidia and Boinc. If there is a problem with the drivers, a CUDA bug or an issue with Boinc it makes things difficult to trace and fix.

Differences in card designs also makes it more difficult, so one GTX275 will work fine, but another fails tasks and the only differences seems to be the amount of RAM on the card. Under Win7 my Palit GTX260-216 worked, then started to fail more and more task types (no matter which driver I used); possibly a CUDA bug. When I installed XP it worked fine again and when I installed Linux it ran equally well.

You could dual boot the system with Linux, all you need is a Linux CD and some space on your existing drive or a USB stick.

I would first try the latest Boinc Beta version along with the latest drivers; the Boinc Beta says it fixed a CUDA leak so it might help.
ID: 18229 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
barts

Send message
Joined: 28 Aug 09
Posts: 12
Credit: 4,537,060
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 18238 - Posted: 3 Aug 2010, 13:25:51 UTC - in response to Message 18229.  

I know all about being able to do dual boot, but it won't be more than just a test adding another 'PC' into my account with again another starting date etc.

I will give the beta boinc a try... meanwhile I just leave my OS as it is, my PC is not dedicated GPUGRID only, I use it for other things too
ID: 18238 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
barts

Send message
Joined: 28 Aug 09
Posts: 12
Credit: 4,537,060
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 18250 - Posted: 5 Aug 2010, 18:30:09 UTC - in response to Message 18238.  

The beta also not works.

Milkyway = Running correct - no failures
Collatz Conjuncture = Running correct - no failures
GPUGGRID = Failing 85% of the WU's

For me 1+1=2... there must be something wrong in GPUGRID

Back to the latests release version of BOINC. the beta does not show the message tab anymore which makes if even more hard to find out some info if there are failures or not

Anyone having options to get GPUGRID running again as it was running before may 2010 (as before that time it was running correct !! - and no - it did not start failing because of changes in PC OS, SW or drivers as it started failing during a holiday !!)
ID: 18250 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Snow Crash

Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18251 - Posted: 5 Aug 2010, 21:14:29 UTC - in response to Message 18250.  

The beta also not works.

Milkyway = Running correct - no failures
Collatz Conjuncture = Running correct - no failures
GPUGGRID = Failing 85% of the WU's

For me 1+1=2... there must be something wrong in GPUGRID

Back to the latests release version of BOINC. the beta does not show the message tab anymore which makes if even more hard to find out some info if there are failures or not

Anyone having options to get GPUGRID running again as it was running before may 2010 (as before that time it was running correct !! - and no - it did not start failing because of changes in PC OS, SW or drivers as it started failing during a holiday !!)


I can understand you frustration but if you take a look through the "Top Hosts" listing you can find lots of 275 cards that are returning error free.

Not only that but the very WUs that are erroring on your machine are completing sucessfully on others.

Maybe your card is starting to go bad? Milkyway and Collatz do not exercise your card as much as GPUGrid so I don't think they are good bellweathers for determining a card's functionality/ stability

Have you tried running anay of the standard GPU benchmark program lately?
Furmark, OCCT, etc.
Thanks - Steve
ID: 18251 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jjwhalen

Send message
Joined: 23 Nov 09
Posts: 29
Credit: 17,591,899
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18252 - Posted: 5 Aug 2010, 21:15:36 UTC

In case anyone is tracking broken workunits, taskID 2778863, a TONI_CAPBIND, threw an unhandled exception after 1.01sec. I see it also crashed on (all 5) other hosts. The stderr looks very complete, including runtime debugger output.

This is the first WU crash I've had since upgrading to a GTX 465SC and figuring out what overclock was tolerable. The computerID is 57387.
ID: 18252 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18256 - Posted: 5 Aug 2010, 23:46:43 UTC - in response to Message 18252.  
Last modified: 5 Aug 2010, 23:51:52 UTC

barts, the only way you are going to know for sure if your card is stuffed is if you try it on Linux or XP running GPUGrid tasks; a 7min task elsewhere will not tell you much.

jjwhalen,
6 Failures now, so it is a bad task/bug:

errors Too many errors (may have bug)
ID: 18256 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile KPX

Send message
Joined: 29 Sep 09
Posts: 5
Credit: 116,222,589
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18322 - Posted: 11 Aug 2010, 15:45:16 UTC

I have this "Error while computing" problem as well. In my case, it seems GPUGrid is not detecting my graphics card... I thought installing the latest nVidia driver would fix this, but it didn't. Any idea what's wrong? I am posting the failed WU details, and the computer details below that:
-------------------------------------------------------------------------------
Name h232f99r168-TONI_CAPBINDsp2-72-100-RND1083_0
Workunit 1789399
Created 11 Aug 2010 5:21:12 UTC
Sent 11 Aug 2010 5:47:17 UTC
Received 11 Aug 2010 5:48:51 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -40 (0xffffffffffffffd8)
Computer ID 71984
Report deadline 16 Aug 2010 5:47:17 UTC
Run time 0
CPU time 0
stderr out

<core_client_version>6.10.57</core_client_version>
<![CDATA[
<message>
- exit code -40 (0xffffffd8)
</message>
<stderr_txt>
# Using device 0
# There is no device supporting CUDA.
# Device 0: "Device Emulation (CPU)"
# Clock rate: 1.35 GHz
# Total amount of global memory: -1 bytes
# Number of multiprocessors: 16
# Number of cores: 128
SWAN: FATAL : No device found

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 0
Granted credit 0
application version ACEMD2: GPU molecular dynamics v6.05 (cuda)

-------------------------------------------------------------------------------
CPU type GenuineIntel
Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz [Family 6 Model 23 Stepping 10]
Number of processors 4
Coprocessors NVIDIA GeForce GT 240 (474MB) driver: 25896
Operating System Microsoft Windows 7
Ultimate x64 Edition, (06.01.7600.00)
BOINC client version 6.10.57
Memory 4095.12 MB
Cache 6144 KB
Swap space 8188.38 MB
Total disk space 149.05 GB
Free Disk Space 101.51 GB
Measured floating point speed 2849.9 million ops/sec
Measured integer speed 8782.37 million ops/sec
Average upload rate 32.48 KB/sec
Average download rate 300.82 KB/sec
Average turnaround time 0.97 days
Maximum daily WU quota per CPU 1/day
Tasks 33
Number of times client has contacted server 286
Last time contacted server 11 Aug 2010 5:48:51 UTC
% of time BOINC client is running 99.9352 %
While BOINC running, % of time host has an Internet connection 100 %
While BOINC running, % of time work is allowed 99.9917 %
Task duration correction factor 2.510605
ID: 18322 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18323 - Posted: 11 Aug 2010, 16:24:11 UTC - in response to Message 18322.  
Last modified: 11 Aug 2010, 16:27:48 UTC

Your GT240 has 96shaders and not 128, so the driver that is installed needs to be uninstalled.
Then restart in Safe Mode and install the correct driver.
After that restart again.

-Update Boinc while you are at it.
ID: 18323 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile KPX

Send message
Joined: 29 Sep 09
Posts: 5
Credit: 116,222,589
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18334 - Posted: 13 Aug 2010, 0:40:28 UTC - in response to Message 18323.  

You are right, the number of shaders is detected incorrectly. But what do you mean by correct driver? I have installed the latest one from the nvidia website... why is that not correct?
ID: 18334 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18337 - Posted: 13 Aug 2010, 9:41:45 UTC - in response to Message 18334.  
Last modified: 13 Aug 2010, 9:46:03 UTC

I see you have not updated Boinc yet and still have 112 shaders.

Uninstall Boinc, restart, uninstall the present (Probably corrupt) driver, restart to Safe Mode. Install the latest (25896) driver. Restart, install Boinc and restart again before trying any tasks.
ID: 18337 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Terry

Send message
Joined: 9 Mar 09
Posts: 1
Credit: 42,239
RAC: 0
Level

Scientific publications
wat
Message 18384 - Posted: 22 Aug 2010, 4:54:46 UTC - in response to Message 18337.  

I'm getting computational errors now as well on my win7 64 bit machine, I believe this just started. I'll let the project run a few more days and if it continues then I'll just drop the project. It's not worth the hassle for me to trouble shoot this since these are home computers that I set up to run projects while not in use.

You want to provide additional information in the information error I'd be happy to post what I get.

Regards.
ID: 18384 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18385 - Posted: 22 Aug 2010, 15:58:40 UTC - in response to Message 18384.  
Last modified: 22 Aug 2010, 16:21:36 UTC

You have a G210M graphics card.
With only 16 shaders this card is not up to running GPUGRID tasks - even if it did not crash tasks it would probably take 4days to complete.
You should stop trying to use it with GPUGRID as all your tasks are failing and the card is too slow to complete in a reasonable time.
It may be of some use to other GPU projects (SETI, Einstein, Folding@home, Collatz) but not all; it will not work on MilkyWay.
ID: 18385 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18757 - Posted: 23 Sep 2010, 1:37:35 UTC
Last modified: 23 Sep 2010, 1:38:56 UTC

One idea on a possible cause for the errors: On my computer, they appear to happen only if all three of the following programs are running at once:


A GPUGRID workunit.

Norton Internet Security 2010, in full scan mode, especially if manually started in this mode. BOINC directories excluded from scanning.

Windows Live Mail version 2009 (Build 14.0.8117.0416) - the current version for 64-bit Vista; in newsgroups mode.


When the error occurs, many flashing dots appear on the screen - too many to read the screen well; and the GPUGRID workunit tries to restart but eventually fails.

How close is this combination to what others are running when they see failures?

9/21/2010 3:06:14 PM Starting BOINC client version 6.10.56 for windows_x86_64
9/21/2010 3:06:14 PM log flags: file_xfer, sched_ops, task
9/21/2010 3:06:14 PM Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
9/21/2010 3:06:14 PM Data directory: C:\ProgramData\BOINC
9/21/2010 3:06:14 PM Running under account Bobby
9/21/2010 3:06:16 PM Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz [Family 6 Model 23 Stepping 10]
9/21/2010 3:06:16 PM Processor: 6.00 MB cache
9/21/2010 3:06:16 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 syscall nx lm vmx smx tm2 pbe
9/21/2010 3:06:16 PM OS: Microsoft Windows Vista: Home Premium x64 Edition, Service Pack 2, (06.00.6002.00)
9/21/2010 3:06:16 PM Memory: 8.00 GB physical, 16.11 GB virtual
9/21/2010 3:06:16 PM Disk: 919.67 GB total, 723.13 GB free
9/21/2010 3:06:16 PM Local time is UTC -5 hours
9/21/2010 3:06:42 PM NVIDIA GPU 0: GeForce 9800 GT (driver version 19621, CUDA version 3000, compute capability 1.1, 1024MB, 336 GFLOPS peak)
9/21/2010 3:06:43 PM GPUGRID URL http://www.gpugrid.net/; Computer ID 48221; resource share 35

About a dozen other BOINC projects, but all other GPU-using projects disabled when the errors occurred.
ID: 18757 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy

Send message
Joined: 19 Aug 07
Posts: 46
Credit: 45,339,082
RAC: 34
Level
Val
Scientific publications
watwatwatwatwatwatwat
Message 19102 - Posted: 29 Oct 2010, 21:51:35 UTC

I had a task p35-IBUCH_1_TRYP_101025-3-4-RND1655_0 fail after 4.38 hours with the following errors MDIO ERROR: cannot open file "restart.coor" ERROR: file tclutil.cpp line 31: get_Dvec() element 0 (b)
called boinc_finish. I'm running Win7 64 bit Boinc 6.10.58 with a GTX 470 driver 260.89. Link to result3205760 Exit status 98 (0x62)
ID: 19102 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19105 - Posted: 30 Oct 2010, 0:04:16 UTC - in response to Message 19102.  

Update to 26099 from 26089 - different issue but you should still do it.

Dont know the reason for this specific IBUCH error; only one of the scientist could tell you (unless it is a driver issue).

You might want to read this thread, http://www.gpugrid.net/forum_thread.php?id=2123

GPU crunching is folly at times, better luck with your next task.
ID: 19105 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Error while computing

©2025 Universitat Pompeu Fabra