Advanced search

Message boards : Graphics cards (GPUs) : GA Errors

Author Message
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15912 - Posted: 22 Mar 2010 | 14:13:10 UTC

Had a rare Invalid result today for task f1308-TONI_GA6-0-1-RND2170. So here are some details just in case there is an issue:

21 Mar 2010 17:31:52 UTC 22 Mar 2010 11:09:14 UTC Error while computing 35,624.88 4,521.77 5,830.52 --- ACEMD - GPU molecular dynamics v6.03 (cuda)
The task may have errored around the time it finished, or was due to finish.

The WU was sent out 5 times with 3 errors, so far.

Name f1308-TONI_GA6-0-1-RND2170_2 Workunit1272759 Created 21 Mar 2010 17:20:06 UTC Sent21 Mar 2010 17:31:52 UTC Received 22 Mar 2010 11:09:14 UTC Server stateOver OutcomeClient error Client state Compute error Exit status1 (0x1) Computer ID 55951 Report deadline 26 Mar 2010 17:31:52 UTC Run time 35624.884787 CPU time 4521.767 stderr out

<core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# There is 1 device supporting CUDA
# Device 0: "GeForce GT 240"
# Clock rate: 1.49 GHz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 12
# Number of cores: 96
MDIO ERROR: cannot open file "restart.coor"
# There is 1 device supporting CUDA
# Device 0: "GeForce GT 240"
# Clock rate: 1.49 GHz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 12
# Number of cores: 96
# There is 1 device supporting CUDA
# Device 0: "GeForce GT 240"
# Clock rate: 1.49 GHz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 12
# Number of cores: 96
# There is 1 device supporting CUDA
# Device 0: "GeForce GT 240"
# Clock rate: 1.49 GHz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 12
# Number of cores: 96
# There is 1 device supporting CUDA
# Device 0: "GeForce GT 240"
# Clock rate: 1.49 GHz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 12
# Number of cores: 96

</stderr_txt>
]]>
Validate stateInvalid Claimed credit5830.51736111111 Granted credit0 application versionACEMD - GPU molecular dynamics v6.03 (cuda)


Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1626
Credit: 9,384,566,723
RAC: 19,075,423
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15916 - Posted: 22 Mar 2010 | 15:23:30 UTC
Last modified: 22 Mar 2010 | 15:25:52 UTC

I lost f1301-TONI_GA4-0-1-RND9765_1 a couple of days ago - about two-thirds of the way through.

The same host completed these tasks successfully before and after the failure:
f1285-TONI_GA4-0-1-RND6458_0
f527-TONI_GA5-0-1-RND8726_2
f838-TONI_GA6-0-1-RND0092_0

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15922 - Posted: 22 Mar 2010 | 17:54:48 UTC - in response to Message 15916.

lots of toni seem to be having trouble ...
error on a normally stable machine ...
http://www.gpugrid.net/workunit.php?wuid=1272937
____________
Thanks - Steve

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 15934 - Posted: 23 Mar 2010 | 10:45:37 UTC - in response to Message 15922.
Last modified: 23 Mar 2010 | 10:46:19 UTC

These are all"GA" workunits. I'll look at the errors and renaming the thread.

Post to thread

Message boards : Graphics cards (GPUs) : GA Errors

//