Workunit failures

Author	Message
ianmbaker Send message Joined: 23 Jul 08 Posts: 2 Credit: 1,015,635 RAC: 0 Level Scientific publications	Message 5877 - Posted: 22 Jan 2009, 19:25:27 UTC SInce my system has downloaded Ver 6.61 all of the workunits have failed. The Messages from BOIC say:- 22/01/2009 18:43:19\|GPUGRID\|Sending scheduler request: To fetch work. Requesting 3021 seconds of work, reporting 1 completed tasks 22/01/2009 18:43:24\|GPUGRID\|Scheduler request completed: got 1 new tasks 22/01/2009 18:43:26\|GPUGRID\|Started download of kA28006-SH2_US_4-0-10-SH2_US_41140000-LICENSE 22/01/2009 18:43:26\|GPUGRID\|Started download of kA28006-SH2_US_4-0-10-SH2_US_41140000-COPYRIGHT 22/01/2009 18:43:27\|GPUGRID\|Finished download of kA28006-SH2_US_4-0-10-SH2_US_41140000-LICENSE 22/01/2009 18:43:27\|GPUGRID\|Finished download of kA28006-SH2_US_4-0-10-SH2_US_41140000-COPYRIGHT 22/01/2009 18:43:27\|GPUGRID\|Started download of kA28006-SH2_US_4-0-10-SH2_US_41140000-smd.1140000.coor 22/01/2009 18:43:27\|GPUGRID\|Started download of kA28006-SH2_US_4-0-10-SH2_US_41140000-smd.1140000.vel 22/01/2009 18:43:28\|GPUGRID\|Finished download of kA28006-SH2_US_4-0-10-SH2_US_41140000-smd.1140000.vel 22/01/2009 18:43:28\|GPUGRID\|Started download of kA28006-SH2_US_4-0-10-SH2_US_41140000-input.idx 22/01/2009 18:43:29\|GPUGRID\|Finished download of kA28006-SH2_US_4-0-10-SH2_US_41140000-input.idx 22/01/2009 18:43:29\|GPUGRID\|Started download of kA28006-SH2_US_4-0-10-SH2_US_41140000-complex_full.sol.ionized.pdb 22/01/2009 18:43:38\|GPUGRID\|Finished download of kA28006-SH2_US_4-0-10-SH2_US_41140000-smd.1140000.coor 22/01/2009 18:43:38\|GPUGRID\|Started download of kA28006-SH2_US_4-0-10-SH2_US_41140000-complex_full.sol.ionized.psf 22/01/2009 18:43:56\|GPUGRID\|Finished download of kA28006-SH2_US_4-0-10-SH2_US_41140000-complex_full.sol.ionized.pdb 22/01/2009 18:43:56\|GPUGRID\|Started download of kA28006-SH2_US_4-0-10-SH2_US_41140000-parameters 22/01/2009 18:43:59\|GPUGRID\|Finished download of kA28006-SH2_US_4-0-10-SH2_US_41140000-parameters 22/01/2009 18:43:59\|GPUGRID\|Started download of kA28006-SH2_US_4-0-10-SH2_US_41140000-SH2_US_41140000 22/01/2009 18:44:00\|GPUGRID\|Finished download of kA28006-SH2_US_4-0-10-SH2_US_41140000-SH2_US_41140000 22/01/2009 18:44:18\|GPUGRID\|Finished download of kA28006-SH2_US_4-0-10-SH2_US_41140000-complex_full.sol.ionized.psf 22/01/2009 18:44:20\|GPUGRID\|Starting kA28006-SH2_US_4-0-10-SH2_US_41140000_1 22/01/2009 18:44:20\|GPUGRID\|Starting task kA28006-SH2_US_4-0-10-SH2_US_41140000_1 using acemd version 661 22/01/2009 18:44:23\|GPUGRID\|Computation for task kA28006-SH2_US_4-0-10-SH2_US_41140000_1 finished 22/01/2009 18:44:23\|GPUGRID\|Output file kA28006-SH2_US_4-0-10-SH2_US_41140000_1_1 for task kA28006-SH2_US_4-0-10-SH2_US_41140000_1 absent 22/01/2009 18:44:23\|GPUGRID\|Output file kA28006-SH2_US_4-0-10-SH2_US_41140000_1_2 for task kA28006-SH2_US_4-0-10-SH2_US_41140000_1 absent 22/01/2009 18:44:23\|GPUGRID\|Output file kA28006-SH2_US_4-0-10-SH2_US_41140000_1_3 for task kA28006-SH2_US_4-0-10-SH2_US_41140000_1 absent 22/01/2009 18:44:25\|GPUGRID\|Started upload of kA28006-SH2_US_4-0-10-SH2_US_41140000_1_0 22/01/2009 18:44:27\|GPUGRID\|Finished upload of kA28006-SH2_US_4-0-10-SH2_US_41140000_1_0 Similar messages for every work unit which downloaded and failed. There were no problems with the previous 6.55 version. Any ideas on how I can get crunching again? Ian ID: 5877 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 5880 - Posted: 22 Jan 2009, 20:31:13 UTC - in response to Message 5877. Strange, your error message is MDIO ERROR: read error for file "input.vel", byte number 4: number of atoms (1625495040) != (39910) expected ERROR: mdioload.cu, line 146: Unable to read binvelfile Sorry, I've never seen this message before and don't know what you could do. MrS Scanning for our furry friends since Jan 2002 ID: 5880 · Rating: 0 · rate: / Reply Quote

PeteS Send message Joined: 1 Jan 09 Posts: 7 Credit: 3,602,175 RAC: 0 Level Scientific publications	Message 5882 - Posted: 22 Jan 2009, 21:06:20 UTC - in response to Message 5877. I also had all workunits failing, finally I was told that daily quota of 8WU's has been exceeded and was not allocated any more. You can see the failed WU's from here: http://www.gpugrid.net/results.php?userid=12774 ID: 5882 · Rating: 0 · rate: / Reply Quote

UL1 Send message Joined: 16 Sep 07 Posts: 56 Credit: 35,013,195 RAC: 0 Level Scientific publications	Message 5884 - Posted: 22 Jan 2009, 21:09:25 UTC - in response to Message 5880. Got almost 15 WUs now erroring out with exact the same error message that ianmbaker got...but me am running LINUX with app version 6.59... Any ideas what's going wrong here ? ID: 5884 · Rating: 0 · rate: / Reply Quote

Donnie Send message Joined: 13 Nov 08 Posts: 11 Credit: 11,185,470 RAC: 0 Level Scientific publications	Message 5886 - Posted: 22 Jan 2009, 22:55:41 UTC I had 9 WUs also error out. They appear to have been back to back starting at 18:21:28 UTC thru 18:30:37 UTC. I have 1 "good" 6.61 running at 69% completion. Same error message as below: MDIO ERROR: read error for file "input.vel", byte number 4: number of atoms (1078071040) != (39910) expected ERROR: mdioload.cu, line 146: Unable to read binvelfile Now I've reached my daily quota and will have 2 260 GTX 216 cards sitting idle. I suppose I could do folding @ home. ID: 5886 · Rating: 0 · rate: / Reply Quote

K1atOdessa Send message Joined: 25 Feb 08 Posts: 249 Credit: 444,646,963 RAC: 0 Level Scientific publications	Message 5888 - Posted: 22 Jan 2009, 23:21:05 UTC I've errored out on a bunch of WU's after doing 1 successfully. Looks like 6.61 needs to be repealed and go back to 6.55. ID: 5888 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 5891 - Posted: 22 Jan 2009, 23:32:15 UTC I have completed two so far, both a success ... ID: 5891 · Rating: 0 · rate: / Reply Quote

Michael Milan Send message Joined: 19 Jan 09 Posts: 4 Credit: 1,037,300 RAC: 0 Level Scientific publications	Message 5895 - Posted: 23 Jan 2009, 3:02:50 UTC I have a strange workunit error: http://www.gpugrid.net/result.php?resultid=238028 Anyone knows what this means?: "Cuda error: Kernel [frc_sum_kernel_bond] failed in file 'force.cu' in line 283 : unknown error." ID: 5895 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 5896 - Posted: 23 Jan 2009, 3:11:53 UTC - in response to Message 5895. I have a strange workunit error: http://www.gpugrid.net/result.php?resultid=238028 Anyone knows what this means?: "Cuda error: Kernel [frc_sum_kernel_bond] failed in file 'force.cu' in line 283 : unknown error." Bad unknown things ... Really bad, and really, really unknown things ... But we know exactly where ... :) Sorry, I could not resist ... and it is the only sense of humor that I have ... ID: 5896 · Rating: 0 · rate: / Reply Quote

Scott Brown Send message Joined: 21 Oct 08 Posts: 144 Credit: 2,973,555 RAC: 0 Level Scientific publications	Message 5897 - Posted: 23 Jan 2009, 4:45:16 UTC - in response to Message 5896. Bad unknown things ... Really bad, and really, really unknown things ... But we know exactly where ... :) Sorry, I could not resist ... and it is the only sense of humor that I have ... Well...there are different kinds of unknowns... see http://www.youtube.com/watch?v=_RpSv3HjpEw :) ID: 5897 · Rating: 0 · rate: / Reply Quote

X1900AIW Send message Joined: 12 Sep 08 Posts: 74 Credit: 23,566,124 RAC: 0 Level Scientific publications	Message 5898 - Posted: 23 Jan 2009, 6:11:46 UTC - in response to Message 5891. I have completed two so far, both a success ... Same to me, success with both WUs 236301, 236294, no error, but regardless I suspend now further downloads and wait for all 6.61-result (5 from 7) in my task-queue. ID: 5898 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 5899 - Posted: 23 Jan 2009, 7:05:50 UTC - in response to Message 5897. Well...there are different kinds of unknowns... see http://www.youtube.com/watch?v=_RpSv3HjpEw :) As an engineer I lived by those rules ... and it was almost always the unknown unknowns that got me ... which is why I tried so hard to find out what they might be ... ID: 5899 · Rating: 0 · rate: / Reply Quote

Chris S Send message Joined: 18 Jan 09 Posts: 21 Credit: 3,950,530 RAC: 0 Level Scientific publications	Message 5905 - Posted: 23 Jan 2009, 10:34:27 UTC I had 2 with this error, but running a 6.61 OK now MDIO ERROR: read error for file "input.vel", byte number 4: number of atoms (1625495040) != (39910) expected ERROR: mdioload.cu, line 146: Unable to read binvelfile ID: 5905 · Rating: 0 · rate: / Reply Quote

ianmbaker Send message Joined: 23 Jul 08 Posts: 2 Credit: 1,015,635 RAC: 0 Level Scientific publications	Message 5911 - Posted: 23 Jan 2009, 11:58:07 UTC It looks like the problem reported in thread http://www.gpugrid.net/forum_thread.php?id=671 was the cause of the problem. The failing work units I had were in that series. Thanks to all who responded. Ian ID: 5911 · Rating: 0 · rate: / Reply Quote

Nognlite Send message Joined: 9 Nov 08 Posts: 69 Credit: 25,106,923 RAC: 0 Level Scientific publications	Message 5914 - Posted: 23 Jan 2009, 15:14:34 UTC Same here. I've had two successfull WU's in the past 4 days. The rest of the time I maxed out my daily quota and now it has been reduced to 1/day. Now I sit idle (well not me at work but my computer). I have no problems with my 8800GT on Vista 64. All my issues are with 2 GTX280's on a Vista 64 machine. Pat ID: 5914 · Rating: 0 · rate: / Reply Quote

X1900AIW Send message Joined: 12 Sep 08 Posts: 74 Credit: 23,566,124 RAC: 0 Level Scientific publications	Message 5921 - Posted: 23 Jan 2009, 18:51:09 UTC - in response to Message 5905. I had 2 with this error, but running a 6.61 OK now MDIO ERROR: read error for file "input.vel", byte number 4: number of atoms (1625495040) != (39910) expected ERROR: mdioload.cu, line 146: Unable to read binvelfile Same error, first time a 6.61 crashed: 239257, Exit status 98 (0x62) MDIO ERROR: read error for file "input.vel", byte number 4: number of atoms (1625495040) != (39910) expected ERROR: mdioload.cu, line 146: Unable to read binvelfile ID: 5921 · Rating: 0 · rate: / Reply Quote

mikaok Send message Joined: 16 Jan 09 Posts: 12 Credit: 639,094 RAC: 0 Level Scientific publications	Message 5922 - Posted: 23 Jan 2009, 19:22:54 UTC - in response to Message 5921. Last modified: 23 Jan 2009, 19:24:14 UTC I had 2 with this error, but running a 6.61 OK now MDIO ERROR: read error for file "input.vel", byte number 4: number of atoms (1625495040) != (39910) expected ERROR: mdioload.cu, line 146: Unable to read binvelfile Same error, first time a 6.61 crashed: 239257, Exit status 98 (0x62) MDIO ERROR: read error for file "input.vel", byte number 4: number of atoms (1625495040) != (39910) expected ERROR: mdioload.cu, line 146: Unable to read binvelfile Same problem here. exit code 98 for the last 11 wu's in a row.. Mostly those were SH2_US_4 units, but also at least one SH2_US_5. Hope I don't get penalty for these :D ID: 5922 · Rating: 0 · rate: / Reply Quote