Advanced search

Message boards : Server and website : Result no longer usable

Author Message
Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 381
Credit: 4,784,759,839
RAC: 1,004,444
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21267 - Posted: 30 May 2011 | 15:09:32 UTC

During a routine update request, the following happened:

5/30/2011 10:41:58 AM | GPUGRID | update requested by user
5/30/2011 10:42:01 AM | GPUGRID | Sending scheduler request: Requested by user.
5/30/2011 10:42:01 AM | GPUGRID | Not reporting or requesting tasks
5/30/2011 10:42:03 AM | GPUGRID | Scheduler request completed
5/30/2011 10:42:03 AM | GPUGRID | Result A587-TONI_AGGsoup1-7-100-RND6153_1 is no longer usable
5/30/2011 10:42:03 AM | GPUGRID | Result p17-IBUCH_7_wtEGFR_110419-18-20-RND5765_1 is no longer usable
5/30/2011 10:42:03 AM | GPUGRID | Result A163-TONI_AGGdense1-3-100-RND8813_2 is no longer usable
5/30/2011 10:42:04 AM | GPUGRID | Computation for task A587-TONI_AGGsoup1-7-100-RND6153_1 finished
5/30/2011 10:42:04 AM | GPUGRID | Computation for task p17-IBUCH_7_wtEGFR_110419-18-20-RND5765_1 finished
5/30/2011 10:42:37 AM | GPUGRID | Sending scheduler request: To report completed tasks.
5/30/2011 10:42:37 AM | GPUGRID | Reporting 3 completed tasks, requesting new tasks for NVIDIA GPU
5/30/2011 10:42:41 AM | GPUGRID | Scheduler request completed: got 2 new tasks

I don't see why this happened. I updated to the latest driver and boinc versions a while back. Two of the units were partially complete, with no errors. The two new units are running fine, so far.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,991,617,060
RAC: 73,299
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21268 - Posted: 30 May 2011 | 19:55:03 UTC - in response to Message 21267.

Tasks listed as Client Detached

Are you using Bam?

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 381
Credit: 4,784,759,839
RAC: 1,004,444
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21269 - Posted: 30 May 2011 | 21:13:15 UTC - in response to Message 21268.

I don't even know what Bam is. So, I would say no.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 381
Credit: 4,784,759,839
RAC: 1,004,444
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21313 - Posted: 6 Jun 2011 | 1:28:10 UTC

It happened to me again:

6/5/2011 9:09:13 PM | GPUGRID | Sending scheduler request: To fetch work.
6/5/2011 9:09:13 PM | GPUGRID | Requesting new tasks for NVIDIA GPU
6/5/2011 9:09:15 PM | GPUGRID | Scheduler request completed: got 1 new tasks
6/5/2011 9:09:15 PM | GPUGRID | Result A570-TONI_AGGsoup1-13-100-RND6941_0 is no longer usable
6/5/2011 9:09:15 PM | GPUGRID | Result A229-TONI_AGGsoup1-11-100-RND0938_1 is no longer usable
6/5/2011 9:09:16 PM | GPUGRID | Computation for task A570-TONI_AGGsoup1-13-100-RND6941_0 finished
6/5/2011 9:09:16 PM | GPUGRID | Computation for task A229-TONI_AGGsoup1-11-100-RND0938_1 finished
6/5/2011 9:09:39 PM | GPUGRID | Started download of A435-TONI_AGG1-24-LICENSE
6/5/2011 9:09:39 PM | GPUGRID | Started download of A435-TONI_AGG1-24-COPYRIGHT
6/5/2011 9:09:40 PM | GPUGRID | Finished download of A435-TONI_AGG1-24-LICENSE
6/5/2011 9:09:40 PM | GPUGRID | Finished download of A435-TONI_AGG1-24-COPYRIGHT
6/5/2011 9:09:40 PM | GPUGRID | Started download of A435-TONI_AGG1-24-A435-TONI_AGG1-23-100-RND2826_1
6/5/2011 9:09:40 PM | GPUGRID | Started download of A435-TONI_AGG1-24-A435-TONI_AGG1-23-100-RND2826_2
6/5/2011 9:09:48 PM | GPUGRID | Finished download of A435-TONI_AGG1-24-A435-TONI_AGG1-23-100-RND2826_2
6/5/2011 9:09:48 PM | GPUGRID | Started download of A435-TONI_AGG1-24-A435-TONI_AGG1-23-100-RND2826_3
6/5/2011 9:09:49 PM | GPUGRID | Finished download of A435-TONI_AGG1-24-A435-TONI_AGG1-23-100-RND2826_1
6/5/2011 9:09:49 PM | GPUGRID | Started download of A435-TONI_AGG1-24-pdb_file
6/5/2011 9:09:49 PM | GPUGRID | Sending scheduler request: To report completed tasks.
6/5/2011 9:09:49 PM | GPUGRID | Reporting 2 completed tasks, requesting new tasks for NVIDIA GPU
6/5/2011 9:09:52 PM | GPUGRID | Finished download of A435-TONI_AGG1-24-A435-TONI_AGG1-23-100-RND2826_3

This started after, I upgraded to Boinc 6.12.26 and Nvidia version 27061, and then to 27533. It is only happening to my windows 7 machine. Everything was going fine until the scheduler request. Is the problem in my machine or in the server?

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21314 - Posted: 6 Jun 2011 | 3:29:06 UTC - in response to Message 21313.

6/5/2011 9:09:15 PM | GPUGRID | Result A570-TONI_AGGsoup1-13-100-RND6941_0 is no longer usable
6/5/2011 9:09:15 PM | GPUGRID | Result A229-TONI_AGGsoup1-11-100-RND0938_1 is no longer usable


The server decided that the 2 results in question were no longer usable and sent the 2 messages quoted above to your BOINC client. Your computer responded to the above messages by ending computation on the 2 tasks, as indicated in the 2 messages quoted below.

6/5/2011 9:09:16 PM | GPUGRID | Computation for task A570-TONI_AGGsoup1-13-100-RND6941_0 finished
6/5/2011 9:09:16 PM | GPUGRID | Computation for task A229-TONI_AGGsoup1-11-100-RND0938_1 finished


What's puzzling is that when I look at the time the tasks were sent to you and the time the tasks were returned, I see that the one task was returned about 5 hours after it was sent and the other task was returned about 7 hours after it was sent. Tasks should not get the "result is no longer needed" message until after the deadline has passed. The deadline is 5 days but your 2 tasks were canceled less than 12 hours after they were sent to you. That's weird.

Another weird thing is that on your list of tasks for that computer, both of those tasks show a status of "Client detached". If they were canceled by the server then they should show status "Canceled" not "Client detached". That makes me wonder... Did you detach the client from GPUgrid after the tasks were canceled and before they were reported?

So we have 2 mysteries here. It's hard to give you any advise about what to do about the problem until those 2 mysteries are solved.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 381
Credit: 4,784,759,839
RAC: 1,004,444
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21315 - Posted: 6 Jun 2011 | 10:39:56 UTC

No, I did not detach from the client. This was in fact a computer scheduler request.

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21316 - Posted: 6 Jun 2011 | 11:45:32 UTC - in response to Message 21315.


Hmmm. If you didn't detach from the project and the project server canceled your 2 tasks less than 12 hours after you received them, as the messages you posted indicate, then you've exposed some bugs on the server, perhaps in the client too. The problem is nobody else seems to be afflicted by the bugs so I doubt they exist.

The only other scenario I can think of is not pretty. In that scenario you did in fact detach from the project and you fabricated the messages to make it look like the server canceled the 2 tasks prematurely. Maybe you're just screwing with our heads? Sorry if that offends and I hope I'm wrong. I sincerely hope someone else can think of a third scenario to explain what's going on.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 381
Credit: 4,784,759,839
RAC: 1,004,444
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21317 - Posted: 6 Jun 2011 | 12:11:54 UTC

No, I am not making this up. I know you had to mention this to rule out all possibilities, but please don't mention this again. Thank you!

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21320 - Posted: 6 Jun 2011 | 16:49:58 UTC

is it possible that the WUs were resends because the original, which was sent to someone else, had not been returned within 48 hours but then before you crunched they, the originally finally returned?
____________
Thanks - Steve

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21322 - Posted: 6 Jun 2011 | 17:42:38 UTC - in response to Message 21320.

No, that did not happen. Follow the audit trail for yourself if you wish.

Click on Bedrich's name to the left of one of his messages. Then click on the "Computers: View" link to bring up his list of attached computers. Now recall from one of his posts that he said it happened on his Windows 7 machine. That would be computer number 74707. Click on the "Tasks" link for 74707 then find the 2 tasks with status "Client detached". Click the "work unit ID" link for each task. Note that for each task the work unit name at the top of that page matches one of the 2 canceled tasks in the messages he quoted in the first post in this thread.

Now note that for work unit 2516266 the first replication was reported 5 Jun 2011 13:50:57 UTC but Bedrich's computer 7407 didn't receive the second replication until 5 Jun 2011 18:10:34 UTC, about 4 hours after the first replication was returned. Thus Bedrich's computer was not crunching the work unit simultaneously with another host.

For work unit 2516397 we can see that Bedrich's computer 74707 reported the task at 5 Jun 2011 23:22:49 UTC and the next com,puter in line didn't receive the second replication until 6 Jun 2011 4:40:53 UTC. Again the work unit was not crunching simultaneously on 2 computers.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,991,617,060
RAC: 73,299
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21323 - Posted: 6 Jun 2011 | 18:33:34 UTC - in response to Message 21317.

I'm sure Bedrich Hajek is not making it up; such errors have been seen and reported before (here and in other project threads).
If it's a server side problem (task issue or other) one of the researchers might be able to help. If its the result of corrupt user data a project reset might help. It could also be the result of an Antivirus/Firewall blocking something, or something else I can't think of.

Have you recently changed your user details, and do they match up well with other projects?
Did your system do a system restore, or recover from a problem?

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 381
Credit: 4,784,759,839
RAC: 1,004,444
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21329 - Posted: 6 Jun 2011 | 23:45:06 UTC - in response to Message 21323.

Have you recently changed your user details, and do they match up well with other projects?

No.

Did your system do a system restore, or recover from a problem?

No.

The only things that I did that were unusual were update boinc and Nvidia, as I mentioned earlier. I only updated the Nvidia drivers. Today, I reinstalled Nvidia, doing clean installation, afterwards I cleaned out the registry. Maybe, this will do the trick. If this doesn't work, I can go back to the earlier versions.

Post to thread

Message boards : Server and website : Result no longer usable