Message boards :
Number crunching :
What am I doing wrong
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 24 Nov 12 Posts: 17 Credit: 453,679,903 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
I now have about 15 WU that spontaneously abort at 1 Hr. like ... Name e10s3_e1s67f460-GERARD_CXCL12_LIG11_CGENFF2-1-2-RND2153_0 Workunit 10707801 Created 1 Mar 2015 | 1:31:51 UTC Sent 1 Mar 2015 | 8:34:50 UTC Received 1 Mar 2015 | 9:34:55 UTC Server state Over Outcome Abandoned Client state New Exit status 0 (0x0) Computer ID 191787 Report deadline 6 Mar 2015 | 8:34:50 UTC Run time 0.00 CPU time 0.00 Validate state Initial Credit 0.00 Application version Long runs (8-12 hours on fastest card) v8.47 (cuda65) They are running on an nvidia gpx 770 Any Ideas?? This is using Win-7 What info do "you" need to help diagnose this? Ed F |
|
Send message Joined: 24 Nov 12 Posts: 17 Credit: 453,679,903 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
(I don't see an "edit" option) The most recent WU is now at 1:45 ... must have been a bad batch?? Ed F Edit nope ... this one died at 2:00 Ed F |
|
Send message Joined: 5 Dec 12 Posts: 84 Credit: 1,663,883,415 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
They all say for a status "Abandoned". That's so weird! |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I see that your computer's details show: BOINC version 6.12.34 .... That is ANCIENT. Please try the latest release, BOINC v7.4.36. http://boinc.berkeley.edu/download.php |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Dayle Diamond wrote: They all say for a status "Abandoned". That's so weird! The root of this phenomenon should be some kind of BOINC work folder access rights problem. Are there more than one user on this PC? Is the BOINC installed as a system service (protected execution mode)? Jacob Klein wrote: I see that your computer's details show: That's true, but it still has to work under Windows 7 x64. Until recently, I've used 6.10.60 on my hosts. The only reason for the update was to have such spare projects which are using OpenCL. Jacob Klein wrote: Please try the latest release, BOINC v7.4.36. Updating to this version is still a good idea, as this will update the folder access rights. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 318 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
No, 'Abandoned' is a server-only phenomenon - the tasks are marked thus in the server database record, but as the OP stated in the first post, the BOINC client locally knows nothing about this, and carries on processing - there are no permission problems locally. In general, when this happens, local tasks continue running until the user notices or all tasks are completed. I'm wondering (and this is pure speculation) whether the 'spontaneous abort' is actually the regular once-per-hour scheduler request 'requested by project', which is specific to this project. If the scheduler reply says the work is no longer viable, that could trigger the abort. That could be checked in the Event Log. As to why the server is marking the tasks as abandoned - nobody really knows, and I'd appreciate more help in tracking it down. It's done by the function mark_results_over(), which is called in two places in sched/handle_request.cpp (and nowhere else). It's supposed to happen "when there's evidence that the host has detached.", or "If the [RPC] seqno from the host is less than what we expect, the user must have copied the state file to a different host". But it seems to happen more than that, and the finger of suspicion seems to point at communication problems between host and server resulting in RPC requests being processed out of order on the server. As to running BOINC v6.12.34, that's fine. I run it here too, because it's the last version allowed to run GPUs in Service Mode under Windows XP. Works fine. |
|
Send message Joined: 24 Nov 12 Posts: 17 Credit: 453,679,903 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
Well ... I have no idea ... but I removed the project and reconnected ... I have completed 1 WU and am 1:45 into the next ... However WU 10709889 is nowhere to be seen ... must have fallen through the cracks during the disconnect?? Anyway ... all SEEMS to be well now ... I have no idea what went wrong ... but ... Thanks for the response! Ed F |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
As to why the server is marking the tasks as abandoned - nobody really knows, and I'd appreciate more help in tracking it down. It's done by the function mark_results_over(), which is called in two places in sched/handle_request.cpp (and nowhere else). It's supposed to happen "when there's evidence that the host has detached.", or "If the [RPC] seqno from the host is less than what we expect, the user must have copied the state file to a different host". But it seems to happen more than that, and the finger of suspicion seems to point at communication problems between host and server resulting in RPC requests being processed out of order on the server. It happened on one of my dual boot (WinXP/Win7) hosts, when I've tried to make the BOINC manager use the same working folder on both OSes. I've succeeded to do it on my other similar host by setting the proper access rights for the BOINC work folder (which is located on the Win7's partition on this host), but on the first host the ongoing GPUGrid workunits gets abandoned, whenever I boot to Win7 (the BOINC working folder is located on the WinXP's partition on this host). |
©2025 Universitat Pompeu Fabra