... still babysiting ...

Author	Message
Kokomiko Send message Joined: 18 Jul 08 Posts: 190 Credit: 24,093,690 RAC: 0 Level Scientific publications	Message 4847 - Posted: 25 Dec 2008, 13:41:50 UTC The scheduler in the 6.5.0 must still have problems with long time projects. I've CPDN (2 tasks, one with over 200 hours, one with 1600 hours), PrimeGrid, MilkyWay and ABC running as CPU tasks. In this case the BM don't ask for work automatically. If I stop one or 2 projects, I get this call and answer: 25.12.2008 13:33:54\|GPUGRID\|Sending scheduler request: Requested by user. Requesting 167135 seconds of work, reporting 0 completed tasks 25.12.2008 13:33:59\|GPUGRID\|Scheduler request completed: got 0 new tasks 25.12.2008 13:33:59\|GPUGRID\|Message from server: No work sent 25.12.2008 13:33:59\|GPUGRID\|Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer. 25.12.2008 13:33:59\|GPUGRID\|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer. If I stop everything I get: 25.12.2008 14:06:34\|GPUGRID\|Sending scheduler request: Requested by user. Requesting 218577 seconds of work, reporting 1 completed tasks 25.12.2008 14:08:15\|GPUGRID\|Scheduler request completed: got 1 new tasks 25.12.2008 14:08:47\|GPUGRID\|Sending scheduler request: To fetch work. Requesting 216986 seconds of work, reporting 0 completed tasks 25.12.2008 14:08:52\|GPUGRID\|Scheduler request completed: got 1 new tasks 25.12.2008 14:09:23\|GPUGRID\|Sending scheduler request: To fetch work. Requesting 216750 seconds of work, reporting 0 completed tasks 25.12.2008 14:09:28\|GPUGRID\|Scheduler request completed: got 1 new tasks The behavior of the scheduler of the 6.3.21 was much better. If I leave home now for more than 24 hours I have to downgrade or my boxes are running all dry. ID: 4847 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 4848 - Posted: 25 Dec 2008, 13:57:11 UTC You can also try to extend the cache size. I know we should if we are on HS connections run with a lean cache, but, I had to up mine to 0.4 days to get work reliably. ID: 4848 · Rating: 0 · rate: / Reply Quote

Kokomiko Send message Joined: 18 Jul 08 Posts: 190 Credit: 24,093,690 RAC: 0 Level Scientific publications	Message 4850 - Posted: 25 Dec 2008, 14:09:25 UTC - in response to Message 4848. You can also try to extend the cache size. I know we should if we are on HS connections run with a lean cache, but, I had to up mine to 0.4 days to get work reliably. My work cache is set to 2.00 days. The problem exist on the boxes with the fast cards (GTX280 and GTX260²). BTW: The GTX280 runs more than 30% faster with 3+1, the GTX260² runs fine with 4+1, all with Vista 64 bit. ID: 4850 · Rating: 0 · rate: / Reply Quote

[BOINC@Poland]AiDec Send message Joined: 2 Sep 08 Posts: 53 Credit: 9,213,937 RAC: 0 Level Scientific publications	Message 4851 - Posted: 25 Dec 2008, 15:04:49 UTC I would like just to confirm this strange behavior (not requesting more work if other projects are not suspended). ID: 4851 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 4860 - Posted: 25 Dec 2008, 23:16:56 UTC - in response to Message 4851. I would like just to confirm this strange behavior (not requesting more work if other projects are not suspended). Me too, on 6.4.2. MrS Scanning for our furry friends since Jan 2002 ID: 4860 · Rating: 0 · rate: / Reply Quote

DoctorNow Send message Joined: 18 Aug 07 Posts: 83 Credit: 144,208,752 RAC: 59 Level Scientific publications	Message 4863 - Posted: 26 Dec 2008, 7:53:49 UTC - in response to Message 4860. I would like just to confirm this strange behavior (not requesting more work if other projects are not suspended). Me too, on 6.4.2. Yep, me too now, even on 6.3.21. :-\ And I first thought, it only was a client problem of 6.5.0, that's why I switched back because I didn't had this problem before with the 6.3.21... Member of BOINC@Heidelberg and ATA! ID: 4863 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 4865 - Posted: 26 Dec 2008, 8:30:27 UTC Just to dip my oar, my experience is different. I am running 6.5.0 and it has been returning and fetching work normally for me. Though I just got one task with a 168 hour run time... forcing the prior task into high priority (it just completed). Now that it is running the time is coming down quite nicely thank you. Even more interesting is the task AFTER that came in at 17:25 which for the long task is about the run time I would expect. YMMV ID: 4865 · Rating: 0 · rate: / Reply Quote

Kokomiko Send message Joined: 18 Jul 08 Posts: 190 Credit: 24,093,690 RAC: 0 Level Scientific publications	Message 4867 - Posted: 26 Dec 2008, 11:49:27 UTC I have a mix of long time and short time projects running. On MilkyWay and PrimGrid I get also work if the BM calls for one second free time. One CPDN task has work for 850 hours. So the calls for work on GPUGrid are too short to get work. If I stop CPDN, MW and PG calls immediately a lot of WUs and after restart CPDN some projects going in high prio mode. I have to set all projects (without GPUGrid) to NNW and then to stop. That's the only way for me to get new work on this Vista 64 machine with the GTX280. But I don't understand, that the other box with the same OS, a GTX20², the same projects and 2 CPDN WUS with aggregate 1232 hours work has not the problem. I think, the main problem is, that the calculated free time for a work call is the same for CPU und GPU tasks. Is it possible, to make a different calculaton of free time for the WU calls for GPU and CPU work? ID: 4867 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 4873 - Posted: 26 Dec 2008, 13:00:36 UTC - in response to Message 4867. I have a mix of long time and short time projects running. On MilkyWay and PrimGrid I get also work if the BM calls for one second free time. One CPDN task has work for 850 hours. So the calls for work on GPUGrid are too short to get work. If I stop CPDN, MW and PG calls immediately a lot of WUs and after restart CPDN some projects going in high prio mode. I have to set all projects (without GPUGrid) to NNW and then to stop. That's the only way for me to get new work on this Vista 64 machine with the GTX280. But I don't understand, that the other box with the same OS, a GTX20², the same projects and 2 CPDN WUS with aggregate 1232 hours work has not the problem. I think, the main problem is, that the calculated free time for a work call is the same for CPU und GPU tasks. Is it possible, to make a different calculaton of free time for the WU calls for GPU and CPU work? This is the subject of a post in the SaH NC forum where I discuss how GPU processing breaks the resource share model ... which is the basis for making these calculations. Splitting the model to have two separate calculations only works when the project is pure CPU or Pure GPU processing. That model also breaks when you have a situation like SaH where you have capabilities to run on both processing elements. More interesting to me is the fact that the long neglected issues with credit calculations COULD have been the solution to this conundrum. Even sadder is that we predicted issues such as this back in beta testing of BOINC when discussing the future. Unfortunately the developers kept telling us we did not understand and that a correctly operating credit calculation system was not important. However, if we did have correct characterization of the CPU and GPU as to their capabilities in processing as defined by the original model of Cobblestones, then we would know what the processing capabilities of each processor system was, from there you know the total capacity, can look at the current loading, and allocate the resources. From THERE, you can ask for the correct type of work to properly "balance" the resource allocation ... and so on ... I hate to be right all the time ... :) ID: 4873 · Rating: 0 · rate: / Reply Quote

Kokomiko Send message Joined: 18 Jul 08 Posts: 190 Credit: 24,093,690 RAC: 0 Level Scientific publications	Message 4878 - Posted: 26 Dec 2008, 15:57:38 UTC ... and again ... it's no fun, can't grab new work ... 12/26/08 16:51:11\|GPUGRID\|Sending scheduler request: Requested by user. Requesting 115479 seconds of work, reporting 0 completed tasks 12/26/08 16:51:16\|GPUGRID\|Scheduler request completed: got 0 new tasks 12/26/08 16:51:16\|GPUGRID\|Message from server: No work sent 12/26/08 16:51:16\|GPUGRID\|Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer. 12/26/08 16:51:16\|GPUGRID\|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer. 12/26/08 16:52:01\|GPUGRID\|Sending scheduler request: Requested by user. Requesting 186492 seconds of work, reporting 0 completed tasks 12/26/08 16:52:07\|GPUGRID\|Scheduler request completed: got 0 new tasks 12/26/08 16:52:07\|GPUGRID\|Message from server: No work sent 12/26/08 16:52:07\|GPUGRID\|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer. 12/26/08 16:52:07\|GPUGRID\|Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer. 12/26/08 16:53:27\|GPUGRID\|Sending scheduler request: Requested by user. Requesting 255197 seconds of work, reporting 0 completed tasks 12/26/08 16:53:32\|GPUGRID\|Scheduler request completed: got 0 new tasks 12/26/08 16:53:32\|GPUGRID\|Message from server: No work sent 12/26/08 16:53:32\|GPUGRID\|Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer. 12/26/08 16:53:32\|GPUGRID\|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer. 12/26/08 16:54:23\|GPUGRID\|Sending scheduler request: Requested by user. Requesting 255197 seconds of work, reporting 0 completed tasks 12/26/08 16:54:28\|GPUGRID\|Scheduler request completed: got 0 new tasks 12/26/08 16:54:28\|GPUGRID\|Message from server: No work sent 12/26/08 16:54:28\|GPUGRID\|Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer. 12/26/08 16:54:28\|GPUGRID\|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer. ID: 4878 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 4879 - Posted: 26 Dec 2008, 16:16:55 UTC - in response to Message 4878. Which hostid? Please report also these issues on the boinc-alpha mailing lists. gdf ID: 4879 · Rating: 0 · rate: / Reply Quote

Black Beard Send message Joined: 16 Nov 08 Posts: 7 Credit: 982,855 RAC: 0 Level Scientific publications	Message 4881 - Posted: 26 Dec 2008, 16:37:23 UTC I'm having this problem also. In my case I got the impression that the problem was caused by everything wanting to run in 'high priority' mode. If I want to download GPUGRID wu's I must suspend the other three projects I run on this machine. If I want to download wu's for any of the other projects I must suspend GPUGRID. My host ID is 19688. How can I stop all my projects from running in High priority mode? I have my cache set to three days plus one day extra in my computing preferences. ID: 4881 · Rating: 0 · rate: / Reply Quote

Kokomiko Send message Joined: 18 Jul 08 Posts: 190 Credit: 24,093,690 RAC: 0 Level Scientific publications	Message 4885 - Posted: 26 Dec 2008, 19:14:08 UTC - in response to Message 4879. Last modified: 26 Dec 2008, 19:14:33 UTC Which hostid? Please report also these issues on the boinc-alpha mailing lists. gdf Host ID 7785 ID: 4885 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 4900 - Posted: 26 Dec 2008, 23:27:26 UTC - in response to Message 4860. I would like just to confirm this strange behavior (not requesting more work if other projects are not suspended). Me too, on 6.4.2. To add more detail: - I have 2 CPU projects, one with many many WUs (due to some previuos error, nevermind) and a normal one - all WUs (CPU+GPU) are in high priority mode due to this massive amount of cached WUs - cache size is set to 1.25 days - GPU-Grid has 37.5% ressource share And even if the current GPU-WU is down to a few hours of runtime BOINC won't request new work, until I suspend the project with many WUs. MrS Scanning for our furry friends since Jan 2002 ID: 4900 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 4912 - Posted: 27 Dec 2008, 0:46:39 UTC Just a note, Dr. Anderson has aknowledged that there are issues with with the work fetch policy which is contributing to our misery. He plans to start working on this soon ... Others have chimed in with suggestions (including me) and hopefully he will actually look at the real problem which is a little bit bigger ... but that is only Paul's opinion ... In the mean time we will have to fiddle with it to get work I think ... ID: 4912 · Rating: 0 · rate: / Reply Quote