Early WU Downloads

Author	Message
tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34675 - Posted: 15 Jan 2014, 14:52:48 UTC - in response to Message 34670. 15/01/2014 11:51:50 \| \| log flags: file_xfer, sched_ops, task, file_xfer_debug 15/01/2014 11:51:50 \| \| Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6 The red line lists the logging flags currently turned on. It looks like you turned on the <file_xfer_debug> instead of <work_fetch_debug>. cc_config.xml fixed. I really must listen to instructions!! ID: 34675 · Rating: 0 · rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34676 - Posted: 15 Jan 2014, 14:57:31 UTC Here's the log of a work_fetch cycle. Is the line in red normal? 15/01/2014 15:56:42 \| \| [work_fetch] entering choose_project() 15/01/2014 15:56:42 \| \| [work_fetch] ------- start work fetch state ------- 15/01/2014 15:56:42 \| \| [work_fetch] target work buffer: 180.00 + 0.00 sec 15/01/2014 15:56:42 \| \| [work_fetch] --- project states --- 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] REC 71611.135 prio -48.201093 can req work 15/01/2014 15:56:42 \| \| [work_fetch] --- state for CPU --- 15/01/2014 15:56:42 \| \| [work_fetch] shortfall 540.00 nidle 3.00 saturated 0.00 busy 0.00 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 15/01/2014 15:56:42 \| \| [work_fetch] --- state for NVIDIA --- 15/01/2014 15:56:42 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 30261.82 busy 0.00 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] fetch share 1.000 15/01/2014 15:56:42 \| \| [work_fetch] ------- end work fetch state ------- 15/01/2014 15:56:42 \| \| [work_fetch] No project chosen for work fetch ID: 34676 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 34677 - Posted: 15 Jan 2014, 15:22:14 UTC - in response to Message 34676. Here's the log of a work_fetch cycle. Is the line in red normal? 15/01/2014 15:56:42 \| \| [work_fetch] entering choose_project() 15/01/2014 15:56:42 \| \| [work_fetch] ------- start work fetch state ------- 15/01/2014 15:56:42 \| \| [work_fetch] target work buffer: 180.00 + 0.00 sec 15/01/2014 15:56:42 \| \| [work_fetch] --- project states --- 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] REC 71611.135 prio -48.201093 can req work 15/01/2014 15:56:42 \| \| [work_fetch] --- state for CPU --- 15/01/2014 15:56:42 \| \| [work_fetch] shortfall 540.00 nidle 3.00 saturated 0.00 busy 0.00 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 15/01/2014 15:56:42 \| \| [work_fetch] --- state for NVIDIA --- 15/01/2014 15:56:42 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 30261.82 busy 0.00 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] fetch share 1.000 15/01/2014 15:56:42 \| \| [work_fetch] ------- end work fetch state ------- 15/01/2014 15:56:42 \| \| [work_fetch] No project chosen for work fetch It's normal when --- state for NVIDIA --- saturated 30261.82 [seconds] is larger than target work buffer: 180.00 + 0.00 sec[onds] - in other words, you have enough work for now, and don't need any more. ID: 34677 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34678 - Posted: 15 Jan 2014, 15:22:25 UTC - in response to Message 34676. Last modified: 15 Jan 2014, 15:33:24 UTC Here's the log of a work_fetch cycle. Is the line in red normal? 15/01/2014 15:56:42 \| \| [work_fetch] entering choose_project() 15/01/2014 15:56:42 \| \| [work_fetch] ------- start work fetch state ------- 15/01/2014 15:56:42 \| \| [work_fetch] target work buffer: 180.00 + 0.00 sec 15/01/2014 15:56:42 \| \| [work_fetch] --- project states --- 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] REC 71611.135 prio -48.201093 can req work 15/01/2014 15:56:42 \| \| [work_fetch] --- state for CPU --- 15/01/2014 15:56:42 \| \| [work_fetch] shortfall 540.00 nidle 3.00 saturated 0.00 busy 0.00 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 15/01/2014 15:56:42 \| \| [work_fetch] --- state for NVIDIA --- 15/01/2014 15:56:42 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 30261.82 busy 0.00 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] fetch share 1.000 15/01/2014 15:56:42 \| \| [work_fetch] ------- end work fetch state ------- 15/01/2014 15:56:42 \| \| [work_fetch] No project chosen for work fetch Let's teach you how to read this. target work buffer: ...says you need work to keep busy for at least "180" seconds (that's the 3 minutes I was talking about earlier, where even if you set min_buffer to 0, BOINC uses 3 minutes intentionally, since it could take around 3 minutes to ask projects for work) This line also equates to "when getting work, try not to get much more than: 180.00 + 0.00", which takes your max_addition_buffer setting into account. For reference, I use 0.1 days and 0.5 days for my buffer settings. So, my line says: target work buffer: 8640.00 + 43200.00 sec project states: ... GPUGrid is listed as "can req work". If you had it set for no new tasks, or suspended, it would be noted here, and then excluded from work fetch operations. state for CPU: shortfall 540 means that, in order to keep all your CPUs busy for that min_buffer setting, you'd need 540 instance seconds of CPU work. nidle 3 means that you have 3 CPUs that are currently completely idle. (Note: This saddens me, might prove beneficial to put those to work with some CPU projects). Notice that the GPUGrid entry in that block says (no apps), meaning that the project told BOINC it doesn't have CPU apps, and BOINC won't ever request CPU work from it. state for NVIDIA: shortfall 0 means that, in order to keep all your NVIDIA GPUs busy for that min_buffer setting, you'd need 0 seconds. In fact, you have saturation, meaning that all instances are projected to be busy for 30261.82 seconds (8.4 hours). end work fetch state: Here is where it makes a decision, based on the info above, of whether to request work from a project or not. You have no CPU projects available for your idle CPUs, so they get left idle :sadface: You have no idle NVIDIA devices, and also your saturation level (30261.82 seconds) is greater than your low water mark (180 seconds), so you don't need NVIDIA work either. So it correctly says "No project chosen for work fetch", and doesn't request work. Does that help? You should now be ready to read these log messages on your own, I'd think. Feel free to change some buffer values, or set GPUGrid for No New Tasks, to see the effects on this work_fetch_debug output. ID: 34678 · Rating: 0 · rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34680 - Posted: 15 Jan 2014, 16:58:18 UTC - in response to Message 34678. Let's teach you how to read this. Thanks for that, Jacob. I shall study it carefully. You have no CPU projects available for your idle CPUs, so they get left idle :sadface: Yes. I've been feeling guilty about that. So I'm now running six Rosettas too. A bit worried that the CPU fan has gone from 3700 rpm to 4300rpm and the CPU temperature has gone from 55C to 64C but I guess that's a question for my other thread. ID: 34680 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34681 - Posted: 15 Jan 2014, 17:08:56 UTC - in response to Message 34680. Last modified: 15 Jan 2014, 17:15:13 UTC Note 1: Richard's previous post in this thread, is likely correct. Note 2: REC is Recent estimated credit, and is used by BOINC in the "prio" priority calculation when choosing which project to ask for work. The projects are listed in "prio" order, such that you can easily see which would be "next in line" in a request for work. Note 3: In case you're curious to see a more-involved work fetch cycle, or might be wanting a list of projects that I'm attached to, below is a work_fetch_debug that shows the projects I'm attached to. I run a lot of various CPU and NVIDIA projects on this machine. Note 4: Further information about work_fetch can be found in this slightly outdated, but highly useful, document: http://boinc.berkeley.edu/trac/wiki/ClientSched ------------------------------ A cycle of my work fetch: ------------------------------ 1/15/2014 12:09:59 PM \| \| [work_fetch] entering choose_project() 1/15/2014 12:09:59 PM \| \| [work_fetch] ------- start work fetch state ------- 1/15/2014 12:09:59 PM \| \| [work_fetch] target work buffer: 8640.00 + 43200.00 sec 1/15/2014 12:09:59 PM \| \| [work_fetch] --- project states --- 1/15/2014 12:09:59 PM \| DrugDiscovery \| [work_fetch] REC 0.000 prio -0.000000 can req work 1/15/2014 12:09:59 PM \| The Lattice Project \| [work_fetch] REC 0.000 prio -0.000000 can req work 1/15/2014 12:09:59 PM \| superlinkattechnion \| [work_fetch] REC 0.126 prio -0.000000 can't req work: master URL fetch pending (backoff: 43700.11 sec) 1/15/2014 12:09:59 PM \| pogs \| [work_fetch] REC 0.000 prio -0.000000 can't req work: "no new tasks" requested via Manager 1/15/2014 12:09:59 PM \| Quake-Catcher Network \| [work_fetch] REC 0.000 prio 0.000000 can't req work: non CPU intensive 1/15/2014 12:09:59 PM \| ralph@home \| [work_fetch] REC 0.000 prio -0.000000 can req work 1/15/2014 12:09:59 PM \| DNA@Home \| [work_fetch] REC 0.000 prio -0.000000 can req work 1/15/2014 12:09:59 PM \| correlizer \| [work_fetch] REC 0.014 prio -0.000000 can req work 1/15/2014 12:09:59 PM \| WUProp@Home \| [work_fetch] REC 0.014 prio -0.000002 can't req work: non CPU intensive 1/15/2014 12:09:59 PM \| MindModeling@Beta \| [work_fetch] REC 110.106 prio -0.002449 can req work 1/15/2014 12:09:59 PM \| LHC@home 1.0 \| [work_fetch] REC 221.382 prio -0.004923 can req work 1/15/2014 12:09:59 PM \| Test4Theory@Home \| [work_fetch] REC 221.453 prio -0.004925 can req work 1/15/2014 12:09:59 PM \| boincsimap \| [work_fetch] REC 248.218 prio -0.005520 can req work 1/15/2014 12:09:59 PM \| World Community Grid \| [work_fetch] REC 1063.070 prio -0.006005 can req work 1/15/2014 12:09:59 PM \| rosetta@home \| [work_fetch] REC 287.694 prio -0.007551 can req work 1/15/2014 12:09:59 PM \| climateprediction.net \| [work_fetch] REC 304.466 prio -0.007763 can req work 1/15/2014 12:09:59 PM \| Cosmology@Home \| [work_fetch] REC 306.047 prio -0.010929 can req work 1/15/2014 12:09:59 PM \| Docking \| [work_fetch] REC 288.875 prio -0.011575 can req work 1/15/2014 12:09:59 PM \| Poem@Home \| [work_fetch] REC 590.890 prio -0.013141 can req work 1/15/2014 12:09:59 PM \| climateathome \| [work_fetch] REC 251.853 prio -0.022648 can req work 1/15/2014 12:09:59 PM \| Milkyway@Home \| [work_fetch] REC 229.825 prio -0.058248 can req work 1/15/2014 12:09:59 PM \| RNA World \| [work_fetch] REC 446.313 prio -0.070412 can req work 1/15/2014 12:09:59 PM \| GPUGRID \| [work_fetch] REC 662565.023 prio -14.773922 can req work 1/15/2014 12:09:59 PM \| SETI@home \| [work_fetch] REC 76065.475 prio -169.162250 can req work 1/15/2014 12:09:59 PM \| Einstein@Home \| [work_fetch] REC 80649.354 prio -179.356354 can req work 1/15/2014 12:09:59 PM \| SETI@home Beta Test \| [work_fetch] REC 80680.357 prio -179.425301 can req work 1/15/2014 12:09:59 PM \| Albert@Home \| [work_fetch] REC 86519.931 prio -192.412500 can req work 1/15/2014 12:09:59 PM \| \| [work_fetch] --- state for CPU --- 1/15/2014 12:09:59 PM \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 302655.24 busy 0.00 1/15/2014 12:09:59 PM \| DrugDiscovery \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| The Lattice Project \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| superlinkattechnion \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| pogs \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| ralph@home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| DNA@Home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| correlizer \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| MindModeling@Beta \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| LHC@home 1.0 \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| Test4Theory@Home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| boincsimap \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| World Community Grid \| [work_fetch] fetch share 0.190 1/15/2014 12:09:59 PM \| rosetta@home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| climateprediction.net \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| Cosmology@Home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| Docking \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| Poem@Home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| climateathome \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| Milkyway@Home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| RNA World \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| SETI@home \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| Einstein@Home \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| SETI@home Beta Test \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| Albert@Home \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| \| [work_fetch] --- state for NVIDIA --- 1/15/2014 12:09:59 PM \| \| [work_fetch] shortfall 68058.62 nidle 0.00 saturated 8942.84 busy 0.00 1/15/2014 12:09:59 PM \| DrugDiscovery \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| The Lattice Project \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| superlinkattechnion \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| pogs \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| ralph@home \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| DNA@Home \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| correlizer \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| MindModeling@Beta \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| LHC@home 1.0 \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| Test4Theory@Home \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| boincsimap \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| World Community Grid \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| rosetta@home \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| climateprediction.net \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| Cosmology@Home \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| Docking \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| Poem@Home \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| climateathome \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| Milkyway@Home \| [work_fetch] fetch share 0.000 (blocked by configuration file) 1/15/2014 12:09:59 PM \| RNA World \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| GPUGRID \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| SETI@home \| [work_fetch] fetch share 0.001 1/15/2014 12:09:59 PM \| Einstein@Home \| [work_fetch] fetch share 0.001 1/15/2014 12:09:59 PM \| SETI@home Beta Test \| [work_fetch] fetch share 0.001 1/15/2014 12:09:59 PM \| Albert@Home \| [work_fetch] fetch share 0.001 1/15/2014 12:09:59 PM \| \| [work_fetch] ------- end work fetch state ------- 1/15/2014 12:09:59 PM \| \| [work_fetch] No project chosen for work fetch ID: 34681 · Rating: 0 · rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34757 - Posted: 22 Jan 2014, 7:58:47 UTC Just had an early download; eight CPUs and two GPUs busy and no sign of a GPUGrid WU stopping: 22/01/2014 08:07:49 \| GPUGRID \| [work_fetch] fetch share 1.000 22/01/2014 08:07:49 \| \| [work_fetch] ------- end work fetch state ------- 22/01/2014 08:07:49 \| \| [work_fetch] No project chosen for work fetch 22/01/2014 08:08:29 \| \| [work_fetch] Request work fetch: application exited 22/01/2014 08:08:29 \| GPUGRID \| [work_fetch] REC 272425.128 prio -2.579669 can req work 22/01/2014 08:08:29 \| \| [work_fetch] --- state for CPU --- 22/01/2014 08:08:29 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 2030.85 busy 0.00 22/01/2014 08:08:29 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 22/01/2014 08:08:29 \| \| [work_fetch] --- state for NVIDIA --- 22/01/2014 08:08:29 \| \| [work_fetch] shortfall 1044.00 nidle 1.00 saturated 0.00 busy 0.00 22/01/2014 08:08:29 \| GPUGRID \| [work_fetch] fetch share 0.000 22/01/2014 08:08:29 \| \| [work_fetch] ------- end work fetch state ------- 22/01/2014 08:08:29 \| GPUGRID \| [work_fetch] set_request() for NVIDIA: ninst 2 nused_total 1.000000 nidle_now 1.000000 fetch share 0.000000 req_inst 1.000000 req_secs 1044.000000 22/01/2014 08:08:29 \| GPUGRID \| [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA (1044.00 sec, 1.00 inst) 22/01/2014 08:08:29 \| GPUGRID \| Sending scheduler request: To fetch work. 22/01/2014 08:08:29 \| GPUGRID \| Requesting new tasks for NVIDIA 22/01/2014 08:08:32 \| GPUGRID \| Scheduler request completed: got 1 new tasks 22/01/2014 08:08:32 \| \| [work_fetch] Request work fetch: RPC complete 22/01/2014 08:08:34 \| GPUGRID \| Started download of 72x-SANTI_MAR420cap310-29-LICENSE ID: 34757 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34764 - Posted: 22 Jan 2014, 14:04:15 UTC Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct? if you think it was incorrect, then the next steps might be to turn on cpu_sched and coproc_debug, to "prove" that work fetch made a bad call. ID: 34764 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 34765 - Posted: 22 Jan 2014, 15:52:52 UTC - in response to Message 34764. Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct? if you think it was incorrect, then the next steps might be to turn on cpu_sched and coproc_debug, to "prove" that work fetch made a bad call. It might be possible to work out what happened from the message log entries immediately before and after the section Tomba posted. Did a task restart, for example? The trouble with the extra log flags is that you can't use them retrospectively to diagnose a problem which has already happened - you have to set them anyway, and wait for it to happen again. ID: 34765 · Rating: 0 · rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34769 - Posted: 22 Jan 2014, 17:38:45 UTC - in response to Message 34764. Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct? As I said in my post "no sign of a GPUGrid WU stopping" and I did check back for 30 minutes. if you think it was incorrect, then the next steps might be to turn on cpu_sched and coproc_debug, to "prove" that work fetch made a bad call. OK. I'll check 'em out. ID: 34769 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 34771 - Posted: 22 Jan 2014, 18:27:56 UTC - in response to Message 34769. Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct? As I said in my post "no sign of a GPUGrid WU stopping" and I did check back for 30 minutes. Unfortunately, a task stopping isn't necessarily logged with normal settings - I think you'd need to add <task_debug> to be sure of seeing that. But you should see the restart afterwards, in the normal logs (I think). ID: 34771 · Rating: 0 · rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34793 - Posted: 23 Jan 2014, 18:11:30 UTC We start with two WUs confirmed running followed by a couple of nidles of 0. Then - oops - only one WU is running. One more 0 nidle then a 1 nidle, and one more 0 nidle!! Then we have a 1 "nidle_now", followed by an early WU fetch.... 23/01/2014 12:17:43 \| GPUGRID \| [coproc] NVIDIA instance 0: confirming for I91R1-NATHAN_KIDc22_full2-3-10-RND2112_0 23/01/2014 12:17:43 \| GPUGRID \| [coproc] NVIDIA instance 1: confirming for 75x-SANTI_MARwtcap310-22-32-RND0081_0 23/01/2014 12:18:10 \| \| [work_fetch] entering choose_project() 23/01/2014 12:18:10 \| \| [work_fetch] ------- start work fetch state ------- 23/01/2014 12:18:10 \| \| [work_fetch] target work buffer: 180.00 + 864.00 sec 23/01/2014 12:18:10 \| \| [work_fetch] --- project states --- 23/01/2014 12:18:10 \| GPUGRID \| [work_fetch] REC 275978.848 prio -3.498618 can req work 23/01/2014 12:18:10 \| \| [work_fetch] --- state for CPU --- 23/01/2014 12:18:10 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 2568.97 busy 0.00 23/01/2014 12:18:10 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 23/01/2014 12:18:10 \| \| [work_fetch] --- state for NVIDIA --- 23/01/2014 12:18:10 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 4775.59 busy 0.00 23/01/2014 12:18:10 \| GPUGRID \| [work_fetch] fetch share 0.500 23/01/2014 12:18:10 \| \| [work_fetch] ------- end work fetch state ------- 23/01/2014 12:18:10 \| \| [work_fetch] No project chosen for work fetch 23/01/2014 12:18:13 \| \| [work_fetch] Request work fetch: application exited 23/01/2014 12:18:13 \| GPUGRID \| [coproc] NVIDIA instance 0: confirming for I91R1-NATHAN_KIDc22_full2-3-10-RND2112_0 23/01/2014 12:18:15 \| \| [work_fetch] entering choose_project() 23/01/2014 12:18:15 \| \| [work_fetch] ------- start work fetch state ------- 23/01/2014 12:18:15 \| \| [work_fetch] target work buffer: 180.00 + 864.00 sec 23/01/2014 12:18:15 \| \| [work_fetch] --- project states --- 23/01/2014 12:18:15 \| GPUGRID \| [work_fetch] REC 275979.860 prio -3.498092 can req work 23/01/2014 12:18:15 \| \| [work_fetch] --- state for CPU --- 23/01/2014 12:18:15 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 2560.84 busy 0.00 23/01/2014 12:18:15 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 23/01/2014 12:18:15 \| \| [work_fetch] --- state for NVIDIA --- 23/01/2014 12:18:15 \| \| [work_fetch] shortfall 1044.00 nidle 1.00 saturated 0.00 busy 0.00 23/01/2014 12:18:15 \| GPUGRID \| [work_fetch] fetch share 0.000 23/01/2014 12:18:15 \| \| [work_fetch] ------- end work fetch state ------- 23/01/2014 12:18:19 \| \| [work_fetch] Request work fetch: RPC complete 23/01/2014 12:18:24 \| \| [work_fetch] entering choose_project() 23/01/2014 12:18:24 \| \| [work_fetch] ------- start work fetch state ------- 23/01/2014 12:18:24 \| \| [work_fetch] target work buffer: 180.00 + 864.00 sec 23/01/2014 12:18:24 \| \| [work_fetch] --- project states --- 23/01/2014 12:18:24 \| GPUGRID \| [work_fetch] REC 275979.860 prio -2.505735 can req work 23/01/2014 12:18:24 \| \| [work_fetch] --- state for CPU --- 23/01/2014 12:18:24 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 2546.51 busy 0.00 23/01/2014 12:18:24 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 23/01/2014 12:18:24 \| \| [work_fetch] --- state for NVIDIA --- 23/01/2014 12:18:24 \| \| [work_fetch] shortfall 1044.00 nidle 1.00 saturated 0.00 busy 0.00 23/01/2014 12:18:24 \| GPUGRID \| [work_fetch] fetch share 0.000 23/01/2014 12:18:24 \| \| [work_fetch] ------- end work fetch state ------- 23/01/2014 12:18:24 \| GPUGRID \| [work_fetch] set_request() for NVIDIA: ninst 2 nused_total 1.000000 nidle_now 1.000000 fetch share 0.000000 req_inst 1.000000 req_secs 1044.000000 23/01/2014 12:18:24 \| GPUGRID \| [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA (1044.00 sec, 1.00 inst) 23/01/2014 12:18:24 \| GPUGRID \| Sending scheduler request: To fetch work. 23/01/2014 12:18:24 \| GPUGRID \| Requesting new tasks for NVIDIA 23/01/2014 12:18:27 \| GPUGRID \| Scheduler request completed: got 1 new tasks 23/01/2014 12:18:27 \| \| [work_fetch] Request work fetch: RPC complete 23/01/2014 12:18:29 \| GPUGRID \| Started download of 98x-SANTI_MARwtcap310-30-LICENSE ID: 34793 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34794 - Posted: 23 Jan 2014, 18:40:43 UTC 23/01/2014 12:18:13 \| \| [work_fetch] Request work fetch: application exited Any idea what application exited, causing the work fetch request? Also, are you using CPU Throttling (The "Use at most X% CPU Time" setting)? Also, can you please include the first messages at the beginning of the event log, so we can see what version you are using? I agree this looks a bit suspicious, but it sounds like a GPU task got unloaded, and work fetch decided to fill an idle spot, even if the timing isn't exactly perfect. ID: 34794 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 34795 - Posted: 23 Jan 2014, 19:12:50 UTC - in response to Message 34793. I think that log sequence is pretty definitive. There are two sets of nidle: The --- state for CPU --- remains at zero throughout. No problems there. The --- state for NVIDIA --- starts at 0, jumps to 1, and then drops to 0 again. At the point of the jump, we can see 23/01/2014 12:18:13 \| \| [work_fetch] Request work fetch: application exited and NVIDIA instance 1: confirming for 75x-SANTI_MARwtcap310-22-32-RND0081_0 disappears from the record. That's result 7689115, which you can see has a pause in the middle: # The simulation has become unstable. Terminating to avoid lock-up (1) # Attempting restart (step 6109000) I imagine that if you look a bit further down, you'd see, perhaps first a 'restarting' entry for 75x-SANTI_MARwtcap310-22-32-RND0081_0, and then two task instances being confirmed again at each [coproc] step. The good news is that 75x-SANTI_MARwtcap310-22-32-RND0081_0 completed successfully and validated, despite the pause in the middle. ID: 34795 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34796 - Posted: 23 Jan 2014, 19:16:35 UTC - in response to Message 34795. And so, for that brief pause, work fetch correctly tried to fill an idle GPU, right? ID: 34796 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 34797 - Posted: 23 Jan 2014, 19:51:32 UTC - in response to Message 34796. And so, for that brief pause, work fetch correctly tried to fill an idle GPU, right? That's my guess. And I'm also guessing that BOINC restarted the missing 75x-SANTI_MARwtcap310-22-32-RND0081_0 (allowing it to run to completion and report success), before the file downloads for the replacement - probably result 7690601 - had completed and allowed it to be started on the idle GPU. ID: 34797 · Rating: 0 · rate: / Reply Quote

Dagorath Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level Scientific publications	Message 34799 - Posted: 23 Jan 2014, 21:55:29 UTC - in response to Message 34797. Makes sense to me. Thanks for the lesson in debug message interpretation, Richard and Jacob, I swear I'll get it eventually. So what's causing the simulation to become unstable and pause to catch its breath? Clocks too high? Shouldn't the client recognize the pause as a temporary suspend and not request more work? BOINC <<--- credit whores, pedants, alien hunters ID: 34799 · Rating: 0 · rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 34802 - Posted: 24 Jan 2014, 10:11:27 UTC Is this maybe the pause time that Matt implemented in the latest versions that after a crash it pauses to avoid consecutive crashes or something like that? It rings a bell. ID: 34802 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 34803 - Posted: 24 Jan 2014, 10:29:50 UTC - in response to Message 34802. Is this maybe the pause time that Matt implemented in the latest versions that after a crash it pauses to avoid consecutive crashes or something like that? Quite likely. If the task crashes having made some progress, it will be restarted after a delay of at least 60 sec. BOINC will try and start some other work during this idle time, which may involve downloading another tasks. In the worst case, a very unreliable machine may be continuously cycling between two or three tasks, each time making just enough progress to merit a restart attempt. MJH ID: 34803 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 34814 - Posted: 25 Jan 2014, 13:55:44 UTC - in response to Message 34803. Is this maybe the pause time that Matt implemented in the latest versions that after a crash it pauses to avoid consecutive crashes or something like that? Quite likely. If the task crashes having made some progress, it will be restarted after a delay of at least 60 sec. BOINC will try and start some other work during this idle time, which may involve downloading another tasks. In the worst case, a very unreliable machine may be continuously cycling between two or three tasks, each time making just enough progress to merit a restart attempt. MJH Exactly that happened when my Gigabyte GTX 780Ti OC was unreliable. ID: 34814 · Rating: 0 · rate: / Reply Quote