Message boards :
Number crunching :
Early WU Downloads
Message board moderation
Previous · 1 · 2 · 3 · Next
| Author | Message |
|---|---|
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
cc_config.xml fixed. I really must listen to instructions!! |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Here's the log of a work_fetch cycle. Is the line in red normal? 15/01/2014 15:56:42 | | [work_fetch] entering choose_project() 15/01/2014 15:56:42 | | [work_fetch] ------- start work fetch state ------- 15/01/2014 15:56:42 | | [work_fetch] target work buffer: 180.00 + 0.00 sec 15/01/2014 15:56:42 | | [work_fetch] --- project states --- 15/01/2014 15:56:42 | GPUGRID | [work_fetch] REC 71611.135 prio -48.201093 can req work 15/01/2014 15:56:42 | | [work_fetch] --- state for CPU --- 15/01/2014 15:56:42 | | [work_fetch] shortfall 540.00 nidle 3.00 saturated 0.00 busy 0.00 15/01/2014 15:56:42 | GPUGRID | [work_fetch] fetch share 0.000 (no apps) 15/01/2014 15:56:42 | | [work_fetch] --- state for NVIDIA --- 15/01/2014 15:56:42 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 30261.82 busy 0.00 15/01/2014 15:56:42 | GPUGRID | [work_fetch] fetch share 1.000 15/01/2014 15:56:42 | | [work_fetch] ------- end work fetch state ------- 15/01/2014 15:56:42 | | [work_fetch] No project chosen for work fetch |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Here's the log of a work_fetch cycle. Is the line in red normal? It's normal when --- state for NVIDIA --- saturated 30261.82 [seconds] is larger than target work buffer: 180.00 + 0.00 sec[onds] - in other words, you have enough work for now, and don't need any more. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Here's the log of a work_fetch cycle. Is the line in red normal? Let's teach you how to read this. target work buffer: ...says you need work to keep busy for at least "180" seconds (that's the 3 minutes I was talking about earlier, where even if you set min_buffer to 0, BOINC uses 3 minutes intentionally, since it could take around 3 minutes to ask projects for work) This line also equates to "when getting work, try not to get much more than: 180.00 + 0.00", which takes your max_addition_buffer setting into account. For reference, I use 0.1 days and 0.5 days for my buffer settings. So, my line says: target work buffer: 8640.00 + 43200.00 sec project states: ... GPUGrid is listed as "can req work". If you had it set for no new tasks, or suspended, it would be noted here, and then excluded from work fetch operations. state for CPU: shortfall 540 means that, in order to keep all your CPUs busy for that min_buffer setting, you'd need 540 instance seconds of CPU work. nidle 3 means that you have 3 CPUs that are currently completely idle. (Note: This saddens me, might prove beneficial to put those to work with some CPU projects). Notice that the GPUGrid entry in that block says (no apps), meaning that the project told BOINC it doesn't have CPU apps, and BOINC won't ever request CPU work from it. state for NVIDIA: shortfall 0 means that, in order to keep all your NVIDIA GPUs busy for that min_buffer setting, you'd need 0 seconds. In fact, you have saturation, meaning that all instances are projected to be busy for 30261.82 seconds (8.4 hours). end work fetch state: Here is where it makes a decision, based on the info above, of whether to request work from a project or not. You have no CPU projects available for your idle CPUs, so they get left idle :sadface: You have no idle NVIDIA devices, and also your saturation level (30261.82 seconds) is greater than your low water mark (180 seconds), so you don't need NVIDIA work either. So it correctly says "No project chosen for work fetch", and doesn't request work. Does that help? You should now be ready to read these log messages on your own, I'd think. Feel free to change some buffer values, or set GPUGrid for No New Tasks, to see the effects on this work_fetch_debug output. |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Let's teach you how to read this. Thanks for that, Jacob. I shall study it carefully. You have no CPU projects available for your idle CPUs, so they get left idle :sadface: Yes. I've been feeling guilty about that. So I'm now running six Rosettas too. A bit worried that the CPU fan has gone from 3700 rpm to 4300rpm and the CPU temperature has gone from 55C to 64C but I guess that's a question for my other thread. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Note 1: Richard's previous post in this thread, is likely correct. Note 2: REC is Recent estimated credit, and is used by BOINC in the "prio" priority calculation when choosing which project to ask for work. The projects are listed in "prio" order, such that you can easily see which would be "next in line" in a request for work. Note 3: In case you're curious to see a more-involved work fetch cycle, or might be wanting a list of projects that I'm attached to, below is a work_fetch_debug that shows the projects I'm attached to. I run a lot of various CPU and NVIDIA projects on this machine. Note 4: Further information about work_fetch can be found in this slightly outdated, but highly useful, document: http://boinc.berkeley.edu/trac/wiki/ClientSched ------------------------------ A cycle of my work fetch: ------------------------------ 1/15/2014 12:09:59 PM | | [work_fetch] entering choose_project() 1/15/2014 12:09:59 PM | | [work_fetch] ------- start work fetch state ------- 1/15/2014 12:09:59 PM | | [work_fetch] target work buffer: 8640.00 + 43200.00 sec 1/15/2014 12:09:59 PM | | [work_fetch] --- project states --- 1/15/2014 12:09:59 PM | DrugDiscovery | [work_fetch] REC 0.000 prio -0.000000 can req work 1/15/2014 12:09:59 PM | The Lattice Project | [work_fetch] REC 0.000 prio -0.000000 can req work 1/15/2014 12:09:59 PM | superlinkattechnion | [work_fetch] REC 0.126 prio -0.000000 can't req work: master URL fetch pending (backoff: 43700.11 sec) 1/15/2014 12:09:59 PM | pogs | [work_fetch] REC 0.000 prio -0.000000 can't req work: "no new tasks" requested via Manager 1/15/2014 12:09:59 PM | Quake-Catcher Network | [work_fetch] REC 0.000 prio 0.000000 can't req work: non CPU intensive 1/15/2014 12:09:59 PM | ralph@home | [work_fetch] REC 0.000 prio -0.000000 can req work 1/15/2014 12:09:59 PM | DNA@Home | [work_fetch] REC 0.000 prio -0.000000 can req work 1/15/2014 12:09:59 PM | correlizer | [work_fetch] REC 0.014 prio -0.000000 can req work 1/15/2014 12:09:59 PM | WUProp@Home | [work_fetch] REC 0.014 prio -0.000002 can't req work: non CPU intensive 1/15/2014 12:09:59 PM | MindModeling@Beta | [work_fetch] REC 110.106 prio -0.002449 can req work 1/15/2014 12:09:59 PM | LHC@home 1.0 | [work_fetch] REC 221.382 prio -0.004923 can req work 1/15/2014 12:09:59 PM | Test4Theory@Home | [work_fetch] REC 221.453 prio -0.004925 can req work 1/15/2014 12:09:59 PM | boincsimap | [work_fetch] REC 248.218 prio -0.005520 can req work 1/15/2014 12:09:59 PM | World Community Grid | [work_fetch] REC 1063.070 prio -0.006005 can req work 1/15/2014 12:09:59 PM | rosetta@home | [work_fetch] REC 287.694 prio -0.007551 can req work 1/15/2014 12:09:59 PM | climateprediction.net | [work_fetch] REC 304.466 prio -0.007763 can req work 1/15/2014 12:09:59 PM | Cosmology@Home | [work_fetch] REC 306.047 prio -0.010929 can req work 1/15/2014 12:09:59 PM | Docking | [work_fetch] REC 288.875 prio -0.011575 can req work 1/15/2014 12:09:59 PM | Poem@Home | [work_fetch] REC 590.890 prio -0.013141 can req work 1/15/2014 12:09:59 PM | climateathome | [work_fetch] REC 251.853 prio -0.022648 can req work 1/15/2014 12:09:59 PM | Milkyway@Home | [work_fetch] REC 229.825 prio -0.058248 can req work 1/15/2014 12:09:59 PM | RNA World | [work_fetch] REC 446.313 prio -0.070412 can req work 1/15/2014 12:09:59 PM | GPUGRID | [work_fetch] REC 662565.023 prio -14.773922 can req work 1/15/2014 12:09:59 PM | SETI@home | [work_fetch] REC 76065.475 prio -169.162250 can req work 1/15/2014 12:09:59 PM | Einstein@Home | [work_fetch] REC 80649.354 prio -179.356354 can req work 1/15/2014 12:09:59 PM | SETI@home Beta Test | [work_fetch] REC 80680.357 prio -179.425301 can req work 1/15/2014 12:09:59 PM | Albert@Home | [work_fetch] REC 86519.931 prio -192.412500 can req work 1/15/2014 12:09:59 PM | | [work_fetch] --- state for CPU --- 1/15/2014 12:09:59 PM | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 302655.24 busy 0.00 1/15/2014 12:09:59 PM | DrugDiscovery | [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM | The Lattice Project | [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM | superlinkattechnion | [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM | pogs | [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM | ralph@home | [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM | DNA@Home | [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM | correlizer | [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM | MindModeling@Beta | [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM | LHC@home 1.0 | [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM | Test4Theory@Home | [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM | boincsimap | [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM | World Community Grid | [work_fetch] fetch share 0.190 1/15/2014 12:09:59 PM | rosetta@home | [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM | climateprediction.net | [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM | Cosmology@Home | [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM | Docking | [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM | Poem@Home | [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM | climateathome | [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM | Milkyway@Home | [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM | RNA World | [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM | GPUGRID | [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM | SETI@home | [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM | Einstein@Home | [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM | SETI@home Beta Test | [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM | Albert@Home | [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM | | [work_fetch] --- state for NVIDIA --- 1/15/2014 12:09:59 PM | | [work_fetch] shortfall 68058.62 nidle 0.00 saturated 8942.84 busy 0.00 1/15/2014 12:09:59 PM | DrugDiscovery | [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM | The Lattice Project | [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM | superlinkattechnion | [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM | pogs | [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM | ralph@home | [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM | DNA@Home | [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM | correlizer | [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM | MindModeling@Beta | [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM | LHC@home 1.0 | [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM | Test4Theory@Home | [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM | boincsimap | [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM | World Community Grid | [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM | rosetta@home | [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM | climateprediction.net | [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM | Cosmology@Home | [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM | Docking | [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM | Poem@Home | [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM | climateathome | [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM | Milkyway@Home | [work_fetch] fetch share 0.000 (blocked by configuration file) 1/15/2014 12:09:59 PM | RNA World | [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM | GPUGRID | [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM | SETI@home | [work_fetch] fetch share 0.001 1/15/2014 12:09:59 PM | Einstein@Home | [work_fetch] fetch share 0.001 1/15/2014 12:09:59 PM | SETI@home Beta Test | [work_fetch] fetch share 0.001 1/15/2014 12:09:59 PM | Albert@Home | [work_fetch] fetch share 0.001 1/15/2014 12:09:59 PM | | [work_fetch] ------- end work fetch state ------- 1/15/2014 12:09:59 PM | | [work_fetch] No project chosen for work fetch |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Just had an early download; eight CPUs and two GPUs busy and no sign of a GPUGrid WU stopping: 22/01/2014 08:07:49 | GPUGRID | [work_fetch] fetch share 1.000 22/01/2014 08:07:49 | | [work_fetch] ------- end work fetch state ------- 22/01/2014 08:07:49 | | [work_fetch] No project chosen for work fetch 22/01/2014 08:08:29 | | [work_fetch] Request work fetch: application exited 22/01/2014 08:08:29 | GPUGRID | [work_fetch] REC 272425.128 prio -2.579669 can req work 22/01/2014 08:08:29 | | [work_fetch] --- state for CPU --- 22/01/2014 08:08:29 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 2030.85 busy 0.00 22/01/2014 08:08:29 | GPUGRID | [work_fetch] fetch share 0.000 (no apps) 22/01/2014 08:08:29 | | [work_fetch] --- state for NVIDIA --- 22/01/2014 08:08:29 | | [work_fetch] shortfall 1044.00 nidle 1.00 saturated 0.00 busy 0.00 22/01/2014 08:08:29 | GPUGRID | [work_fetch] fetch share 0.000 22/01/2014 08:08:29 | | [work_fetch] ------- end work fetch state ------- 22/01/2014 08:08:29 | GPUGRID | [work_fetch] set_request() for NVIDIA: ninst 2 nused_total 1.000000 nidle_now 1.000000 fetch share 0.000000 req_inst 1.000000 req_secs 1044.000000 22/01/2014 08:08:29 | GPUGRID | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA (1044.00 sec, 1.00 inst) 22/01/2014 08:08:29 | GPUGRID | Sending scheduler request: To fetch work. 22/01/2014 08:08:29 | GPUGRID | Requesting new tasks for NVIDIA 22/01/2014 08:08:32 | GPUGRID | Scheduler request completed: got 1 new tasks 22/01/2014 08:08:32 | | [work_fetch] Request work fetch: RPC complete 22/01/2014 08:08:34 | GPUGRID | Started download of 72x-SANTI_MAR420cap310-29-LICENSE |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct? if you think it was incorrect, then the next steps might be to turn on cpu_sched and coproc_debug, to "prove" that work fetch made a bad call. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct? It might be possible to work out what happened from the message log entries immediately before and after the section Tomba posted. Did a task restart, for example? The trouble with the extra log flags is that you can't use them retrospectively to diagnose a problem which has already happened - you have to set them anyway, and wait for it to happen again. |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct? As I said in my post "no sign of a GPUGrid WU stopping" and I did check back for 30 minutes. if you think it was incorrect, then the next steps might be to turn on cpu_sched and coproc_debug, to "prove" that work fetch made a bad call. OK. I'll check 'em out. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct? Unfortunately, a task stopping isn't necessarily logged with normal settings - I think you'd need to add <task_debug> to be sure of seeing that. But you should see the restart afterwards, in the normal logs (I think). |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
We start with two WUs confirmed running followed by a couple of nidles of 0. Then - oops - only one WU is running. One more 0 nidle then a 1 nidle, and one more 0 nidle!! Then we have a 1 "nidle_now", followed by an early WU fetch.... 23/01/2014 12:17:43 | GPUGRID | [coproc] NVIDIA instance 0: confirming for I91R1-NATHAN_KIDc22_full2-3-10-RND2112_0 23/01/2014 12:17:43 | GPUGRID | [coproc] NVIDIA instance 1: confirming for 75x-SANTI_MARwtcap310-22-32-RND0081_0 23/01/2014 12:18:10 | | [work_fetch] entering choose_project() 23/01/2014 12:18:10 | | [work_fetch] ------- start work fetch state ------- 23/01/2014 12:18:10 | | [work_fetch] target work buffer: 180.00 + 864.00 sec 23/01/2014 12:18:10 | | [work_fetch] --- project states --- 23/01/2014 12:18:10 | GPUGRID | [work_fetch] REC 275978.848 prio -3.498618 can req work 23/01/2014 12:18:10 | | [work_fetch] --- state for CPU --- 23/01/2014 12:18:10 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 2568.97 busy 0.00 23/01/2014 12:18:10 | GPUGRID | [work_fetch] fetch share 0.000 (no apps) 23/01/2014 12:18:10 | | [work_fetch] --- state for NVIDIA --- 23/01/2014 12:18:10 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 4775.59 busy 0.00 23/01/2014 12:18:10 | GPUGRID | [work_fetch] fetch share 0.500 23/01/2014 12:18:10 | | [work_fetch] ------- end work fetch state ------- 23/01/2014 12:18:10 | | [work_fetch] No project chosen for work fetch 23/01/2014 12:18:13 | | [work_fetch] Request work fetch: application exited 23/01/2014 12:18:13 | GPUGRID | [coproc] NVIDIA instance 0: confirming for I91R1-NATHAN_KIDc22_full2-3-10-RND2112_0 23/01/2014 12:18:15 | | [work_fetch] entering choose_project() 23/01/2014 12:18:15 | | [work_fetch] ------- start work fetch state ------- 23/01/2014 12:18:15 | | [work_fetch] target work buffer: 180.00 + 864.00 sec 23/01/2014 12:18:15 | | [work_fetch] --- project states --- 23/01/2014 12:18:15 | GPUGRID | [work_fetch] REC 275979.860 prio -3.498092 can req work 23/01/2014 12:18:15 | | [work_fetch] --- state for CPU --- 23/01/2014 12:18:15 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 2560.84 busy 0.00 23/01/2014 12:18:15 | GPUGRID | [work_fetch] fetch share 0.000 (no apps) 23/01/2014 12:18:15 | | [work_fetch] --- state for NVIDIA --- 23/01/2014 12:18:15 | | [work_fetch] shortfall 1044.00 nidle 1.00 saturated 0.00 busy 0.00 23/01/2014 12:18:15 | GPUGRID | [work_fetch] fetch share 0.000 23/01/2014 12:18:15 | | [work_fetch] ------- end work fetch state ------- 23/01/2014 12:18:19 | | [work_fetch] Request work fetch: RPC complete 23/01/2014 12:18:24 | | [work_fetch] entering choose_project() 23/01/2014 12:18:24 | | [work_fetch] ------- start work fetch state ------- 23/01/2014 12:18:24 | | [work_fetch] target work buffer: 180.00 + 864.00 sec 23/01/2014 12:18:24 | | [work_fetch] --- project states --- 23/01/2014 12:18:24 | GPUGRID | [work_fetch] REC 275979.860 prio -2.505735 can req work 23/01/2014 12:18:24 | | [work_fetch] --- state for CPU --- 23/01/2014 12:18:24 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 2546.51 busy 0.00 23/01/2014 12:18:24 | GPUGRID | [work_fetch] fetch share 0.000 (no apps) 23/01/2014 12:18:24 | | [work_fetch] --- state for NVIDIA --- 23/01/2014 12:18:24 | | [work_fetch] shortfall 1044.00 nidle 1.00 saturated 0.00 busy 0.00 23/01/2014 12:18:24 | GPUGRID | [work_fetch] fetch share 0.000 23/01/2014 12:18:24 | | [work_fetch] ------- end work fetch state ------- 23/01/2014 12:18:24 | GPUGRID | [work_fetch] set_request() for NVIDIA: ninst 2 nused_total 1.000000 nidle_now 1.000000 fetch share 0.000000 req_inst 1.000000 req_secs 1044.000000 23/01/2014 12:18:24 | GPUGRID | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA (1044.00 sec, 1.00 inst) 23/01/2014 12:18:24 | GPUGRID | Sending scheduler request: To fetch work. 23/01/2014 12:18:24 | GPUGRID | Requesting new tasks for NVIDIA 23/01/2014 12:18:27 | GPUGRID | Scheduler request completed: got 1 new tasks 23/01/2014 12:18:27 | | [work_fetch] Request work fetch: RPC complete 23/01/2014 12:18:29 | GPUGRID | Started download of 98x-SANTI_MARwtcap310-30-LICENSE |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
23/01/2014 12:18:13 | | [work_fetch] Request work fetch: application exited Any idea what application exited, causing the work fetch request? Also, are you using CPU Throttling (The "Use at most X% CPU Time" setting)? Also, can you please include the first messages at the beginning of the event log, so we can see what version you are using? I agree this looks a bit suspicious, but it sounds like a GPU task got unloaded, and work fetch decided to fill an idle spot, even if the timing isn't exactly perfect. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I think that log sequence is pretty definitive. There are two sets of nidle: The --- state for CPU --- remains at zero throughout. No problems there. The --- state for NVIDIA --- starts at 0, jumps to 1, and then drops to 0 again. At the point of the jump, we can see 23/01/2014 12:18:13 | | [work_fetch] Request work fetch: application exited and NVIDIA instance 1: confirming for 75x-SANTI_MARwtcap310-22-32-RND0081_0 disappears from the record. That's result 7689115, which you can see has a pause in the middle: # The simulation has become unstable. Terminating to avoid lock-up (1) I imagine that if you look a bit further down, you'd see, perhaps first a 'restarting' entry for 75x-SANTI_MARwtcap310-22-32-RND0081_0, and then two task instances being confirmed again at each [coproc] step. The good news is that 75x-SANTI_MARwtcap310-22-32-RND0081_0 completed successfully and validated, despite the pause in the middle. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
And so, for that brief pause, work fetch correctly tried to fill an idle GPU, right? |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
And so, for that brief pause, work fetch correctly tried to fill an idle GPU, right? That's my guess. And I'm also guessing that BOINC restarted the missing 75x-SANTI_MARwtcap310-22-32-RND0081_0 (allowing it to run to completion and report success), before the file downloads for the replacement - probably result 7690601 - had completed and allowed it to be started on the idle GPU. |
|
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Makes sense to me. Thanks for the lesson in debug message interpretation, Richard and Jacob, I swear I'll get it eventually. So what's causing the simulation to become unstable and pause to catch its breath? Clocks too high? Shouldn't the client recognize the pause as a temporary suspend and not request more work? BOINC <<--- credit whores, pedants, alien hunters |
|
Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
Is this maybe the pause time that Matt implemented in the latest versions that after a crash it pauses to avoid consecutive crashes or something like that? It rings a bell. |
MJHSend message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]()
|
Quite likely. If the task crashes having made some progress, it will be restarted after a delay of at least 60 sec. BOINC will try and start some other work during this idle time, which may involve downloading another tasks. In the worst case, a very unreliable machine may be continuously cycling between two or three tasks, each time making just enough progress to merit a restart attempt. MJH |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Exactly that happened when my Gigabyte GTX 780Ti OC was unreliable. |
©2026 Universitat Pompeu Fabra