Why am I getting TOO MANY WU's?

Message boards : Number crunching : Why am I getting TOO MANY WU's?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33021 - Posted: 16 Sep 2013, 21:30:58 UTC

Yes.. only Intel did not yet get their heads around the fact that if they want their GPUs to be used as coprocessors (do they?) they should consider the remote possibility someone wants to OpenCL-crunch some nuumbers on them without displaying the result on an attached display.

In short: only Intel requires a display or dummy.

MrS
Scanning for our furry friends since Jan 2002
ID: 33021 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 33027 - Posted: 16 Sep 2013, 21:58:24 UTC - in response to Message 33021.  


In short: only Intel requires a display or dummy.


That sort of thing is usually down to the BIOS - some systems will insist on disabling the iGPU if there's a discrete card present, others let you chose which has priority. Still others let you chose but then bugger things up anyway.

Matt
ID: 33027 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33074 - Posted: 18 Sep 2013, 20:05:31 UTC - in response to Message 33027.  

That's not what I'm talking about. I'm starting from the point where all GPUs are crunching happily ever after.. until you remove the display from the iGPU. At this point, or when the next WU is supposed to start BOINC can't detect the card as OpenCL device any more because Intels driver sent it to bed. The BIOS has nothing to say in this if you're already in Win and both can run simultaneously in principle.

MrS
Scanning for our furry friends since Jan 2002
ID: 33074 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rick A. Sponholz
Avatar

Send message
Joined: 20 Jan 09
Posts: 52
Credit: 2,518,707,115
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33151 - Posted: 22 Sep 2013, 20:37:18 UTC
Last modified: 22 Sep 2013, 20:37:47 UTC

I am happy to report successful use of two iCPU's (both 4770's). For me, it required the bios setting for the iGPU to be set to, "Always On", my monitor attached to my GTX690's, AND a VGA Dummy Plug attached to the iGPU. They have been running for 5 hrs now without a problem. So far only SETI Beta wu's have run, but hope to get Einstein wu's soon. I'll report when I get some successfully run. Getting my 3770 iGPU working using the same techniques was a dismal failure. Will keep on trying though. BTW, I'm still having the problem of getting too many GPUGRIP wu's at a time (usually 6 wu's for my 4 cores). hope someone can help me with that too. Thanks to all for the help. Regards, Rick
ID: 33151 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33153 - Posted: 22 Sep 2013, 21:16:51 UTC - in response to Message 33151.  
Last modified: 22 Sep 2013, 21:27:07 UTC

I think, my default, you'll get "up to 2 GPUGrid tasks per GPU".

The best approach to limit BOINC work fetch from fetching too many tasks, is to:
- Make sure you are using the latest supported version of BOINC (7.0.64 currently I believe)
- Limit the buffer settings (My settings are: 0.05 days minimum buffer, 0.15 days max additional buffer)
- Use settings within cc_config.xml (in your data directory) to ensure you specifically exclude any GPUs that you do not want to do GPUGrid work. See http://boinc.berkeley.edu/wiki/Client_configuration. Regarding GPUGrid, this really should only be needed if you have any nVidia GPUs that you do not want to do GPUGrid work on.

For reference, here's my cc_config.xml file, which shows how I have excluded GPUGrid from GPU #2 (in addition to several other GPU exclusions across the 3 GPUs). Note that I have some sections commented out using xml comment blocks.


<cc_config>


	<log_flags>
		<!-- The 3 flags that are on by default are: file_xfer, sched_ops, task -->

		<file_xfer>1</file_xfer>
		<file_xfer_debug>0</file_xfer_debug>

		<sched_ops>1</sched_ops>
		<sched_op_debug>0</sched_op_debug>

		<task>1</task>
		<task_debug>0</task_debug>

		<unparsed_xml>1</unparsed_xml>

		<work_fetch_debug>0</work_fetch_debug>
		<rr_simulation>0</rr_simulation>
		<rrsim_detail>0</rrsim_detail>

		<cpu_sched>0</cpu_sched>
		<cpu_sched_debug>0</cpu_sched_debug>
		<cpu_sched_status>0</cpu_sched_status>
		<coproc_debug>1</coproc_debug>

		<mem_usage_debug>0</mem_usage_debug>
		<checkpoint_debug>1</checkpoint_debug>

		<http_debug>0</http_debug>
		<http_xfer_debug>0</http_xfer_debug>
		<network_status_debug>0</network_status_debug>

		<scrsave_debug>1</scrsave_debug>
		<notice_debug>0</notice_debug>

		<app_msg_receive>0</app_msg_receive>
		<app_msg_send>0</app_msg_send>
		<async_file_debug>0</async_file_debug>
		<benchmark_debug>0</benchmark_debug>
		<dcf_debug>0</dcf_debug>
		<disk_usage_debug>0</disk_usage_debug>
		<priority_debug>0</priority_debug>
		<gui_rpc_debug>0</gui_rpc_debug>
		<heartbeat_debug>0</heartbeat_debug>
		<poll_debug>0</poll_debug>
		<proxy_debug>0</proxy_debug>
		<slot_debug>0</slot_debug>
		<state_debug>0</state_debug>
		<statefile_debug>0</statefile_debug>
		<suspend_debug>0</suspend_debug>
		<time_debug>0</time_debug>
		<trickle_debug>0</trickle_debug>

	</log_flags>




	<options>
		<!-- =================================================== TESTING OPTIONS =================================================== -->
<!--
		<start_delay>20</start_delay>
		<ncpus>8</ncpus>
		<exclusive_app>NotepadTest01.exe</exclusive_app>
		<exclusive_gpu_app>NotepadTest02.exe</exclusive_gpu_app>
-->

		<!-- =================================================== REGULAR OPTIONS =================================================== -->
		<report_results_immediately>0</report_results_immediately>
		<fetch_on_update>0</fetch_on_update>
		<max_event_log_lines>0</max_event_log_lines>

		<max_file_xfers>10</max_file_xfers>
		<max_file_xfers_per_project>4</max_file_xfers_per_project>

		<exclusive_app>iRacingSim.exe</exclusive_app>
		<exclusive_app>iRacingSim64.exe</exclusive_app>
		<exclusive_app>Aces.exe</exclusive_app>
		<exclusive_app>TmForever.exe</exclusive_app>
		<exclusive_app>TmForeverLauncher.exe</exclusive_app>

		<!-- ===================================================== SETUP GPUS ====================================================== -->
		<use_all_gpus>1</use_all_gpus>

		<!-- =========================================== SETUP GPU 0: GeForce GTX 660 Ti =========================================== -->
<!--
		<ignore_nvidia_dev>0</ignore_nvidia_dev>
-->

		<!-- Exclude World Community Grid's "Help Conquer Cancer" GPU app (hcc1) on main display - makes graphics slow, even on 660 Ti -->
		<!-- Commenting out, for now, since this round of hcc1 is completed, and next round may not exhibit the issue. -->
<!--
		<exclude_gpu>
			<url>http://www.worldcommunitygrid.org</url>
			<device_num>0</device_num>
			<app>hcc1</app>
		</exclude_gpu>
-->

		<!-- Exclude Einstein/Albert, since work from other GPU projects should give enough work to keep this GPU busy. -->
		<exclude_gpu>
			<url>http://einstein.phys.uwm.edu/</url>
			<device_num>0</device_num>
		</exclude_gpu>
		<exclude_gpu>
			<url>http://albert.phys.uwm.edu/</url>
			<device_num>0</device_num>
		</exclude_gpu>

		<!-- Exclude SETI/Beta, since work from other GPU projects should give enough work to keep this GPU busy. -->
		<exclude_gpu>
			<url>http://setiathome.berkeley.edu/</url>
			<device_num>0</device_num>
		</exclude_gpu>
		<exclude_gpu>
			<url>http://setiweb.ssl.berkeley.edu/beta/</url>
			<device_num>0</device_num>
		</exclude_gpu>

		<!-- Exclude Milkyway@Home, since work from other GPU projects should give enough work to keep this GPU busy. -->
		<exclude_gpu>
			<url>http://milkyway.cs.rpi.edu/milkyway/</url>
			<device_num>0</device_num>
		</exclude_gpu>

		<!-- =========================================== SETUP GPU 1: GeForce GTX 460 =========================================== -->
<!--
		<ignore_nvidia_dev>1</ignore_nvidia_dev>
-->

		<!-- Exclude POEM's "POEM++ OpenCL version" GPU app (poemcl) from a second heterogeneous GPU, since it does not work properly -->
		<!-- Note: Although 320.18 drivers successfully run smalltest_3, the drivers still do not work right with POEM. -->
		<!-- Note: Also, it appears that running POEM only on the GTX 460, does not work. So, it must run on the GTX 660 Ti! -->
		<exclude_gpu>
			<url>http://boinc.fzk.de/poem/</url>
			<device_num>1</device_num>
			<app>poemcl</app>
		</exclude_gpu>

		<!-- Reminder: For GPUGrid.net, if going to run 2-tasks-on-1-GPU, exclude this GPU (it only has 1 GB memory) -->
<!--
		<exclude_gpu>
			<url>http://www.gpugrid.net</url>
			<device_num>1</device_num>
		</exclude_gpu>
-->

		<!-- Exclude Einstein/Albert, since work from other GPU projects should give enough work to keep this GPU busy. -->
		<exclude_gpu>
			<url>http://einstein.phys.uwm.edu/</url>
			<device_num>1</device_num>
		</exclude_gpu>
		<exclude_gpu>
			<url>http://albert.phys.uwm.edu/</url>
			<device_num>1</device_num>
		</exclude_gpu>

		<!-- Exclude SETI/Beta, since work from other GPU projects should give enough work to keep this GPU busy. -->
		<exclude_gpu>
			<url>http://setiathome.berkeley.edu/</url>
			<device_num>1</device_num>
		</exclude_gpu>
		<exclude_gpu>
			<url>http://setiweb.ssl.berkeley.edu/beta/</url>
			<device_num>1</device_num>
		</exclude_gpu>

		<!-- Exclude Milkyway@Home, since work from other GPU projects should give enough work to keep this GPU busy. -->
		<exclude_gpu>
			<url>http://milkyway.cs.rpi.edu/milkyway/</url>
			<device_num>1</device_num>
		</exclude_gpu>

		<!-- =========================================== SETUP GPU 2: GeForce GTS 240 =========================================== -->
<!--
		<ignore_nvidia_dev>2</ignore_nvidia_dev>
-->

		<!-- Exclude World Community Grid's Help Conquer Cancer GPU app -->
		<!-- GPU not supported per https://secure.worldcommunitygrid.org/help/viewTopic.do?shortName=GPU#610 -->
		<exclude_gpu>
			<url>http://www.worldcommunitygrid.org</url>
			<device_num>2</device_num>
			<app>hcc1</app>
		</exclude_gpu>

		<!-- Exclude POEM's "POEM++ OpenCL version" GPU app (poemcl) from a second heterogeneous GPU, since it does not work properly -->
		<!-- Also, GPU is not supported, as all tasks immediately error out -->
		<exclude_gpu>
			<url>http://boinc.fzk.de/poem/</url>
			<device_num>2</device_num>
			<app>poemcl</app>
		</exclude_gpu>

		<!-- Exclude GPUGrid.net -->
		<!-- GPU not supported per http://www.gpugrid.net/forum_thread.php?id=2507 -->
		<exclude_gpu>
			<url>http://www.gpugrid.net/</url>
			<device_num>2</device_num>
		</exclude_gpu>

		<!-- Exclude Milkyway@Home -->
		<!-- GPU not supported, as all tasks immediately error out -->
		<exclude_gpu>
			<url>http://milkyway.cs.rpi.edu/milkyway/</url>
			<device_num>2</device_num>
		</exclude_gpu>

	</options>


</cc_config>



If done successfully, you'll see the exclusions listed towards the beginning of your Event Log when you restart BOINC. For instance, mine says:

9/21/2013 9:13:51 PM | Einstein@Home | Config: excluded GPU. Type: all. App: all. Device: 0
9/21/2013 9:13:51 PM | Albert@Home | Config: excluded GPU. Type: all. App: all. Device: 0
9/21/2013 9:13:51 PM | SETI@home | Config: excluded GPU. Type: all. App: all. Device: 0
9/21/2013 9:13:51 PM | SETI@home Beta Test | Config: excluded GPU. Type: all. App: all. Device: 0
9/21/2013 9:13:51 PM | Milkyway@Home | Config: excluded GPU. Type: all. App: all. Device: 0
9/21/2013 9:13:51 PM | Poem@Home | Config: excluded GPU. Type: all. App: poemcl. Device: 1
9/21/2013 9:13:51 PM | Einstein@Home | Config: excluded GPU. Type: all. App: all. Device: 1
9/21/2013 9:13:51 PM | Albert@Home | Config: excluded GPU. Type: all. App: all. Device: 1
9/21/2013 9:13:51 PM | SETI@home | Config: excluded GPU. Type: all. App: all. Device: 1
9/21/2013 9:13:51 PM | SETI@home Beta Test | Config: excluded GPU. Type: all. App: all. Device: 1
9/21/2013 9:13:51 PM | Milkyway@Home | Config: excluded GPU. Type: all. App: all. Device: 1
9/21/2013 9:13:51 PM | World Community Grid | Config: excluded GPU. Type: all. App: hcc1. Device: 2
9/21/2013 9:13:51 PM | Poem@Home | Config: excluded GPU. Type: all. App: poemcl. Device: 2
9/21/2013 9:13:51 PM | GPUGRID | Config: excluded GPU. Type: all. App: all. Device: 2
9/21/2013 9:13:51 PM | Milkyway@Home | Config: excluded GPU. Type: all. App: all. Device: 2

Good luck!
ID: 33153 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rick A. Sponholz
Avatar

Send message
Joined: 20 Jan 09
Posts: 52
Credit: 2,518,707,115
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33160 - Posted: 23 Sep 2013, 1:31:50 UTC - in response to Message 33153.  

Thanks for sharing your setup Jacob. I too use a cc_config to try and exclude my iGPU from the calculations BOINC makes to get GPUGRID work. My buffer settings are similar to yours: .02 minimum days, .23 additional days. I do use project specific app_config.xml's to allow multiple projects to run simaltainiously on my GTX690's. Other than those, I let BOINC get work for all my projects and allow BOINC access to all my computer capability. Still wish I'd only get 1 GPUGRID wu per CPU, until I get to the .02 minimum shown in my preferences, and when it gets work, only get 1 WU (because 1 long wu is longer than .25 days work) also shown in my preferences. Oh well, I'll keep trying, but hate abortinh GPUGRID wu's because they've been sitting in my que unable to run because I got more than 4 wu's. Thanks again, Regards, Rick
ID: 33160 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33161 - Posted: 23 Sep 2013, 3:25:28 UTC - in response to Message 33160.  

If you want me to do some more research into it, I can. What I'd need you to do is to turn on work_fetch_debug in the cc_config log_flags, then capture a segment where you believe BOINC fetched work erroneously.

I'm well-versed in reading the BOINC work fetch log files.
ID: 33161 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33269 - Posted: 29 Sep 2013, 20:25:57 UTC

I figured out a reason for overfetch in my BOINC 7.1.18:

I have GPU-Grid set up as backup project, with POEM being primary and providing sporadic work supply. The cache setting is relatively low (0.05 + 0.35 days) and long runs take ~11 hours (0.46 days) on my GPU.

Yet, when POEM ran out BOINC got 2 GPU-Grids. I increased the time to switch between apps to 15h to make it finish the 1st WU which already started, but the other one may sit there for days without starting when POEM supplied work again.

So there I sat thinking why the **** does BOINC fetch 22h worth of work when it's supposed to get at most 1 for a backup project anyway? Why not wait until the 1st WU is almost finished, or at least when the expected remaining runtime falls below the cache setting? This made me switch to short tasks, where the entire problem is just not as large.

Today I think I figured out what's happening: I switched to long-runs again and got a beta WU, where this didn't happen (only fetched one, as it should). The difference on my side between beta and long run? The former is set to 1 GPU, whereas the latter is set to 0.51 to allow up to 3 POEMs along the GPU-Grid task, increasing GPU utilization from ~85% to 98%.

Apparently BOINC sees a nVidia not fully utilized when the backup project kicks in (correct, it's only 0.51), but does not yet factor in that fetching another WU of 0.51 won't help in this case. On the other hand this behaviour is not too bad, since with correct backup-project-behaviour I'd have to wait for the download of the new WU when the current one finishes. And I'd rather miss some early-return bonus than risk an idle GPU ;)

Long story short: do you have less than 1 GPU set per GPU-Grid task?

MrS
Scanning for our furry friends since Jan 2002
ID: 33269 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rick A. Sponholz
Avatar

Send message
Joined: 20 Jan 09
Posts: 52
Credit: 2,518,707,115
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33271 - Posted: 30 Sep 2013, 3:20:21 UTC - in response to Message 33269.  

I figured out a reason for overfetch in my BOINC 7.1.18:

I have GPU-Grid set up as backup project, with POEM being primary and providing sporadic work supply. The cache setting is relatively low (0.05 + 0.35 days) and long runs take ~11 hours (0.46 days) on my GPU.

Yet, when POEM ran out BOINC got 2 GPU-Grids. I increased the time to switch between apps to 15h to make it finish the 1st WU which already started, but the other one may sit there for days without starting when POEM supplied work again.

So there I sat thinking why the **** does BOINC fetch 22h worth of work when it's supposed to get at most 1 for a backup project anyway? Why not wait until the 1st WU is almost finished, or at least when the expected remaining runtime falls below the cache setting? This made me switch to short tasks, where the entire problem is just not as large.

Today I think I figured out what's happening: I switched to long-runs again and got a beta WU, where this didn't happen (only fetched one, as it should). The difference on my side between beta and long run? The former is set to 1 GPU, whereas the latter is set to 0.51 to allow up to 3 POEMs along the GPU-Grid task, increasing GPU utilization from ~85% to 98%.

Apparently BOINC sees a nVidia not fully utilized when the backup project kicks in (correct, it's only 0.51), but does not yet factor in that fetching another WU of 0.51 won't help in this case. On the other hand this behaviour is not too bad, since with correct backup-project-behaviour I'd have to wait for the download of the new WU when the current one finishes. And I'd rather miss some early-return bonus than risk an idle GPU ;)

Long story short: do you have less than 1 GPU set per GPU-Grid task?

MrS


Yes, I do have GPUGRID set at .75 GPU via app_config, so I can get better utilization from my GTX690's. So, you think BOINC can't figure out how to properly feed the GPU's? Interesting perspective. Thanks for your thoughts MrS. Regards, Rick
ID: 33271 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33310 - Posted: 1 Oct 2013, 20:51:00 UTC

Rick, just curious: why are you setting 0.75? Any other project running besides GPU-Grid, like POEM in my case?

SK: to be fair, scheduling is in this case based on "completely utilize the GPUs before anything else", which is in principle just what I want. The failure seems to be a minor bug / unwanted behaviour in that the current logic only says "GPU not fully utilized, so get more work" without taking into account that the new WU won't be able to run due to my settings (and in fact shouldn't start to run before the other one).

MrS
Scanning for our furry friends since Jan 2002
ID: 33310 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Why am I getting TOO MANY WU's?

©2025 Universitat Pompeu Fabra