Advanced search

Message boards : Server and website : Not getting short WUs

Author Message
Profile Scalextrix[Gridcoin]
Send message
Joined: 27 Jan 09
Posts: 34
Credit: 130,147,406
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 37265 - Posted: 13 Jul 2014 | 14:41:23 UTC
Last modified: 13 Jul 2014 | 14:50:24 UTC

I added a cc-config to exclude my GTX 670 card from acemdlong WUs, I have a GTX 780 Ti which is good for those. I wanted the GTX 670 to work on acemdshort, however even though there are 1000's of unsent WUs according to the Server Status page, I cant get any...

My GPUGRID preferences allow all projects.

Any ideas?

EDIT: Here is the relevant section of my cc_config:
<options>
<exclude_gpu>
<url>http://boinc.fzk.de/poem/</url>
<device_num>1</device_num>
<app>poemcl</app>
</exclude_gpu>
<exclude_gpu>
<url>http://www.gpugrid.net/</url>
<device_num>1</device_num>
<app>acemdlong</app>
</exclude_gpu>

MarkJ
Volunteer moderator
Project tester
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 732
Credit: 197,194,445
RAC: 3
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37266 - Posted: 13 Jul 2014 | 18:46:41 UTC
Last modified: 13 Jul 2014 | 18:52:43 UTC

You're missing the closing tag for <options> and you need an opening and closing tag for cc_config. If you want to debug the scheduler stuff set the sched_op_debug tags with a value of 1 so you can see what it's requesting and getting back. They go within the log_flags tags.

You will probably also need to restart the client as some flags can only be set when it first starts. The BOINC event log should show what flags are set at start up and any exclusions.
____________
BOINC blog

Profile Scalextrix[Gridcoin]
Send message
Joined: 27 Jan 09
Posts: 34
Credit: 130,147,406
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 37271 - Posted: 15 Jul 2014 | 20:13:50 UTC - in response to Message 37266.

Thanks for the advice, I didnt paste the entire cc_config as its quite long so I just put in an excerpt, I do have the opening and closing tags.

Ill try the debug though, great tip.

Profile Scalextrix[Gridcoin]
Send message
Joined: 27 Jan 09
Posts: 34
Credit: 130,147,406
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 37293 - Posted: 18 Jul 2014 | 9:22:45 UTC - in response to Message 37271.

Well I still cant get tasks when I have the cc_config set to exclude GPU 1 from acemdlong, excerpt of the schedule request from the debugging event log is as follows:

18/07/2014 10:04:48 | GPUGRID | [sched_op] Starting scheduler request
18/07/2014 10:04:48 | GPUGRID | Sending scheduler request: To fetch work.
18/07/2014 10:04:48 | GPUGRID | Requesting new tasks for intel_gpu
18/07/2014 10:04:48 | GPUGRID | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
18/07/2014 10:04:48 | GPUGRID | [sched_op] NVIDIA work request: 0.00 seconds; 0.00 devices
18/07/2014 10:04:48 | GPUGRID | [sched_op] intel_gpu work request: 86400.00 seconds; 1.00 devices
18/07/2014 10:04:50 | GPUGRID | Scheduler request completed: got 0 new tasks

It seems to be saying that when I exclude GPU 1 from acemdlong, that I have 0.00 NVIDIA devices available...?

Here is the cc_config, this time in full:
<cc_config>
<log_flags>
<file_xfer>1</file_xfer>
<sched_ops>1</sched_ops>
<task>1</task>
<android_debug>0</android_debug>
<app_msg_receive>0</app_msg_receive>
<app_msg_send>0</app_msg_send>
<async_file_debug>0</async_file_debug>
<benchmark_debug>0</benchmark_debug>
<checkpoint_debug>0</checkpoint_debug>
<coproc_debug>0</coproc_debug>
<cpu_sched>0</cpu_sched>
<cpu_sched_debug>0</cpu_sched_debug>
<cpu_sched_status>0</cpu_sched_status>
<dcf_debug>0</dcf_debug>
<disk_usage_debug>0</disk_usage_debug>
<file_xfer_debug>0</file_xfer_debug>
<gui_rpc_debug>0</gui_rpc_debug>
<heartbeat_debug>0</heartbeat_debug>
<http_debug>0</http_debug>
<http_xfer_debug>0</http_xfer_debug>
<mem_usage_debug>0</mem_usage_debug>
<network_status_debug>0</network_status_debug>
<notice_debug>0</notice_debug>
<poll_debug>0</poll_debug>
<priority_debug>0</priority_debug>
<proxy_debug>0</proxy_debug>
<rr_simulation>0</rr_simulation>
<rrsim_detail>0</rrsim_detail>
<sched_op_debug>1</sched_op_debug>
<scrsave_debug>0</scrsave_debug>
<slot_debug>0</slot_debug>
<state_debug>0</state_debug>
<statefile_debug>0</statefile_debug>
<suspend_debug>0</suspend_debug>
<task_debug>0</task_debug>
<time_debug>0</time_debug>
<trickle_debug>0</trickle_debug>
<unparsed_xml>0</unparsed_xml>
<work_fetch_debug>0</work_fetch_debug>
</log_flags>
<options>
<exclude_gpu>
<url>http://boinc.fzk.de/poem/</url>
<device_num>1</device_num>
<type>NVIDIA</type>
<app>poemcl</app>
</exclude_gpu>
<exclude_gpu>
<url>http://www.gpugrid.net/</url>
<device_num>1</device_num>
<type>NVIDIA</type>
<app>acemdlong</app>
</exclude_gpu>
<abort_jobs_on_exit>0</abort_jobs_on_exit>
<allow_multiple_clients>0</allow_multiple_clients>
<allow_remote_gui_rpc>0</allow_remote_gui_rpc>
<client_version_check_url>http://boinc.berkeley.edu/download.php?xml=1</client_version_check_url>
<client_new_version_text></client_new_version_text>
<client_download_url>http://boinc.berkeley.edu/download.php</client_download_url>
<disallow_attach>0</disallow_attach>
<dont_check_file_sizes>0</dont_check_file_sizes>
<dont_contact_ref_site>0</dont_contact_ref_site>
<exit_after_finish>0</exit_after_finish>
<exit_before_start>0</exit_before_start>
<exit_when_idle>0</exit_when_idle>
<fetch_minimal_work>0</fetch_minimal_work>
<fetch_on_update>0</fetch_on_update>
<force_auth>default</force_auth>
<http_1_0>0</http_1_0>
<http_transfer_timeout>300</http_transfer_timeout>
<http_transfer_timeout_bps>10</http_transfer_timeout_bps>
<max_event_log_lines>2000</max_event_log_lines>
<max_file_xfers>8</max_file_xfers>
<max_file_xfers_per_project>2</max_file_xfers_per_project>
<max_stderr_file_size>0</max_stderr_file_size>
<max_stdout_file_size>0</max_stdout_file_size>
<max_tasks_reported>0</max_tasks_reported>
<ncpus>-1</ncpus>
<network_test_url>http://www.google.com/</network_test_url>
<no_alt_platform>0</no_alt_platform>
<no_gpus>0</no_gpus>
<no_info_fetch>0</no_info_fetch>
<no_priority_change>0</no_priority_change>
<os_random_only>0</os_random_only>
<proxy_info>
<socks_server_name></socks_server_name>
<socks_server_port>80</socks_server_port>
<http_server_name></http_server_name>
<http_server_port>80</http_server_port>
<socks5_user_name></socks5_user_name>
<socks5_user_passwd></socks5_user_passwd>
<http_user_name></http_user_name>
<http_user_passwd></http_user_passwd>
<no_proxy></no_proxy>
</proxy_info>
<rec_half_life_days>10.000000</rec_half_life_days>
<report_results_immediately>0</report_results_immediately>
<run_apps_manually>0</run_apps_manually>
<save_stats_days>30</save_stats_days>
<skip_cpu_benchmarks>0</skip_cpu_benchmarks>
<simple_gui_only>0</simple_gui_only>
<start_delay>0</start_delay>
<stderr_head>0</stderr_head>
<suppress_net_info>0</suppress_net_info>
<unsigned_apps_ok>0</unsigned_apps_ok>
<use_all_gpus>1</use_all_gpus>
<use_certs>0</use_certs>
<use_certs_only>0</use_certs_only>
<vbox_window>0</vbox_window>
</options>
</cc_config>


Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 899
Credit: 2,111,819,945
RAC: 1,374,820
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37298 - Posted: 18 Jul 2014 | 12:34:55 UTC - in response to Message 37293.

Well I still cant get tasks when I have the cc_config set to exclude GPU 1 from acemdlong, excerpt of the schedule request from the debugging event log is as follows:

18/07/2014 10:04:48 | GPUGRID | [sched_op] Starting scheduler request
18/07/2014 10:04:48 | GPUGRID | Sending scheduler request: To fetch work.
18/07/2014 10:04:48 | GPUGRID | Requesting new tasks for intel_gpu
18/07/2014 10:04:48 | GPUGRID | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
18/07/2014 10:04:48 | GPUGRID | [sched_op] NVIDIA work request: 0.00 seconds; 0.00 devices
18/07/2014 10:04:48 | GPUGRID | [sched_op] intel_gpu work request: 86400.00 seconds; 1.00 devices
18/07/2014 10:04:50 | GPUGRID | Scheduler request completed: got 0 new tasks

It seems to be saying that when I exclude GPU 1 from acemdlong, that I have 0.00 NVIDIA devices available...?

No, it's not saying that. It's saying that, at the time you made that request, you didn't need any more NVidia work, but you had a completely idle intel_gpu - which, of course, GPUGrid can't supply any work for.

But I note from http://www.gpugrid.net/results.php?hostid=165969 that you were allocated two short tasks round about the time that you posted, so it appears that the process has worked.

Profile Scalextrix[Gridcoin]
Send message
Joined: 27 Jan 09
Posts: 34
Credit: 130,147,406
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 37304 - Posted: 19 Jul 2014 | 9:23:09 UTC - in response to Message 37298.

Hi Richard, I couldnt get any new tasks, so I removed the exclude gpu from cc_config and instantly I got new tasks, if I add it back, no new tasks...

Unless there is a solution I will just let my GPUs do what they want. Thanks forum for the help.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,976,312,010
RAC: 125,698
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37306 - Posted: 19 Jul 2014 | 9:40:29 UTC - in response to Message 37304.
Last modified: 19 Jul 2014 | 16:27:36 UTC

Firstly, a GTX 670 is quite capable of running Long WU's.

Secondly, this isn't going to work by itself,

<exclude_gpu>
<url>http://www.gpugrid.net/</url>
<device_num>1</device_num>
<type>NVIDIA</type>
<app>acemdlong</app>
</exclude_gpu>

You would need to do more:
Make sure you also have short tasks selected for that system profile, otherwise it's just not going to work.
Presuming you only want to run long tasks on the bigger card you would also need to exclude short tasks for that GPU.

On the server the app names are displayed as,
Long runs (8-12 hours on fastest card)
Short runs (2-3 hours on fastest card)
However they are really acemdlong and acemdshort

Device type and number in this order,
<type>NVIDIA</type>
<device_num>0</device_num>

testing,

    <exclude_gpu>
    <url>http://www.gpugrid.net/</url>
    <type>NVIDIA</type>
    <device_num>1</device_num>
    <app>acemdlong</app>
    </exclude_gpu>

    <exclude_gpu>
    <url>http://www.gpugrid.net/</url>
    <type>NVIDIA</type>
    <device_num>0</device_num>
    <app>acemdshort</app>
    </exclude_gpu>



If you needed to alter your profile then do a project update afterwards from Boinc.
After reconfiguring your cc_config file(s), read the config files from within Boinc (Advanced).

Note that there might not always be short tasks!

- Correcting...
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Scalextrix[Gridcoin]
Send message
Joined: 27 Jan 09
Posts: 34
Credit: 130,147,406
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 37307 - Posted: 19 Jul 2014 | 11:17:54 UTC - in response to Message 37306.

Thanks skgiven, I dont care if GPU 0 processes both long and short tasks, I just want GPU 1 to only work on shorts, I agree that a GTX670 can do long tasks but as I dont run 24/7 its better for me to limit the usage to short tasks only, so they complete in a 'decent amount of time'.

I moved the order of where the <type>NVIDIA</type> statement was, however when I used your suggested <app>Long runs (8-12 hours on fastest card)</app>, BOINC manager gave me an error:

"A GPU exclusion on your cc_config.xml file refers to an unknown application 'Long runs (8-12 hours on fastest card)'. Known applications: 'acemdlong', 'acemdshort', 'andriod'."

So I changed cc_config back to:
<exclude_gpu>
<url>http://www.gpugrid.net/</url>
<type>NVIDIA</type>
<device_num>1</device_num>
<app>acemdlong</app>
</exclude_gpu>

I definitely do have Short Tasks selected on the project preferences webpage.

Profile Scalextrix[Gridcoin]
Send message
Joined: 27 Jan 09
Posts: 34
Credit: 130,147,406
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 37308 - Posted: 19 Jul 2014 | 11:22:22 UTC - in response to Message 37307.

I have given up on this effort, Ill just let the GPUs process what they want. Thanks to the forums for your efforts.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,976,312,010
RAC: 125,698
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37311 - Posted: 19 Jul 2014 | 18:24:20 UTC - in response to Message 37308.

I made a couple of corrections to my previous post. I thought it worked but my setup is complex and another cc_config file kicked in. If I get it working I'll let you know...
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,976,312,010
RAC: 125,698
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37312 - Posted: 19 Jul 2014 | 18:24:30 UTC - in response to Message 37311.
Last modified: 20 Jul 2014 | 13:02:28 UTC

I think I sort of know what the issue is.

My setup is different and to test this I wanted to only run short tasks on GPU 0 (a GTX770), but run long tasks on a GTX660Ti (I don't always want to crunch on the 770).

My cc_config file (without the logs) is,

    <cc_config>
    <options>
    <start_delay>30</start_delay>
    <use_all_gpus>0</use_all_gpus>

    <exclude_gpu>
    <url>http://www.gpugrid.net/</url>
    <type>NVIDIA</type>
    <device_num>0</device_num>
    <app>acemdlong</app>
    </exclude_gpu>

    <exclude_gpu>
    <url>http://www.gpugrid.net/</url>
    <type>NVIDIA</type>
    <device_num>1</device_num>
    <app>acemdshort</app>
    </exclude_gpu>

    </options>
    </cc_config>



I was running one Long WU on GPU 1. GPU 0 was free, and remained so after a restart, but Boinc would not ask for short work for GPU 0.

Instead I got the message,

19/07/2014 19:03:31 | GPUGRID | [coproc] NVIDIA instance 0; 1.000000 pending for 37x2-NOELIA_BI_3-10-14-RND2323_1

19/07/2014 19:03:31 | GPUGRID | [coproc] NVIDIA instance 1: confirming 1.000000 instance for 37x2-NOELIA_BI_3-10-14-RND2323_1

To me this suggests a WU is being linked with both GPU's (GPU0 has 1 pending WU - which is wrong because it's running on GPU1) so Boinc thinks the GPU isn't available to run other tasks because there is a pending WU?
If my interpretation is correct-ish then there is a bug. Maybe this has already been fixed in a beta?

I suspected it might have worked in 4h, when the Long WU completed, but I didn't want to wait, so I tried to force the issue.
First I told my system to only get short tasks and then to start using both cards for all types of work. After downloading a short task, I suspended both tasks read the cc_config file and then enabled the tasks. They started running on the correct cards.

I've changed back to both types of work in the profile. I suppose I will have to wait and see if it keeps working before I know for sure, but I think that demonstrates that it will work, but not straight away - running tasks would first need to complete...

Another problem is that Boinc doesn't ask for a specific task type when downloading work - it just asks for work, so if you need short tasks but download a long task then it will sit in the queue and you will not get a short task,

20/07/2014 09:28:04 | GPUGRID | [coproc] NVIDIA instance 0; 1.000000 pending for 2x19x1x6-NOELIA_THROMBIN1-2-3-RND8291_0
20/07/2014 09:28:04 | GPUGRID | [coproc] NVIDIA instance 1: confirming 1.000000 instance for 2x19x1x6-NOELIA_THROMBIN1-2-3-RND8291_0
20/07/2014 09:28:04 | GPUGRID | [coproc] Insufficient NVIDIA for 2x26x1x13-NOELIA_THROMBIN1-2-3-RND1179_0; need 1, available 0

So unless you changed profile before asking for tasks (every time), to ensure you only get short (or long) tasks when you need them, this won't work. IMO this is all too complicated as is, so even if a fix turned up the way forward can only be hardware orientation (allocate per hardware unit). Try writing a detailed cc_config file for a system with an Intel CPU/GPU, an ATI, 2 NVidia's and a mix of external mining devices.

The only other thing I can think of would be to run two instances of Boinc, disabling one GPU in each (and I don't know if that would actually work or not).

...
Tried 7.4.8, same problem,
20/07/2014 14:03:47 | GPUGRID | [coproc] NVIDIA instance 0; 1.000000 pending for 2x26x1x13-NOELIA_THROMBIN1-2-3-RND1179_0
20/07/2014 14:03:47 | GPUGRID | [coproc] NVIDIA instance 1: confirming 1.000000 instance for 2x26x1x13-NOELIA_THROMBIN1-2-3-RND1179_0

Changed profile to ask for Short tasks and other tasks if no short tasks available!

20/07/2014 13:36:06 | GPUGRID | No tasks sent
20/07/2014 13:36:06 | GPUGRID | No tasks are available for Short runs (2-3 hours on fastest card)
20/07/2014 13:36:06 | GPUGRID | No tasks are available for CPU only app
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Scalextrix[Gridcoin]
Send message
Joined: 27 Jan 09
Posts: 34
Credit: 130,147,406
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 37332 - Posted: 21 Jul 2014 | 8:21:05 UTC - in response to Message 37312.

Thanks skgiven, I didnt follow everything in your post, but your description seems to match what Im seeing, if I ask BOINC to exclude one App from one GPU, I dont get any tasks from GPUGRID at all. As you say if I spend time going to the projects webpage, switching preferences, messing with cc_config, then I can get short tasks onto GPU1, but frankly suspending/resuming tasks in BOINC Manager to switch acemdshort off of GPU0 is quicker, has less steps, and has the same result.

However note I have not had the same problem on POEM@HOME, I successfully excluded GPU1 from receiving poemcl tasks in cc_config, but they still download and process on GPU0. Assume the difference is POEM only has one available GPU App, where GPUGRID has many.

Post to thread

Message boards : Server and website : Not getting short WUs