Advanced search

Message boards : Graphics cards (GPUs) : No GPUGRID jobs for over a week

Author Message
Paracelsus
Send message
Joined: 11 Aug 10
Posts: 11
Credit: 21,424,870
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 47632 - Posted: 18 Jul 2017 | 22:02:03 UTC

I seem to have stopped running GPUGRID jobs for the last ~12 days, though other GPU BOINC projects run jobs and there seems enough jobs in the queue:

Event log:

Tue 18 Jul 2017 05:45:25 PM EDT | GPUGRID | update requested by user
Tue 18 Jul 2017 05:45:28 PM EDT | GPUGRID | sched RPC pending: Requested by user
Tue 18 Jul 2017 05:45:28 PM EDT | GPUGRID | [sched_op] Starting scheduler request
Tue 18 Jul 2017 05:45:28 PM EDT | GPUGRID | Sending scheduler request: Requested by user.
Tue 18 Jul 2017 05:45:28 PM EDT | GPUGRID | Requesting new tasks for CPU and NVIDIA GPU
Tue 18 Jul 2017 05:45:28 PM EDT | GPUGRID | [sched_op] CPU work request: 216900.00 seconds; 5.00 devices
Tue 18 Jul 2017 05:45:28 PM EDT | GPUGRID | [sched_op] NVIDIA GPU work request: 43380.00 seconds; 1.00 devices
Tue 18 Jul 2017 05:45:30 PM EDT | GPUGRID | Scheduler request completed: got 0 new tasks
Tue 18 Jul 2017 05:45:30 PM EDT | GPUGRID | [sched_op] Server version 613
Tue 18 Jul 2017 05:45:30 PM EDT | GPUGRID | No tasks sent
Tue 18 Jul 2017 05:45:30 PM EDT | GPUGRID | Project requested delay of 31 seconds
Tue 18 Jul 2017 05:45:30 PM EDT | GPUGRID | [sched_op] Deferring communication for 00:00:31
Tue 18 Jul 2017 05:45:30 PM EDT | GPUGRID | [sched_op] Reason: requested by project


Local cc_config and client version

calculus:~ # cat /var/lib/boinc/cc_config.xml
<cc_config>
<log_flags>
<coproc_debug>1</coproc_debug>
<sched_op_debug>1</sched_op_debug>
</log_flags>
</cc_config>

calculus:~ # rpm -qa | grep -i boinc
boinc-client-7.6.33-2.2.x86_64
boinc-client-lang-7.6.33-2.2.noarch
boinc-manager-7.6.33-2.2.x86_64
boinc-manager-lang-7.6.33-2.2.noarch
libboinc7-7.6.33-2.2.x86_64


Another odd thing is my project stats show 1 and only 1 task (from Jan 2016) but nothing credited since, but that's not correct.

http://www.gpugrid.net/results.php?userid=63993

Any hints would be appreciated.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47633 - Posted: 18 Jul 2017 | 22:22:16 UTC - in response to Message 47632.

Exit BOINC manager with stopping scientific applications, and then update your NVidia driver.

Paracelsus
Send message
Joined: 11 Aug 10
Posts: 11
Credit: 21,424,870
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 47634 - Posted: 18 Jul 2017 | 23:27:37 UTC

The installed driver is latest Nvidia long lived (375.66)

Should it be necessary to move to short lived branch (381.22)?

calculus:~ # nvidia-smi
Tue Jul 18 19:22:34 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 650... Off | 0000:01:00.0 N/A | N/A |
| 30% 39C P8 N/A / N/A | 316MiB / 975MiB | N/A Default |
+-------------------------------+----------------------+----------------------+

Paracelsus
Send message
Joined: 11 Aug 10
Posts: 11
Credit: 21,424,870
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 47635 - Posted: 19 Jul 2017 | 11:30:29 UTC

Updated proprietary Nvidia driver to latest short lived branch (381.22) but still not getting jobs.

calculus:/home/paracelsus # nvidia-smi
Wed Jul 19 07:22:22 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 381.22 Driver Version: 381.22 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 650... Off | 0000:01:00.0 N/A | N/A |
| 32% 43C P8 N/A / N/A | 177MiB / 975MiB | N/A Default |
+-------------------------------+----------------------+----------------------+


Are there other appropriate cc_config.xml debug flags that can be set to help determine why? I'm not seeing the reason in the log so far.

Requests for new tasks always result in the following even when >2000 tasks available:

Wed 19 Jul 2017 07:25:33 AM EDT | GPUGRID | update requested by user
Wed 19 Jul 2017 07:25:36 AM EDT | GPUGRID | sched RPC pending: Requested by user
Wed 19 Jul 2017 07:25:36 AM EDT | GPUGRID | [sched_op] Starting scheduler request
Wed 19 Jul 2017 07:25:37 AM EDT | GPUGRID | Sending scheduler request: Requested by user.
Wed 19 Jul 2017 07:25:37 AM EDT | GPUGRID | Requesting new tasks for NVIDIA GPU
Wed 19 Jul 2017 07:25:37 AM EDT | GPUGRID | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Wed 19 Jul 2017 07:25:37 AM EDT | GPUGRID | [sched_op] NVIDIA GPU work request: 43380.00 seconds; 1.00 devices
Wed 19 Jul 2017 07:25:38 AM EDT | GPUGRID | Scheduler request completed: got 0 new tasks
Wed 19 Jul 2017 07:25:38 AM EDT | GPUGRID | [sched_op] Server version 613
Wed 19 Jul 2017 07:25:38 AM EDT | GPUGRID | No tasks sent
Wed 19 Jul 2017 07:25:38 AM EDT | GPUGRID | Project requested delay of 31 seconds
Wed 19 Jul 2017 07:25:38 AM EDT | GPUGRID | [sched_op] Deferring communication for 00:00:31
Wed 19 Jul 2017 07:25:38 AM EDT | GPUGRID | [sched_op] Reason: requested by project


Thanks for any tips, I'd like to keep contributing to the project.




wabr101
Send message
Joined: 3 Feb 12
Posts: 4
Credit: 196,595,724
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 47637 - Posted: 20 Jul 2017 | 19:17:32 UTC - in response to Message 47635.

Same problem I had....kept deferring communication..no tasks even though available.

Paracelsus
Send message
Joined: 11 Aug 10
Posts: 11
Credit: 21,424,870
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 47647 - Posted: 22 Jul 2017 | 23:15:04 UTC
Last modified: 22 Jul 2017 | 23:42:43 UTC

I've enable other debug flags but am unable to determine why no work units received since early July.

I've enjoyed contributing to this project, particularly seeing the resultant publications, and would like to continue. Of course there is WCG, but I'd like to continue with GPUGrid as well.

Others things I've checked:

Updated client and manager to boinc-client-7.6.33-2.2.x86_64 (Suse Tumbleweed repos)

Ensured 32 bit libs needed were installed
http://boinc.berkeley.edu/wiki/Installing_on_Linux

Turned on additional debugging in /var/lib/boinc/cc_config.xml

Verified client_state.xml values look okay
https://boinc.berkeley.edu/dev/forum_thread.php?id=10936

Checked client logs in ~/.BOINC

Any other suggestions for how to troubleshoot would be appreciated.


Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [work_fetch] set_request() for CPU: ninst 5 nused_total 0.00 nidle_now 5.00 fetch share 1.00 req_inst 5.00 req_secs 216900.00
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [work_fetch] set_request() for NVIDIA GPU: ninst 1 nused_total 0.00 nidle_now 1.00 fetch share 1.00 req_inst 1.00 req_secs 43380.00
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [sched_op] Starting scheduler request
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [work_fetch] request: CPU (216900.00 sec, 5.00 inst) NVIDIA GPU (43380.00 sec, 1.00 inst)
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | Sending scheduler request: To fetch work.
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | Requesting new tasks for CPU and NVIDIA GPU
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [sched_op] CPU work request: 216900.00 seconds; 5.00 devices
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [sched_op] NVIDIA GPU work request: 43380.00 seconds; 1.00 devices
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] HTTP_OP::init_post(): http://www.ps3grid.net/PS3GRID_cgi/cgi
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Connection 4 seems to be dead!
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Closing connection 4
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Connection 5 seems to be dead!
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Closing connection 5
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Connection 7 seems to be dead!
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Closing connection 7
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Connection 8 seems to be dead!
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Closing connection 8
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Hostname www.ps3grid.net was found in DNS cache
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Trying 84.89.134.145...
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: TCP_NODELAY set
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Info: Connected to www.ps3grid.net (84.89.134.145) port 80 (#9)
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: POST /PS3GRID_cgi/cgi HTTP/1.1
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: Host: www.ps3grid.net
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: User-Agent: BOINC client (x86_64-pc-linux-gnu 7.6.33)
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: Accept: */*
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: Content-Type: application/x-www-form-urlencoded
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: Accept-Language: en_US
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: Content-Length: 10407
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: Expect: 100-continue
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server:
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: urlencoded
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server: Accept-Language: en_US
Sat 22 Jul 2017 07:07:31 PM EDT | GPUGRID | [http] [ID#1] Sent header to server:
a
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: HTTP/1.1 100 Continue
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Info: We are completely uploaded and fine
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: HTTP/1.1 200 OK
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: Date: Sat, 22 Jul 2017 23:07:32 GMT
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_auth_gssapi/1.3.1 mod_auth_kerb/5.4 mod_fcgid/2.3.9 PHP/5.4.16 mod_wsgi/3.4 Python/2.7.5
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: Transfer-Encoding: chunked
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: Content-Type: text/xml
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server:
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: fe8
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <scheduler_reply>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <scheduler_version>613</scheduler_version>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <master_url>http://www.gpugrid.net/</master_url>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <request_delay>31.000000</request_delay>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <message priority="low">No tasks sent</message>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <project_name>GPUGRID</project_name>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <next_rpc_delay>3600.000000</next_rpc_delay>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <userid>63993</userid>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <user_name>Paracelsus</user_name>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <user_total_credit>21424870.080154</user_total_credit>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <user_expavg_credit>7855.265631</user_expavg_credit>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <user_create_time>1281536516</user_create_time>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <email_hash>5aab033e6a675cbde84a2d225a74a6a8</email_hash>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <cross_project_id>9145123c6a8f6eb97a39746d118e87f2</cross_project_id>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <host_total_credit>20486475.000000</host_total_credit>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <host_expavg_credit>7833.474464</host_expavg_credit>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <host_venue></host_venue>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <host_create_time>1444494370</host_create_time>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <team_name></team_name>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <no_cpu_apps>0</no_cpu_apps>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <no_cuda_apps>0</no_cuda_apps>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <no_ati_apps>1</no_ati_apps>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <gui_urls>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <gui_url>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <name>Your account</name>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <description>View your account information and credit totals</description>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <url>http://www.gpugrid.net/show_user.php?userid=63993</url>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: </gui_url>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server:
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <gui_url>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <name>Your results</name>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <description>Your recently completed tasks</description>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <url>http://www.gpugrid.net/results.php?userid=63993</url>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: </gui_url>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <gui_url>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <name>Server state</name>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <description>Status of GPUGRID's server</description>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <url>http://www.gpugrid.net/server_status.php</url>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: </gui_url>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <gui_url>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <name>Science</name>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <description>Small contributions, great causes.</description>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <url>http://www.gpugrid.net/science.php</url>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: </gui_url>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <gui_url>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <name>Donate</name>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <description>Thank you for considering a donation to GPUGRID</description>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <url>http://www.gpugrid.net/gpugrid_donations.php</url>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: </gui_url>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <gui_url>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <name>Forum / Help</name>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server: <description>Questions, support and discussions</description>
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Received header from server:
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [http] [ID#1] Info: Connection #9 to host www.ps3grid.net left intact
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | Scheduler request completed: got 0 new tasks
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [sched_op] Server version 613
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | No tasks sent
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | Project requested delay of 31 seconds
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [work_fetch] backing off CPU 870 sec
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [work_fetch] backing off NVIDIA GPU 301 sec
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [sched_op] Deferring communication for 00:00:31
Sat 22 Jul 2017 07:07:32 PM EDT | GPUGRID | [sched_op] Reason: requested by project
Sat 22 Jul 2017 07:07:32 PM EDT | | [work_fetch] Request work fetch: RPC complete
Sat 22 Jul 2017 07:07:37 PM EDT | | [work_fetch] ------- start work fetch state -------
Sat 22 Jul 2017 07:07:37 PM EDT | | [work_fetch] target work buffer: 180.00 + 43200.00 sec
Sat 22 Jul 2017 07:07:37 PM EDT | | [work_fetch] --- project states ---
Sat 22 Jul 2017 07:07:37 PM EDT | climateprediction.net | [work_fetch] REC 744.244 prio -0.768 can't request work: suspended via Manager
Sat 22 Jul 2017 07:07:37 PM EDT | GPUGRID | [work_fetch] REC 5838.797 prio 0.000 can't request work: scheduler RPC backoff (25.92 sec)

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47648 - Posted: 22 Jul 2017 | 23:40:40 UTC - in response to Message 47647.
Last modified: 22 Jul 2017 | 23:41:51 UTC

Have you tried the 384.47 Linux driver?

http://www.nvidia.com
> Drivers
> All NVIDIA Drivers
> Beta and Older Drivers
> Recommended/Beta: All

http://www.nvidia.com/Download/Find.aspx?lang=en-us

Paracelsus
Send message
Joined: 11 Aug 10
Posts: 11
Credit: 21,424,870
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 47649 - Posted: 23 Jul 2017 | 0:29:47 UTC - in response to Message 47648.

Hi Jacob,

Unfortunately, updating the Nvidia driver to 384.47 didn't improve things.

Cheers,
Pete

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47650 - Posted: 23 Jul 2017 | 0:45:55 UTC - in response to Message 47649.

It seems to be an issue with the server software. Might be impossible to troubleshoot further, without a GPUGRID dev/admin to look into it. Sorry.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47653 - Posted: 23 Jul 2017 | 8:03:59 UTC

Perhaps the server put your host to the blacklist forever. To fix this you should try to force the BOINC manager to request a new host ID for your host. You can do it by stopping the BOINC manager, editing the client_state.xml, searching for <hostid>260678</hostid>, and replace the number to the number of a previous host of yours (or a random number, if you don't have an older host), saving the client_state.xml, and restaring the BOINC manager. Maybe it won't work for the first time, so you might try this a couple of times.

Paracelsus
Send message
Joined: 11 Aug 10
Posts: 11
Credit: 21,424,870
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 47668 - Posted: 23 Jul 2017 | 15:47:04 UTC - in response to Message 47653.
Last modified: 23 Jul 2017 | 15:48:41 UTC

Thanks for the suggestion on changing hostid in client_state.xml - and for the tip on stopping boinc manager (and verifying with ps) as otherwise the value reverts.

I replaced the old hostid (260678) and re-used an older hostid (94323), which has propagated up and is reflected as the currently active host at

http://www.gpugrid.net/hosts_user.php?sort=rpc_time&rev=0&show_all=1&userid=63993

Still no new tasks are received.

Is the matter now perhaps one of scheduling priorities with other projects? When suspending all other projects and updating GPUGRID the log is shows

Sun 23 Jul 2017 11:47:59 AM EDT | GPUGRID | [prio] recent est credit: 0.00G in 60.23 sec, 5564.327652 + -0.268874 ->5564.058778
Sun 23 Jul 2017 11:48:15 AM EDT | GPUGRID | [prio] -1.000000 rsf 1.000000 rt 5564.058778 rs 5564.058778
Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] ------- start work fetch state -------
Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] target work buffer: 17280.00 + 25920.00 sec
Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] --- project states ---
Sun 23 Jul 2017 11:48:15 AM EDT | GPUGRID | [work_fetch] REC 5564.059 prio -1.000 can request work
Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] --- state for CPU ---
Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] shortfall 345600.00 nidle 8.00 saturated 0.00 busy 0.00
Sun 23 Jul 2017 11:48:15 AM EDT | GPUGRID | [work_fetch] share 0.000 project is backed off (resource backoff: 114.76, inc 600.00)
Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] --- state for NVIDIA GPU ---
Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] shortfall 43200.00 nidle 1.00 saturated 0.00 busy 0.00
Sun 23 Jul 2017 11:48:15 AM EDT | GPUGRID | [work_fetch] share 0.000 project is backed off (resource backoff: 261.47, inc 1200.00)
Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] ------- end work fetch state -------
Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] No project chosen for work fetch

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47670 - Posted: 23 Jul 2017 | 16:29:11 UTC
Last modified: 23 Jul 2017 | 16:39:16 UTC

If you're asking how to read the work_fetch_debug, here goes. Pay attention.

Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] ------- start work fetch state -------

A "work fetch iteration" has happened. This usually happens every few seconds, but can also happen when the user has changed something like Suspend/Resume, No-New-Work, Update-click, etc.

Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] target work buffer: 17280.00 + 25920.00 sec

Your current buffer settings are: Maintain at least 17280 seconds (0.2 days) of work for all resources, and when asking for work optionally ask for an additional 25920 seconds (0.3 days).

Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] --- project states ---
Sun 23 Jul 2017 11:48:15 AM EDT | GPUGRID | [work_fetch] REC 5564.059 prio -1.000 can request work

- "can request work" means that you are not actively setting suspend or no-new-tasks.
- In this case, there is not a "project backoff" (which a project could request after you contact it).

Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] --- state for CPU ---
Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] shortfall 345600.00 nidle 8.00 saturated 0.00 busy 0.00
Sun 23 Jul 2017 11:48:15 AM EDT | GPUGRID | [work_fetch] share 0.000 project is backed off (resource backoff: 114.76, inc 600.00)

- "shortfall" is 345600 seconds. This is (17280 + 25920) * 8 CPUs. Basically, all your CPUs don't have any work. In fact, "nidle" (number idle), is 8, meaning all 8 cpu resources are currently idle. WE NEED CPU WORK!

Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] --- state for NVIDIA GPU ---
Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] shortfall 43200.00 nidle 1.00 saturated 0.00 busy 0.00
Sun 23 Jul 2017 11:48:15 AM EDT | GPUGRID | [work_fetch] share 0.000 project is backed off (resource backoff: 261.47, inc 1200.00)

- "shortfall" is 43200 seconds. This is (17280 * 25920) * 1 NVIDIA GPU. "nidle" is 1. WE NEED GPU WORK!
- GPUGRID says "project is backed off (resource backoff: 261.47, inc 1200.00)" ... This is a RESOURCE BACKOFF. It means, since you didn't get work for this resource type (NVIDIA GPU) last time you asked, then your BOINC Client backs off (stops asking) this project for work for this resource type... for a time interval (261.47 seconds remaining) that can exponentially increment (1200 on next increment) up to 24 hours.
- Note: I believe clicking "Update" will clear any project backoffs or resouce backoffs.

Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] ------- end work fetch state -------
Sun 23 Jul 2017 11:48:15 AM EDT | | [work_fetch] No project chosen for work fetch

No "request for work" for you. :) work_fetch_debug correctly decided: Do not ask GPUGrid for work.

Sorry this doesn't help solve your problem. But now you know a bit about reading work_fetch_debug.

Paracelsus
Send message
Joined: 11 Aug 10
Posts: 11
Credit: 21,424,870
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 47671 - Posted: 23 Jul 2017 | 16:55:11 UTC - in response to Message 47670.

Thanks Jacob!

I've been reading posts and the wiki, piecing together the responses and their meaning, but your post was one of the clearest I've seen.

I suspended other projects today while debugging (which is maybe the exact wrong thing to do as the priority scheduler may then never escalate GPUGRID?) thus the no CPU or GPU tasks running.

The update button is being used but does not clear any project backoffs or resouce backoffs - though again perhaps this is related to scheduler and resource share tracking?

Would temporarily detaching from all other projects eliminate scheduling contention possible reasons for the backoff, or is that not a correct route to pursue? (The odd thing though is no changes were made in early July, when tasks stopped being received, but of course there maybe be more than one issue in the stack being resolved.)

Thank you everyone for the suggestions.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47672 - Posted: 23 Jul 2017 | 17:09:43 UTC
Last modified: 23 Jul 2017 | 17:10:17 UTC

Clicking "Update", while project GPUGrid is highlighted, should clear its backoffs.

If you don't mind losing in-progress work, or stats on your host, or setting up your BOINC environment again ..... then you might consider uninstalling BOINC, removing your data folder, then reinstalling BOINC.

As I said before, I think there must be a server-side bug that is preventing the server-side-scheduler from sending work to Linux clients. Unfortunately, GPUGrid admins haven't been offering much help to us peon users as of late.

w1hue
Send message
Joined: 28 Sep 09
Posts: 21
Credit: 338,642,011
RAC: 219,587
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 47678 - Posted: 24 Jul 2017 | 4:03:44 UTC

I have also been having trouble getting tasks for my Win7 machine -- keep getting "no tasks available" when server status shows that plenty are available. This has been going on for several weeks. Now and then I will actually get a task -- but usually not.

Also been happening with my XP machine -- but not as often. (No remarks about XP plese . . .)
____________

Paracelsus
Send message
Joined: 11 Aug 10
Posts: 11
Credit: 21,424,870
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 47687 - Posted: 24 Jul 2017 | 23:40:14 UTC

So long GPUGrid...

I've enjoyed contributing to this project (off and on since 2010) but the amount of debug time, just to get tasks, is far too high - and the periods of no tasks far to long. Other BOINC projects run fine, with no feeder issues and rare server issues.

Several years ago I ran Folding at Home on a PS3, but never checked out their Linux client. A quick visit to their site and minutes later I'm crunching long runs on my GPU. It was so easy.

I might give GPUGrid another shot in the future, but for now I'm glad to be able to contribute once again to another GPU cancer research project.

I'm glad there are many users who are able to contribute to GPUGrid, but happy I decided to look again at F@H.

Thanks to everyone who took the time to answer my questions.

Cheers,

Stefan
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 47755 - Posted: 8 Aug 2017 | 22:08:16 UTC - in response to Message 47687.

Thanks as well for crunching for us. Sorry we can't help you right now but we are aware of the problems with the current implementation. Hopefully in the near future we will find time to improve on it.

sis651
Send message
Joined: 25 Nov 13
Posts: 66
Credit: 193,925,538
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwat
Message 47764 - Posted: 9 Aug 2017 | 20:28:49 UTC - in response to Message 47687.

I just wished F@H was in Boinc environment.

wolfman1360
Send message
Joined: 19 Feb 17
Posts: 5
Credit: 36,563,552
RAC: 0
Level
Val
Scientific publications
wat
Message 47772 - Posted: 12 Aug 2017 | 14:38:10 UTC

Hi,
Having the exact same problem here. Running a windows 10 laptop with a GTX 670M. Haven't got new tasks from this project for weeks despite 1000 + being available. All other projects crunch on the GPU just fine. I'm glad I'm not the only one having this problem. I'm getting no new tasks, either for CPU or GPU, from this project. Haven't changed anything on the machine. I must say this is pretty frustrating - I think it may be time to look elsewhere for me, too, until this gets resolved.
Here's hoping it's soon!

Post to thread

Message boards : Graphics cards (GPUs) : No GPUGRID jobs for over a week

//