Advanced search

Message boards : Server and website : Unable to get WUs

Author Message
liderbug
Send message
Joined: 29 Jul 16
Posts: 22
Credit: 57,673,885
RAC: 0
Level
Thr
Scientific publications
watwat
Message 47652 - Posted: 23 Jul 2017 | 1:28:40 UTC

After the server change some time back I've never received any WU's.
Also "Your app_config.xml file refers to an unknown application 'acemdbeta'. Known applications: None" and acemdlong and acemdshort.

Is there something I need to change on my end? update server?(how) remove app from boincmgr and re-add?
Thanks

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,061,410,614
RAC: 825,487
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47655 - Posted: 23 Jul 2017 | 11:15:28 UTC - in response to Message 47652.
Last modified: 23 Jul 2017 | 11:18:29 UTC

I don't know why you're not getting work, sorry. I'm betting something is wrong with the server scheduler software setup, for Linux clients.

You should try running the latest driver, from:
http://www.nvidia.com/Download/Find.aspx?lang=en-us

Note: If you haven't yet received apps of those names, then it's normal for an app_config.xml to say "unknown application". You may safely ignore those warnings.

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,454,527,327
RAC: 2,688,891
Level
Met
Scientific publications
watwatwat
Message 47669 - Posted: 23 Jul 2017 | 15:50:58 UTC - in response to Message 47652.

Also "Your app_config.xml file refers to an unknown application 'acemdbeta'. Known applications: None" and acemdlong and acemdshort.

this is normal behaviour, just ignore it.

liderbug
Send message
Joined: 29 Jul 16
Posts: 22
Credit: 57,673,885
RAC: 0
Level
Thr
Scientific publications
watwat
Message 47674 - Posted: 23 Jul 2017 | 21:24:20 UTC - in response to Message 47655.
Last modified: 23 Jul 2017 | 21:28:48 UTC

My Nvidia was 367.57, now 381.22 (l&g). Restarted and nope |-(
No tasks.


Question: app_config.xml - is it required to have an entry for say, acemdshort, to get WUs? If ac.xml were empty or not even there? If gpugrid were having a problem with linux boxes I'd think I'd be the last to notice (then again...). Is it possible my xml file has a problem?


<app_config>
<app>
<name>acemdshort</name> (and long & beta)
<max_concurrent>4</max_concurrent>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.499</cpu_usage>
</gpu_versions>
</app>
</app_config>

although the exact same format works for Einstein <sigh>. Are acemdshort/long/beta the only 'name' entries?

Twisty little passages all alike. You're at Witt's End.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,314,702,970
RAC: 1,418,329
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47675 - Posted: 23 Jul 2017 | 21:54:46 UTC - in response to Message 47674.

I'm pretty certain that app_config.xml won't be the source of your problem. app_config modifies the behaviour of tasks after you've received them, but it isn't supposed to modify the way that work is requested or allocated, and I've never seen any evidence that it does.

One important thing to check is that you have set your GPUGRID preferences properly for the venue(s) you've assigned your computer(s) to.

It's a long time since I've fiddled in that area, but I think the minimum requirements are:

Use NVIDIA GPU yes
ACEMD long runs yes (no short runs or beta work currently)
Use Graphics Processing Unit (GPU) if available yes

And you need to be sure that your GPU is detected at startup as a CUDA-capable device - that information needs to have been recorded in the opening lines of the Event Log. Having OpenCL detection too is nice, but not needed for this project.

liderbug
Send message
Joined: 29 Jul 16
Posts: 22
Credit: 57,673,885
RAC: 0
Level
Thr
Scientific publications
watwat
Message 47683 - Posted: 24 Jul 2017 | 17:10:02 UTC - in response to Message 47675.


Use NVIDIA GPU yes
ACEMD long runs yes (no short runs or beta work currently)
Use Graphics Processing Unit (GPU) if available yes

All there. And I have 2 other projects using GPU.
Question: If I were to "remove" gpugrid and then re-add?

tks

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,314,702,970
RAC: 1,418,329
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47684 - Posted: 24 Jul 2017 | 17:48:41 UTC - in response to Message 47683.

Nothing to be lost by trying, if you have nothing. But it's best to try and work out what's the cause, what's the effect.

I don't think I've seen any start-up messages from your event log yet?

liderbug
Send message
Joined: 29 Jul 16
Posts: 22
Credit: 57,673,885
RAC: 0
Level
Thr
Scientific publications
watwat
Message 47685 - Posted: 24 Jul 2017 | 21:25:39 UTC - in response to Message 47684.

ls -l job_log_www.gpugrid.net.txt
-rw-r--r--. 1 boinc boinc 80135 Apr 16 02:50 job_log_www.gpugrid.net.txt

1469895773 ue 49826.670670 ct 15416.700000 fe 5000000000000000 nm e28s8_e23s12p0f34-GERARD_CXCL12VOLKDIM_12998741_1-0-1-RND4083_0 et 80564.874344 es 0
1469912950 ue 49826.670670 ct 15712.970000 fe 5000000000000000 nm e30s18_e1s37p0f294-GERARD_CXCL12VOLKDIM_12998741_2-0-1-RND4077_0 et 79586.773434 es 0
1470006079 ue 49874.687629 ct 14010.640000 fe 5000000000000000 nm e30s8_e29s2p0f67-GERARD_CXCL12VOLKDIM_4181455_2-0-1-RND2161_0 et 84578.133858 es 0<snip>

----------------
24-Jul-2017 15:22:33 [---] Starting BOINC client version 7.6.22 for x86_64-pc-linux-gnu
24-Jul-2017 15:22:33 [---] log flags: file_xfer, sched_ops, task, cpu_sched
24-Jul-2017 15:22:33 [---] Libraries: libcurl/7.40.0 NSS/3.21 Basic ECC zlib/1.2.8 libidn/1.32 libssh2/1.5.0
24-Jul-2017 15:22:33 [---] Running as a daemon
24-Jul-2017 15:22:33 [---] Data directory: /var/lib/boinc
24-Jul-2017 15:22:33 [---] CUDA: NVIDIA GPU 0: GeForce GTX 770 (driver version 381.22, CUDA version 8.0, compute capability 3.0, 4031MB, 3780MB available, 3653 GFLOPS peak)
24-Jul-2017 15:22:33 [---] OpenCL: NVIDIA GPU 0: GeForce GTX 770 (driver version 381.22, device version OpenCL 1.2 CUDA, 4031MB, 3780MB available, 3653 GFLOPS peak)
24-Jul-2017 15:22:33 [---] Host name: lightning
24-Jul-2017 15:22:33 [---] Processor: 6 AuthenticAMD AMD Phenom(tm) II X6 1090T Processor [Family 16 Model 10 Stepping 0]
24-Jul-2017 15:22:33 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt cpb hw_pstate npt lbrv svm_lock nrip_save pausefilter vmmcall
24-Jul-2017 15:22:33 [---] OS: Linux: 4.4.14-200.fc22.x86_64
24-Jul-2017 15:22:33 [---] Memory: 19.61 GB physical, 3.87 GB virtual
24-Jul-2017 15:22:33 [---] Disk: 49.09 GB total, 11.90 GB free
24-Jul-2017 15:22:33 [---] Local time is UTC -6 hours
24-Jul-2017 15:22:33 [Einstein@Home] Found app_config.xml
24-Jul-2017 15:22:33 [GPUGRID] Found app_config.xml
24-Jul-2017 15:22:33 [GPUGRID] Your app_config.xml file refers to an unknown application 'acemdshort'. Known applications: None
24-Jul-2017 15:22:33 [Rosetta@home] Found app_config.xml
24-Jul-2017 15:22:33 [SETI@home] Found app_config.xml
24-Jul-2017 15:22:33 [World Community Grid] Found app_config.xml
24-Jul-2017 15:22:33 [---] Config: GUI RPC allowed from any host
24-Jul-2017 15:22:33 [---] Config: GUI RPCs allowed from:
24-Jul-2017 15:22:33 [---] 192.168.0.6
24-Jul-2017 15:22:33 [---] 192.168.0.7
24-Jul-2017 15:22:33 [---] 192.168.0.8
24-Jul-2017 15:22:33 [---] 192.168.0.99
24-Jul-2017 15:22:33 [---] Config: report completed tasks immediately
24-Jul-2017 15:22:33 [---] Config: use all coprocessors
24-Jul-2017 15:22:33 [Einstein@Home] URL http://einstein.phys.uwm.edu/; Computer ID 12277310; resource share 30
24-Jul-2017 15:22:33 [GPUGRID] URL http://www.gpugrid.net/; Computer ID 358985; resource share 100
24-Jul-2017 15:22:33 [Quake-Catcher Network] URL http://quakecatcher.net/sensor/; Computer ID 59873; resource share 100
24-Jul-2017 15:22:33 [Rosetta@home] URL http://boinc.bakerlab.org/rosetta/; Computer ID 1728788; resource share 200
24-Jul-2017 15:22:33 [SETI@home] URL http://setiathome.berkeley.edu/; Computer ID 8011539; resource share 200
24-Jul-2017 15:22:33 [World Community Grid] URL http://www.worldcommunitygrid.org/; Computer ID 3594037; resource share 100
24-Jul-2017 15:22:33 [World Community Grid] General prefs: from World Community Grid (last modified 19-Jun-2017 15:53:13)
24-Jul-2017 15:22:33 [World Community Grid] Host location: none
24-Jul-2017 15:22:33 [World Community Grid] General prefs: using your defaults
24-Jul-2017 15:22:33 [---] Reading preferences override file
24-Jul-2017 15:22:33 [---] Preferences:
24-Jul-2017 15:22:33 [---] max memory usage when active: 10039.31MB
24-Jul-2017 15:22:33 [---] max memory usage when idle: 18070.76MB
24-Jul-2017 15:22:33 [---] max disk usage: 13.08GB
24-Jul-2017 15:22:33 [---] max CPUs used: 4
24-Jul-2017 15:22:33 [---] don't compute while active
24-Jul-2017 15:22:33 [---] don't use GPU while active
24-Jul-2017 15:22:33 [---] suspend work if non-BOINC CPU load exceeds 75%
24-Jul-2017 15:22:33 [---] (to change preferences, visit a project web site or select Preferences in the Manager)
24-Jul-2017 15:22:33 [---] Suspending computation - initial delay
24-Jul-2017 15:22:34 [World Community Grid] [cpu_sched] Restarting task MCM1_0134794_9906_1 using mcm1 version 736 in slot 2
24-Jul-2017 15:22:34 [Einstein@Home] [cpu_sched] Restarting task LATeah0036L_1012.0_0_0.0_13473680_0 using hsgamma_FGRPB1G version 120 (FGRPopencl1K-nvidia) in slot 0
24-Jul-2017 15:22:34 [Einstein@Home] [cpu_sched] Restarting task LATeah0036L_1020.0_0_0.0_125500_1 using hsgamma_FGRPB1G version 120 (FGRPopencl1K-nvidia) in slot 1
24-Jul-2017 15:22:34 [World Community Grid] [cpu_sched] Restarting task ZIKA_000267231_x5gj4_NS2BNS3pr_ZIKV_A_EF_0414_1 using zika version 708 in slot 3
-------------
Please see if you can see anything in the above
tks

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,314,702,970
RAC: 1,418,329
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47686 - Posted: 24 Jul 2017 | 22:08:01 UTC - in response to Message 47685.

Your job log (before the <snip>) shows that you last completed a GPUGrid task at Sun, 31 Jul 2016 23:01:19 GMT - almost a year ago. We probably need to see the end of that file, rather than the start.

The key line from the other log is probably

driver version 381.22, CUDA version 8.0, compute capability 3.0

Was CC 3.0 one of the ones that the new app had problems with? It's late for me in this time zone - I'll leave that question hanging.

liderbug
Send message
Joined: 29 Jul 16
Posts: 22
Credit: 57,673,885
RAC: 0
Level
Thr
Scientific publications
watwat
Message 47689 - Posted: 25 Jul 2017 | 22:11:16 UTC - in response to Message 47686.

tail job..log
1492091701 ue 38840.844523 ct 13110.690000 fe 5000000000000000 nm e17s9_e10s11p3f19-ADRIA_FAAH_FBP_0-0-4-RND6247_0 et 35346.739654 es 0
1492160233 ue 38805.903475 ct 16470.370000 fe 5000000000000000 nm e16s207_e9s37p0f109-PABLO_contact_goal_KIX_CMYB-1-4-RND9431_0 et 68530.181182 es 0
1492194987 ue 38805.903475 ct 11511.790000 fe 5000000000000000 nm e34s42_e26s4p0f404-PABLO_P01106_0_IDP-0-1-RND9775_0 et 34752.300236 es 0
1492332652 ue 38530.012474 ct 12372.740000 fe 5000000000000000 nm e69s20_e55s24p0f194-PABLO_P04637_1_IDP-0-1-RND6969_3 et 46490.001711 es 0
-----------------------------
ls -l job_log_www.gpugrid.net.txt
-rw-r--r--. 1 boinc boinc 80135 Apr 16 02:50 job_log_www.gpugrid.net.txt

and Apr 16 - isn't that about the time gpugrid changed servers.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,314,702,970
RAC: 1,418,329
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47690 - Posted: 25 Jul 2017 | 23:22:34 UTC - in response to Message 47689.
Last modified: 25 Jul 2017 | 23:23:11 UTC

Yes, last tasks completed Sun, 16 Apr 2017 08:50:52 GMT - the day before.

But look at the opening post in App update 17 April 2017:

The peculiar exception for sm 3.0 devices is due to a compiler problem with CUDA 80 that affects only that hardware version.


But that post is about Windows only, and the new v9.18 cuda80 app for Windows released that day.

For Linux, https://www.gpugrid.net/apps.php shows a cuda80 app v9.14 deployed 1 Nov 2016 | 21:27:32 UTC. You should have been running that already, and the April update shouldn't have affected you.

Unless:
a) They only noticed the cc 3.0 bug that day, and blocked it.
b) In blocking cc 3.0 for Windows, they inadvertently blocked Linux as well.

I can't help you distinguish between those cases - either would apply to the server only, and you would need an admin to come and help.

liderbug
Send message
Joined: 29 Jul 16
Posts: 22
Credit: 57,673,885
RAC: 0
Level
Thr
Scientific publications
watwat
Message 47691 - Posted: 26 Jul 2017 | 12:45:06 UTC - in response to Message 47690.

<sigh> Reading your response and looking things up I thought <sigh> upgrade CUDA <sigh> and reboot. <sigh>....... reboot.... reboot recovery ... ah... up OK time to upgrade my Fedora22 to 25 <sigh>... started the download, went to sleep. Now waiting for reboot and waiting and waiting and... <waaaaaaaaaaa> just shoot me

liderbug
Send message
Joined: 29 Jul 16
Posts: 22
Credit: 57,673,885
RAC: 0
Level
Thr
Scientific publications
watwat
Message 47695 - Posted: 26 Jul 2017 | 19:24:34 UTC - in response to Message 47691.

OK, me sheen R-back running Fedora-25. I've dnf erased boinc* and re-installed
Hoop1, hoop2, hoop3 ...
einstein, running
SETI, running .339cpu+1nvidgpu + 4x on cpu
gpugrid, got - new tasks

CUDA 8.0

I did find in client_state.xml
<scheduler_url>http://www.ps3grid.net/PS3GRID_cgi/cgi</scheduler_url>
which returns:

<scheduler_reply>
<scheduler_version>613</scheduler_version>
<master_url>http://www.gpugrid.net/</master_url>
<request_delay>31.000000</request_delay>
<message priority="low">Error in request message: xp.get_tag() failed</message>
<project_name>GPUGRID</project_name>
</scheduler_reply>


Which says ... ??? And while I've been typing this a Seti.gpu is at 45%, 46...

Hello Witts End.


liderbug
Send message
Joined: 29 Jul 16
Posts: 22
Credit: 57,673,885
RAC: 0
Level
Thr
Scientific publications
watwat
Message 47702 - Posted: 27 Jul 2017 | 21:44:54 UTC - in response to Message 47695.

I somehow stumbled across ps3grid.net which seems to be a 99% clone of gpugrid.net. And it knows me - auto logged in ..?????

And I found a page listing half a dozen people who work?run?manage? xxxgrid.net. One problem is that the link to their ?bio? doesn't.

Side note, looking at the Performance page and the listing of Users:computers they all show MS Windows 10. Is my problem that I don't do business with Bill Gates?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,314,702,970
RAC: 1,418,329
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47703 - Posted: 27 Jul 2017 | 21:59:01 UTC - in response to Message 47702.

As in your previous thread (see my reply there): this project was called PS3Grid before it was called GPUGrid, and this site is the clone of that site.

Before Sony locked down the PS3s and prevented them running third party operating systems.

liderbug
Send message
Joined: 29 Jul 16
Posts: 22
Credit: 57,673,885
RAC: 0
Level
Thr
Scientific publications
watwat
Message 47711 - Posted: 29 Jul 2017 | 14:10:49 UTC - in response to Message 47703.

I went to the gg web site -> Volunteers and found only 2 that are doing work on Linux boxes (everything else was Win-10). I sent a PM and received 1 answer saying he was running Mint. I've loaded his ver of Nvidia - no difference.

I'd like to raise the question of why I/we can't get any response from management. Have they all moved on? Someone was there to change servers. If there is only 1 body with too much on their plate... hey, go open, there are thousands of us out here who kind of know what we're doing. Hello????

Post to thread

Message boards : Server and website : Unable to get WUs