New multicore app and WUs

Message boards : News : New multicore app and WUs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
klepel

Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,798,881,008
RAC: 311
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48176 - Posted: 13 Nov 2017, 16:03:57 UTC

I just wanted to report back:
My host ID: 420971 gets work and finishes latest version with success!
My host ID: 452211 does not get any work. Message is: There is now work available. This host does not have any GPU and works from an USB stick.
ID: 48176 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 48177 - Posted: 13 Nov 2017, 16:15:25 UTC - in response to Message 48176.  
Last modified: 13 Nov 2017, 16:21:37 UTC

Working/not working pairs are useful for debugging indeed (if they have the same preferences, that is). It was suggested that it was the presence of a GPU, but there are GPU-less counter-examples, like this. The scheduler is a software nightmare...

I'll resume tests later this week. In the meantime, there are 1000 more CPU WUs (QC310big).
ID: 48177 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48180 - Posted: 13 Nov 2017, 17:45:45 UTC
Last modified: 13 Nov 2017, 17:52:18 UTC

Today is my lucky day. I just enabled the multicore app, and immediately picked up two of them on my i7-3770 machine running Ubuntu 16.04.3 (Linux 4.10.0.38), and BOINC 7.8.3. They run on 7 cores, with one core reserved for GPU support as set by BOINC preferences, not in the app_config (though I use one for other purposes).

However, suspending them does not shut them down with LAIM enabled, as noted before. I have not tried the non-LAIM case.

If it matters, this machine was attached to GPUGrid earlier, and I had run a few GPU work units on the GTX 980, though I am requesting only the CPU work now. But maybe that has something to do with why I am getting them.

EDIT: Also, I have "Run test applications?" enabled, though I don't know if that is necessary in this case.
ID: 48180 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan

Send message
Joined: 25 Mar 09
Posts: 25
Credit: 582,385
RAC: 0
Level
Gly
Scientific publications
wat
Message 48183 - Posted: 13 Nov 2017, 22:42:44 UTC

My two computers that are getting or have gotten cpu work, have both been connected before.
The new computer I attached does not get work but says "No work available" even when there is plenty.

Conan
ID: 48183 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
el_gallo_azul

Send message
Joined: 14 Jun 14
Posts: 9
Credit: 28,094,797
RAC: 0
Level
Val
Scientific publications
watwatwat
Message 48184 - Posted: 14 Nov 2017, 0:19:29 UTC - in response to Message 48157.  
Last modified: 14 Nov 2017, 0:20:27 UTC

OK, thanks @mmonnin.

I've just
which readlink

followed by
sudo ln -sf /bin/readlink /usr/bin/readlink
,
and am now waiting for some more WUs.
ID: 48184 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 48187 - Posted: 14 Nov 2017, 9:55:22 UTC - in response to Message 48184.  

Do not make symlinks. The problem is already solved.
ID: 48187 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Coleslaw

Send message
Joined: 24 Jul 08
Posts: 36
Credit: 363,857,679
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48192 - Posted: 14 Nov 2017, 20:09:33 UTC
Last modified: 14 Nov 2017, 20:11:47 UTC

Since it’s the first time we have a CPU app out, I’ll test the behavior of GPUGRID with a relatively large batch that you will see soon.


I just started reading this thread. I thought I would point out that there was a multi-threaded CPU application back in 2014. It just wasn't necessarily for Quantum Chemistry.
ID: 48192 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan

Send message
Joined: 25 Mar 09
Posts: 25
Credit: 582,385
RAC: 0
Level
Gly
Scientific publications
wat
Message 48198 - Posted: 16 Nov 2017, 7:21:52 UTC - in response to Message 48192.  

Since it’s the first time we have a CPU app out, I’ll test the behavior of GPUGRID with a relatively large batch that you will see soon.


I just started reading this thread. I thought I would point out that there was a multi-threaded CPU application back in 2014. It just wasn't necessarily for Quantum Chemistry.


Yes I ran that one on both Windows 32 bit and Linux 64 bit, which is where nearly all my points came from, as I had to stop GPU use a few years ago so I ran the CPU app instead.

Conan
ID: 48198 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dayle Diamond

Send message
Joined: 5 Dec 12
Posts: 84
Credit: 1,663,883,415
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48228 - Posted: 23 Nov 2017, 16:47:19 UTC
Last modified: 23 Nov 2017, 17:05:04 UTC

On a 1950x it's reserving all 32 threads but not running them near the maximum.
It seems to be switching which cores are active - my System Monitor CPU usage chart looks like a long line of infinity symbols.

If you divide the CPU time by the runtime, you'll see an average usage of about seventeen cores a second. Everything else is going to waste.

16713948 12878079 453935 23 Nov 2017 | 12:59:03 UTC 23 Nov 2017 | 16:09:15 UTC Completed and validated 680.18 11,586.25
67.70 Quantum Chemistry v3.10 (mt)

16713947 12878078 453935 23 Nov 2017 | 12:59:03 UTC 23 Nov 2017 | 14:12:17 UTC Completed and validated 761.12 12,984.46 267.57 Quantum Chemistry v3.10 (mt)

16713946 12878077 453935 23 Nov 2017 | 12:59:03 UTC 23 Nov 2017 | 15:11:46 UTC Completed and validated 702.76 11,639.75

PS. It's running at top priority over World Community Grid, but they've got similar deadlines. Is this intentional?
ID: 48228 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
dfygrvty

Send message
Joined: 21 Nov 17
Posts: 2
Credit: 2,826,188
RAC: 0
Level
Ala
Scientific publications
watwat
Message 48229 - Posted: 23 Nov 2017, 17:53:43 UTC - in response to Message 48127.  

getting a ton of quantum chemistry tasks on my aws ec2 p2.xlarge instance.
a47-toni_qc310k-0-1-* are the names of the tasks. Are these the new multicore tasks you talked about? The machine takes a task to 66% in 2 seconds and then sits at that percentage for ~10 minutes.

I think the task stops reporting progress @ 66%? bug? I compiled the boinc client on the ec2 instance, so it could definitely be user error as well.
ID: 48229 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
klepel

Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,798,881,008
RAC: 311
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48230 - Posted: 23 Nov 2017, 18:05:48 UTC

Same here stuck at 66%. Will go to lunch and see if it finished in the meanwhile.
ID: 48230 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
dfygrvty

Send message
Joined: 21 Nov 17
Posts: 2
Credit: 2,826,188
RAC: 0
Level
Ala
Scientific publications
watwat
Message 48231 - Posted: 23 Nov 2017, 18:28:37 UTC - in response to Message 48229.  

they finish about 10-15 minutes after they 'hang' on my ec2 instance.
ID: 48231 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
klepel

Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,798,881,008
RAC: 311
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48232 - Posted: 23 Nov 2017, 18:50:00 UTC

Here as well! Times are in relation with more threads and higher clock frequency on the other computer.
ID: 48232 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dayle Diamond

Send message
Joined: 5 Dec 12
Posts: 84
Credit: 1,663,883,415
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48233 - Posted: 23 Nov 2017, 20:44:43 UTC

I'm using Ubuntu's bundled system monitor to display CPU usage graphs. That 66% thing is just a bug with the work unit time estimation, but my cores really were gradually rising and falling from 0 to 100%. Like a helix on its side, but with 32 lines.

(It's not thermal throttling.)

IF at all possible, consider limiting each multicore app to four cores - almost every modern CPU's threads can be divided equally by four, so we can ensure the highest throughput as no thread would go to waste.
ID: 48233 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 48234 - Posted: 23 Nov 2017, 21:57:07 UTC - in response to Message 48233.  
Last modified: 23 Nov 2017, 22:00:00 UTC

The 66% is due to our using the boinc wrapper for an app which doesn't report its progress. There are three steps in the WU (install, update, compute) and the third is the long one, hence the 2/3.

If I figure out how, I'll try to limit the number of CPUs requested. I think the client has some control over it as well.
ID: 48234 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Petr Kriz

Send message
Joined: 22 Feb 09
Posts: 3
Credit: 114,900
RAC: 0
Level

Scientific publications
wat
Message 48235 - Posted: 23 Nov 2017, 22:46:53 UTC

Just tried to run few tasks and still getting the same error:

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)
</message>
<stderr_txt>
23:27:04 (6871): wrapper (7.7.26016): starting
23:27:04 (6871): wrapper (7.7.26016): starting
23:27:04 (6871): wrapper: running ../../projects/www.gpugrid.net/Miniconda3-4.3.30-Linux-x86_64.sh (-b -f -p /var/lib/boinc/projects/www.gpugrid.net/miniconda)
Python 3.6.3 :: Anaconda, Inc.
23:33:01 (6871): task miniconda-installer reached time limit 360
23:33:01 (6871): wrapper: running /var/lib/boinc/projects/www.gpugrid.net/miniconda/bin/python (pre_script.py)
Traceback (most recent call last):
File "pre_script.py", line 1, in <module>
import conda.cli
ModuleNotFoundError: No module named 'conda'
23:33:02 (6871): $PROJECT_DIR/miniconda/bin/python exited; CPU time 0.025285
23:33:02 (6871): app exit status: 0x1
23:33:02 (6871): called boinc_finish(195)

</stderr_txt>
]]>

Any idea, how to solve it?
ID: 48235 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
klepel

Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,798,881,008
RAC: 311
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48236 - Posted: 24 Nov 2017, 1:56:57 UTC

ID: 48236 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
el_gallo_azul

Send message
Joined: 14 Jun 14
Posts: 9
Credit: 28,094,797
RAC: 0
Level
Val
Scientific publications
watwatwat
Message 48237 - Posted: 24 Nov 2017, 4:44:33 UTC

Since I had 100% errors (Message 48156 - Posted: 12 Nov 2017 | 2:36:31 UTC) on my first batch of these CPU tasks, I created a symlink as instructed, then deleted the symlink as subsequently instructed, but I have never received a single task since my 12 Nov 2017 post.
ID: 48237 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 48239 - Posted: 24 Nov 2017, 11:10:37 UTC - in response to Message 48237.  

OK, we will start production mode next week. Unfortunately we will need more than 50x the current number of CPUs, but it is just the start now, so it is ok.

gdf
ID: 48239 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jul 16
Posts: 338
Credit: 7,987,341,558
RAC: 178,897
Level
Tyr
Scientific publications
watwatwatwatwat
Message 48241 - Posted: 24 Nov 2017, 11:17:28 UTC - in response to Message 48228.  

On a 1950x it's reserving all 32 threads but not running them near the maximum.
It seems to be switching which cores are active - my System Monitor CPU usage chart looks like a long line of infinity symbols.

If you divide the CPU time by the runtime, you'll see an average usage of about seventeen cores a second. Everything else is going to waste.

16713948 12878079 453935 23 Nov 2017 | 12:59:03 UTC 23 Nov 2017 | 16:09:15 UTC Completed and validated 680.18 11,586.25
67.70 Quantum Chemistry v3.10 (mt)

16713947 12878078 453935 23 Nov 2017 | 12:59:03 UTC 23 Nov 2017 | 14:12:17 UTC Completed and validated 761.12 12,984.46 267.57 Quantum Chemistry v3.10 (mt)

16713946 12878077 453935 23 Nov 2017 | 12:59:03 UTC 23 Nov 2017 | 15:11:46 UTC Completed and validated 702.76 11,639.75

PS. It's running at top priority over World Community Grid, but they've got similar deadlines. Is this intentional?


Pretty typical of multithreaded apps (of any BOINC project) that they do not scale that well past 4-8 cores. I typically use an app_config to 4 cores on mt apps like LHC, Cosmology, yafu, etc.
ID: 48241 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : News : New multicore app and WUs

©2025 Universitat Pompeu Fabra