Message boards :
News :
New multicore app and WUs
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,798,881,008 RAC: 311 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I just wanted to report back: My host ID: 420971 gets work and finishes latest version with success! My host ID: 452211 does not get any work. Message is: There is now work available. This host does not have any GPU and works from an USB stick. |
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Working/not working pairs are useful for debugging indeed (if they have the same preferences, that is). It was suggested that it was the presence of a GPU, but there are GPU-less counter-examples, like this. The scheduler is a software nightmare... I'll resume tests later this week. In the meantime, there are 1000 more CPU WUs (QC310big). |
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Today is my lucky day. I just enabled the multicore app, and immediately picked up two of them on my i7-3770 machine running Ubuntu 16.04.3 (Linux 4.10.0.38), and BOINC 7.8.3. They run on 7 cores, with one core reserved for GPU support as set by BOINC preferences, not in the app_config (though I use one for other purposes). However, suspending them does not shut them down with LAIM enabled, as noted before. I have not tried the non-LAIM case. If it matters, this machine was attached to GPUGrid earlier, and I had run a few GPU work units on the GTX 980, though I am requesting only the CPU work now. But maybe that has something to do with why I am getting them. EDIT: Also, I have "Run test applications?" enabled, though I don't know if that is necessary in this case. |
![]() Send message Joined: 25 Mar 09 Posts: 25 Credit: 582,385 RAC: 0 Level ![]() Scientific publications ![]() |
My two computers that are getting or have gotten cpu work, have both been connected before. The new computer I attached does not get work but says "No work available" even when there is plenty. Conan |
Send message Joined: 14 Jun 14 Posts: 9 Credit: 28,094,797 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() |
OK, thanks @mmonnin. I've just which readlink followed by sudo ln -sf /bin/readlink /usr/bin/readlink, and am now waiting for some more WUs. |
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Do not make symlinks. The problem is already solved. |
![]() Send message Joined: 24 Jul 08 Posts: 36 Credit: 363,857,679 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Since it’s the first time we have a CPU app out, I’ll test the behavior of GPUGRID with a relatively large batch that you will see soon. I just started reading this thread. I thought I would point out that there was a multi-threaded CPU application back in 2014. It just wasn't necessarily for Quantum Chemistry. ![]() |
![]() Send message Joined: 25 Mar 09 Posts: 25 Credit: 582,385 RAC: 0 Level ![]() Scientific publications ![]() |
Since it’s the first time we have a CPU app out, I’ll test the behavior of GPUGRID with a relatively large batch that you will see soon. Yes I ran that one on both Windows 32 bit and Linux 64 bit, which is where nearly all my points came from, as I had to stop GPU use a few years ago so I ran the CPU app instead. Conan |
Send message Joined: 5 Dec 12 Posts: 84 Credit: 1,663,883,415 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
On a 1950x it's reserving all 32 threads but not running them near the maximum. It seems to be switching which cores are active - my System Monitor CPU usage chart looks like a long line of infinity symbols. If you divide the CPU time by the runtime, you'll see an average usage of about seventeen cores a second. Everything else is going to waste. 16713948 12878079 453935 23 Nov 2017 | 12:59:03 UTC 23 Nov 2017 | 16:09:15 UTC Completed and validated 680.18 11,586.25 67.70 Quantum Chemistry v3.10 (mt) 16713947 12878078 453935 23 Nov 2017 | 12:59:03 UTC 23 Nov 2017 | 14:12:17 UTC Completed and validated 761.12 12,984.46 267.57 Quantum Chemistry v3.10 (mt) 16713946 12878077 453935 23 Nov 2017 | 12:59:03 UTC 23 Nov 2017 | 15:11:46 UTC Completed and validated 702.76 11,639.75 PS. It's running at top priority over World Community Grid, but they've got similar deadlines. Is this intentional? |
Send message Joined: 21 Nov 17 Posts: 2 Credit: 2,826,188 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
getting a ton of quantum chemistry tasks on my aws ec2 p2.xlarge instance. a47-toni_qc310k-0-1-* are the names of the tasks. Are these the new multicore tasks you talked about? The machine takes a task to 66% in 2 seconds and then sits at that percentage for ~10 minutes. I think the task stops reporting progress @ 66%? bug? I compiled the boinc client on the ec2 instance, so it could definitely be user error as well. |
Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,798,881,008 RAC: 311 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Same here stuck at 66%. Will go to lunch and see if it finished in the meanwhile. |
Send message Joined: 21 Nov 17 Posts: 2 Credit: 2,826,188 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
they finish about 10-15 minutes after they 'hang' on my ec2 instance. |
Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,798,881,008 RAC: 311 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Here as well! Times are in relation with more threads and higher clock frequency on the other computer. |
Send message Joined: 5 Dec 12 Posts: 84 Credit: 1,663,883,415 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm using Ubuntu's bundled system monitor to display CPU usage graphs. That 66% thing is just a bug with the work unit time estimation, but my cores really were gradually rising and falling from 0 to 100%. Like a helix on its side, but with 32 lines. (It's not thermal throttling.) IF at all possible, consider limiting each multicore app to four cores - almost every modern CPU's threads can be divided equally by four, so we can ensure the highest throughput as no thread would go to waste. |
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
The 66% is due to our using the boinc wrapper for an app which doesn't report its progress. There are three steps in the WU (install, update, compute) and the third is the long one, hence the 2/3. If I figure out how, I'll try to limit the number of CPUs requested. I think the client has some control over it as well. |
Send message Joined: 22 Feb 09 Posts: 3 Credit: 114,900 RAC: 0 Level ![]() Scientific publications ![]() |
Just tried to run few tasks and still getting the same error: <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61) </message> <stderr_txt> 23:27:04 (6871): wrapper (7.7.26016): starting 23:27:04 (6871): wrapper (7.7.26016): starting 23:27:04 (6871): wrapper: running ../../projects/www.gpugrid.net/Miniconda3-4.3.30-Linux-x86_64.sh (-b -f -p /var/lib/boinc/projects/www.gpugrid.net/miniconda) Python 3.6.3 :: Anaconda, Inc. 23:33:01 (6871): task miniconda-installer reached time limit 360 23:33:01 (6871): wrapper: running /var/lib/boinc/projects/www.gpugrid.net/miniconda/bin/python (pre_script.py) Traceback (most recent call last): File "pre_script.py", line 1, in <module> import conda.cli ModuleNotFoundError: No module named 'conda' 23:33:02 (6871): $PROJECT_DIR/miniconda/bin/python exited; CPU time 0.025285 23:33:02 (6871): app exit status: 0x1 23:33:02 (6871): called boinc_finish(195) </stderr_txt> ]]> Any idea, how to solve it? |
Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,798,881,008 RAC: 311 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This one hang for about 6 hours: http://www.gpugrid.net/result.php?resultid=16717461 |
Send message Joined: 14 Jun 14 Posts: 9 Credit: 28,094,797 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() |
Since I had 100% errors (Message 48156 - Posted: 12 Nov 2017 | 2:36:31 UTC) on my first batch of these CPU tasks, I created a symlink as instructed, then deleted the symlink as subsequently instructed, but I have never received a single task since my 12 Nov 2017 post. |
![]() Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
OK, we will start production mode next week. Unfortunately we will need more than 50x the current number of CPUs, but it is just the start now, so it is ok. gdf |
Send message Joined: 2 Jul 16 Posts: 338 Credit: 7,987,341,558 RAC: 178,897 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
On a 1950x it's reserving all 32 threads but not running them near the maximum. Pretty typical of multithreaded apps (of any BOINC project) that they do not scale that well past 4-8 cores. I typically use an app_config to 4 cores on mt apps like LHC, Cosmology, yafu, etc. |
©2025 Universitat Pompeu Fabra