New multicore app and WUs

Message boards : News : New multicore app and WUs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 48150 - Posted: 11 Nov 2017, 10:51:23 UTC - in response to Message 48149.  
Last modified: 11 Nov 2017, 10:55:02 UTC

I'll add /bin to the path in the next app update. That may work, unless there is some weird sandboxing thing going on. You shouldn't need to tweak your system: just let them fail (they should fail fast, so no CPU loss).

Concerning why some hosts are not receiving WUs, it's baffling me. It's not a matter of hosts already having GPUs because my own machine does and it did not get tasks. It may be related to the "reliable hosts" classification.
ID: 48150 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 48151 - Posted: 11 Nov 2017, 11:38:12 UTC - in response to Message 48150.  

@Daniel: can you list one of your hosts which gets QC tasks and one which doesn't?

Thanks
ID: 48151 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [B@P] Daniel

Send message
Joined: 17 Sep 16
Posts: 5
Credit: 382,453,727
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwat
Message 48152 - Posted: 11 Nov 2017, 12:09:40 UTC - in response to Message 48151.  

@Daniel: can you list one of your hosts which gets QC tasks and one which doesn't?

Thanks

Hosts which get tasks: 449991, 449992, 391907
Hosts which did not get any: 444456, 452231
ID: 48152 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John C MacAlister

Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 48153 - Posted: 11 Nov 2017, 12:28:20 UTC - in response to Message 48127.  

Many thanks for this: I look forward to the Windows version!

Dears,

we would like to test our new CPU multicore application for quantum chemistry tasks ("QC"). Since it’s the first time we have a CPU app out, I’ll test the behavior of GPUGRID with a relatively large batch that you will see soon. Workunits are named "*QC309big*".

Here’s some features of the app, in short (subject to change):

* Platform: Linux only for now, generic x64.
* Threads: as many as Boinc decides. I guess it depends on your machine, your preferences, and other running tasks in ways which are obscure to me…
* Run time: about 1 CPU hour per WU (so, shorter if multithreading)
* Credit: computed with the default algorithm (tasks are short, don’t expect much). Bonus mechanism for fast turnaround is still on.
* Known bugs: restarts and checkpoints. This should be mitigated with the “keep in memory when suspended” option. Sorry about that, it’s outside of our control.
* Network behavior: the first time you get a WU of this kind it downloads a Python interpreter (miniconda) and then some open-source packages, and installs them in the project directory. The installation is reused whenever possible.
* Disk usage: could go around 1 GB, perhaps more when tasks are running. Resetting the project should remove everything.
* Memory usage: should be around 1 GB when running.

Depending on the results of this test, we’ll start thinking about other platforms.

Thanks and nice crunching!

Toni


John
ID: 48153 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan

Send message
Joined: 25 Mar 09
Posts: 25
Credit: 582,385
RAC: 0
Level
Gly
Scientific publications
wat
Message 48154 - Posted: 11 Nov 2017, 22:48:37 UTC
Last modified: 11 Nov 2017, 22:51:20 UTC

Two of my computers have received tasks and processed them with no trouble.
Both run Fedora (16 and 21), host ids are 192138 and 189186.
My 8 core (16 thread) computer (running Fedora 25) has yet to receive a task.

Host 192138 is a 6 core computer and Host 189186 is a four core computer.

The 6 core has shorter Run times per task and more CPU times than the 4 core.

This is as expected due to core count, however the 4 core computer gets higher credit per task than the 6 core, this does not make sense.

6 core getting around 1,500 sec Run time, 8,600 CPU time and about 66 credits.

4 core getting around 3,200 sec Run time, 6,900 CPU time and about 85+ credits.

A bit odd perhaps?

Conan
ID: 48154 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 48155 - Posted: 11 Nov 2017, 23:04:48 UTC - in response to Message 48154.  
Last modified: 11 Nov 2017, 23:29:30 UTC

Credit assignment logic has historically been problematic (see here) to the point that I am inclined to think that it has no best solution. For the time being the credit algorithm is the old default one from boinc. I think it relies heavily on the self-computed FLOPS and yes that seems paradoxical.
ID: 48155 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
el_gallo_azul

Send message
Joined: 14 Jun 14
Posts: 9
Credit: 28,094,797
RAC: 0
Level
Val
Scientific publications
watwatwat
Message 48156 - Posted: 12 Nov 2017, 2:36:31 UTC

I haven't been able to successfully process a WU on my computer. I've received many, but they've all resulted in "Computation error".

See screenshot: https://imgur.com/z0vLkoh
ID: 48156 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jul 16
Posts: 338
Credit: 7,987,341,558
RAC: 197,587
Level
Tyr
Scientific publications
watwatwatwatwat
Message 48157 - Posted: 12 Nov 2017, 3:14:36 UTC - in response to Message 48156.  

I haven't been able to successfully process a WU on my computer. I've received many, but they've all resulted in "Computation error".

See screenshot: https://imgur.com/z0vLkoh


You'll have to try one of the suggestions posted by Daniel or [VENETO] sabayonino above. I'm waiting for more WUs to try myself.
ID: 48157 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gianni

Send message
Joined: 11 Jul 08
Posts: 18
Credit: 105,098
RAC: 0
Level

Scientific publications
watwatwat
Message 48158 - Posted: 12 Nov 2017, 4:39:28 UTC - in response to Message 48142.  

we are not aware of fast and free gpu qm applications. if you know one, let us know.
ID: 48158 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 48159 - Posted: 12 Nov 2017, 8:40:07 UTC - in response to Message 48157.  
Last modified: 12 Nov 2017, 9:30:30 UTC

Please do not tweak your system. The current application (QC 3.10) should solve the problem.
ID: 48159 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 48160 - Posted: 12 Nov 2017, 13:28:59 UTC - in response to Message 48158.  

we are not aware of fast and free gpu qm applications. if you know one, let us know.


@UF & @UNC developed ANAKIN-ME to create fast, accurate quantum mechanical simulations. See the demo at #SC17 http://nvda.ws/2zyBhKj


https://twitter.com/NVIDIADC
ID: 48160 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 48161 - Posted: 12 Nov 2017, 16:14:10 UTC - in response to Message 48160.  

Yes, we have that and it is nice, but limited and not a QM code.
ID: 48161 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jul 16
Posts: 338
Credit: 7,987,341,558
RAC: 197,587
Level
Tyr
Scientific publications
watwatwatwatwat
Message 48167 - Posted: 13 Nov 2017, 13:14:17 UTC

I completed one this morning in Ubuntu.
ID: 48167 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 48169 - Posted: 13 Nov 2017, 14:03:22 UTC - in response to Message 48167.  

The new app has 0% failure rate. However, only a handful of hosts are receiving it, for reasons utterly obscure.

This is the only indication i found in the logs:

2017-11-10 20:06:33.9454 [PID=182743] [quota] Overall limits on jobs in progress:
2017-11-10 20:06:33.9454 [PID=182743] [quota] CPU: base 2 scaled 112 njobs 0
2017-11-10 20:06:33.9454 [PID=182743] [quota] GPU: base 2 scaled 0 njobs 0


That "njobs 0" seems to prevent result sending. Any clue hugely appreciated...

ID: 48169 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48170 - Posted: 13 Nov 2017, 14:19:55 UTC - in response to Message 48169.  

The new app has 0% failure rate. However, only a handful of hosts are receiving it, for reasons utterly obscure.

This is the only indication i found in the logs:

2017-11-10 20:06:33.9454 [PID=182743] [quota] Overall limits on jobs in progress:
2017-11-10 20:06:33.9454 [PID=182743] [quota] CPU: base 2 scaled 112 njobs 0
2017-11-10 20:06:33.9454 [PID=182743] [quota] GPU: base 2 scaled 0 njobs 0


That "njobs 0" seems to prevent result sending. Any clue hugely appreciated...

The only reading material I can suggest is http://boinc.berkeley.edu/trac/wiki/ProjectOptions#Joblimits, but I imagine you know that already. Remember to read the following 'Job limits (advanced)' section too.
ID: 48170 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
captainjack

Send message
Joined: 9 May 13
Posts: 171
Credit: 4,594,296,466
RAC: 130,244
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48171 - Posted: 13 Nov 2017, 14:44:30 UTC

For those interested in controlling the number of threads used by the multicore app, the following app_config.xml entries seem to work.

  <app>
    <name>QC</name>
      <max_concurrent>1</max_concurrent>
  </app>
   <app_version>
      <app_name>QC</app_name>
      <plan_class>mt</plan_class>
      <avg_ncpus>9</avg_ncpus>
      <cmdline>--nthreads 9</cmdline>
   </app_version>

The <avg_ncpus> entry tells BOINC the number of threads to reserve for the app.

The <cmdline> entry tells the app the number of threads available for processing.
ID: 48171 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 48172 - Posted: 13 Nov 2017, 14:49:59 UTC - in response to Message 48171.  

Can anybody comment on the suspend/resume behavior under a variety of conditions (ie. with and without "keep in memory")? I expect the calculation to restart from scratch, but not crash.
ID: 48172 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bormolino

Send message
Joined: 16 May 13
Posts: 41
Credit: 145,731,947
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 48173 - Posted: 13 Nov 2017, 15:17:03 UTC

Like many others I don't get any WUs on my linux machines.
ID: 48173 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
captainjack

Send message
Joined: 9 May 13
Posts: 171
Credit: 4,594,296,466
RAC: 130,244
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48174 - Posted: 13 Nov 2017, 15:32:43 UTC

Can anybody comment on the suspend/resume behavior under a variety of conditions (ie. with and without "keep in memory")? I expect the calculation to restart from scratch, but not crash.


When I suspended a task with LAIM on, BOINC manager showed that it was suspended, but the system monitor showed that the task was still busy using all the threads that were allocated to it.

When I suspended a task with LAIM off, BOINC manager showed that the task was suspended and the task disappeared from the system monitor. When the task was resumed, it restarted from 0 and appears to be running normally.
ID: 48174 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 48175 - Posted: 13 Nov 2017, 15:54:35 UTC - in response to Message 48174.  

@captainjack - thanks, appreciated.
ID: 48175 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : News : New multicore app and WUs

©2025 Universitat Pompeu Fabra