Updates to the QMML app

Message boards : Multicore CPUs : Updates to the QMML app

Author	Message
Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48835 - Posted: 6 Feb 2018 \| 10:47:20 UTC
	Two changes were made yesterday: * CPU threads are limited to 4 (you should still be able to crunch multiple WUs at once, please check) * Credits should be in line with other projects' Let us know.
	ID: 48835 \| Rating: 0 \| rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 1,168 Level Scientific publications	Message 48841 - Posted: 6 Feb 2018 \| 15:28:23 UTC
	I have a 2600K processor with 6 logical cores available for the QC app. This app_config.xml starts 3 concurrent QC apps with 2 cores each. We'll see how it goes. <app_config> <app> <name>QC</name> <max_concurrent>3</max_concurrent> </app> <app_version> <app_name>QC</app_name> <plan_class>mt</plan_class> <avg_ncpus>2</avg_ncpus> <cmdline>--nthreads 2</cmdline> </app_version> </app_config>
	ID: 48841 \| Rating: 0 \| rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 1,168 Level Scientific publications	Message 48844 - Posted: 6 Feb 2018 \| 16:48:38 UTC
	It seems one task of 3 is progressing much more slowly than the other 2 so I'll reduce max concurrent tasks to 2 in the app_config. Is there a limit to using 4 logical cores per processor?
	ID: 48844 \| Rating: 0 \| rate: / Reply Quote

klepel Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,759,881,008 RAC: 618,108 Level Scientific publications	Message 48846 - Posted: 6 Feb 2018 \| 17:57:45 UTC - in response to Message 48835.
	Toni, there has been an other reason, I abstained from the QMML app on the AMD1700x computer apart from failing WUs and freezing the computer: QMML app clogs the scheduler of BOINC and therefor blocks the other projects to download additional WUs automaticly when requiered (I do run only one instance of QMML app on this computer). So the computer ends-up to run only one instance of QMML app and a GPU app. When I ask manually for additional WUs on the other projects, then they get downloaded. So it is to run only GPUGRID on this computer or run GPUGRID only on the GPU. * CPU threads are limited to 4 (you should still be able to crunch multiple WUs at once, please check) This would not have been necessary as it could easily changed by app-config. And a working app_config is already circulating in the forums. This now limits power user to make there own adjustments, means let run all threads on one single QMML app WU. See problem above.
	ID: 48846 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1376 Credit: 8,054,669,922 RAC: 6,103,121 Level Scientific publications	Message 48848 - Posted: 6 Feb 2018 \| 21:54:16 UTC
	I haven't had any issue with the QC tasks running alongside my usual SETI tasks. I reduce the number of Seti cpu tasks when I run the QC task set to use 4 cores. The Seti and Einstein projects download and run tasks normally without any manual intervention. I did have to increase my disk space for the QC task I ran this morning when GPUGrid complained it needed an additional 600 MB of space. I have a larger than normal amount of Seti work on board today to make through the scheduled outage. Maybe the disk space needs to be increase or the resource share changed. I have a AMD 1800X in my Ryzen cruncher.
	ID: 48848 \| Rating: 0 \| rate: / Reply Quote

klepel Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,759,881,008 RAC: 618,108 Level Scientific publications	Message 48849 - Posted: 6 Feb 2018 \| 22:20:29 UTC
	Thanks Keith for your comments. I do not run SETI on the CPU, I run PRIMEGRID and SETI on the GPU only. I do run ODLK1 on the CPU, as this is the only project that does not freeze frequently my computer/CPU (once a day). The ODLK1 tasks however do not download when I am crunching the QMML app. The freezing might as well be that I am mildly overclocked the CPU (3770 MHz) and run the RAM at 2966 MHz (RAM is rated at 3000 MHz), but I have not had the time to investigate further.
	ID: 48849 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1376 Credit: 8,054,669,922 RAC: 6,103,121 Level Scientific publications	Message 48852 - Posted: 7 Feb 2018 \| 0:38:42 UTC - in response to Message 48849.
	I've gotten pretty good at tuning Ryzen for 24/7 distributed computing. I have had the 1700X since launch in March of last year. I run the 1700X at 3.9Ghz and the memory at 3333Mhz CL14 with fast timings. The 1800X being newer and better made runs at 3.95 Ghz and 3333Mhz CL14 fast timings. Do you get BSOD's or black screens? BSOD's are almost invariably due to aggressive memory clocks, memory instability or IMC weakness. Black screens, (computer appears frozen, no display or keyboard or mouse input recognized) with no error logs generated are caused by cpu lockup because of insufficient VDDCR cpu voltage for the desired cpu clocks. Both my Ryzens run for weeks without errors. Only reason uptime is not longer is because of OS updates or whatever.
	ID: 48852 \| Rating: 0 \| rate: / Reply Quote

klepel Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,759,881,008 RAC: 618,108 Level Scientific publications	Message 48854 - Posted: 7 Feb 2018 \| 4:07:03 UTC - in response to Message 48852.
	Black screens, (computer appears frozen, no display or keyboard or mouse input recognized) “with no error logs generated” are caused by cpu lockup because of insufficient VDDCR cpu voltage for the desired cpu clocks. It is a black screen with the symptoms you describe. I would even say, I am not overclocking at all: I do have a ASUS Prime X370-Pro motherboard and in Bios Settings it asks me, if I am on Water-Cooling, which I am (Corsair Liquid CPU Cooler H60) and then it gives me 3770 MHz, that is all what I did. Similar with RAM: I just adjusted the frequency in BIOS to the Frequency of the RAM specification nothing else. So if you might help with overclocking or with stabilizing the system at these frequencies, would be highly appreciated. Then I will try to switch back to the QMML app.
	ID: 48854 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1376 Credit: 8,054,669,922 RAC: 6,103,121 Level Scientific publications	Message 48855 - Posted: 7 Feb 2018 \| 5:06:23 UTC - in response to Message 48854.
	We probably should converse via PM so as to not pollute or hijack the thread.
	ID: 48855 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1376 Credit: 8,054,669,922 RAC: 6,103,121 Level Scientific publications	Message 48856 - Posted: 7 Feb 2018 \| 5:34:48 UTC
	Wow, how did you manage to get 1200-1300 credits for your QC tasks today. What's your secret?
	ID: 48856 \| Rating: 0 \| rate: / Reply Quote

NUCCpod_NAPTIMELABS_01 Send message Joined: 18 Aug 17 Posts: 6 Credit: 174,440,173 RAC: 0 Level Scientific publications	Message 48860 - Posted: 7 Feb 2018 \| 7:54:45 UTC - in response to Message 48835.
	So far with testing, I have only been able to run QMML work units on systems with up to 4 cores. On any of my systems with 8 or 16 cores, attempting to run multiple QMMLs, they all end prematurely with a computational error.
	ID: 48860 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,773,367,558 RAC: 72,226 Level Scientific publications	Message 48861 - Posted: 7 Feb 2018 \| 11:28:32 UTC - in response to Message 48860.
	So far with testing, I have only been able to run QMML work units on systems with up to 4 cores. On any of my systems with 8 or 16 cores, attempting to run multiple QMMLs, they all end prematurely with a computational error. Probably because there are issues if two tasks start up at the same time. You'll have to limit QC tasks to 1 concurrent task at a time.
	ID: 48861 \| Rating: 0 \| rate: / Reply Quote

klepel Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,759,881,008 RAC: 618,108 Level Scientific publications	Message 48865 - Posted: 7 Feb 2018 \| 16:32:31 UTC
	@Keith: Thanks for you PM. @Keith: I did nothing! Toni changed the credits and my two computers have higher credits. @NUCCpod_NAPTIMELABS_01 and mmonnin: It is correct you have to limit QMML app to only one concurrent task at a time. Then it works on my AMDs. There is circulating an app_config in the forums that works.
	ID: 48865 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1376 Credit: 8,054,669,922 RAC: 6,103,121 Level Scientific publications	Message 48868 - Posted: 7 Feb 2018 \| 17:03:50 UTC - in response to Message 48865.
	@klepel: Well Toni didn't change the credits for me it appears.
	ID: 48868 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1376 Credit: 8,054,669,922 RAC: 6,103,121 Level Scientific publications	Message 48875 - Posted: 7 Feb 2018 \| 19:42:17 UTC
	I just downloaded a couple more QC tasks hoping that the one I did yesterday was a fluke or carryover from the "old" tasks with the tiny credit. Nope. Still getting very little credit for these QC tasks and not worth tying up 4 cores. Haven't a clue why I get such little credit and others are getting 24 times more for the same cpu elapsed times. Run time 1,854.06 CPU time 7,224.95 Validate state Valid Credit 47.13
	ID: 48875 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48878 - Posted: 7 Feb 2018 \| 23:01:29 UTC - in response to Message 48875.
	Haven't a clue why I get such little credit and others are getting 24 times more for the same cpu elapsed times. Not consistently. They vary all over the place. Your values are a little low for the moment, but you need more data points to draw much of a conclusion. Mine vary a lot too (Ryzen 1700, not overclocked). http://www.gpugrid.net/results.php?hostid=452287&offset=0&show_names=0&state=3&appid= Note that those are with two cores per work unit, but that should not affect the credit per work unit, in a perfect world at least. And note that the longer work units often get less credit than the shorter ones, so the credit system is strange in any case. I think the points are a little more consistent on my Intel machines, and probably a little higher than on the Ryzen machine on average, though I have not tried to calculate it yet. i7-3770: http://www.gpugrid.net/results.php?hostid=433866&offset=0&show_names=0&state=3&appid= i7-4790: http://www.gpugrid.net/results.php?hostid=334241&offset=0&show_names=0&state=3&appid= However, I normally pay no attention to credits, and the Ryzen seems to run comparably fast as the Intels insofar as I can see at the moment, which is the only thing that matters to me.
	ID: 48878 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1376 Credit: 8,054,669,922 RAC: 6,103,121 Level Scientific publications	Message 48882 - Posted: 8 Feb 2018 \| 1:13:22 UTC - in response to Message 48878.
	Just give me one QC task that gets as much credit as yours or klebel's and I would have hope. Alas the 110 credits I got yesterday for this Task 16998146 is the most I've ever seen. My credits have ranged from 6-47 over 35 tasks so far with the one above the only outlier.
	ID: 48882 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1376 Credit: 8,054,669,922 RAC: 6,103,121 Level Scientific publications	Message 48885 - Posted: 8 Feb 2018 \| 20:29:14 UTC
	OK, so I once again crunched some more QC task. This time I reduced the core count to two to see if it made any difference. Nope. Still extremely low credit compared to everyone else that has posted in these threads. Run time 4,362.76 CPU time 8,635.24 Validate state Valid Credit 91.52 Run time 4,270.13 CPU time 8,440.52 Validate state Valid Credit 90.31 Task 17003892 Task 17003905
	ID: 48885 \| Rating: 0 \| rate: / Reply Quote

DRSMT Send message Joined: 23 Feb 17 Posts: 21 Credit: 5,528,142,362 RAC: 1,893,481 Level Scientific publications	Message 48886 - Posted: 9 Feb 2018 \| 9:05:03 UTC
	The problem with multiple WUs starting at the same time, should be fixed (or otherwise a lot of calculation errors will be produced).
	ID: 48886 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1376 Credit: 8,054,669,922 RAC: 6,103,121 Level Scientific publications	Message 48889 - Posted: 9 Feb 2018 \| 18:42:29 UTC - in response to Message 48886.
	Or just set max_concurrent to 1 and avoid the issue entirely until the applications and software for the problem gets resolved.
	ID: 48889 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,773,367,558 RAC: 72,226 Level Scientific publications	Message 48890 - Posted: 9 Feb 2018 \| 21:58:26 UTC - in response to Message 48889. Last modified: 9 Feb 2018 \| 22:03:54 UTC
	Or just set max_concurrent to 1 and avoid the issue entirely until the applications and software for the problem gets resolved. Until the BM queue fills up with just QC tasks and all other cores go idle. Better to just avoid the entire application in the 1st place. If it's not worth the admins time to fix known issues that cause errors 100% of the time in known situations then its not worth the time for donors to run.
	ID: 48890 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1376 Credit: 8,054,669,922 RAC: 6,103,121 Level Scientific publications	Message 48891 - Posted: 9 Feb 2018 \| 22:44:25 UTC - in response to Message 48890.
	Every donor is different. I don't have GPUGrid as my sole project so the crunchers never go idle, there is always work being done for someone. As with most projects, there is always a shortage of manpower, money or time for keeping applications current and working.
	ID: 48891 \| Rating: 0 \| rate: / Reply Quote

Dayle Diamond Send message Joined: 5 Dec 12 Posts: 84 Credit: 1,663,883,415 RAC: 0 Level Scientific publications	Message 48925 - Posted: 13 Feb 2018 \| 3:27:21 UTC
	I'll be watching for updates, but for now I'm also turning off CPU tasks. Scheduler is telling batches to activate all at once. Rows of errors. I acknowledge that user fixes have been recommended but I don't want to program a solution that's beyond my capacity to correct once circumstances change. Hope to be back soon!
	ID: 48925 \| Rating: 0 \| rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 1,168 Level Scientific publications	Message 48939 - Posted: 14 Feb 2018 \| 15:48:12 UTC
	I set up the quantum chem app on a second computer. I started with the following app_config.xml file: <app_config> <app> <name>QC</name> <max_concurrent>1</max_concurrent> </app> <app_version> <app_name>QC</app_name> <plan_class>mt</plan_class> <avg_ncpus>4</avg_ncpus> <cmdline>--nthreads 4</cmdline> </app_version> </app_config> Once the Work Units downloaded and 1 WU started w/4 threads, I edited the app_config.xml to change <max_concurrent>1</max_concurrent> to <max_concurrent>2</max_concurrent>. Then I restarted the boinc client and now I have 2 work units running simultaneously with 4 threads each. So far so good.
	ID: 48939 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1376 Credit: 8,054,669,922 RAC: 6,103,121 Level Scientific publications	Message 48948 - Posted: 14 Feb 2018 \| 21:53:11 UTC
	But there is still the issue that whenever your QC tasks finish and more than 2 tasks are downloaded, that your system can try to start both tasks at the same time and then both will fail. There is no guaranteed method to stagger starting of multiple QC tasks in an unattended system on auto. The only way to get around this situation is to do exactly what you posted. But requires your intervention at each new task startup. Or just set max_concurrent to 1 and be done with it.
	ID: 48948 \| Rating: 0 \| rate: / Reply Quote

Conan Send message Joined: 25 Mar 09 Posts: 25 Credit: 582,385 RAC: 0 Level Scientific publications	Message 48950 - Posted: 14 Feb 2018 \| 23:10:33 UTC Last modified: 14 Feb 2018 \| 23:12:52 UTC
	I am not seeing this situation at all. My AMD Linux Fedora systems download more than one at a time but only ever try to run one at a time. Even on my 8 core (+ 8 HT) only one starts and other projects keep running. I am also running other applications but on a 4 core computer when GPU Grid starts it is the only thing that runs and only one at a time. Not trying to control how things are run. So I don't know what could be causing problems on your computers. BOINC versions are 7.4.25 and 7.6.22 Conan
	ID: 48950 \| Rating: 0 \| rate: / Reply Quote

Dayle Diamond Send message Joined: 5 Dec 12 Posts: 84 Credit: 1,663,883,415 RAC: 0 Level Scientific publications	Message 48954 - Posted: 15 Feb 2018 \| 1:19:37 UTC
	Quick update: I checked my logs and this thread again to see if I could resume tasks. To my surprise, tasks were resumed while I had still opted out. ACEMD short runs (2-3 hours on fastest card): yes ACEMD long runs (8-12 hours on fastest GPU): yes ACEMD Beta: yes Quantum Chemistry (Linux, CPU): no Python Runtime : yes I don't know what's broken but it's not cool.
	ID: 48954 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1376 Credit: 8,054,669,922 RAC: 6,103,121 Level Scientific publications	Message 48955 - Posted: 15 Feb 2018 \| 2:55:35 UTC - in response to Message 48950.
	I am not seeing this situation at all. My AMD Linux Fedora systems download more than one at a time but only ever try to run one at a time. Even on my 8 core (+ 8 HT) only one starts and other projects keep running. I am also running other applications but on a 4 core computer when GPU Grid starts it is the only thing that runs and only one at a time. Not trying to control how things are run. So I don't know what could be causing problems on your computers. BOINC versions are 7.4.25 and 7.6.22 Conan Several posters have reported the problem of starting two QC tasks at the same time or within 5 seconds of each other causes the first task to error out. See Message 48589
	ID: 48955 \| Rating: 0 \| rate: / Reply Quote

captainjack Send message Joined: 9 May 13 Posts: 171 Credit: 4,379,345,966 RAC: 9,019,539 Level Scientific publications	Message 48956 - Posted: 15 Feb 2018 \| 3:20:37 UTC
	Dayle Diamond said: Quick update: I checked my logs and this thread again to see if I could resume tasks. To my surprise, tasks were resumed while I had still opted out. Dayle, do you perchance have the preference box checked for If no work for selected applications is available, accept work from other applications?
	ID: 48956 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,773,367,558 RAC: 72,226 Level Scientific publications	Message 48959 - Posted: 15 Feb 2018 \| 3:50:20 UTC
	If one has more than 4 cores and running max concurrent = 1, there is still the chance that boinc manager will flood your queue with all QC tasks and nothing from the other project. Esp when just starting up that setup before BM gets a better handle of the resource share it is a long term resource share. I've personally had this happen using max concurrent on another project. 4 threads working, 28 idle. It's far from a perfect solution.
	ID: 48959 \| Rating: 0 \| rate: / Reply Quote

Dayle Diamond Send message Joined: 5 Dec 12 Posts: 84 Credit: 1,663,883,415 RAC: 0 Level Scientific publications	Message 48982 - Posted: 18 Feb 2018 \| 20:19:20 UTC - in response to Message 48956.
	Dayle, do you perchance have the preference box checked for If no work for selected applications is available, accept work from other applications? Oops. Thank you ><. Happy Crunching, I'll be back with these tasks once things have stabilized.
	ID: 48982 \| Rating: 0 \| rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 1,168 Level Scientific publications	Message 48993 - Posted: 19 Feb 2018 \| 12:45:44 UTC - in response to Message 48939.
	I set up the quantum chem app on a second computer. I started with the following app_config.xml file: <app_config> <app> <name>QC</name> <max_concurrent>1</max_concurrent> </app> <app_version> <app_name>QC</app_name> <plan_class>mt</plan_class> <avg_ncpus>4</avg_ncpus> <cmdline>--nthreads 4</cmdline> </app_version> </app_config> Once the Work Units downloaded and 1 WU started w/4 threads, I edited the app_config.xml to change <max_concurrent>1</max_concurrent> to <max_concurrent>2</max_concurrent>. Then I restarted the boinc client and now I have 2 work units running simultaneously with 4 threads each. So far so good. This approach, although a bit cumbersome, works. I've completed 189 Work Units with no errors. I think special badges for Quantum Chemistry contribution would be a draw for more users.
	ID: 48993 \| Rating: 0 \| rate: / Reply Quote

NUCCpod_NAPTIMELABS_01 Send message Joined: 18 Aug 17 Posts: 6 Credit: 174,440,173 RAC: 0 Level Scientific publications	Message 49065 - Posted: 22 Feb 2018 \| 2:23:34 UTC
	So, forgive me if this has been answered. Is this planned to be fixed at some point?
	ID: 49065 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1376 Credit: 8,054,669,922 RAC: 6,103,121 Level Scientific publications	Message 49066 - Posted: 22 Feb 2018 \| 2:41:29 UTC - in response to Message 49065.
	No it hasn't been answered. Or even addressed by the developer as far as I can tell. Seems the resources lately have been in deploying and debugging the WSL QC app.
	ID: 49066 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Multicore CPUs : Updates to the QMML app

	About	Science	Volunteers	Performance	Forum	Join us	Donate