Message boards :
News :
New workunits
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 11 · Next
Author | Message |
---|---|
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Thanks, but I believe you misread me. The CPU is fine. The measurement is wrong. No, I believe the measurement is incorrect but is still going to be rather high in actuality. The Ryzen 3600 ships with the Wraith Stealth cooler which is just the normal Intel solution of a copper plug embedded into a aluminum casting. It just doesn't have the ability to quickly move heat away from the IHS. You would see much better temps if you switched to the Wraith MAX or Wraith Prism cooler which have real heat pipes and normal sized fans. The temps are correct for the Ryzen and Ryzen+ cpus, but the k10temp driver which is stock in Ubuntu didn't get the change needed to accommodate the Ryzen 2 cpus with the correct 0 temp offset. That only is shipping in the 5.3.4 or 5.4 kernels. https://www.phoronix.com/scan.php?page=news_item&px=AMD-Zen2-k10temp-Patches There are other solutions you could use in the meantime like the ASUS temp driver if you have a compatible motherboard or there also is a zenpower driver that can report the proper temp as well as the cpu power. https://github.com/ocerman/zenpower |
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
Damn! Wishful thinking! How about 4.75. To many numbers on my screen It's because it shows 4075 and then I automatically drop in the . at 2 places not realizing my mistake! As far as temperature goes, I am only reporting the CPU temp at the sensor point. I have sent a webform to Arctic asking them what the temp would be at the radiator after passing by the CPU heatsink. The air temp of the exhaust air does not feel anywhere near 80. I would put it down around 40C or less. I will see what they say and let you know. |
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
Tony - I keep getting this on random tasks unknown error) - exit code 195 (0xc3)</message> <stderr_txt> 13:11:40 (25792): wrapper (7.9.26016): starting 13:11:40 (25792): wrapper: running acemd3.exe (--boinc input --device 1) # Engine failed: Particle coordinate is nan 13:37:25 (25792): acemd3.exe exited; CPU time 1524.765625 13:37:25 (25792): app exit status: 0x1 13:37:25 (25792): called boinc_finish(195) It runs 1524 seconds and bombs. What's up with that? It also appears that BOINC or the task is ignoring the appconfig command to use only my 1050. I see another task that is starting on the 1050 and then jumping to the 650 since the 1050 is tied up with another project. |
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The temps are correct for the Ryzen and Ryzen+ cpus, but the k10temp driver which is stock in Ubuntu didn't get the change needed to accommodate the Ryzen 2 cpus with the correct 0 temp offset. That only is shipping in the 5.3.4 or 5.4 kernels. Then it is probably reading 20C too high, and the CPU is really at 75C. Yes, I can improve on that. Thanks. |
Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Tony - I keep getting this on random tasks # Engine failed: Particle coordinate is nan Two issues can cause this error: 1. Error in the Task. This would mean all Hosts fail the task. See this link for details: https://github.com/openmm/openmm/issues/2308 2. If other Hosts do not fail the task, the error could be in the GPU Clock rate. I have tested this on one of my hosts and am able to produce this error when I Clock the GPU too high. It also appears that BOINC or the task is ignoring the appconfig command to use only my 1050. One setting to try....In Boinc Manager, Computer Preferences, set the "Switch between tasks every xxx minutes" to between 800 - 9999. This should allow the task to finish on the same GPU it started on. Can you post your app_config.xml file contents? |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
I've had a couple of the NaN errors. One where everyone errors out the task and another recently where it errored out after running through to completion. I had already removed all overclocking on the card but it still must have been too hot for the stock clockrate. It is my hottest card being sandwiched in the middle of the gpu stack with very little airflow. I am going to have to start putting in negative clock offset on it to get the temps down I think to avoid any further NaN errors on that card. |
Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've had a couple of the NaN errors. One where everyone errors out the task and another recently where it errored out after running through to completion. I had already removed all overclocking on the card but it still must have been too hot for the stock clockrate. It is my hottest card being sandwiched in the middle of the gpu stack with very little airflow. I am going to have to start putting in negative clock offset on it to get the temps down I think to avoid any further NaN errors on that card. Would be interested to hear if the Under Clocking / Heat reduction fixes the issue. I am fairly confident this is the issue, but need validation / more data from fellow volunteers to be sure. |
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 869 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
http://www.gpugrid.net/show_host_detail.php?hostid=147723 that's really interesting: the comparison of above two tasks shows that the host with the GTX1660ti yields lower GFLOP figures (single as well as double) as the host with the GTX1650. In both hosts, the CPU ist the same: Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz. And now the even more surprising fact: by coincidence, exactly the same CPU is running in one of my hosts (http://www.gpugrid.net/show_host_detail.php?hostid=205584) with a GTX750ti - and here the GFLOP figures are even markedly higher than in the abeove cited hosts with more modern GPUs. So, is the conclusion now: the weaker the GPU, the higher the number of GFLOPs generated by the system? |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The "Integer" (I hope it's called this way in English) speed measured is way much higher under Linux than under Windows.http://www.gpugrid.net/show_host_detail.php?hostid=147723that's really interesting: the comparison of above two tasks shows that the host with the GTX1660ti yields lower GFLOP figures (single as well as double) as the host with the GTX1650. (the 1st and 2nd host use Linux, the 3rd use Windows) See the stats of my dual boot host: Linux 139876.18 - Windows 12615.42 There's more than one order of magnitude difference between the two OS on the same hardware, one of them must be wrong. |
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
Damn! Wishful thinking! ------------------------ Hi Greg I talked to my colleague who is in the Liquid Freezer II Dev. Team and he said that theese temps are normal with this kind of load. Installation sounds good to me. With kind regards Your ARCTIC Team, Stephan Arctic/Service Manager |
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
Tony - I keep getting this on random tasks -------------------- <?xml version="1.0"?> -<app_config> -<exclude_gpu> <url>www.gpugrid.net</url> <device_num>1</device_num> <type>NVIDIA</type> </exclude_gpu> </app_config> I was having some issues with LHC ATLAS and was in the process of putting the tasks on pause and then disconnecting the client. In this process I discovered that another instance popped up right after I closed the one I was looking at and then I got another instance popping up with a message saying that there were two running. I shut that down and it shut down the last instance. This is a first for me. I have restarted my computer and now will wait and see whats going on. |
Send message Joined: 10 Jun 13 Posts: 9 Credit: 295,692,471 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
What you posted is a mix of app_config.xml and cc_config.xml. Be so kind as to strictly follow the hints and examples on this page: https://boinc.berkeley.edu/wiki/Client_configuration |
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
What you posted is a mix of app_config.xml and cc_config.xml. You give me a page on CC config. I jumped down to what appears to be stuff related to app_config and copied this <exclude_gpu> <url>project_URL</url> [<device_num>N</device_num>] [<type>NVIDIA|ATI|intel_gpu</type>] [<app>appname</app>] </exclude_gpu> project id is the gpugrid.net device = 1 type is nvidia removed app name since app name changes so much *****GPUGRID: Notice from BOINC Missing <app_config> in app_config.xml 11/28/2019 8:24:51 PM*** This is why I had it in the text. |
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
What the heck now???!!! A slew of Exit child errors! What is this? Is this speed problems with OC? Also getting restart on different device errors!!! Now this...is that because something is not right in the app_config? |
![]() Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
<cc_config> <exclude_gpu> <url>project_URL</url> [<device_num>N</device_num>] [<type>NVIDIA|ATI|intel_gpu</type>] [<app>appname</app>] </exclude_gpu </cc_config> This needs to go into the Boinc folder not the GPUGrid project folder ![]() ![]() |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
If you are going to use an exclude, then you need to exclude all dissimilar devices than the one you want to use. That is how to get rid of restart on different device errors. Or just set the switch between tasks to 360minutes or greater and don't exit BOINC while the task is running. The device number you use in the exclude statement is defined by how BOINC enumerates the cards in the Event Log at startup. The gpu_exclude statement goes into cc_config.xml in the main BOINC directory, not a project directory. |
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
What the heck now???!!! I see two types of errors: ERROR: src\mdsim\context.cpp line 322: Cannot use a restart file on a different device! as the name says, exclusion not working. And # Engine failed: Particle coordinate is nan this usually indicates mathematical errors in the operations performed, memory corruption, or similar (or a faulty wu, unlikely in this case). Maybe a reboot will solve it. |
Send message Joined: 10 Jun 13 Posts: 9 Credit: 295,692,471 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
You give me a page on CC config. I posted the official documentation for more than just cc_config.xml: cc_config.xml nvc_config.xml app_config.xml It's worth to carefully read this page a couple of times as it provides all you need to know. Long ago the page had a direct link to the app_config.xml section. Unfortunately that link is not available any more but you may use your browser's find function. |
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
If you are going to use an exclude, then you need to exclude all dissimilar devices than the one you want to use. That is how to get rid of restart on different device errors. Or just set the switch between tasks to 360minutes or greater and don't exit BOINC while the task is running. Ok, on point 1, it was set for 360 already because that's a good time for LHC ATLAS to run complete. I moved it up to 480 now to try and deal with this stuff in GPUGRID. Point 2 - Going to try a cc_config with a triple exclude gpu code block for here and for 2 other projects. From what I read this should be possible. |
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
What the heck now???!!! One of these days I will get this problem solved. Driving me nuts! |
©2025 Universitat Pompeu Fabra