Message boards :
Number crunching :
Peer certificate cannot be authenticated
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
after long time, I intended to resume GPUGRID crunching, and so I tried to download GPU tasks on my machine with GTX980ti inside. However, the BOINC event log shows the following: 04.10.2021 07:14:02 | GPUGRID | Requesting new tasks for CPU and NVIDIA GPU 04.10.2021 07:14:04 | GPUGRID | Scheduler request failed: Peer certificate cannot be authenticated with given CA certificates 04.10.2021 07:14:05 | | Project communication failed: attempting access to reference site 04.10.2021 07:14:06 | | Internet access OK - project servers may be temporarily down. I don't think that the server is down, as on the server status page I can see that the number of tasks available dropping continually. So, what's the problem with the "Peer certificate" ? Edit: just tried it with another PC - same problem :-( FYI: both systems are Windows 10 |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
What I also found out now: On a new PC with two RTX3070 inside (OS: Windows 10) I tried to add the GPUGRID project; it did not work. BOINC gave me the message "adding of project failed". However,I don't think that servers are down at GPUGRID. On the server status page, I can observe a permanently chaning figure for "unsent" tasks, and I am able to access everything on the GPUGRID website. So what's the reason for which I cannot 1) download any tasks, and 2) not add GPUGRID as a project on a new PC ? |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Please use the fix for the peer certificate problem with Windows BOINC hosts at the BOINC forums. https://boinc.berkeley.edu/forum_forum.php?id=10 https://boinc.berkeley.edu/forum_thread.php?id=14413 |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
thanks you, Keith, for the hint. I downloaded the certificate and replaced it in the BOINC program folder. Then, downloading tasks worked fine. However, both tasks failed after about 11 seconds with 195 (0xc3) EXIT_CHILD_FAILED see here: https://www.gpugrid.net/result.php?resultid=32649276 and here: https://www.gpugrid.net/result.php?resultid=32649255 anyone any idea what the problem is? |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
You should update the Microsoft Visual C++ runtime library as well: https://aka.ms/vs/16/release/vc_redist.x86.exe https://aka.ms/vs/16/release/vc_redist.x64.exe Then restart Windows. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Zoltan, thanks for the hint. I now tried to crunch tasks on the new PC with the two RTX3070 inside. On this machine, I updated with the Microsoft Visual C++ runtime libraries several weeks ago, after there was a problem with Folding@home. However, the tasks are failing, too. Whereas one interesting thing was that with the three tasks which were downloaded, in one case the cuda version was 101, in the two other cases it was 1121. https://www.gpugrid.net/result.php?resultid=32649296 (101) https://www.gpugrid.net/result.php?resultid=32649291 (1121) https://www.gpugrid.net/result.php?resultid=32649279 (1121) so I suspect the tasks don't run on Ampere cards yet :-( |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
All three tasks failed with app exit status: 0xc0000135 That's the characteristic signal of the missing VC runtime package. Are you sure you installed the correct version? There should be a file 'vcruntime140_1.dll' in your C:\Windows\System32 directory. With that fixed, Ampere cards should run the 1121 version of the app, but the 101 version will still fail. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Richard, 'vcruntime140_1.dll' is indeed missing. Although for sure I had installed the two Visual CC++ files several weeks ago. No idea what happened. I will try to reinstall the two files and see what happens. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
VC runtime package re-installed, a double check has shown that 'vcruntime140_1.dll' is present in C:\Windows\System32 directory. System restart and new download of GPUGRID tasks. However, they also failed after a few seconds: https://www.gpugrid.net/result.php?resultid=32649623 https://www.gpugrid.net/result.php?resultid=32649596 The reason obviously is: ACEMD v2.18 (cuda101) Why do I not received the correct version cuda1121? What's going wrong? |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The reason obviously is: ACEMD v2.18 (cuda101) I retried downloading more tasks, and this time I obviously received the ones with the correct cuda version 1121, because crunching works well. What I notice is that these tasks challenge the GPU quite a lot, the GPUs become markedly hotter than e.g. with Folding@home or WCG GPU tasks. So I had to reduce the power input accordingly in order not to overheat my two RTX3070; I use to run them at 60/61°C, not higher. On another machine with a GTX970 inside the task got downloaded and running, after also there having installed the new peer certificate. However, a rough calculation of the total runtime of this task (e2s130_e1s109p0f1198-ADRIA_AdB_KIXCMYB_HIP-0-2-RND3205_2) yields about 70hours :-((( Which shows that these new task are obviously not really good for older cards. However, for the Ampere cards, the cuda version issue needs to get resolved quickly. Otherwise, it's like lottery :-( |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The reason obviously is: ACEMD v2.18 (cuda101) I think you should install the latest NVidia driver (472.12) to fix this. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
The reason obviously is: ACEMD v2.18 (cuda101) his drivers on the 3070 host are already adequate for the cuda1121 app (which is why he received some). The problem is the project scheduler sending an incompatible app to the Ampere cards. A cuda101 app will never work on Ampere with the gpu-architecture checks in place in the app. a cuda101 app has no knowledge of the Ampere architecture and can't be added in. This is why a CUDA 11.2+ app was required for Ampere cards. (technically the admins could make their app architecture agnostic if they built PTX versions of the kernel to include with the app, but it's clear they don't want to do this or it would require too much work on their end) The project admins are aware of the issue of the cuda101 app being sent to Ampere hosts, and have commented that they are working on a fix to the scheduler.
|
PDWSend message Joined: 7 Mar 14 Posts: 18 Credit: 6,575,125,525 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I have successfully completed 2 tasks today labelled as cuda101 on separate Ampere hosts, the same hosts also had other cuda101 tasks that failed today as well. Drivers are 470.xx |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
I have successfully completed 2 tasks today labelled as cuda101 on separate Ampere hosts since your hosts are hidden, can you link to these Ampere hosts that processed a cuda101 task? when you say "Ampere host" do you mean that they ONLY have Ampere GPU(s) installed? or perhaps some secondary GPU was older and actually performed the computation? was it one of the Beta cuda101 apps?
|
PDWSend message Joined: 7 Mar 14 Posts: 18 Credit: 6,575,125,525 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
since your hosts are hidden, can you link to these Ampere hosts that processed a cuda101 task? I only run single GPU systems, unless you count the iGPUs (which I don't use for BOINC) in some Windows machines. I'm only running some of my linux machines with no iGPUs on GPUGrid at the moment, so no secondary GPUs involved. No, it wasn't a Beta cuda101 task or app. I'll show you the more interesting task as it first failed on a 3070 with a newer driver than mine. It is labelled as "New version of ACEMD v2.18 (cuda101)" and it showed as cuda101 whilst running on the client. http://gpugrid.net/workunit.php?wuid=27080654 I also have a third cuda101 task running, nearly 70% completed, on a different machine with only an Ampere card in it. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Did you reboot the host after fixing the file? Looks like BOINC still threw a fault possibly with the zipping feature that may still depend on the expired SSL certificate. You may have to wait for a new BOINC that has been hinted in development for ASAP release. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
since your hosts are hidden, can you link to these Ampere hosts that processed a cuda101 task? My best guess is it’s some bug/idiosyncrasy on your system where it’s confusing the cuda101 and cuda1121 apps. Maybe it “thinks” it’s running the 101 executable but somehow actually called up the 1121 executable. Just a guess based on the vast majority of 101 failures on that system for the wrong architecture, even after the successful run.
|
PDWSend message Joined: 7 Mar 14 Posts: 18 Credit: 6,575,125,525 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
My best guess is it’s some bug/idiosyncrasy on your system where it’s confusing the cuda101 and cuda1121 apps. Maybe it “thinks” it’s running the 101 executable but somehow actually called up the 1121 executable. Just a guess based on the vast majority of 101 failures on that system for the wrong architecture, even after the successful run. This "bug/idiosyncrasy" is also present on a different host with a different card and different OS ! When a cuda101 task normally fails it has something like this in the stderr file: 08:31:00 (56461): wrapper: running bin/acemd3 (--boinc --device 0)
ACEMD failed:
Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)
08:31:01 (56461): bin/acemd3 exited; CPU time 1.217885So the program that is run is acemd3 (with the parameters), it then fails and exits. The cuda101 that is currently running on a 3080 on a different machine is currently showing acemd3 in 'top' with the parameters. Surely that is the same program ? The progress.log file in the slot folder shows a 3080 running so it knows what hardware it has to work with. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
It could be a bug in the app too I guess. I just haven’t seen that anywhere else. If you look in the gpugrid.net project folder you’ll find two separate executables for acemd3. Only difference is one is compiled with cuda101 and the other is compiled with cuda1121, according to top, both are referred to as just “acemd3”. I don’t know the exact mechanism that could cause one app to be used in place of another, but since both are present I can imagine it happening. BOINC isn’t the most robust software, lots of bugs, especially in older versions. But to me this makes more sense than some cuda101 tasks randomly working on Ampere when cuda101 has no knowledge of Ampere.
|
©2025 Universitat Pompeu Fabra