Peer certificate cannot be authenticated

Author	Message
Erich56 Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 10,668 Level Scientific publications	Message 57437 - Posted: 4 Oct 2021, 5:19:54 UTC Last modified: 4 Oct 2021, 5:24:24 UTC after long time, I intended to resume GPUGRID crunching, and so I tried to download GPU tasks on my machine with GTX980ti inside. However, the BOINC event log shows the following: 04.10.2021 07:14:02 \| GPUGRID \| Requesting new tasks for CPU and NVIDIA GPU 04.10.2021 07:14:04 \| GPUGRID \| Scheduler request failed: Peer certificate cannot be authenticated with given CA certificates 04.10.2021 07:14:05 \| \| Project communication failed: attempting access to reference site 04.10.2021 07:14:06 \| \| Internet access OK - project servers may be temporarily down. I don't think that the server is down, as on the server status page I can see that the number of tasks available dropping continually. So, what's the problem with the "Peer certificate" ? Edit: just tried it with another PC - same problem :-( FYI: both systems are Windows 10 ID: 57437 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 10,668 Level Scientific publications	Message 57439 - Posted: 4 Oct 2021, 5:57:14 UTC What I also found out now: On a new PC with two RTX3070 inside (OS: Windows 10) I tried to add the GPUGRID project; it did not work. BOINC gave me the message "adding of project failed". However,I don't think that servers are down at GPUGRID. On the server status page, I can observe a permanently chaning figure for "unsent" tasks, and I am able to access everything on the GPUGRID website. So what's the reason for which I cannot 1) download any tasks, and 2) not add GPUGRID as a project on a new PC ? ID: 57439 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 0 Level Scientific publications	Message 57440 - Posted: 4 Oct 2021, 6:17:22 UTC Please use the fix for the peer certificate problem with Windows BOINC hosts at the BOINC forums. https://boinc.berkeley.edu/forum_forum.php?id=10 https://boinc.berkeley.edu/forum_thread.php?id=14413 ID: 57440 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 10,668 Level Scientific publications	Message 57441 - Posted: 4 Oct 2021, 6:38:34 UTC thanks you, Keith, for the hint. I downloaded the certificate and replaced it in the BOINC program folder. Then, downloading tasks worked fine. However, both tasks failed after about 11 seconds with 195 (0xc3) EXIT_CHILD_FAILED see here: https://www.gpugrid.net/result.php?resultid=32649276 and here: https://www.gpugrid.net/result.php?resultid=32649255 anyone any idea what the problem is? ID: 57441 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 57442 - Posted: 4 Oct 2021, 6:53:12 UTC - in response to Message 57441. Last modified: 4 Oct 2021, 6:53:51 UTC You should update the Microsoft Visual C++ runtime library as well: https://aka.ms/vs/16/release/vc_redist.x86.exe https://aka.ms/vs/16/release/vc_redist.x64.exe Then restart Windows. ID: 57442 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 10,668 Level Scientific publications	Message 57443 - Posted: 4 Oct 2021, 7:05:26 UTC - in response to Message 57442. Zoltan, thanks for the hint. I now tried to crunch tasks on the new PC with the two RTX3070 inside. On this machine, I updated with the Microsoft Visual C++ runtime libraries several weeks ago, after there was a problem with Folding@home. However, the tasks are failing, too. Whereas one interesting thing was that with the three tasks which were downloaded, in one case the cuda version was 101, in the two other cases it was 1121. https://www.gpugrid.net/result.php?resultid=32649296 (101) https://www.gpugrid.net/result.php?resultid=32649291 (1121) https://www.gpugrid.net/result.php?resultid=32649279 (1121) so I suspect the tasks don't run on Ampere cards yet :-( ID: 57443 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 57444 - Posted: 4 Oct 2021, 7:15:40 UTC - in response to Message 57443. Last modified: 4 Oct 2021, 7:22:11 UTC All three tasks failed with app exit status: 0xc0000135 That's the characteristic signal of the missing VC runtime package. Are you sure you installed the correct version? There should be a file 'vcruntime140_1.dll' in your C:\Windows\System32 directory. With that fixed, Ampere cards should run the 1121 version of the app, but the 101 version will still fail. ID: 57444 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 10,668 Level Scientific publications	Message 57445 - Posted: 4 Oct 2021, 7:43:58 UTC - in response to Message 57444. Richard, 'vcruntime140_1.dll' is indeed missing. Although for sure I had installed the two Visual CC++ files several weeks ago. No idea what happened. I will try to reinstall the two files and see what happens. ID: 57445 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 10,668 Level Scientific publications	Message 57446 - Posted: 4 Oct 2021, 10:18:13 UTC VC runtime package re-installed, a double check has shown that 'vcruntime140_1.dll' is present in C:\Windows\System32 directory. System restart and new download of GPUGRID tasks. However, they also failed after a few seconds: https://www.gpugrid.net/result.php?resultid=32649623 https://www.gpugrid.net/result.php?resultid=32649596 The reason obviously is: ACEMD v2.18 (cuda101) Why do I not received the correct version cuda1121? What's going wrong? ID: 57446 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 10,668 Level Scientific publications	Message 57447 - Posted: 4 Oct 2021, 10:35:38 UTC - in response to Message 57446. The reason obviously is: ACEMD v2.18 (cuda101) Why do I not received the correct version cuda1121? What's going wrong? I retried downloading more tasks, and this time I obviously received the ones with the correct cuda version 1121, because crunching works well. What I notice is that these tasks challenge the GPU quite a lot, the GPUs become markedly hotter than e.g. with Folding@home or WCG GPU tasks. So I had to reduce the power input accordingly in order not to overheat my two RTX3070; I use to run them at 60/61°C, not higher. On another machine with a GTX970 inside the task got downloaded and running, after also there having installed the new peer certificate. However, a rough calculation of the total runtime of this task (e2s130_e1s109p0f1198-ADRIA_AdB_KIXCMYB_HIP-0-2-RND3205_2) yields about 70hours :-((( Which shows that these new task are obviously not really good for older cards. However, for the Ampere cards, the cuda version issue needs to get resolved quickly. Otherwise, it's like lottery :-( ID: 57447 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 57455 - Posted: 4 Oct 2021, 14:58:13 UTC - in response to Message 57447. The reason obviously is: ACEMD v2.18 (cuda101) Why do I not received the correct version cuda1121? What's going wrong? I retried downloading more tasks, and this time I obviously received the ones with the correct cuda version 1121 ... However, for the Ampere cards, the cuda version issue needs to get resolved quickly. Otherwise, it's like lottery :-( I think you should install the latest NVidia driver (472.12) to fix this. ID: 57455 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1117 Credit: 40,876,970,595 RAC: 0 Level Scientific publications	Message 57456 - Posted: 4 Oct 2021, 15:14:27 UTC - in response to Message 57455. The reason obviously is: ACEMD v2.18 (cuda101) Why do I not received the correct version cuda1121? What's going wrong? I retried downloading more tasks, and this time I obviously received the ones with the correct cuda version 1121 ... However, for the Ampere cards, the cuda version issue needs to get resolved quickly. Otherwise, it's like lottery :-( I think you should install the latest NVidia driver (472.12) to fix this. his drivers on the 3070 host are already adequate for the cuda1121 app (which is why he received some). The problem is the project scheduler sending an incompatible app to the Ampere cards. A cuda101 app will never work on Ampere with the gpu-architecture checks in place in the app. a cuda101 app has no knowledge of the Ampere architecture and can't be added in. This is why a CUDA 11.2+ app was required for Ampere cards. (technically the admins could make their app architecture agnostic if they built PTX versions of the kernel to include with the app, but it's clear they don't want to do this or it would require too much work on their end) The project admins are aware of the issue of the cuda101 app being sent to Ampere hosts, and have commented that they are working on a fix to the scheduler. ID: 57456 · Rating: 0 · rate: / Reply Quote

PDW Send message Joined: 7 Mar 14 Posts: 18 Credit: 6,956,875,525 RAC: 4,386 Level Scientific publications	Message 57457 - Posted: 4 Oct 2021, 15:39:31 UTC - in response to Message 57456. I have successfully completed 2 tasks today labelled as cuda101 on separate Ampere hosts, the same hosts also had other cuda101 tasks that failed today as well. Drivers are 470.xx ID: 57457 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1117 Credit: 40,876,970,595 RAC: 0 Level Scientific publications	Message 57459 - Posted: 4 Oct 2021, 17:27:16 UTC - in response to Message 57457. I have successfully completed 2 tasks today labelled as cuda101 on separate Ampere hosts since your hosts are hidden, can you link to these Ampere hosts that processed a cuda101 task? when you say "Ampere host" do you mean that they ONLY have Ampere GPU(s) installed? or perhaps some secondary GPU was older and actually performed the computation? was it one of the Beta cuda101 apps? ID: 57459 · Rating: 0 · rate: / Reply Quote

PDW Send message Joined: 7 Mar 14 Posts: 18 Credit: 6,956,875,525 RAC: 4,386 Level Scientific publications	Message 57465 - Posted: 4 Oct 2021, 19:15:14 UTC - in response to Message 57459. since your hosts are hidden, can you link to these Ampere hosts that processed a cuda101 task? when you say "Ampere host" do you mean that they ONLY have Ampere GPU(s) installed? or perhaps some secondary GPU was older and actually performed the computation? was it one of the Beta cuda101 apps? I only run single GPU systems, unless you count the iGPUs (which I don't use for BOINC) in some Windows machines. I'm only running some of my linux machines with no iGPUs on GPUGrid at the moment, so no secondary GPUs involved. No, it wasn't a Beta cuda101 task or app. I'll show you the more interesting task as it first failed on a 3070 with a newer driver than mine. It is labelled as "New version of ACEMD v2.18 (cuda101)" and it showed as cuda101 whilst running on the client. http://gpugrid.net/workunit.php?wuid=27080654 I also have a third cuda101 task running, nearly 70% completed, on a different machine with only an Ampere card in it. ID: 57465 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 0 Level Scientific publications	Message 57467 - Posted: 4 Oct 2021, 21:34:09 UTC - in response to Message 57441. Did you reboot the host after fixing the file? Looks like BOINC still threw a fault possibly with the zipping feature that may still depend on the expired SSL certificate. You may have to wait for a new BOINC that has been hinted in development for ASAP release. ID: 57467 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1117 Credit: 40,876,970,595 RAC: 0 Level Scientific publications	Message 57469 - Posted: 4 Oct 2021, 22:18:16 UTC - in response to Message 57465. since your hosts are hidden, can you link to these Ampere hosts that processed a cuda101 task? when you say "Ampere host" do you mean that they ONLY have Ampere GPU(s) installed? or perhaps some secondary GPU was older and actually performed the computation? was it one of the Beta cuda101 apps? I only run single GPU systems, unless you count the iGPUs (which I don't use for BOINC) in some Windows machines. I'm only running some of my linux machines with no iGPUs on GPUGrid at the moment, so no secondary GPUs involved. No, it wasn't a Beta cuda101 task or app. I'll show you the more interesting task as it first failed on a 3070 with a newer driver than mine. It is labelled as "New version of ACEMD v2.18 (cuda101)" and it showed as cuda101 whilst running on the client. http://gpugrid.net/workunit.php?wuid=27080654 I also have a third cuda101 task running, nearly 70% completed, on a different machine with only an Ampere card in it. My best guess is it’s some bug/idiosyncrasy on your system where it’s confusing the cuda101 and cuda1121 apps. Maybe it “thinks” it’s running the 101 executable but somehow actually called up the 1121 executable. Just a guess based on the vast majority of 101 failures on that system for the wrong architecture, even after the successful run. ID: 57469 · Rating: 0 · rate: / Reply Quote

PDW Send message Joined: 7 Mar 14 Posts: 18 Credit: 6,956,875,525 RAC: 4,386 Level Scientific publications	Message 57471 - Posted: 4 Oct 2021, 22:51:05 UTC - in response to Message 57469. My best guess is it’s some bug/idiosyncrasy on your system where it’s confusing the cuda101 and cuda1121 apps. Maybe it “thinks” it’s running the 101 executable but somehow actually called up the 1121 executable. Just a guess based on the vast majority of 101 failures on that system for the wrong architecture, even after the successful run. This "bug/idiosyncrasy" is also present on a different host with a different card and different OS ! When a cuda101 task normally fails it has something like this in the stderr file: 08:31:00 (56461): wrapper: running bin/acemd3 (--boinc --device 0) ACEMD failed: Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch) 08:31:01 (56461): bin/acemd3 exited; CPU time 1.217885 So the program that is run is acemd3 (with the parameters), it then fails and exits. The cuda101 that is currently running on a 3080 on a different machine is currently showing acemd3 in 'top' with the parameters. Surely that is the same program ? The progress.log file in the slot folder shows a 3080 running so it knows what hardware it has to work with. ID: 57471 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1117 Credit: 40,876,970,595 RAC: 0 Level Scientific publications	Message 57473 - Posted: 5 Oct 2021, 1:10:41 UTC - in response to Message 57471. It could be a bug in the app too I guess. I just haven’t seen that anywhere else. If you look in the gpugrid.net project folder you’ll find two separate executables for acemd3. Only difference is one is compiled with cuda101 and the other is compiled with cuda1121, according to top, both are referred to as just “acemd3”. I don’t know the exact mechanism that could cause one app to be used in place of another, but since both are present I can imagine it happening. BOINC isn’t the most robust software, lots of bugs, especially in older versions. But to me this makes more sense than some cuda101 tasks randomly working on Ampere when cuda101 has no knowledge of Ampere. ID: 57473 · Rating: 0 · rate: / Reply Quote