Message boards :
News :
Server problems
Message board moderation
Previous · 1 · 2 · 3 · 4
Author | Message |
---|---|
Send message Joined: 19 Mar 14 Posts: 5 Credit: 14,682,787 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The project is back ok now. Try a reset, or re-attaching and it should pick up, as it did for me just now. I had a 'fun' few hours forcing a download of 100Mb- but only got to 60% before the plug was pulled server side.. I see that the previous Cuda 1101 zip file is no longer attached -only Cuda101, as I had wondered if this was a factor -apart from file size and expired certs,but probably not. However the larger Cuda file is still only downloading at ~8KBps, so the unit won't be running for quite a while.. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Just try a normal update, or a retry on any stalled transfers, before going any further. A full project reset shouldn't be needed, and all those extra transfers will just slow down the project recovery for everyone. |
Send message Joined: 22 May 20 Posts: 110 Credit: 115,525,136 RAC: 345 Level ![]() Scientific publications ![]() |
Having worked through your excellent instructions Richard, I finally succeeded in getting the pending uploads through. Forgot to set NNW and thus the manager requested 2 new tasks. However, I didn't bother to fix the new work download issue. Had 2 pending downloads that I aborted. This morning when all was fixed, I repeatedly got the scheduler request completed – downloads stalled message and thus reattached the project. Took less than 1.5 min to download all necessary project files and now I am back to business as usual. Only this approach seemed to solve the above annoying scheduler message. |
Send message Joined: 2 Jul 16 Posts: 338 Credit: 7,987,341,558 RAC: 178,897 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Site is back to normal. No need for funny business or resets. No more Unsecure message when browsing. Uploads/Downloads work as intended. |
![]() Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
there was a problem with the certificate renewal. Now is fine. |
![]() Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
there was a problem with the certificate renewal. Now is fine. |
Send message Joined: 11 Feb 18 Posts: 41 Credit: 579,891,424 RAC: 0 Level ![]() Scientific publications ![]() |
there was a problem with the certificate renewal. Now is fine. Everyone knows it since three days !!! It tooks three days to solve the problem ! Not be surprised, that less ans less users leaves your project. Only the one who race for Formula Boinc, are following. Thank you Richard Hasselgrove for your help. |
Send message Joined: 11 Feb 18 Posts: 41 Credit: 579,891,424 RAC: 0 Level ![]() Scientific publications ![]() |
Just try a normal update, or a retry on any stalled transfers, before going any further. Thank you Richard for all your help.Why you not join as computer scientist, this team ? You are everuwhere, with a very heavy knownledge about Boinc. I write now, after publication of site admin who says " it was a certificate problem". All of us knows it! Only very late reaction from admin ! Best regards from Belgium. (sorry for my english, i try to do my best) ![]() |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks for the kind words. Trying to solve these little problems goes some way to keeping those little grey cells in working order. I did - many years ago - try to put forward the concept of 'technical moderators' as a specialist position within BOINC: people with technical knowledge who could bridge the gap between the mass of volunteers and the project scientists or administrators. There's a need for people who can decipher [5-year old voice on] Mummeeeee - it's not working! [5-year old voice off] and turn it into a technical description of what needs to be tweaked. The idea never took off (the project side couldn't see the need), but I've gone on trying to live the dream. |
![]() ![]() Send message Joined: 21 Nov 16 Posts: 36 Credit: 164,429,114 RAC: 12,554 Level ![]() Scientific publications ![]() |
Richard... and because you try there will be a small quite, wonderful place in heaven for you. Bill F |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Ta. Continuing on our theme of 'things the volunteers have noticed, and project admin might like to take a look at...' We are now running ACEMD3 v2.19, deployed on 10 Nov 2021. The first tasks had a data error, but we've now been running ADRIA_BanditGPCR tasks successfully since Friday 26 November. The apps come in two flavours, cuda 101 and cuda 1121. I'll let the owners of Ampere cards pursue their own private grief, but I'm worried about the rest of us. The machines I run here all have GTX 1660 series cards - all modern and efficient, and fast enough to complete these tasks in under 24 hours. Five cards have returned four tasks each, and all twenty have validated. All machines have tried both cuda101 and cuda1121, and on four out of five cuda1121 is clearly faster than cuda101. The fifth is a bit ambiguous. That means that BOINC should be moving towards issuing cuda1121 preferentially. In fact, it should have reached that point by now - we must have completed well over 100 tasks globally since this version was launched. But all four of my 'clear advantage' machines are currently running cuda101, and only the ambiguous one is trying cuda1121 again. That's the wrong way round. Why? Looking at the details for each of our computers, there's a link for "Application details: Show". That brings up the history for that computer, running each application that its tried. The crucial lines here are 'Number of tasks completed' and 'Average processing rate'. Once 'Number of tasks completed' reaches 11, the server should compare the APRs and preferentially assign the fastest app for new work, when there's a choice. But my hosts are showing zero tasks completed, despite the 'Consecutive valid tasks' count being filled in. If the project-global count of completed tasks (which we can't inspect directly) is also not filling in properly, that would explain the bias towards cuda101. But I can't explain why the stats aren't being recorded properly. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Well, I don't know how it happened, but after writing all that yesterday... Today's rotation has brought me a clean sweep of cuda1121 tasks across all five machines. Coincidence, or a tweak to the server? No way of knowing externally, but it's good news both for the project (the science will be done more quickly) and for the volunteers. |
![]() Send message Joined: 9 Feb 16 Posts: 78 Credit: 656,229,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
there was a problem with the certificate renewal. Now is fine. Nice. How about the problem that the server does not accept result file sizes above approx. 500 MB and instead of crediting them appropriately throws these in the bin? Has this issue been solved as well? Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
How about the problem that the server does not accept result file sizes above approx. 500 MB and instead of crediting them appropriately throws these in the bin? Has this issue been solved as well?Kind of. The present workunits are much shorter, thus their result file is much shorter (~270MB) as well. |
![]() Send message Joined: 9 Feb 16 Posts: 78 Credit: 656,229,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
How about the problem that the server does not accept result file sizes above approx. 500 MB and instead of crediting them appropriately throws these in the bin? Has this issue been solved as well?Kind of. Well, that's not a solid solution to the problem. Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
©2025 Universitat Pompeu Fabra