Server problems

Message boards : News : Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
TrevG

Send message
Joined: 19 Mar 14
Posts: 5
Credit: 14,682,787
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 57954 - Posted: 29 Nov 2021, 10:01:37 UTC - in response to Message 57942.  
Last modified: 29 Nov 2021, 10:10:05 UTC

The project is back ok now.
Try a reset, or re-attaching and it should pick up, as it did for me just now.
I had a 'fun' few hours forcing a download of 100Mb- but only got to 60% before the plug was pulled server side..
I see that the previous Cuda 1101 zip file is no longer attached -only Cuda101, as I had wondered if this was a factor -apart from file size and expired certs,but probably not.
However the larger Cuda file is still only downloading at ~8KBps, so the unit won't be running for quite a while..
ID: 57954 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57955 - Posted: 29 Nov 2021, 10:20:03 UTC - in response to Message 57954.  

Just try a normal update, or a retry on any stalled transfers, before going any further.

A full project reset shouldn't be needed, and all those extra transfers will just slow down the project recovery for everyone.
ID: 57955 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 22 May 20
Posts: 110
Credit: 115,525,136
RAC: 345
Level
Cys
Scientific publications
wat
Message 57956 - Posted: 29 Nov 2021, 10:31:13 UTC - in response to Message 57955.  
Last modified: 29 Nov 2021, 10:32:05 UTC

Having worked through your excellent instructions Richard, I finally succeeded in getting the pending uploads through. Forgot to set NNW and thus the manager requested 2 new tasks. However, I didn't bother to fix the new work download issue. Had 2 pending downloads that I aborted. This morning when all was fixed, I repeatedly got the scheduler request completed – downloads stalled message and thus reattached the project. Took less than 1.5 min to download all necessary project files and now I am back to business as usual. Only this approach seemed to solve the above annoying scheduler message.
ID: 57956 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jul 16
Posts: 338
Credit: 7,987,341,558
RAC: 178,897
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57957 - Posted: 29 Nov 2021, 11:43:13 UTC

Site is back to normal. No need for funny business or resets.
No more Unsecure message when browsing.
Uploads/Downloads work as intended.
ID: 57957 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 57961 - Posted: 29 Nov 2021, 16:25:02 UTC - in response to Message 57957.  

there was a problem with the certificate renewal. Now is fine.
ID: 57961 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 57962 - Posted: 29 Nov 2021, 16:25:07 UTC - in response to Message 57957.  

there was a problem with the certificate renewal. Now is fine.
ID: 57962 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
marsinph

Send message
Joined: 11 Feb 18
Posts: 41
Credit: 579,891,424
RAC: 0
Level
Lys
Scientific publications
wat
Message 57970 - Posted: 29 Nov 2021, 19:52:59 UTC - in response to Message 57961.  

there was a problem with the certificate renewal. Now is fine.



Everyone knows it since three days !!!
It tooks three days to solve the problem !
Not be surprised, that less ans less users leaves your project.
Only the one who race for Formula Boinc, are following.
Thank you Richard Hasselgrove for your help.

ID: 57970 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
marsinph

Send message
Joined: 11 Feb 18
Posts: 41
Credit: 579,891,424
RAC: 0
Level
Lys
Scientific publications
wat
Message 57972 - Posted: 29 Nov 2021, 19:58:34 UTC - in response to Message 57955.  

Just try a normal update, or a retry on any stalled transfers, before going any further.

A full project reset shouldn't be needed, and all those extra transfers will just slow down the project recovery for everyone.



Thank you Richard for all your help.Why you not join as computer scientist, this team ?
You are everuwhere, with a very heavy knownledge about Boinc.
I write now, after publication of site admin who says " it was a certificate problem".
All of us knows it! Only very late reaction from admin !
Best regards from Belgium. (sorry for my english, i try to do my best)



ID: 57972 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57974 - Posted: 29 Nov 2021, 21:16:51 UTC - in response to Message 57972.  

Thanks for the kind words. Trying to solve these little problems goes some way to keeping those little grey cells in working order.

I did - many years ago - try to put forward the concept of 'technical moderators' as a specialist position within BOINC: people with technical knowledge who could bridge the gap between the mass of volunteers and the project scientists or administrators. There's a need for people who can decipher [5-year old voice on] Mummeeeee - it's not working! [5-year old voice off] and turn it into a technical description of what needs to be tweaked.

The idea never took off (the project side couldn't see the need), but I've gone on trying to live the dream.
ID: 57974 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bill F
Avatar

Send message
Joined: 21 Nov 16
Posts: 36
Credit: 164,429,114
RAC: 12,554
Level
Ile
Scientific publications
wat
Message 57977 - Posted: 30 Nov 2021, 3:42:40 UTC - in response to Message 57974.  
Last modified: 30 Nov 2021, 3:43:02 UTC

Richard... and because you try there will be a small quite, wonderful place in heaven for you.

Bill F
ID: 57977 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57979 - Posted: 30 Nov 2021, 14:02:27 UTC

Ta. Continuing on our theme of 'things the volunteers have noticed, and project admin might like to take a look at...'

We are now running ACEMD3 v2.19, deployed on 10 Nov 2021. The first tasks had a data error, but we've now been running ADRIA_BanditGPCR tasks successfully since Friday 26 November.

The apps come in two flavours, cuda 101 and cuda 1121. I'll let the owners of Ampere cards pursue their own private grief, but I'm worried about the rest of us.

The machines I run here all have GTX 1660 series cards - all modern and efficient, and fast enough to complete these tasks in under 24 hours. Five cards have returned four tasks each, and all twenty have validated.

All machines have tried both cuda101 and cuda1121, and on four out of five cuda1121 is clearly faster than cuda101. The fifth is a bit ambiguous.

That means that BOINC should be moving towards issuing cuda1121 preferentially. In fact, it should have reached that point by now - we must have completed well over 100 tasks globally since this version was launched.

But all four of my 'clear advantage' machines are currently running cuda101, and only the ambiguous one is trying cuda1121 again. That's the wrong way round.

Why? Looking at the details for each of our computers, there's a link for "Application details: Show". That brings up the history for that computer, running each application that its tried.

The crucial lines here are 'Number of tasks completed' and 'Average processing rate'. Once 'Number of tasks completed' reaches 11, the server should compare the APRs and preferentially assign the fastest app for new work, when there's a choice.

But my hosts are showing zero tasks completed, despite the 'Consecutive valid tasks' count being filled in. If the project-global count of completed tasks (which we can't inspect directly) is also not filling in properly, that would explain the bias towards cuda101. But I can't explain why the stats aren't being recorded properly.
ID: 57979 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57986 - Posted: 1 Dec 2021, 9:57:21 UTC - in response to Message 57979.  

Well, I don't know how it happened, but after writing all that yesterday...

Today's rotation has brought me a clean sweep of cuda1121 tasks across all five machines. Coincidence, or a tweak to the server? No way of knowing externally, but it's good news both for the project (the science will be done more quickly) and for the volunteers.
ID: 57986 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber

Send message
Joined: 9 Feb 16
Posts: 78
Credit: 656,229,684
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 58006 - Posted: 1 Dec 2021, 23:07:32 UTC - in response to Message 57962.  

there was a problem with the certificate renewal. Now is fine.

Nice.
How about the problem that the server does not accept result file sizes above approx. 500 MB and instead of crediting them appropriately throws these in the bin? Has this issue been solved as well?

Michael.
President of Rechenkraft.net - Germany's first and largest distributed computing organization.
ID: 58006 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58007 - Posted: 1 Dec 2021, 23:15:17 UTC - in response to Message 58006.  

How about the problem that the server does not accept result file sizes above approx. 500 MB and instead of crediting them appropriately throws these in the bin? Has this issue been solved as well?
Kind of.
The present workunits are much shorter, thus their result file is much shorter (~270MB) as well.
ID: 58007 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber

Send message
Joined: 9 Feb 16
Posts: 78
Credit: 656,229,684
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 58013 - Posted: 2 Dec 2021, 1:47:14 UTC - in response to Message 58007.  

How about the problem that the server does not accept result file sizes above approx. 500 MB and instead of crediting them appropriately throws these in the bin? Has this issue been solved as well?
Kind of.
The present workunits are much shorter, thus their result file is much shorter (~270MB) as well.

Well, that's not a solid solution to the problem.

Michael.
President of Rechenkraft.net - Germany's first and largest distributed computing organization.
ID: 58013 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : News : Server problems

©2025 Universitat Pompeu Fabra