Server only allows one connection at a time from an IP? 30s cooldown is too short.

Author	Message
Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1117 Credit: 40,876,970,595 RAC: 0 Level Scientific publications	Message 54935 - Posted: 24 May 2020, 15:26:31 UTC I have a gigabit up/down fiber connection at both locations that run my computers (separate external IPs) and experience the same problem at both locations if one system is running the default 30s cooldown. Changing the default cooldown on the project server side to something longer like 5-10mins will largely solve this problem for everyone without the need for each user to run a custom client to work around the problem. Toni, please implement this on the project servers. ID: 54935 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 54936 - Posted: 24 May 2020, 16:12:02 UTC - in response to Message 54935. Ian, what are your current statistics? My two fastest machines are the two Linux boxes, each with 2x GTX 1660 Super or GTX 1660 Ti. I've got 321 valid tasks showing at the moment - since the start of the current run, probably. The runtimes are Max 8,994.80 sec 149 minutes Min 1,180.58 sec 19 minutes Avg 3,114.27 sec 51 minutes I'm guessing your fastest will be better than 19 minutes - maybe we ought to ask Toni to start with a 5 minute delay, and see how we go, before upping it to 10 minutes if we have to? I'm also worrying about what happens if we get more bad batches - these machines spit out the error tasks in just 3 seconds. Blow two of those in succession, and I'm left waiting for the next scheduler contact. ID: 54936 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1117 Credit: 40,876,970,595 RAC: 0 Level Scientific publications	Message 54939 - Posted: 24 May 2020, 16:52:39 UTC - in response to Message 54936. the shortest i've seen on my 2080ti (PL 225W) is about 800s (13.3mins) the longest i've seen on my 2080ti (PL 225W) is about 3200s (53.3mins) the shortest i've seen on my 2070 (PL 150W) is about 1200s (20mins) the longest i've seen on my 2070 (PL 150W) is about 6000s (1.6hrs) They could also allow more than 2 WU per GPU, and increase the max in-progress to reflect that. but really things like bad batches shouldn't be considered for figuring the cooldown IMO. treat that as an edge case. Plan for things to work normally most of the time. ID: 54939 · Rating: 0 · rate: / Reply Quote

RFGuy_KCCO Send message Joined: 13 Feb 14 Posts: 6 Credit: 1,068,161,100 RAC: 0 Level Scientific publications	Message 54968 - Posted: 26 May 2020, 16:30:36 UTC Please fix this issue, as it is clearly causing problems with receiving and sending work for many users. I am setting "No New Work" on this project until the issue is corrected. ID: 54968 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 3,588 Level Scientific publications	Message 54969 - Posted: 26 May 2020, 18:05:25 UTC so far, I have had no connection problems. However, since this afternoon there are many of them. Should be fixed ASAP. ID: 54969 · Rating: 0 · rate: / Reply Quote

Gunnar Hjern Send message Joined: 22 May 20 Posts: 2 Credit: 22,042,067 RAC: 0 Level Scientific publications	Message 54970 - Posted: 26 May 2020, 19:10:59 UTC - in response to Message 54969. Last modified: 26 May 2020, 20:10:45 UTC Hi! I just experienced the same problem: I have two old HP Z220 with GTX-960 and GTX-750Ti, and both were standing still with files that wouldn't be downloaded, and no new tasks as dl was pending on the current task. It didn't help to abort the stalled downloads, or aborting the whole task - it was STILL complaining about those downloads!! :-( In the end it was nothing to do but to hit the "reset project" button on both of the machines, but that resulted in several hundred MB:s of downloading for each one! :-O Now both machines are up and running again - let's see how long it'll last. Hope admins will sort this problem out as soon as possible, before the server lines will be all bogged down. Happy crunching!!! //Gunnar ID: 54970 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 54972 - Posted: 26 May 2020, 22:43:03 UTC - in response to Message 54970. It didn't help to abort the stalled downloads, or aborting the whole task - it was STILL complaining about those downloads!! That's a different problem. These tasks were created before the http->https transition, so they still want to download through http, but that won't succeed. You have to abort the downloads, then restart the BOINC manager, or manually edit the client_state.xml file (see the Warning: bad tasks re-appearing in the download queue thread for details). ID: 54972 · Rating: 0 · rate: / Reply Quote

Gunnar Hjern Send message Joined: 22 May 20 Posts: 2 Credit: 22,042,067 RAC: 0 Level Scientific publications	Message 54973 - Posted: 26 May 2020, 23:43:34 UTC - in response to Message 54972. Thanks for pointing that out! Didn't know about that problem as I'm pretty new on this project, and when the same thing happened on both my computers simultaneous, I thought it was related to this problem. :-) Hope them faulty tasks will be cleaned out from the database asap!! They are effectively locking up my machines and forcing me to reset the project manually. Happy crunching!!! ID: 54973 · Rating: 0 · rate: / Reply Quote