Advanced search

Message boards : Server and website : Website unreachable

Author Message
Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49955 - Posted: 19 Jul 2018 | 13:33:09 UTC

The connection has timed out

The server at www.gpugrid.net is taking too long to respond.


I am getting this more and more often, usually every couple of minutes. And trying to set up machines is getting to be impossible. It is hard to even post this.

I suggest that GPUGrid look into it while they still have some users left.

AuxRx
Send message
Joined: 3 Jul 18
Posts: 22
Credit: 2,758,801
RAC: 0
Level
Ala
Scientific publications
wat
Message 49956 - Posted: 19 Jul 2018 | 18:23:24 UTC - in response to Message 49955.

Not an official response, but from another volunteer: I haven't had any issues. Maybe it's locally, with your ISP or a larger routing issue?

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49957 - Posted: 19 Jul 2018 | 20:11:15 UTC - in response to Message 49956.
Last modified: 19 Jul 2018 | 20:32:07 UTC

Not an official response, but from another volunteer: I haven't had any issues. Maybe it's locally, with your ISP or a larger routing issue?

Thanks. I was wondering why there were not more complaints. I don't have problems with other projects, but there must be something about this route. It has been bad for a long time, and getting worse.

(Possibly a change in DNS servers will help.)

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,920,631,959
RAC: 6,336,120
Level
Arg
Scientific publications
watwatwatwatwat
Message 49959 - Posted: 19 Jul 2018 | 21:42:24 UTC - in response to Message 49957.

I've been getting this problem for a while now myself. 1st or 2nd attempt for connection to GPUGrid.net time out. Then the next attempt gets through. If I make a task download request on one system, then invariably the next system to make a request gets a very long request cycle or times out.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49960 - Posted: 19 Jul 2018 | 22:03:48 UTC - in response to Message 49959.

That is it. But my own ISP's DNS servers have been known to be unreliable in the past, so I have now set the OpenDNS servers in my router:

OpenDNS
Primary: 208.67.222.222
Secondary: 208.67.220.220

Thus far, they are connecting without a problem, which is an improvement.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,600,061,851
RAC: 8,791,184
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49961 - Posted: 19 Jul 2018 | 22:35:35 UTC

My experience is like Keith's. I have up to five computers attached to GPUGrid, all on the same home LAN and connected via a single router - so all sharing the same public IP address.

Any one computer can contact the project at full speed. But if two computers try to connect in quick succession, the second can't establish communications. This occurs at the operating system/web server level, not at the BOINC level. Allowing a few minutes pause with no activity allows the next attempt to go through at normal speed.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49962 - Posted: 20 Jul 2018 | 0:09:22 UTC - in response to Message 49960.

That is it. But my own ISP's DNS servers have been known to be unreliable in the past, so I have now set the OpenDNS servers in my router:

OpenDNS
Primary: 208.67.222.222
Secondary: 208.67.220.220

Thus far, they are connecting without a problem, which is an improvement.

You can try google's public DNS servers too:
Primary: 8.8.8.8 Secondary: 8.8.4.4

Or Cloudflare's:
Primary: 1.1.1.1 Secondary: 1.0.0.1

- OR -
Windows users can put GPUGrid's IP address to the the hosts file of the OS, in this way the OS don't have to ask the DNS servers for GPUGrid's IP address. (however you have to change it manually if the IP address changes)
Press Windows key + R
Type, or copy and paste the following:
notepad c:\windows\system32\drivers\etc\hosts

insert the following line at the end of the file:
84.89.134.145 www.gpugrid.net

save and exit from notepad

Probably there's a similar workaround for Linux.

BTW from the description of the error I suspect that this is not a DNS issue, rather session limit / bandwidth problem.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 332
Credit: 3,772,896,065
RAC: 4,765,302
Level
Arg
Scientific publications
watwatwatwatwat
Message 49963 - Posted: 20 Jul 2018 | 2:00:27 UTC

I get this too in the US. Using a proxy server located in the EU immediately solves the timeouts. Turn off the VPN and timeouts again. It's absolutely location based. Watching upload speeds go down and down then spike back up then drop and drop in a continuous loop says there are network issues somewhere crossing the pond.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,600,061,851
RAC: 8,791,184
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49964 - Posted: 20 Jul 2018 | 7:03:08 UTC - in response to Message 49962.

If it was a DNS problem, you would see entries like

27-May-2018 02:10:07 [Einstein@Home] Scheduler request failed: Couldn't resolve host name

in the BOINC Event log. That's the only failure I've had since April, and it wasn't concerning GPUGrid.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49965 - Posted: 20 Jul 2018 | 8:24:32 UTC - in response to Message 49964.
Last modified: 20 Jul 2018 | 9:20:25 UTC

If it was a DNS problem, you would see entries like
27-May-2018 02:10:07 [Einstein@Home] Scheduler request failed: Couldn't resolve host name in the BOINC Event log.

I haven't seen that, but I wasn't looking for it when I had the failures, and have rebooted since, so that log is gone (maybe saved?). But since switching to OpenDNS, website access has been much faster and more reliable. But I still got one failure, so I created a hosts file entry as Zoltan suggested. It seems to have eliminated the problem entirely. Previously, I could not connect twice in succession, but had to wait a couple of minutes between attempts.

EDIT: The problems that I have noticed are only on my Windows7 64-bit machine where I access the website with Firefox. I do the crunching (both CPU and GPU) on dedicated Ubuntu machines that I access over the LAN, and haven't noticed any hung transfers there in BoincTasks. Maybe any problems there are just hidden, I don't know. But it could be a Windows issue?

EDIT2: The BoincTask logs for my Ubuntu machines are still available after several days, since I don't reboot them often, and I don't see any DNS or other problems there, for whatever that is worth. In fact, it may just be a Firefox timing issue of some sort.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,600,061,851
RAC: 8,791,184
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49966 - Posted: 20 Jul 2018 | 9:25:08 UTC - in response to Message 49965.

Next time it happens, I'll do some digging in the event logs. That probably won't be until we get a Windows GPU workflow running again - I've set 'no new work' except for one test probe for the time being.

I use one machine to view the website, but five to contact the BOINC scheduler for the project. I most commonly have problems contacting the scheduler, but it can affect the website and upload/download servers too.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 332
Credit: 3,772,896,065
RAC: 4,765,302
Level
Arg
Scientific publications
watwatwatwatwat
Message 49967 - Posted: 20 Jul 2018 | 10:43:26 UTC - in response to Message 49965.

If it was a DNS problem, you would see entries like
27-May-2018 02:10:07 [Einstein@Home] Scheduler request failed: Couldn't resolve host name in the BOINC Event log.

I haven't seen that, but I wasn't looking for it when I had the failures, and have rebooted since, so that log is gone (maybe saved?). But since switching to OpenDNS, website access has been much faster and more reliable. But I still got one failure, so I created a hosts file entry as Zoltan suggested. It seems to have eliminated the problem entirely. Previously, I could not connect twice in succession, but had to wait a couple of minutes between attempts.

EDIT: The problems that I have noticed are only on my Windows7 64-bit machine where I access the website with Firefox. I do the crunching (both CPU and GPU) on dedicated Ubuntu machines that I access over the LAN, and haven't noticed any hung transfers there in BoincTasks. Maybe any problems there are just hidden, I don't know. But it could be a Windows issue?

EDIT2: The BoincTask logs for my Ubuntu machines are still available after several days, since I don't reboot them often, and I don't see any DNS or other problems there, for whatever that is worth. In fact, it may just be a Firefox timing issue of some sort.


Its not FF as the website hangs for me too with Chrome. And the variable upload speeds I see are from Ubuntu.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,920,631,959
RAC: 6,336,120
Level
Arg
Scientific publications
watwatwatwatwat
Message 49971 - Posted: 20 Jul 2018 | 21:42:48 UTC

My symptoms are exactly as Richard described. I have mostly Linux machines with one Windows 10 machine. I have the problem on all machines so not OS specific.

I have never seen any BOINC network issues logged even with http_debug set. BOINC never indicates a problem. Just the website is inaccessible to any machine if any other machine has contacted it within a minute recently. The first machine has a normal connection with fast response and downloads. For any subsequent connection on another machine I need to wait for five minutes before attempting connection to get a good connection.

I have multiple DNS resources available to any machine on any connection. Think it is a bandwidth issue from California to GPUGrid.net. Can't even open up another tab to GPUGrid.net and connect on this same computer tab that I am typing this reply and have the other tab connect.

Frustrating.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49972 - Posted: 20 Jul 2018 | 22:56:26 UTC - in response to Message 49971.

It could be one of the negative side effects of discarding net neutrality.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49975 - Posted: 21 Jul 2018 | 1:27:05 UTC - in response to Message 49972.
Last modified: 21 Jul 2018 | 1:28:10 UTC

It could be one of the negative side effects of discarding net neutrality.

Some ISP would have to be monitoring you pretty carefully to distinguish between the first and second attempt. And I am not sure what they would gain. I have 50 Mbps down/10 Mbps up for a flat rate. I don't think they care what it is. At least they haven't tried to charge me more for the second attempt to GPUGrid. Chances are, the problem is more at the project end, but I don't know if you are seeing it in Europe?

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 21,893,126
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 49977 - Posted: 21 Jul 2018 | 7:10:53 UTC - in response to Message 49975.
Last modified: 21 Jul 2018 | 7:11:22 UTC

... but I don't know if you are seeing it in Europe?

I now have tried to open various GPUGRID pages one after the other, in a sequence of a few seconds - no problem here (in Austria) ...

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49979 - Posted: 21 Jul 2018 | 8:15:34 UTC - in response to Message 49977.

no problem here (in Austria) ...

I wonder if there is more than one problem? Since fixing the DNS issue, I can open multiple web pages here without a problem that I see. There may be something slow in getting across the Atlantic though that does not appear in Europe.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49980 - Posted: 21 Jul 2018 | 8:27:18 UTC - in response to Message 49975.

It could be one of the negative side effects of discarding net neutrality.

Some ISP would have to be monitoring you pretty carefully to distinguish between the first and second attempt. And I am not sure what they would gain. I have 50 Mbps down/10 Mbps up for a flat rate.
I suggest you to use http://speedtest.net and choose a server in Barcelona to test your connection speed (at different times of a day).

I don't think they care what it is. At least they haven't tried to charge me more for the second attempt to GPUGrid.
That's what a flat rate is about. But if they prioritize traffic from Netflix, or Amazon or whatever content provider which pays for them, traffic of other parties will suffer. International / Intercontinental data traffic costs a lot (for your ISP), because to build the data lines on the bottom of the ocean have cost a lot, also they have limited transfer speed (compared to the number of computers connected through these lines); so your ISP do care about to where their customers connect in the world (called traffic shaping and QoS Quality of Service).

Chances are, the problem is more at the project end, but I don't know if you are seeing it in Europe?
Rarely, but it's because we're closer (geographically; also IT wise: there are less IT equipment connecting GPUGrid's server and my computers than computers on other continents).

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49981 - Posted: 21 Jul 2018 | 9:03:35 UTC - in response to Message 49980.
Last modified: 21 Jul 2018 | 9:06:26 UTC

Speedtest from Barcelona is no problem, about the same as the U.S.
https://postimg.cc/gallery/1jo3afioi/

I usually have no problem with Europe, since I am not that far (probably less than 200 km) from where the cable lands. I download routinely from CERN at over 1 Mbps for example.

Net neutrality really is not it. That is for the large video services, and I have never seen it even there yet. It is more a theoretical possibility. They would have to charge me more to make any money from it, and they have never tried to do that yet. There is plenty of bandwidth in the U.S. Cutting down on something like GPUGrid would cost them more to monitor than it is worth. About the only exception I can think of is wireless, which is monitored more anyway.

But my problem is solved. It appears that is not the case for everyone, which is why it might be more than just DNS, but I am not enough of a network expert to suggest beyond that. (I am resisting the temptation to say that it is a GDPR check, but probably won't be able to for long.)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49982 - Posted: 21 Jul 2018 | 9:44:34 UTC - in response to Message 49981.

Speedtest from Barcelona is no problem, about the same as the U.S.
That is good news, while it makes the original issue really strange.
However it could be a misconfigured router, which we can't figure out.

Net neutrality really is not it.
Ok, that was only one of many possibilities. It could be any large traffic (on any router in between) which hinders others (for example torrent traffic drastically increases when a new episode of a popular series is released on torrent sites).

I am not enough of a network expert to suggest beyond that.
Neither am I, and even if one of us would be, it couldn't be figured out without analyzing router traffic logs which is available only for the ISP staff.

(I am resisting the temptation to say that it is a GDPR check, but probably won't be able to for long.)
GDPR (beside being a pain in the arse for European companies too) is about handling personal data, not about handling (shaping) data traffic. If the issue would be based on GDPR, it would cause problems with your connection with CERN too.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49983 - Posted: 21 Jul 2018 | 12:13:32 UTC - in response to Message 49982.
Last modified: 21 Jul 2018 | 12:30:51 UTC

GDPR (beside being a pain in the arse for European companies too) is about handling personal data, not about handling (shaping) data traffic.

Humor does not always make it across the pond.

PS - I did not have a speed problem, but a connectivity issue of some sort. So there might be another factor at work for some people.

Also, it appears that some people monitor their upload/downloads much better than I do, and may be catching problems that I don't see. I only saw the obvious problem of not reaching the website, but that is now fixed.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50132 - Posted: 28 Jul 2018 | 23:39:09 UTC
Last modified: 28 Jul 2018 | 23:40:14 UTC

I receive "Website unreachable" messages lately.
At first I thought that the increased traffic caused by the recent problems with the Windows app are behind this. But since the Windows app is working again (almost) normally, I still receive "Website unreachable" messages. I started to investigate by starting an elevated command prompt and pinging www.gpugrid.net continuously (e.g. ping www.gpugrid.net -t).
I've made two observations:
1. while the ping runs in the background, I don't receive "Website unreachable" messages.
2. The ping statistics are the following:

Ping statistics for 84.89.134.145: Packets: Sent = 1273, Received = 1255, Lost = 18 (1% loss), Approximate round trip times in milli-seconds: Minimum = 82ms, Maximum = 136ms, Average = 83ms
I have no idea what could cause this, but those who have such access problems should try to run ping in the background and test the accessibility of the GPUGrid website.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 50133 - Posted: 29 Jul 2018 | 7:39:34 UTC - in response to Message 50132.

Thanks for the data. My speculation is that there already is some preferential routing for well-known high-traffic websites, while gpugrid is not a "well known" one and is throttled.

Fixing global routing is definitely outside of our possibilities. :)

The fact that changing DNS improves the situation is comforting though.
Consider that firefox &c do their own caching of DNS entries.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50134 - Posted: 29 Jul 2018 | 8:23:19 UTC

I let ping run for all night, the statistics show the same:

Ping statistics for 84.89.134.145: Packets: Sent = 28889, Received = 28544, Lost = 345 (1% loss), Approximate round trip times in milli-seconds: Minimum = 82ms, Maximum = 174ms, Average = 85ms

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,600,061,851
RAC: 8,791,184
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50135 - Posted: 29 Jul 2018 | 9:48:25 UTC

While I don't discount the possibility of DNS problems - it's worth checking - I think there are other problems to explore.

I've just got this from my Chrome browser:

This site can’t be reached
www.gpugrid.net took too long to respond.

That was opening a page shortly (less than a minute) after a different machine on my network had reported a completed task. There is no evidence of a timing problem on the reporting machine:

29/07/2018 10:37:11 | GPUGRID | Finished upload of e24s22_e15s67p0f28-PABLO_2IDP_P01106_2_ASNP21P_IDP-0-1-RND3729_0_1
29/07/2018 10:37:11 | GPUGRID | Sending scheduler request: To report completed tasks.
29/07/2018 10:37:14 | GPUGRID | Scheduler request completed
29/07/2018 10:37:14 | GPUGRID | [sched_op] handle_scheduler_reply(): got ack for task e24s22_e15s67p0f28-PABLO_2IDP_P01106_2_ASNP21P_IDP-0-1-RND3729_0

I don't think the browser error message indicates a DNS problem in this case, but I do think it's related to the other machine reporting.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 332
Credit: 3,772,896,065
RAC: 4,765,302
Level
Arg
Scientific publications
watwatwatwatwat
Message 50137 - Posted: 29 Jul 2018 | 16:23:50 UTC

Still location based and its a very old issue. EU is fine but across the pond we get timeouts.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50138 - Posted: 29 Jul 2018 | 16:31:39 UTC - in response to Message 50137.

Still location based and its a very old issue. EU is fine but across the pond we get timeouts.

I don't have the short timeouts that I was getting originally, the DNS change fixed that. But I occasionally get long timeouts (after 30 seconds), so something is happening somewhere.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,920,631,959
RAC: 6,336,120
Level
Arg
Scientific publications
watwatwatwatwat
Message 50143 - Posted: 30 Jul 2018 | 0:33:30 UTC - in response to Message 50135.

While I don't discount the possibility of DNS problems - it's worth checking - I think there are other problems to explore.

I've just got this from my Chrome browser:

This site can’t be reached
www.gpugrid.net took too long to respond.

That was opening a page shortly (less than a minute) after a different machine on my network had reported a completed task. There is no evidence of a timing problem on the reporting machine:

29/07/2018 10:37:11 | GPUGRID | Finished upload of e24s22_e15s67p0f28-PABLO_2IDP_P01106_2_ASNP21P_IDP-0-1-RND3729_0_1
29/07/2018 10:37:11 | GPUGRID | Sending scheduler request: To report completed tasks.
29/07/2018 10:37:14 | GPUGRID | Scheduler request completed
29/07/2018 10:37:14 | GPUGRID | [sched_op] handle_scheduler_reply(): got ack for task e24s22_e15s67p0f28-PABLO_2IDP_P01106_2_ASNP21P_IDP-0-1-RND3729_0

I don't think the browser error message indicates a DNS problem in this case, but I do think it's related to the other machine reporting.

This is the same kind of behavior that I observe. It's almost as if the GPUGrid.net database locks the userid for a short period of time after the first machine reports or accesses that hosts database. The next host that tries to contact the site either to report or access its stats gets the timeout.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,600,061,851
RAC: 8,791,184
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50145 - Posted: 30 Jul 2018 | 7:08:58 UTC - in response to Message 50143.

This is the same kind of behavior that I observe. It's almost as if the GPUGrid.net database locks the userid for a short period of time after the first machine reports or accesses that hosts database. The next host that tries to contact the site either to report or access its stats gets the timeout.

Something like that, except it won't be the database: the problem is 'failure to connect', and the UserID can't be exchanged until there's a connection to communicate over.

It seems to be happier today:

30/07/2018 07:51:19 | GPUGRID | [sched_op] Starting scheduler request
30/07/2018 07:51:20 | GPUGRID | [http] [ID#1] Info: Connected to www.ps3grid.net (84.89.134.145) port 80 (#11712)
30/07/2018 07:51:20 | GPUGRID | [http] [ID#1] Received header from server: Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_auth_gssapi/1.3.1 mod_auth_kerb/5.4 mod_fcgid/2.3.9 PHP/5.4.16 mod_wsgi/3.4 Python/2.7.5

I'll keep my eyes open as we settle back to normal running, but Apache would be blocking IP addresses, if anything.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1284
Credit: 4,920,631,959
RAC: 6,336,120
Level
Arg
Scientific publications
watwatwatwatwat
Message 50161 - Posted: 30 Jul 2018 | 17:44:24 UTC - in response to Message 50145.

This is the same kind of behavior that I observe. It's almost as if the GPUGrid.net database locks the userid for a short period of time after the first machine reports or accesses that hosts database. The next host that tries to contact the site either to report or access its stats gets the timeout.

Something like that, except it won't be the database: the problem is 'failure to connect', and the UserID can't be exchanged until there's a connection to communicate over.

It seems to be happier today:

30/07/2018 07:51:19 | GPUGRID | [sched_op] Starting scheduler request
30/07/2018 07:51:20 | GPUGRID | [http] [ID#1] Info: Connected to www.ps3grid.net (84.89.134.145) port 80 (#11712)
30/07/2018 07:51:20 | GPUGRID | [http] [ID#1] Received header from server: Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_auth_gssapi/1.3.1 mod_auth_kerb/5.4 mod_fcgid/2.3.9 PHP/5.4.16 mod_wsgi/3.4 Python/2.7.5

I'll keep my eyes open as we settle back to normal running, but Apache would be blocking IP addresses, if anything.

Thanks for the comms protocol explanation Richard. I have to agree. The site is much more amenable today. I just hit the servers for work round robin on three machines within 20 seconds of each other and they all got work immediately with no problems contacting the server. The fourth machine also connected but the RTS buffer had been run dry by that time.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,600,061,851
RAC: 8,791,184
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50185 - Posted: 1 Aug 2018 | 7:11:44 UTC

OK, I finally got a BOINC log for the failed connections. As usual, I updated two machines on my network in quick succession, and this is the log from the second.

01/08/2018 07:50:54 | GPUGRID | update requested by user
01/08/2018 07:50:54 | | [http] HTTP_OP::init_get(): http://www.gpugrid.net/notices.php?userid=30277&auth=30277_35c13b5a51da7043408976de34dc6a07
01/08/2018 07:50:54 | | [http] HTTP_OP::libcurl_exec(): ca-bundle set
01/08/2018 07:50:55 | | [http] [ID#0] Info: Connection 13303 seems to be dead!
01/08/2018 07:50:55 | | [http] [ID#0] Info: Closing connection 13303
01/08/2018 07:50:55 | | [http] [ID#0] Info: Connection 13302 seems to be dead!
01/08/2018 07:50:55 | | [http] [ID#0] Info: Closing connection 13302
01/08/2018 07:50:55 | | [http] [ID#0] Info: Trying 84.89.134.145...
01/08/2018 07:50:59 | GPUGRID | sched RPC pending: Requested by user
01/08/2018 07:50:59 | GPUGRID | [sched_op] Starting scheduler request
01/08/2018 07:50:59 | GPUGRID | Sending scheduler request: Requested by user.
01/08/2018 07:50:59 | GPUGRID | Requesting new tasks for NVIDIA GPU and Intel GPU
01/08/2018 07:50:59 | GPUGRID | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
01/08/2018 07:50:59 | GPUGRID | [sched_op] NVIDIA GPU work request: 39307.93 seconds; 0.00 devices
01/08/2018 07:50:59 | GPUGRID | [sched_op] Intel GPU work request: 88550.78 seconds; 1.00 devices
01/08/2018 07:50:59 | GPUGRID | [http] HTTP_OP::init_post(): http://www.ps3grid.net/PS3GRID_cgi/cgi
01/08/2018 07:50:59 | GPUGRID | [http] HTTP_OP::libcurl_exec(): ca-bundle set
01/08/2018 07:50:59 | GPUGRID | [http] [ID#1] Info: Trying 84.89.134.145...
01/08/2018 07:51:16 | | [http] [ID#0] Info: connect to 84.89.134.145 port 80 failed: Timed out
01/08/2018 07:51:16 | | [http] [ID#0] Info: Failed to connect to www.gpugrid.net port 80: Timed out
01/08/2018 07:51:16 | | [http] [ID#0] Info: Closing connection 13304
01/08/2018 07:51:16 | | [http] HTTP error: Couldn't connect to server
01/08/2018 07:51:20 | GPUGRID | [http] [ID#1] Info: connect to 84.89.134.145 port 80 failed: Timed out
01/08/2018 07:51:20 | GPUGRID | [http] [ID#1] Info: Failed to connect to www.ps3grid.net port 80: Timed out
01/08/2018 07:51:20 | GPUGRID | [http] [ID#1] Info: Closing connection 13305
01/08/2018 07:51:20 | GPUGRID | [http] HTTP error: Couldn't connect to server
01/08/2018 07:51:21 | GPUGRID | Scheduler request failed: Couldn't connect to server
01/08/2018 07:51:21 | GPUGRID | Sending scheduler request: Requested by user.
01/08/2018 07:51:21 | GPUGRID | Requesting new tasks for NVIDIA GPU and Intel GPU
01/08/2018 07:51:21 | GPUGRID | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
01/08/2018 07:51:21 | GPUGRID | [sched_op] NVIDIA GPU work request: 39307.93 seconds; 0.00 devices
01/08/2018 07:51:21 | GPUGRID | [sched_op] Intel GPU work request: 88550.78 seconds; 1.00 devices
01/08/2018 07:51:21 | GPUGRID | [http] HTTP_OP::init_post(): https://www.gpugrid.net/PS3GRID_cgi/cgi
01/08/2018 07:51:21 | GPUGRID | [http] HTTP_OP::libcurl_exec(): ca-bundle set
01/08/2018 07:51:21 | GPUGRID | [http] [ID#1] Info: Trying 84.89.134.145...
01/08/2018 07:51:43 | GPUGRID | [http] [ID#1] Info: connect to 84.89.134.145 port 443 failed: Timed out
01/08/2018 07:51:43 | GPUGRID | [http] [ID#1] Info: Failed to connect to www.gpugrid.net port 443: Timed out
01/08/2018 07:51:43 | GPUGRID | [http] [ID#1] Info: Closing connection 13306
01/08/2018 07:51:43 | GPUGRID | [http] HTTP error: Couldn't connect to server
01/08/2018 07:51:43 | GPUGRID | Scheduler request failed: Couldn't connect to server
01/08/2018 07:51:43 | GPUGRID | [sched_op] Deferring communication for 00:01:49
01/08/2018 07:51:43 | GPUGRID | [sched_op] Reason: Scheduler request failed

It's not clear to me - I'll investigate later - why it tried to get notices from the project on both port 80 (http) and port 443 (https). And having a ps3grid.net still in there can't be helping either. I'm out all day, but I'll have a delve this evening.

Again as usual, the automatic retry a couple of minutes later got through without problems:

01/08/2018 07:53:37 | GPUGRID | Requesting new tasks for NVIDIA GPU and Intel GPU
01/08/2018 07:53:38 | GPUGRID | Scheduler request completed: got 0 new tasks

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50230 - Posted: 8 Aug 2018 | 19:31:00 UTC

When I click on too many GPUGrid links in quick succession, I get a Firefox timeout after 20 seconds. It is almost repeatable, but not entirely consistent.

Zalster
Avatar
Send message
Joined: 26 Feb 14
Posts: 211
Credit: 4,496,324,562
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 50366 - Posted: 2 Sep 2018 | 4:35:06 UTC

Looks like the server has gone belly up. Unable to report completed task. Server status hasn't changed much over the last few hours
____________

tullio
Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 50367 - Posted: 2 Sep 2018 | 5:51:18 UTC

All is nominal here.
Tullio

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 21,893,126
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 50368 - Posted: 2 Sep 2018 | 6:26:46 UTC - in response to Message 50366.

Unable to report completed task.

I experienced that 2 days ago - but after about 3-4 hours, all was back to normal.

Post to thread

Message boards : Server and website : Website unreachable

//