SOS-Downloads stuck

Message boards : Server and website : SOS-Downloads stuck
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44861 - Posted: 26 Oct 2016, 17:13:21 UTC - in response to Message 44857.  

On the Computing Preferences tab of the BOINC Options list has up and download limiting. I noticed on some of my systems I set this to less than half what they can push opening the connection and have seen these user-side limited speed connections pause and timeout less if at all. It may be that the university is limiting the bandwidth and one of the triggers is a noticeable spike for a single connection.

Doubt it. My max DL speed is 5 Mbps. That can't tax anybody's server.

(centurylink monopoly dsl. We don't care, we don't have to. We're the phone company...)
ID: 44861 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vagelis Giannadakis

Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 44866 - Posted: 27 Oct 2016, 10:14:58 UTC - in response to Message 44837.  

Stefan wrote:
Is there something we need to take a look at or is it an individual issue? Can someone give me a tl;dr?


You can start by asking the university / campus IT people whether they are doing any form of traffic shaping on incoming connections to servers in the university. If there is some traffic shaping going on, you can tell them your contributors have reported problems downloading files (tasks) from certain servers (grosso??) and ask them to monitor the traffic shaping for incoming connections to your servers. Finally, ask them to report any findings to you and, if you do find we are victim to any bandwidth / number of connections limiting mechanism, start to exercise the fine art of negotiating for "MOAR BANDWIDTH!!" :D
ID: 44866 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 45060 - Posted: 31 Oct 2016, 8:58:43 UTC
Last modified: 31 Oct 2016, 13:06:00 UTC

University staff have a "won't bother looking into it till you prove it" attitude, so right now Jose is running a script from home testing the connection over a few days. Then we can throw the hard cold data at them and tell them to fix it.
ID: 45060 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 45063 - Posted: 31 Oct 2016, 13:08:34 UTC

Does anyone notice download problems on weekends?
ID: 45063 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
captainjack

Send message
Joined: 9 May 13
Posts: 171
Credit: 4,739,796,466
RAC: 334,273
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45066 - Posted: 31 Oct 2016, 14:27:35 UTC

Stephan asked

Does anyone notice download problems on weekends?


Yes
Sun 30 Oct 2016 11:45:55 PM CDT |  | Project communication failed: attempting access to reference site
Sun 30 Oct 2016 11:45:55 PM CDT | GPUGRID | Temporarily failed download of e26s11_e22s4p0f35-SDOERR_CASP11_crystal_ss_20ns_ntl9_0-0-psf_file: transient HTTP error
Sun 30 Oct 2016 11:45:55 PM CDT | GPUGRID | Backing off 00:06:21 on download of e26s11_e22s4p0f35-SDOERR_CASP11_crystal_ss_20ns_ntl9_0-0-psf_file
Sun 30 Oct 2016 11:45:57 PM CDT |  | Internet access OK - project servers may be temporarily down.


Sun 30 Oct 2016 04:42:45 PM CDT |  | Project communication failed: attempting access to reference site
Sun 30 Oct 2016 04:42:45 PM CDT | GPUGRID | Temporarily failed download of e12s17_e4s21p0f210-PABLO_SH2TRIPEP_Q_TRI_2-0-pdb_file: transient HTTP error
Sun 30 Oct 2016 04:42:45 PM CDT | GPUGRID | Backing off 00:04:07 on download of e12s17_e4s21p0f210-PABLO_SH2TRIPEP_Q_TRI_2-0-pdb_file
Sun 30 Oct 2016 04:42:46 PM CDT |  | Internet access OK - project servers may be temporarily down.

ID: 45066 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45071 - Posted: 31 Oct 2016, 18:04:02 UTC - in response to Message 45063.  

Does anyone notice download problems on weekends?

Yes, here too. Examples from just one machine:
29-Oct-2016 01:51:04 [GPUGRID] Temporarily failed download of e10s3_e9s8p0f10-SDOERR_CASP11_crystal_contacts_20ns_a3D_0-0-coor_file: transient HTTP error

29-Oct-2016 05:08:31 [GPUGRID] Temporarily failed download of e16s12_e9s18p0f486-GERARD_CXCL12CHALCLD_mol0_2-0-coor_file: transient HTTP error

29-Oct-2016 10:31:01 [GPUGRID] Temporarily failed download of e6s1_e5s2p0f181-SDOERR_CASP11_crystal_ss_50ns_a3D_0-0-pdb_file: transient HTTP error

30-Oct-2016 14:03:47 [GPUGRID] Temporarily failed download of e28s4_e27s3p0f1-SDOERR_CASP11_crystal_ss_20ns_ntl9_1-0-psf_file: transient HTTP error

30-Oct-2016 22:11:57 [GPUGRID] Temporarily failed download of e13s11_e10s4p0f159-SDOERR_CASP11_crystal_ss_contacts_20ns_a3D_1-0-pdb_file: transient HTTP error

I do have a copy of Wireshark available and I can try to capture a log, if that would be helpful?
ID: 45071 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45073 - Posted: 31 Oct 2016, 18:38:02 UTC - in response to Message 45071.  

I do have a copy of Wireshark available and I can try to capture a log, if that would be helpful?

You can have a try, but we'll see similar events: some http requests remain unanswered, but we won't know which device blocked/dropped that packet (and why). Perhaps if it's a packet fragmentation issue we'll see something useful in the log.
ID: 45073 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mindcrime

Send message
Joined: 27 Feb 14
Posts: 4
Credit: 121,376,887
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwat
Message 45074 - Posted: 31 Oct 2016, 18:50:05 UTC - in response to Message 44756.  

While I don't think the staff of GPUGrid could do anything about your HTTP timeout problem, out of curiosity I ask you to run a very basic network diagnostics:
If you have a Windows based PC on the same network as your crunching box, please open a command prompt and type

ping www.gpugrid.net -n 100

You can do it on Linux also, but I'm not familiar with its command syntax (the -n 100 parameter tells the ping command to try 100 times).
You'll see a lot of (exactly 100, if everything's going well) messages like:

Reply from 84.89.134.145: bytes=32 time=83ms TTL=49

Then, at the end:

Ping statistics for 84.89.134.145:
    Packets: Sent = 100, Received = 100, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 83ms, Maximum = 88ms, Average = 83ms

These are the actual results of my host, I'm curious about your statistics.
I expect your loss of packets and the round trip times be significantly higher than what I experience.
Unfortunately these numbers do not reveal the device which is responsible for your problem, but I'm quite confident in that it's closer to your end (most probably it's at your ISP) than to the GPUGrid site (in this case much more users would have such difficulties).

You could also try a traceroute command:

tracert www.gpugrid.net

Which gives you a list of the devices between your end and grosso.upf.edu (on which the gpugrid.net project resides).
Perhaps this list could help us to figure out what's wrong. Especially if it gives you very different results when you run it multiple times.
In some cases these errors are simply caused by network congestion (when the ISP has limited bandwidth to certain destinations), but it could depend on the time of the day. On your end however, P2P file sharing applications or appliances, a faulty router/switch could cause such strange errors (but I'm sure in this case there would be problems with other sites as well).



Nanoprobe's network/setup/config is NOT the issue, I've experienced this issue many times on different machines with different OS and connections. The issue is exactly as he describes, usually the larger file will hangup and some of the smaller ones will finish. Then after timing out the big one will restart and make a small amount of progress and hang again. This is unique to this project, I have no issues elsewhere. It IS on gpugrid's side, not sure if its the project or their provider.

People regularly mention download/upload problems around here. Currently experiencing this on win7 64bit and linux 64bit
ID: 45074 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mindcrime

Send message
Joined: 27 Feb 14
Posts: 4
Credit: 121,376,887
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwat
Message 45075 - Posted: 31 Oct 2016, 18:51:56 UTC - in response to Message 45060.  
Last modified: 31 Oct 2016, 19:38:30 UTC

University staff have a "won't bother looking into it till you prove it" attitude, so right now Jose is running a script from home testing the connection over a few days. Then we can throw the hard cold data at them and tell them to fix it.



Tell them to install boinc client and add gpugrid to it. I bet they'll get some hangups.

If i have a stalled file transfer, currently I have a stalled libcufft.so.6.5 and after it stalls and times out I can watch my network activity when I retry it. It spikes up but immediately comes back down and stalls, looks like 180deg of a sin wave.

I'm not an IT guy, i could do tracerts and what not but I have no idea how to diagnose this as everything points to server side for the following reasons.

-This is the only project I have this problem on
-Many other people have posted about "transient http error" for over 6 months. this kind of error is almost unheard of on other projects.
-But most importantly; veteran crunchers with years of BOINC experience are telling you that they cannot contribute.

And what kind of IT department doesn't do the IT work? Sounds like they said we don't want to figure it out, you figure it out. That's pretty messed up.

edit: felt I should follow up since I made some progress after I got ranty.

I edited my cc_config on a linux machine to do 1 max transfer per project and I was able to get all the files without interruptions. It feels like there's something at play affecting parallel downloads to the same IP/host?
ID: 45075 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45077 - Posted: 31 Oct 2016, 19:33:15 UTC - in response to Message 45074.  

Nanoprobe's network/setup/config is NOT the issue, ...

I'm aware of that. ISP's are doing some traffic shaping (or QoS), which could result in issues like this one.
Most probably the campus' ISP (or WAN operator, or IT staff) is to blame.
This issue began when there was a change in the network at the campus about a year ago.
It was much worse than now in the beginning, but it seems that there is still something which escaped their attention.

I've experienced this issue many times on different machines with different OS and connections. The issue is exactly as he describes, usually the larger file will hangup and some of the smaller ones will finish. Then after timing out the big one will restart and make a small amount of progress and hang again.

This is probability at work: large files are divided to much more packets than smaller ones, so if a packet gets lost from time to time a larger file has higher probability to get stuck (even many times).

This is unique to this project, I have no issues elsewhere. It IS on gpugrid's side, not sure if its the project or their provider.

Perhaps GPUGrid's BOINC server log (compared to the user's log) could help in deciding this.
ID: 45077 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>P4G] anthony

Send message
Joined: 14 Mar 10
Posts: 14
Credit: 501,938,373
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45079 - Posted: 31 Oct 2016, 20:07:07 UTC
Last modified: 31 Oct 2016, 20:11:18 UTC

Hello,

The problem is solved for me, I edit my cc_config file as like as caffeineyellow5 said in the second message.

If you aren't cc-config.xml, crete a file with notepad "bloc-note" with the following command :
<http_transfer_timeout>10</http_transfer_timeout>
(I change the value to earn time).
And modify the file name to cc_config.xml

Put it into C:\ProgramData\BOINC
ID: 45079 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 45132 - Posted: 3 Nov 2016, 10:33:46 UTC

We sent them our tests which show the timeouts and now they are looking into it. Let's hope we get some news soon.
ID: 45132 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45232 - Posted: 7 Nov 2016, 15:58:39 UTC - in response to Message 45079.  
Last modified: 7 Nov 2016, 16:11:56 UTC

Hello, The problem is solved for me, I edit my cc_config file as like as caffeineyellow5 said in the second message.

If you aren't cc-config.xml, crete a file with notepad "bloc-note" with the following command :
<http_transfer_timeout>10</http_transfer_timeout>

Put it into C:\ProgramData\BOINC

Anthony it doesn't really solve the problem, it simply masks it somewhat so that DLs don't hang for hours. BTW, this was first suggested by Richard Haselgrove. A more complete workaround is the one I posted in the 5th message:

https://www.gpugrid.net/forum_thread.php?id=4399&nowrap=true#44724

Realize that these are only workarounds and not a real solution. The bad news is that they might tend to hammer the server with more requests than should be necessary if everything was working correctly. Personally, I wouldn't go under 60 for http_transfer_timeout.

Also the same DL problem is evident when trying to access long threads on the message board: the thread DL stalls especially on threads with too many graphics (like the crunchathlon thread for instance). Quite irritating. Again, this happens on no other projects except GPUGrid.
ID: 45232 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Arif Mert Kapicioglu

Send message
Joined: 26 May 10
Posts: 6
Credit: 597,131,550
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45253 - Posted: 12 Nov 2016, 18:19:49 UTC

Have you received any new info on this issue? I can concur problem occurs in downloading files with sizes <1MB. In fact, I had to attach a backup project just to keep my gpus working.
ID: 45253 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 45277 - Posted: 16 Nov 2016, 2:05:25 UTC

I still suspect a cache/packet size, open active connection limit, timeout, or throttling issues at the University's IT level which stands between the project and the world.
ID: 45277 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45283 - Posted: 16 Nov 2016, 18:12:12 UTC

Downloads still stalling here...
ID: 45283 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pvh

Send message
Joined: 17 Mar 10
Posts: 23
Credit: 1,173,824,416
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45811 - Posted: 21 Dec 2016, 12:54:55 UTC

I too am now wrestling with the download issue, manually trying to force libcufft through... This really reminds me of how the internet worked 20 years ago when it was hopelessly overloaded. I agree with caffeineyellow5 that this all points to some overloaded or malfunctioning network component on the campus causing it to randomly drop packets. This is something the network guys on the campus need to solve...
ID: 45811 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45812 - Posted: 21 Dec 2016, 14:08:31 UTC

If you have a 'modern' internet connection, you can download the full darn official toolkit for CUDA 8.0 direct from NVidia:

https://developer.nvidia.com/cuda-toolkit

It's 1.2 GB in total, but only took me about 8 minutes to download - NVidia have good servers and connections.

I wouldn't bother installing the whole package: just use an archive manager (I used 7-zip) to pull the file(s) you need from cufft\bin\

For the Windows cufft64_80.dll I get a file size of 145,769,016 bytes (142,353 KB), and an MD5 of fe5ab557e61c775e6eda899a229dd42b - all identical to the file distributed by GPUGrid.

I'd need to rename the file from cufft64_80.dll to _cufft64_80.dll, and then drop it into the GPUGrid project folder in the BOINC data directory: click 'retry download' and it should accept that the download is complete.

The same procedure should work for other operating systems too, though you may need to mark the file as executable.
ID: 45812 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
LSG

Send message
Joined: 22 Nov 10
Posts: 4
Credit: 647,970,482
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45815 - Posted: 21 Dec 2016, 15:00:31 UTC

Downloads are hanging anywhere from 0.00% to 92.71% of the download. I've aborted several transfers that remain hung for an hour and keep cycling between "Download: active" (with nothing downloading), "Download: pending" (ditto), and "Download: retry in {time}." Very frustrating. Unproductive, too. This problem has not occurred in the three other BOINC projects I'm subscribed to. My location: NH, USA.
ID: 45815 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tomas Brada

Send message
Joined: 3 Nov 15
Posts: 38
Credit: 6,768,093
RAC: 0
Level
Ser
Scientific publications
wat
Message 45818 - Posted: 21 Dec 2016, 17:27:49 UTC - in response to Message 45815.  

I've aborted several transfers that remain hung for an hour and keep cycling between "Download: active" (with nothing downloading), "Download: pending" (ditto), and "Download: retry in {time}".

Next time you can select "Suspend network activity" in the manager, wait a few seconds, and then resume it. This causes the download to pause and close the stalled TCP connection then start a fresh one.
ID: 45818 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Server and website : SOS-Downloads stuck

©2026 Universitat Pompeu Fabra