Message boards :
Server and website :
SOS-Downloads stuck
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
| Author | Message |
|---|---|
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
On the Computing Preferences tab of the BOINC Options list has up and download limiting. I noticed on some of my systems I set this to less than half what they can push opening the connection and have seen these user-side limited speed connections pause and timeout less if at all. It may be that the university is limiting the bandwidth and one of the triggers is a noticeable spike for a single connection. Doubt it. My max DL speed is 5 Mbps. That can't tax anybody's server. (centurylink monopoly dsl. We don't care, we don't have to. We're the phone company...) |
|
Send message Joined: 5 May 13 Posts: 187 Credit: 349,254,454 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Stefan wrote: You can start by asking the university / campus IT people whether they are doing any form of traffic shaping on incoming connections to servers in the university. If there is some traffic shaping going on, you can tell them your contributors have reported problems downloading files (tasks) from certain servers (grosso??) and ask them to monitor the traffic shaping for incoming connections to your servers. Finally, ask them to report any findings to you and, if you do find we are victim to any bandwidth / number of connections limiting mechanism, start to exercise the fine art of negotiating for "MOAR BANDWIDTH!!" :D
|
|
Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
University staff have a "won't bother looking into it till you prove it" attitude, so right now Jose is running a script from home testing the connection over a few days. Then we can throw the hard cold data at them and tell them to fix it. |
|
Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
Does anyone notice download problems on weekends? |
|
Send message Joined: 9 May 13 Posts: 171 Credit: 4,739,796,466 RAC: 334,273 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Stephan asked Does anyone notice download problems on weekends? Yes Sun 30 Oct 2016 11:45:55 PM CDT | | Project communication failed: attempting access to reference site Sun 30 Oct 2016 11:45:55 PM CDT | GPUGRID | Temporarily failed download of e26s11_e22s4p0f35-SDOERR_CASP11_crystal_ss_20ns_ntl9_0-0-psf_file: transient HTTP error Sun 30 Oct 2016 11:45:55 PM CDT | GPUGRID | Backing off 00:06:21 on download of e26s11_e22s4p0f35-SDOERR_CASP11_crystal_ss_20ns_ntl9_0-0-psf_file Sun 30 Oct 2016 11:45:57 PM CDT | | Internet access OK - project servers may be temporarily down. Sun 30 Oct 2016 04:42:45 PM CDT | | Project communication failed: attempting access to reference site Sun 30 Oct 2016 04:42:45 PM CDT | GPUGRID | Temporarily failed download of e12s17_e4s21p0f210-PABLO_SH2TRIPEP_Q_TRI_2-0-pdb_file: transient HTTP error Sun 30 Oct 2016 04:42:45 PM CDT | GPUGRID | Backing off 00:04:07 on download of e12s17_e4s21p0f210-PABLO_SH2TRIPEP_Q_TRI_2-0-pdb_file Sun 30 Oct 2016 04:42:46 PM CDT | | Internet access OK - project servers may be temporarily down. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Does anyone notice download problems on weekends? Yes, here too. Examples from just one machine: 29-Oct-2016 01:51:04 [GPUGRID] Temporarily failed download of e10s3_e9s8p0f10-SDOERR_CASP11_crystal_contacts_20ns_a3D_0-0-coor_file: transient HTTP error 29-Oct-2016 05:08:31 [GPUGRID] Temporarily failed download of e16s12_e9s18p0f486-GERARD_CXCL12CHALCLD_mol0_2-0-coor_file: transient HTTP error 29-Oct-2016 10:31:01 [GPUGRID] Temporarily failed download of e6s1_e5s2p0f181-SDOERR_CASP11_crystal_ss_50ns_a3D_0-0-pdb_file: transient HTTP error 30-Oct-2016 14:03:47 [GPUGRID] Temporarily failed download of e28s4_e27s3p0f1-SDOERR_CASP11_crystal_ss_20ns_ntl9_1-0-psf_file: transient HTTP error 30-Oct-2016 22:11:57 [GPUGRID] Temporarily failed download of e13s11_e10s4p0f159-SDOERR_CASP11_crystal_ss_contacts_20ns_a3D_1-0-pdb_file: transient HTTP error I do have a copy of Wireshark available and I can try to capture a log, if that would be helpful? |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I do have a copy of Wireshark available and I can try to capture a log, if that would be helpful? You can have a try, but we'll see similar events: some http requests remain unanswered, but we won't know which device blocked/dropped that packet (and why). Perhaps if it's a packet fragmentation issue we'll see something useful in the log. |
|
Send message Joined: 27 Feb 14 Posts: 4 Credit: 121,376,887 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
While I don't think the staff of GPUGrid could do anything about your HTTP timeout problem, out of curiosity I ask you to run a very basic network diagnostics: Nanoprobe's network/setup/config is NOT the issue, I've experienced this issue many times on different machines with different OS and connections. The issue is exactly as he describes, usually the larger file will hangup and some of the smaller ones will finish. Then after timing out the big one will restart and make a small amount of progress and hang again. This is unique to this project, I have no issues elsewhere. It IS on gpugrid's side, not sure if its the project or their provider. People regularly mention download/upload problems around here. Currently experiencing this on win7 64bit and linux 64bit |
|
Send message Joined: 27 Feb 14 Posts: 4 Credit: 121,376,887 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
University staff have a "won't bother looking into it till you prove it" attitude, so right now Jose is running a script from home testing the connection over a few days. Then we can throw the hard cold data at them and tell them to fix it. Tell them to install boinc client and add gpugrid to it. I bet they'll get some hangups. If i have a stalled file transfer, currently I have a stalled libcufft.so.6.5 and after it stalls and times out I can watch my network activity when I retry it. It spikes up but immediately comes back down and stalls, looks like 180deg of a sin wave. I'm not an IT guy, i could do tracerts and what not but I have no idea how to diagnose this as everything points to server side for the following reasons. -This is the only project I have this problem on -Many other people have posted about "transient http error" for over 6 months. this kind of error is almost unheard of on other projects. -But most importantly; veteran crunchers with years of BOINC experience are telling you that they cannot contribute. And what kind of IT department doesn't do the IT work? Sounds like they said we don't want to figure it out, you figure it out. That's pretty messed up. edit: felt I should follow up since I made some progress after I got ranty. I edited my cc_config on a linux machine to do 1 max transfer per project and I was able to get all the files without interruptions. It feels like there's something at play affecting parallel downloads to the same IP/host? |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Nanoprobe's network/setup/config is NOT the issue, ... I'm aware of that. ISP's are doing some traffic shaping (or QoS), which could result in issues like this one. Most probably the campus' ISP (or WAN operator, or IT staff) is to blame. This issue began when there was a change in the network at the campus about a year ago. It was much worse than now in the beginning, but it seems that there is still something which escaped their attention. I've experienced this issue many times on different machines with different OS and connections. The issue is exactly as he describes, usually the larger file will hangup and some of the smaller ones will finish. Then after timing out the big one will restart and make a small amount of progress and hang again. This is probability at work: large files are divided to much more packets than smaller ones, so if a packet gets lost from time to time a larger file has higher probability to get stuck (even many times). This is unique to this project, I have no issues elsewhere. It IS on gpugrid's side, not sure if its the project or their provider. Perhaps GPUGrid's BOINC server log (compared to the user's log) could help in deciding this. |
|
Send message Joined: 14 Mar 10 Posts: 14 Credit: 501,938,373 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hello, The problem is solved for me, I edit my cc_config file as like as caffeineyellow5 said in the second message. If you aren't cc-config.xml, crete a file with notepad "bloc-note" with the following command : <http_transfer_timeout>10</http_transfer_timeout> (I change the value to earn time). And modify the file name to cc_config.xml Put it into C:\ProgramData\BOINC |
|
Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
We sent them our tests which show the timeouts and now they are looking into it. Let's hope we get some news soon. |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hello, The problem is solved for me, I edit my cc_config file as like as caffeineyellow5 said in the second message. Anthony it doesn't really solve the problem, it simply masks it somewhat so that DLs don't hang for hours. BTW, this was first suggested by Richard Haselgrove. A more complete workaround is the one I posted in the 5th message: https://www.gpugrid.net/forum_thread.php?id=4399&nowrap=true#44724 Realize that these are only workarounds and not a real solution. The bad news is that they might tend to hammer the server with more requests than should be necessary if everything was working correctly. Personally, I wouldn't go under 60 for http_transfer_timeout. Also the same DL problem is evident when trying to access long threads on the message board: the thread DL stalls especially on threads with too many graphics (like the crunchathlon thread for instance). Quite irritating. Again, this happens on no other projects except GPUGrid. |
|
Send message Joined: 26 May 10 Posts: 6 Credit: 597,131,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Have you received any new info on this issue? I can concur problem occurs in downloading files with sizes <1MB. In fact, I had to attach a backup project just to keep my gpus working. |
caffeineyellow5Send message Joined: 30 Jul 14 Posts: 225 Credit: 2,658,976,345 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I still suspect a cache/packet size, open active connection limit, timeout, or throttling issues at the University's IT level which stands between the project and the world. |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Downloads still stalling here... |
|
Send message Joined: 17 Mar 10 Posts: 23 Credit: 1,173,824,416 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I too am now wrestling with the download issue, manually trying to force libcufft through... This really reminds me of how the internet worked 20 years ago when it was hopelessly overloaded. I agree with caffeineyellow5 that this all points to some overloaded or malfunctioning network component on the campus causing it to randomly drop packets. This is something the network guys on the campus need to solve... |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
If you have a 'modern' internet connection, you can download the full darn official toolkit for CUDA 8.0 direct from NVidia: https://developer.nvidia.com/cuda-toolkit It's 1.2 GB in total, but only took me about 8 minutes to download - NVidia have good servers and connections. I wouldn't bother installing the whole package: just use an archive manager (I used 7-zip) to pull the file(s) you need from cufft\bin\ For the Windows cufft64_80.dll I get a file size of 145,769,016 bytes (142,353 KB), and an MD5 of fe5ab557e61c775e6eda899a229dd42b - all identical to the file distributed by GPUGrid. I'd need to rename the file from cufft64_80.dll to _cufft64_80.dll, and then drop it into the GPUGrid project folder in the BOINC data directory: click 'retry download' and it should accept that the download is complete. The same procedure should work for other operating systems too, though you may need to mark the file as executable. |
|
Send message Joined: 22 Nov 10 Posts: 4 Credit: 647,970,482 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Downloads are hanging anywhere from 0.00% to 92.71% of the download. I've aborted several transfers that remain hung for an hour and keep cycling between "Download: active" (with nothing downloading), "Download: pending" (ditto), and "Download: retry in {time}." Very frustrating. Unproductive, too. This problem has not occurred in the three other BOINC projects I'm subscribed to. My location: NH, USA. |
|
Send message Joined: 3 Nov 15 Posts: 38 Credit: 6,768,093 RAC: 0 Level ![]() Scientific publications
|
I've aborted several transfers that remain hung for an hour and keep cycling between "Download: active" (with nothing downloading), "Download: pending" (ditto), and "Download: retry in {time}". Next time you can select "Suspend network activity" in the manager, wait a few seconds, and then resume it. This causes the download to pause and close the stalled TCP connection then start a fresh one.
|
©2026 Universitat Pompeu Fabra