Message boards :
Server and website :
Warning: bad tasks re-appearing in the download queue
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
There were a number of bad workunits accidentally created around 13:00 - 14:00 UTC on 21 May. Examples: WU 20154512 WU 20154529 They are timing out, and being resent. Although I'm pretty well acquainted with the tricksy ways of BOINC, I'm finding them hard to cope with. All the file downloads go into persistent 'transient HTTP error' - but it's not transient. One successful way is to: Wait until GPUGrid is idle (all previous tasks reported) Stop BOINC Edit client_state.xml Change https:// to http:// for the affected download files only Save file Restart BOINC but be very careful when editing that file - use text mode only. I'll try some other ways when the machine runs dry again - but downloading the files via a browser didn't work for me this time. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
We seem to have passed the bad batch, but I've got a fair few to deal with - and more will appear as people try to abort them, or in five days time when they timeout again. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 2 Level ![]() Scientific publications
|
I had like 10 of them in a row just hanging out and the system failed over the the backup project. I just nuked em all and it downloaded new work
|
|
Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,412,649,587 RAC: 8,996 Level ![]() Scientific publications ![]() ![]()
|
Not sure how to uniquely identify the defective WUs. But I had dozens sitting in my Transfer list this morning with Project backoffs of 3 to 5 hours. Aborting them also required a reboot to get more WUs to DL and many of those failed to DL. For me GG is going idle. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I think it only needs a BOINC restart to clean them up, rather than a reboot - but I agree, it takes more than just a simple abort. The trouble with aborting / nuking them is that they'll turn up again, like a bad penny. They'll each hang around until they reach their eighth error - which could be another 30 days away. That's why I started this thread in the server area - I hoped I would attract advanced, skilled, users who knew how to edit files safely and efficiently, and help to get these blighters out of the way for good. |
|
Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,412,649,587 RAC: 8,996 Level ![]() Scientific publications ![]() ![]()
|
Can't the entire batch of bad pennies be extirpated from the server side??? Kicking the can down the road is only going to make this worse. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
That's the other reason for posting in the server area - I was hoping Toni would notice and come up with a Cunning Plan. Take a look at host 508381. Reported the last working task at 26 May 2020 | 16:11:12, got three new ones at 26 May 2020 | 16:13:00. Takes less than two minutes to shut down, edit the file, and start again, once you've got the hang of it. Another one bites the dust. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Wait until GPUGrid is idle (all previous tasks reported)I'm not sure that the task generated by a "repaired" task would come using https, so we may do it 10 times. (I've aborted the transfer of these tasks, then restarted the BOINC manager, as it thinks that "some downloads are stalled". It refers to the ones I've aborted.) |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'm not sure that the task generated by a "repaired" task would come using https, so we may do it 10 times. It probably doesn't, but I'm less worried about that. It downloads, it computes, it returns results, and it validates - that's the object of the exercise. Check the examples I posted in the opening post - they won't be troubling us any more. But yours will be coming back, unless cancelled. |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
I tried to update the urls in the database so hopefully reissued results will have the correct url. thanks. |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Also, for curiosity, the reason of the problem is that in some place the non-https gpugrid.org was used. Our certificate covered several domains but not that one. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Also, for curiosity, the reason of the problem is that in some place the non-https gpugrid.org was used. Our certificate covered several domains but not that one. Ah, thanks. If any do leak out (let's hope not), I'll try using that as a fix, and report back. |
©2026 Universitat Pompeu Fabra