Author |
Message |
|
Hi.
I can't upload my last Long Runs Result. The big package with 40,44 MB is uploading to 99% and then breaks it up...
Sorry for my bad English... |
|
|
|
I too am having upload problems. 2 short runs that get a tiny amount of data sent, then stop
Server is OUT of disc space. So until new capacity is added or data purged from the server, we are stuck |
|
|
MrJoSend message
Joined: 18 Apr 14 Posts: 43 Credit: 1,192,135,172 RAC: 0 Level
Scientific publications
|
The same with me. 5 machines, none of them can upload..
____________
Regards, Josef
|
|
|
petebeSend message
Joined: 19 Nov 12 Posts: 31 Credit: 1,549,545,867 RAC: 0 Level
Scientific publications
|
Same here - can't upload: "Server is out of disk space". |
|
|
|
same here :-(
|
|
|
|
Three shorts awaiting upload...... |
|
|
|
Same issue here. |
|
|
|
Same here. 3 longs. All the smaller files uploaded but the three 53MB files still waiting. |
|
|
|
Someone please make some space on the server.
2014. 08. 16. 22:21:01 GPUGRID Started upload of e2s356_e1s863f335-SANTI_marsalWTbound2-30-32-RND8784_0_9
2014. 08. 16. 22:21:01 GPUGRID Started upload of I4R44-SDOERR_BARNA5-12-100-RND6142_0_9
2014. 08. 16. 22:21:03 GPUGRID [error] Error reported by file upload server: Server is out of disk space
2014. 08. 16. 22:21:03 GPUGRID [error] Error reported by file upload server: Server is out of disk space
2014. 08. 16. 22:21:03 GPUGRID Temporarily failed upload of e2s356_e1s863f335-SANTI_marsalWTbound2-30-32-RND8784_0_9: transient upload error
2014. 08. 16. 22:21:03 GPUGRID Backing off 40 min 29 sec on upload of e2s356_e1s863f335-SANTI_marsalWTbound2-30-32-RND8784_0_9
2014. 08. 16. 22:21:03 GPUGRID Temporarily failed upload of I4R44-SDOERR_BARNA5-12-100-RND6142_0_9: transient upload error
2014. 08. 16. 22:21:03 GPUGRID Backing off 2 min 15 sec on upload of I4R44-SDOERR_BARNA5-12-100-RND6142_0_9
|
|
|
|
And of course cant get new tasks.
8/16/2014 8:34:07 PM | GPUGRID | Sending scheduler request: Requested by user.
8/16/2014 8:34:07 PM | GPUGRID | Requesting new tasks for CPU and NVIDIA
8/16/2014 8:34:09 PM | GPUGRID | Scheduler request completed: got 0 new tasks
8/16/2014 8:34:09 PM | GPUGRID | No tasks sent
8/16/2014 8:34:09 PM | GPUGRID | No tasks are available for Long runs (8-12 hours on fastest card)
8/16/2014 8:34:09 PM | GPUGRID | This computer has reached a limit on tasks in progress
|
|
|
Mumak Send message
Joined: 7 Dec 12 Posts: 92 Credit: 225,897,225 RAC: 0 Level
Scientific publications
|
The entire staff on vacation? |
|
|
|
Now 4 waiting. No new. I am crunching E@H meantime. |
|
|
TJSend message
Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level
Scientific publications
|
Also upload problems for me, but I got a new task, that is running now. The small files all uploaded the bigger ones don't.
Two old erros on my list from 2013 can be removed from the server Matt, that makes a tiny bit of room....:)
http://www.gpugrid.net/results.php?userid=29115&offset=0&show_names=1&state=5&appid=
____________
Greetings from TJ |
|
|
|
Hi TJ , Yes I see your comp id 163838 got two uploaded at 21:19 & 21:56 UTC, so thus the new. I wonder how they squeezed in? Mybe the fix has started :) |
|
|
|
Yay -- I get to test BOINC's ability to have my 0-resource-share GPU-backup projects kick in again! (My GPU-backup projects: SETI, Einstein, SETIBETA, and Albert)
By the way, it's always good to have backup projects setup, so your GPUs can stay busy, when things like this happen to your main project!
Until then, we just have to wait patiently. So, make sure your GPUs keep busy during the wait!
- Jacob |
|
|
|
Well now this effecting <24hr return bonuses. Still no word from project ? :( |
|
|
|
Well I was able to hit "retry" a couple times on my oldest tasks and was able to get them to upload and get new b4 the 24hr mark. |
|
|
|
I think that involves getting lucky enough to "shove your uploads" into the space that was freed up elsewhere. I too was able to do that to get some of mine to upload. But some are still failing with "[error] Error reported by file upload server: Server is out of disk space"
So... We just have to be patient. I hope somebody can fix it Monday, but MJH mentioned vacations, so, we'll see. Just be patient, and make sure your backup projects can keep the GPUs busy :)
|
|
|
|
9289 GPUGRID 17-08-2014 11:31:59 [error] Error reported by file upload server: can't write file /home/ps3grid/projects/PS3GRID/upload/288/I16R45-SDOERR_BARNA5-13-100-RND1176_1_9: No space left on server
|
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
I installed Windows 8.1 this morning. First job was to install BOINC and connect to GPUGrid. Got a long and it's processing. But...
I have two GPUs, both seen by Device Manager, but I cannot get another WU after many "update" commands.
Is this due to the current problem or must I do something to get a second?
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I emailed Matt and Gianni regarding the disk space problems...
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
I installed Windows 8.1 this morning. First job was to install BOINC and connect to GPUGrid. Got a long and it's processing. But...
I have two GPUs, both seen by Device Manager, but I cannot get another WU after many "update" commands.
Is this due to the current problem or must I do something to get a second?
This is a different problem.
Possibly you can resolve this by creating a line containing <use_all_gpus>1</use_all_gpus> under the <options> section of the cc_config.xml file on your host.
See this post. |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Many thanks Retvari. That hit the spot!! |
|
|
|
Just be patient, and make sure your backup projects can keep the GPUs busy :)
Hopefully Einstein has new enough applications and dont dislike some of my x4 pcie 1.0 slots anymore ^^ they are gpugrid only cruncher so they are build cheapest possible i could get it run here at full speed *hehe* but we will see, we only can wait as mentioned, one machine seemed to upload enough files to get two new Workunits itself over night :)
Edit: no Sir, Einstein still dont like low PCIe bandwidth, BOINCTasks says two 570s and one of it is 50% slower then the other, noooo ^^
____________
DSKAG Austria Research Team: http://www.research.dskag.at
|
|
|
ToniVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level
Scientific publications
|
freed up some space this morning. |
|
|
AedazanSend message
Joined: 8 Apr 11 Posts: 1 Credit: 84,538,560 RAC: 0 Level
Scientific publications
|
Not enough, I am still getting the error. :P
Feel free to free some more ;)
In light of this issue I think I will switch to long runs on my server from now on :D |
|
|
|
Still getting no luck on uploads, and server refuses to give me more tasks to run. |
|
|
|
freed up some space this morning.
Thanks, but we are still unable to upload, due to no space being available. |
|
|
oprSend message
Joined: 24 May 11 Posts: 7 Credit: 93,272,937 RAC: 0 Level
Scientific publications
|
Hello!
So... Some disk is full,right? I'm having biggest part of a short-run-wu waiting uploading here too. Boinc says the transfer is 100% but status remains "sending". I guess it sends it some day?
Regards,opr |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
And I see the server is fast running out of longs... |
|
|
|
Boinc says the transfer is 100% but status remains "sending".
+1, retries every 4-5 hours fail. No space on disk, it reminds me of other DC projects poorly managed.
|
|
|
|
The entire staff on vacation?
I'd expect most of them to be away from the project on Sundays. There might not be any there who know how to free any disk space without creating problems that are even worse.
An additional hard disk for the server is fairly cheap, but would they need to buy something else (such as a hard drive cabinet) in order to allow the server to use it? They might mention the price they would have to pay in order to add a terabyte or so of hard drive space to the server, and then ask for donations to pay for this.
They should, however, tell the server to extend the deadlines of all workunits still in progress by approximately the length of the upload outages.
Downloading workunits MIGHT help free some disk space, so you should probably keep on downloading and running workunits as long as you have enough hard drive space on your computers to store the output files of the workunits - unless the problem on the server gets bad enough to prevent any more downloads.
|
|
|
MrJoSend message
Joined: 18 Apr 14 Posts: 43 Credit: 1,192,135,172 RAC: 0 Level
Scientific publications
|
An additional hard disk for the server is fairly cheap..
It is hard to imagine for me that no one seems to have observed the available disk space for the entire project. Hmm..
____________
Regards, Josef
|
|
|
Misfit Send message
Joined: 23 May 08 Posts: 33 Credit: 610,551,356 RAC: 0 Level
Scientific publications
|
8/17/2014 1:53:41 PM | GPUGRID | [error] Error reported by file upload server: Server is out of disk space
Woohoo! |
|
|
|
I was able to upload my files now, and get new work. The issue might be resolved. |
|
|
|
Thx god =)
____________
DSKAG Austria Research Team: http://www.research.dskag.at
|
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
I was able to upload my files now, and get new work. The issue might be resolved.
Not for me! This morning I found an upload that had stalled at 100%. I did a retry and it is now at 48%.
|
|
|
|
Ehm ok not all units uploaded, full again now O.o
Hopefully something today on monday happens, the deadline is coming closer on some units and a massive resend cos much units will go over 2 1/2 day until report when they "repaired" it O.o
____________
DSKAG Austria Research Team: http://www.research.dskag.at
|
|
|
MrJoSend message
Joined: 18 Apr 14 Posts: 43 Credit: 1,192,135,172 RAC: 0 Level
Scientific publications
|
The issue might be resolved.
Unfortunately, not me
____________
Regards, Josef
|
|
|
|
.....Still have the issue on my two hosts; although overnight one host did manage to report 4 WUs, it is now stalled again. |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
I was able to upload my files now, and get new work. The issue might be resolved.
Not for me! This morning I found an upload that had stalled at 100%. I did a retry and it is now at 48%.
Same WU stalled at 100% and is now back to 49%! |
|
|
|
The issue is not resolved at this time. |
|
|
TJSend message
Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level
Scientific publications
|
Sometimes uploading is working, sometimes not. I got new work and some result where uploaded normally, other way after 24 hours. So there seems to be progress but it is not resolved yet.
____________
Greetings from TJ |
|
|
|
Yes i think always when the right workunits get through to end a sequence, it writes down the result and deletes the big files for example three workunits and freeing up itself some space, where 100users try to poke in, and hopefully the correct ones to get again some other sequences finished ^^ i want my 500M Milestone!! :P
____________
DSKAG Austria Research Team: http://www.research.dskag.at
|
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Well - it's Monday afternoon in Spain and still no reaction. Clearly no-one is minding the shop... |
|
|
|
I think in america it could be still wakeup morning time? Dont know where the staff is located.
____________
DSKAG Austria Research Team: http://www.research.dskag.at
|
|
|
mikeySend message
Joined: 2 Jan 09 Posts: 298 Credit: 6,652,620,188 RAC: 15,180,127 Level
Scientific publications
|
Well - it's Monday afternoon in Spain and still no reaction. Clearly no-one is minding the shop...
Yeah I am out of work and have units to send back, I set my pc's to no new tasks and attached to a different project. |
|
|
|
August in Spain.....
Well - it's Monday afternoon in Spain and still no reaction. Clearly no-one is minding the shop...
Yeah I am out of work and have units to send back, I set my pc's to no new tasks and attached to a different project.
|
|
|
|
Well - it's Monday afternoon in Spain and still no reaction. Clearly no-one is minding the shop...
Yeah I am out of work and have units to send back, I set my pc's to no new tasks and attached to a different project.
There no reason to set No New Tasks. Just let BOINC's transmit-retry-back-off do its job until GPUGrid can complete the transfers, and make sure you are attached to backup-GPU projects. |
|
|
StefanProject administrator Project developer Project tester Project scientist Send message
Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level
Scientific publications
|
We are actually in the shop :) The problem is figuring out where all the disk space has vanished which seems to take some time and then finding the best course of action. But we are definitely working on it. |
|
|
Mumak Send message
Joined: 7 Dec 12 Posts: 92 Credit: 225,897,225 RAC: 0 Level
Scientific publications
|
Will we loose credit for not reporting long tasks within 24h ? |
|
|
MrJoSend message
Joined: 18 Apr 14 Posts: 43 Credit: 1,192,135,172 RAC: 0 Level
Scientific publications
|
The problem is figuring out where all the disk space has vanished
A mysterious disappearance of hard disk space. Most probably extraterrestrial forces were involved. Maybe we will find them with Seti@home ;-)
____________
Regards, Josef
|
|
|
ToniVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level
Scientific publications
|
Believe it or not, dedicated people are working on the overworked server regardless of the time of the day. When the disk is full it takes time even to delete files. |
|
|
|
Thank you Toni. Keep up the good work.
My backup GPU projects kicked in briefly, and everything worked as intended. Now as GPUGrid comes back, and can handle the transfers, BOINC will transition to using it again. All without user intervention.
Again, thank you, for continuing to make GPUGrid run relatively very smoothly! |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Thank you Toni. Keep up the good work.
Here here!
Jacob,
I took your advice and decided to connect to einstein@home. It's now downloading ~50 files at 5 megs each! And given my BOINC settings they look like they're CPU tasks.
What project do you recommend to guarantee GPU tasks?
|
|
|
MrJoSend message
Joined: 18 Apr 14 Posts: 43 Credit: 1,192,135,172 RAC: 0 Level
Scientific publications
|
My backup GPU projects kicked in briefly, and everything worked as intended. Now as GPUGrid comes back, and can handle the transfers, BOINC will transition to using it again. All without user intervention.
Hi Jacob,
can you please tell me how you manage this? Sounds like:
If (GPUGrid fails) thanElse End If
I would be very grateful for your solution.
____________
Regards, Josef
|
|
|
|
My backup GPU projects kicked in briefly, and everything worked as intended. Now as GPUGrid comes back, and can handle the transfers, BOINC will transition to using it again. All without user intervention.
Hi Jacob,
can you please tell me how you manage this? Sounds like:
If (GPUGrid fails) thanElse End If
I would be very grateful for your solution.
......... Pick another project & set the resource share to 0 in Boinc Manager - that way work will only be requested when projects with a greater than 0 resource share are unavailable.
I have Einstein set to resources share of 0 all the time - so when GPUGrid developed this problem my system started requesting a single Einstein GPU task at a time -a resource share of 0 does not download more than one task at a time.
Make sure you edit your Einstien preferences to only use your nVidia GPU - or you may find CPU tasks appearing ........ |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Dont know where the staff is located.
GPUGrid is based in Barcelona, Spain.
|
|
|
|
Believe it or not, dedicated people are working on the overworked server regardless of the time of the day. When the disk is full it takes time even to delete files.
Glad to hear the server is overworked always good news for a project. Why not ask a big company to make a donation of new server for exposure? I think maybe get someone to send begging letters to all computer suppliers on a daily basis like Andy Dufresne did in Shawshank Redemption, excellent film! |
|
|
ToniVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level
Scientific publications
|
An update: we have been freeing up space since Sunday night. The resolution is not instantaneous because:
1. deleting huge amount of files is slow
2. as soon as files are deleted, the freed up space is used by newly uploaded WUs.
So, please don't worry if things take some time to go back to normality.
The increase in required space is partly due to new WU types (e.g. the CPU experiments).
Addition of HD space has been of course considered, but with the current server it is sadly not possible for hardware reasons. |
|
|
|
......... Pick another project & set the resource share to 0 in Boinc Manager - that way work will only be requested when projects with a greater than 0 resource share are unavailable.
I have Einstein set to resources share of 0 all the time - so when GPUGrid developed this problem my system started requesting a single Einstein GPU task at a time -a resource share of 0 does not download more than one task at a time.
Make sure you edit your Einstien preferences to only use your nVidia GPU - or you may find CPU tasks appearing ........
That's right -- To use a "Backup project", make sure it's Resource Share is 0. See, that's a special case -- "0 Resource Share" means "Only ask this project for work if all other projects couldn't give me any, and even then, just give me enough work to keep busy right now, don't build a cache of work."
So, I have Einstein/SETI/Albert/SETIBeta all set to 0 resource share as backup projects, and all allowed to download GPU tasks.
I have tons of CPU projects (so I don't have much chance of running out of work on CPU), but I only have GPUGrid/POEM for GPU projects. And so, basically, I end up never asking those backup projects for work, except when GPUGrid has an issue. And when that happens, my setup works flawlessly to keep GPUs busy with backup project GPU tasks, all without user intervention. |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Pick another project & set the resource share to 0 in Boinc Manager
OK. I have three GPUGrid uploads @ 100%, and one GPUGrid and one Einstein GPU task running.
I've tried but I cannot find how to set the resource share by project in BOINC Manager. Help!!
|
|
|
|
"Resource Share" is a project setting.
http://boinc.berkeley.edu/wiki/preferences
It can be set either at:
- the project's website (and then issue an Update command in BOINC), or
- if you use an account manager (like BOINCStats), you can set it there (and then issue an Update command in BOINC).
It can even be set "per venue/location", but I don't use venues/locations to do my settings. |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
"Resource Share" is a project setting.
Ah! It's not a BOINC Manager setting.
I have GPUGrid @ 100% and Einstein @ 1% (it does not like 0%), so I guess I'm OK for the eventual return to normal of GPUGrid!
|
|
|
|
...and Einstein @ 1% (it does not like 0%)
Can you please explain?.
You should be able to put 0, click "Update preferences", and then it'll show "---" meaning "it is now set as a backup project". |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
...and Einstein @ 1% (it does not like 0%)
Can you please explain?.
You should be able to put 0, click "Update preferences", and then it'll show "---" meaning "it is now set as a backup project".
The first time I tried it it would not let me enter a "0". Just tried it again and it did!! And it now shows --- |
|
|
|
Yes its definate running with 0%, its working here too. I would prefer to run POEM as backup but they have two Problems.
1.) Not enough workunits all the time for a 100% working backupproject, because they need a backupproject itself :D
2.) does not like dual gpu machines, yes i know exclude blabla, but then point 1. is again for one of the GPUs ^^
So its only Einstein for science reason you can use as backup thats really working as it should. Im sure they are happy about some additional hardcore gpus witch are running on this project :D
____________
DSKAG Austria Research Team: http://www.research.dskag.at
|
|
|
BarryAZSend message
Joined: 16 Apr 09 Posts: 163 Credit: 920,927,307 RAC: 4,983 Level
Scientific publications
|
Hmm
I was hoping with the beginning of the work week the disk space issues would be confirmed by a project message and some information regarding a lasting solution shared with people here.
I am still hopeful we'll something along these lines.
I too would revert to POEM GPU's -- but they are exceedingly rare at the moment as they work through application issues.
Instead, I'm running work from Prime Grid or Collatz or Moo. My definitely preference with Nvidia cards is GPU Grid which has been running well over the past several months.
So here's hoping we get some message from the project letting folks know of a projected time frame for resolution.
____________
|
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
So its only Einstein for science reason you can use as backup thats really working as it should.
I'm not so sure. I'm running my last GPUGrid and one Einstein. I set Einstein to "No CPU" and did an "update". So what's this enormous 500meg download?:
There's much more below... |
|
|
|
Seems your getting very much workunits ;) did you set it to 0% and update your client before you accept new work?
____________
DSKAG Austria Research Team: http://www.research.dskag.at
|
|
|
BarryAZSend message
Joined: 16 Apr 09 Posts: 163 Credit: 920,927,307 RAC: 4,983 Level
Scientific publications
|
OK -- I see I missed an earlier message from Toni describing the problem -- my apologies.
I am wondering if it is possible to connect an external drive not on the array and to use it to offload files for archive purposes and thus free up space. It is possible that the server hardware doesn't directly support an external USB or ESATA drive though -- in which case, it would require downing the server (never any fun) and installing a card to provide the interface. Just a thought since the message suggests this is going to be an ongoing issue.
Hmm
I was hoping with the beginning of the work week the disk space issues would be confirmed by a project message and some information regarding a lasting solution shared with people here.
I am still hopeful we'll something along these lines.
I too would revert to POEM GPU's -- but they are exceedingly rare at the moment as they work through application issues.
Instead, I'm running work from Prime Grid or Collatz or Moo. My definitely preference with Nvidia cards is GPU Grid which has been running well over the past several months.
So here's hoping we get some message from the project letting folks know of a projected time frame for resolution.
|
|
|
MrJoSend message
Joined: 18 Apr 14 Posts: 43 Credit: 1,192,135,172 RAC: 0 Level
Scientific publications
|
@Jacob: Thanks, Einstein works now perfekt as a backup-project.
But as with Tomba, there are a lot of WU's now. And the GPU load varies between 0 and 50%, CPU-application is running as well, although I have GPU-only applications clicked in the settings. Einstein does not work the way I'm used to from GPUGrid. Hmm..
____________
Regards, Josef
|
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Seems your getting very much workunits ;) did you set it to 0% and update your client before you accept new work?
Yep. I set it to 0% and did an update.
Those 50+ files are downloaded and they are NOT CPU WUs. Not sure what they are but I'm currently running my final GPUGrid WU and one Einstein.
|
|
|
|
@Jacob: Thanks, Einstein works now perfekt as a backup-project.
But as with Tomba, there are a lot of WU's now. And the GPU load varies between 0 and 50%, CPU-application is running as well, although I have GPU-only applications clicked in the settings. Einstein does not work the way I'm used to from GPUGrid. Hmm..
If you accidentally got tons of work units without having the Resource Share set to 0, but now have it correctly set to 0 Resource Share, you can feel free to abort any unstarted work units, without fear of wasting any work. You can abort started ones too, but it's preferable not to do that, since you'd be throwing away some work that has already been done.
Regarding GPU Usage, yeah, if it's an "OpenCL" task, those only use the GPU at certain times. So, you'll see usage fluctuate often, and maybe even be 0 most of the time, but at least you're trying to keep it busy. That's just the nature of their OpenCL app, which is indeed quite different than a GPUGrid CUDA app.
Just keep the GPUs busy, or at least try :) |
|
|
MrJoSend message
Joined: 18 Apr 14 Posts: 43 Credit: 1,192,135,172 RAC: 0 Level
Scientific publications
|
you can feel free to abort any unstarted work units, without fear of wasting any work.
Done :)
Edit:
Got a much better GPU-utilization with the following app_config.xml in the Einstein-directory:
<app_config>
<app>
<name>einsteinbinary_BRP4G</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.2</cpu_usage>
</gpu_versions>
</app>
<app>
<name>hsgamma_FGRP3</name>
<gpu_versions>
<gpu_usage>0.50</gpu_usage>
<cpu_usage>0.2</cpu_usage>
</gpu_versions>
</app>
</app_config>
____________
Regards, Josef
|
|
|
enelsSend message
Joined: 16 Sep 08 Posts: 9 Credit: 915,807,167 RAC: 0 Level
Scientific publications
|
Seems to be working. |
|
|
|
Edit:
Got a much better GPU-utilization with the following app_config.xml in the Einstein-directory:
<app_config>
<app>
<name>einsteinbinary_BRP4G</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.2</cpu_usage>
</gpu_versions>
</app>
<app>
<name>hsgamma_FGRP3</name>
<gpu_versions>
<gpu_usage>0.50</gpu_usage>
<cpu_usage>0.2</cpu_usage>
</gpu_versions>
</app>
</app_config>
Generally for most tasks, E@H GPU utilization for the most part is not as efficient as GPUGRID. If you want a simpler option, other than the above, which is very useful and may help to address "CPU suffocation" issues that E@H GPU tasks may occur on certain computers, if not video cards, by changing the "cpu usage" value, then follow these simple steps:
1) Open up your "Einstein@Home preferences" page;
2) Scroll down to the bottom of the "default/home/work/school" subsection where you see:
"GPU utilization factor of [Name of] apps DANGEROUS! Only touch this if you are absolutely sure of what you are doing! Wrong setting might even damage your computer! Use solely on your own risk! Min: -1.0 / Max: 1.0 / Default: 1.0, negative values will disable GPU tasks of this type"
3) The number "1" is your default setting of 1 GPU task per 1 GPU card. If you want a higher amount, then change the setting as follows:
0.5 means 2 GPU tasks per GPU card;
0.25 means 4 GPU tasks per GPU card, etc..
Please remember to save the changes, and CPU default usage values for E@H remain unaffected using this method.
Hope this helps and good luck! |
|
|
TJSend message
Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level
Scientific publications
|
So its only Einstein for science reason you can use as backup thats really working as it should. Im sure they are happy about some additional hardcore gpus witch are running on this project :D
Or MilkyWay@home, there multiple GPU's work great too.
____________
Greetings from TJ |
|
|
MrJoSend message
Joined: 18 Apr 14 Posts: 43 Credit: 1,192,135,172 RAC: 0 Level
Scientific publications
|
Everything runs fine again. Thanks to all of the problem solving involved people!
____________
Regards, Josef
|
|
|
|
So its only Einstein for science reason you can use as backup thats really working as it should. Im sure they are happy about some additional hardcore gpus witch are running on this project :D
Or MilkyWay@home, there multiple GPU's work great too.
Im really not using nvidia gpus where DP Performance is needed ;) MW has a 7950 24/7 running from me that must be enough until poem has enough ati workunits ^^
but yes everything is running fine now again, great! :)
____________
DSKAG Austria Research Team: http://www.research.dskag.at
|
|
|
nateSend message
Joined: 6 Jun 11 Posts: 124 Credit: 2,928,865 RAC: 0 Level
Scientific publications
|
There is now plenty of space on the server so no one should be having upload/download problems. Of course, please let us know if any of you still has problems. We still have work to do to ensure this doesn't happen again, but that's for us to worry about. |
|
|
|
Thanx to all the hard work you all did to sort it out !!! |
|
|
mikeySend message
Joined: 2 Jan 09 Posts: 298 Credit: 6,652,620,188 RAC: 15,180,127 Level
Scientific publications
|
Well - it's Monday afternoon in Spain and still no reaction. Clearly no-one is minding the shop...
Yeah I am out of work and have units to send back, I set my pc's to no new tasks and attached to a different project.
There no reason to set No New Tasks. Just let BOINC's transmit-retry-back-off do its job until GPUGrid can complete the transfers, and make sure you are attached to backup-GPU projects.
I did that to avoid problems with the project that IS sending me work and its own deadlines. I don't like units hanging out in my cache just sitting there so keep it pretty short, about a day total. I have now finished the 'other' projects units and am back here again. I KNOW...I am a micro-manager, but it is who I am. |
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
Uploads slowed to a crawl a few hours ago and are getting slower and slower. Other projects are uploading normally. Checked my connection with Ookla Speedtest and the speed is normal. WU uploads are even starting to stall and lose connection:
2422 GPUGRID 10-07-14 22:39 Started upload of 2x2162-NOELIA_5MG-2-3-RND6808_0_9
2423 GPUGRID 10-07-14 22:39 [error] Error reported by file upload server: [2x2162-NOELIA_5MG-2-3-RND6808_0_9] locked by file_upload_handler PID=22012
2424 GPUGRID 10-07-14 22:39 Temporarily failed upload of 2x2162-NOELIA_5MG-2-3-RND6808_0_9: transient upload error
2425 GPUGRID 10-07-14 22:39 Backing off 00:02:50 on upload of 2x2162-NOELIA_5MG-2-3-RND6808_0_9
2426 GPUGRID 10-07-14 22:39 Started upload of 2x2162-NOELIA_5MG-2-3-RND6808_0_9
2427 GPUGRID 10-07-14 22:40 Temporarily failed upload of 2x2162-NOELIA_5MG-2-3-RND6808_0_9: transient HTTP error
2428 GPUGRID 10-07-14 22:40 Backing off 00:06:50 on upload of 2x2162-NOELIA_5MG-2-3-RND6808_0_9
2429 10-07-14 22:40 Project communication failed: attempting access to reference site
2430 10-07-14 22:40 Internet access OK - project servers may be temporarily down.
|
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
Things seem to be normal again. Thanks! |
|
|
|
Yo Guys/Gals. Is there a problem with the server again?
Uploads are maxing at 8 KBps and then stalling.
my log show this error
20/11/2014 6:40:32 PM | GPUGRID | Temporarily failed upload of 20mgx465-NOELIA_20MG2-25-50-RND2603_0_9: transient HTTP error
No problems with the internet from this machine, and WCG tasks have uploaded at 50-80 KBps while GPUGrid has been stuck.
I have 2 tasks struggling to upload at the moment.
HELP!!!
|
|
|
BarryAZSend message
Joined: 16 Apr 09 Posts: 163 Credit: 920,927,307 RAC: 4,983 Level
Scientific publications
|
Been like this for the past hour or so.
1/6/2015 9:45:10 PM | GPUGRID | Backing off 05:36:00 on upload of 1x81-GERARD_CXCL12_LIG10-4-5-RND5536_0_9
1/6/2015 9:46:42 PM | GPUGRID | Started upload of 1x81-GERARD_CXCL12_LIG10-4-5-RND5536_0_9
1/6/2015 9:46:44 PM | GPUGRID | [error] Error reported by file upload server: [1x81-GERARD_CXCL12_LIG10-4-5-RND5536_0_9] locked by file_upload_handler PID=17978
1/6/2015 9:46:44 PM | GPUGRID | Temporarily failed upload of 1x81-GERARD_CXCL12_LIG10-4-5-RND5536_0_9: transient upload error
1/6/2015 9:46:44 PM | GPUGRID | Backing off 04:24:32 on upload of 1x81-GERARD_CXCL12_LIG10-4-5-RND5536_0_9
1/6/2015 9:58:25 PM | GPUGRID | Started upload of 1x81-GERARD_CXCL12_LIG10-4-5-RND5536_0_9
1/6/2015 9:58:27 PM | GPUGRID | [error] Error reported by file upload server: [1x81-GERARD_CXCL12_LIG10-4-5-RND5536_0_9] locked by file_upload_handler PID=17978
1/6/2015 9:58:27 PM | GPUGRID | Temporarily failed upload of 1x81-GERARD_CXCL12_LIG10-4-5-RND5536_0_9: transient upload error
1/6/2015 9:58:27 PM | GPUGRID | Backing off 04:27:53 on upload of 1x81-GERARD_CXCL12_LIG10-4-5-RND5536_0_9
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Sounds similar to this communication/server problem described by mundayweb,
(03. Error code -121 to -130 explained).
ERR_UPLOAD_TRANSIENT -127
First an explanation what transient means. Transient refers to a module that, once loaded into main memory, is expected to remain in memory for a short time.
This is a server error.
The file you are trying to upload is locked on the server. The file_upload_handler put an advisory lock on the file, to prevent other file upload handlers to write to the file.
This can only be fixed by the project.
Extra messages in 5.8 branch of BOINC.
can't open file - Advisory file locking is not guaranteed reliable when used with stream buffered IO.
can't lock file - File Upload Handler can't put an advisory lock on the file to prevent it being overwritten by other FUHs.
Maintenance underway: file uploads are temporarily disabled. - You can't upload as the server is down for maintenance.
http://boincfaq.mundayweb.com/index.php?viewCat=3
I can upload and report work, as can others.
Is this specific to one task and one system?
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help
|
|
|
BarryAZSend message
Joined: 16 Apr 09 Posts: 163 Credit: 920,927,307 RAC: 4,983 Level
Scientific publications
|
It was and it cleared about 2 hours later.
|
|
|
BarryAZSend message
Joined: 16 Apr 09 Posts: 163 Credit: 920,927,307 RAC: 4,983 Level
Scientific publications
|
It appears this problem has resurfaced. Seems the AMD uploads are processing fine, but the Nvidea ones are not:
This so far is on two different systems
2/7/2015 10:23:11 PM | GPUGRID | Started upload of e2s134_e1s85f79-GERARD_BENTRYP_GAAMPCGEN2-0-1-RND0868_0_9
2/7/2015 10:23:12 PM | GPUGRID | [error] Error reported by file upload server: [e2s134_e1s85f79-GERARD_BENTRYP_GAAMPCGEN2-0-1-RND0868_0_9] locked by file_upload_handler PID=27570
2/7/2015 10:23:12 PM | GPUGRID | Temporarily failed upload of e2s134_e1s85f79-GERARD_BENTRYP_GAAMPCGEN2-0-1-RND0868_0_9: transient upload error
2/7/2015 10:23:12 PM | GPUGRID | Backing off 00:21:54 on upload of e2s134_e1s85f79-GERARD_BENTRYP_GAAMPCGEN2-0-1-RND0868_0_9
When I've seen the transient upload error on other projects it seems it is a case of a lack of disk space..
It was and it cleared about 2 hours later.
|
|
|
|
When I've seen the transient upload error on other projects it seems it is a case of a lack of disk space.
When the problem is a lack of disk space, it says so in the event log message. We saw both of these at LHC recently:
Server is out of disk space
can't write file ... No space left on server
There are plenty of other failure modes which aren't reported explicitly, but the cause can be discovered by enabling <http_debug> logging for the duration.
In your case, the single file you wanted to upload was "locked by file_upload_handler PID=27570" - probably a comms glitch hadn't cleared properly. As you found, the process times out after a while, the lock is released, and the upload can proceed normally. It was an individual problem with that one single file, not an indication of server trouble in general. |
|
|
BarryAZSend message
Joined: 16 Apr 09 Posts: 163 Credit: 920,927,307 RAC: 4,983 Level
Scientific publications
|
Richard -- thanks for the reply -- the problem uploads on both computers did clear last night.
I did find it curious that I encountered with two different computers at around the same time.
|
|
|
|
Upload issue.
3/18/2015 8:12:17 PM | GPUGRID | Started upload of e10s60_e8s18f109-NOELIA_27x3-1-2-RND6719_0_0
3/18/2015 8:12:19 PM | GPUGRID | Temporarily failed upload of e10s60_e8s18f109-NOELIA_27x3-1-2-RND6719_0_0: transient HTTP error
3/18/2015 8:12:19 PM | GPUGRID | Backing off 04:37:43 on upload of e10s60_e8s18f109-NOELIA_27x3-1-2-RND6719_0_0
3/18/2015 8:12:22 PM | | Project communication failed: attempting access to reference site
3/18/2015 8:12:23 PM | | Internet access OK - project servers may be temporarily down.
Just Me? |
|
|
BarryAZSend message
Joined: 16 Apr 09 Posts: 163 Credit: 920,927,307 RAC: 4,983 Level
Scientific publications
|
I got that on a couple of systems -- not sure. For me to resolve it I went to the newest version of BOINC -- for some reason that resolved it on both systems.
|
|
|
|
THANX Barry. That did work. Not sure why but was time to upgrade anyway. |
|
|
|
Had the same problem and when I upgraded from 7.0.28 to 7.4.36 everything worked again. |
|
|
|
When I was TS did a project update and noticed in the log something about an error in the certif. maybe related to update to HTTPS ? |
|
|
|
I'm also having upload problems, am on the latest boinc, the lagest file gets to various percentages and then stops.
rebooted pc and managed to upload the large file but still got problems with the small ones.
19/03/2015 07:12:04 | GPUGRID | Temporarily failed upload of e16s55_e1s204f48-NOELIA_1mgx1-1-4-RND3607_0_1: transient HTTP error |
|
|
JugNutSend message
Joined: 27 Nov 11 Posts: 11 Credit: 1,021,749,297 RAC: 0 Level
Scientific publications
|
Yea same here the log message I recieved says...
29741 GPUGRID 19/03/2015 6:30:15 PM Scheduler request failed: Peer certificate cannot be authenticated with given CA certificates
|
|
|
|
are you running windows or Linux jugnut, sound like you have a possible problem with your ca-bundle.crt certificate. are you running the latest version? |
|
|
JugNutSend message
Joined: 27 Nov 11 Posts: 11 Credit: 1,021,749,297 RAC: 0 Level
Scientific publications
|
Yea that was it I updated to the latest boinc version & it's now updating fine :)
(win 7 x64) |
|
|
|
I can not upload any task. HTTP transient error... |
|
|
BarryAZSend message
Joined: 16 Apr 09 Posts: 163 Credit: 920,927,307 RAC: 4,983 Level
Scientific publications
|
I've encountered it now four separate times -- once I lost a completed work unit -- due to a configuration issue with BOINC I suspect.
Perhaps the change to HTTPS is causing a bit of confusion with the uploads.
Again, each time I moved to the current BOINC client, the uploads completed.
|
|
|
|
The only reason for the upload issues is an outdated ca-certificate.
There's no reason to upgrade to a new boinc version. Simply replace the ca-bundle.crt file in the boinc program directory with the one below (extracted from boinc 7.4.36), restart boinc and that's it.
download -> http://www.boincunited.org/ca-bundle.crt
____________
Join BOINC United now! |
|
|
|
My upload starts ok, but after some time (10, 20, 50% progress) it stops. I think it's not related to cert issue. |
|
|
|
The only reason for the upload issues is an outdated ca-certificate.
There's no reason to upgrade to a new boinc version. Simply replace the ca-bundle.crt file in the boinc program directory with the one below (extracted from boinc 7.4.36), restart boinc and that's it.
download -> http://www.boincunited.org/ca-bundle.crt
Thanks - that did the trick for my BOINC v6.12.34 (although I pulled the v7.4.36 bundle from another machine on my own network). Didn't even need to restart the (service-mode) client.
But I think we're not out of the woods yet, because I'm getting multiple re-directs (which may account for the stop/restart problem just reported)
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Info: About to connect() to www.gpugrid.org port 80 (#3)
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Info: Trying 193.146.190.61...
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Info: Connected to www.gpugrid.org (193.146.190.61) port 80 (#3)
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Sent header to server: POST /PS3GRID_cgi/file_upload_handler HTTP/1.1
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Sent header to server: User-Agent: BOINC client (windows_intelx86 6.12.34)
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Sent header to server: Host: www.gpugrid.org
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Sent header to server: Accept: */*
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Sent header to server: Accept-Encoding: deflate, gzip
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Sent header to server: Content-Type: application/x-www-form-urlencoded
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Sent header to server: Content-Length: 318
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Sent header to server:
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Received header from server: HTTP/1.1 301 Moved Permanently
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Received header from server: Date: Thu, 19 Mar 2015 18:05:33 GMT
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Received header from server: Server: Apache/2.2.3 (CentOS)
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Info: the ioctl callback returned 0
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Received header from server: Location: https://www.gpugrid.net/PS3GRID_cgi/file_upload_handler
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Received header from server: Cache-Control: max-age=3600
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Received header from server: Expires: Thu, 19 Mar 2015 19:05:33 GMT
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Received header from server: Content-Length: 343
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Received header from server: Content-Type: text/html; charset=iso-8859-1
19-Mar-2015 18:08:23 [---] [http] [ID#5324] Received header from server:
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Info: Ignoring the response-body
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Info: Connection #3 to host www.gpugrid.org left intact
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Info: Issue another request to this URL: 'https://www.gpugrid.net/PS3GRID_cgi/file_upload_handler'
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Info: Re-using existing connection! (#1) with host www.gpugrid.net
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Info: Connected to www.gpugrid.net (193.146.190.61) port 443 (#1)
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Sent header to server: POST /PS3GRID_cgi/file_upload_handler HTTP/1.1
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Sent header to server: User-Agent: BOINC client (windows_intelx86 6.12.34)
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Sent header to server: Host: www.gpugrid.net
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Sent header to server: Accept: */*
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Sent header to server: Accept-Encoding: deflate, gzip
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Sent header to server: Referer: http://www.gpugrid.org/PS3GRID_cgi/file_upload_handler
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Sent header to server: Content-Type: application/x-www-form-urlencoded
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Sent header to server: Content-Length: 318
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Sent header to server:
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Received header from server: HTTP/1.1 200 OK
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Received header from server: Date: Thu, 19 Mar 2015 18:05:33 GMT
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Received header from server: Server: Apache/2.2.3 (CentOS)
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Received header from server: Cache-Control: max-age=300
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Received header from server: Expires: Thu, 19 Mar 2015 18:10:33 GMT
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Received header from server: Transfer-Encoding: chunked
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Received header from server: Content-Type: text/plain; charset=UTF-8
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Received header from server:
19-Mar-2015 18:08:24 [---] [http] [ID#5324] Info: Connection #1 to host www.gpugrid.net left intact
19-Mar-2015 18:08:25 [---] [http] [ID#5324] Info: Re-using existing connection! (#3) with host www.gpugrid.org
19-Mar-2015 18:08:25 [---] [http] [ID#5324] Info: Connected to www.gpugrid.org (193.146.190.61) port 80 (#3)
19-Mar-2015 18:08:25 [---] [http] [ID#5324] Sent header to server: POST /PS3GRID_cgi/file_upload_handler HTTP/1.1
19-Mar-2015 18:08:25 [---] [http] [ID#5324] Sent header to server: User-Agent: BOINC client (windows_intelx86 6.12.34)
19-Mar-2015 18:08:25 [---] [http] [ID#5324] Sent header to server: Host: www.gpugrid.org
19-Mar-2015 18:08:25 [---] [http] [ID#5324] Sent header to server: Accept: */*
19-Mar-2015 18:08:25 [---] [http] [ID#5324] Sent header to server: Accept-Encoding: deflate, gzip
19-Mar-2015 18:08:25 [---] [http] [ID#5324] Sent header to server: Content-Type: application/x-www-form-urlencoded
19-Mar-2015 18:08:25 [---] [http] [ID#5324] Sent header to server: Content-Length: 411278
19-Mar-2015 18:08:25 [---] [http] [ID#5324] Sent header to server: Expect: 100-continue
19-Mar-2015 18:08:25 [---] [http] [ID#5324] Sent header to server:
19-Mar-2015 18:08:26 [---] [http] [ID#5324] Info: Done waiting for 100-continue
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Received header from server: HTTP/1.1 301 Moved Permanently
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Received header from server: Date: Thu, 19 Mar 2015 18:05:35 GMT
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Received header from server: Server: Apache/2.2.3 (CentOS)
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Info: the ioctl callback returned 0
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Received header from server: Location: https://www.gpugrid.net/PS3GRID_cgi/file_upload_handler
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Received header from server: Cache-Control: max-age=3600
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Received header from server: Expires: Thu, 19 Mar 2015 19:05:35 GMT
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Received header from server: Content-Length: 343
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Received header from server: Connection: close
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Received header from server: Content-Type: text/html; charset=iso-8859-1
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Received header from server:
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Info: Closing connection #3
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Info: Issue another request to this URL: 'https://www.gpugrid.net/PS3GRID_cgi/file_upload_handler'
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Info: Re-using existing connection! (#1) with host www.gpugrid.net
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Info: Connected to www.gpugrid.net (193.146.190.61) port 443 (#1)
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Sent header to server: POST /PS3GRID_cgi/file_upload_handler HTTP/1.1
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Sent header to server: User-Agent: BOINC client (windows_intelx86 6.12.34)
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Sent header to server: Host: www.gpugrid.net
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Sent header to server: Accept: */*
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Sent header to server: Accept-Encoding: deflate, gzip
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Sent header to server: Referer: http://www.gpugrid.org/PS3GRID_cgi/file_upload_handler
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Sent header to server: Content-Type: application/x-www-form-urlencoded
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Sent header to server: Content-Length: 411278
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Sent header to server: Expect: 100-continue
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Sent header to server:
19-Mar-2015 18:08:35 [---] [http] [ID#5324] Received header from server: HTTP/1.1 100 Continue
19-Mar-2015 18:08:45 [---] [http] [ID#5324] Received header from server: HTTP/1.1 200 OK
19-Mar-2015 18:08:45 [---] [http] [ID#5324] Received header from server: Date: Thu, 19 Mar 2015 18:05:44 GMT
19-Mar-2015 18:08:45 [---] [http] [ID#5324] Received header from server: Server: Apache/2.2.3 (CentOS)
19-Mar-2015 18:08:45 [---] [http] [ID#5324] Received header from server: Cache-Control: max-age=300
19-Mar-2015 18:08:45 [---] [http] [ID#5324] Received header from server: Expires: Thu, 19 Mar 2015 18:10:44 GMT
19-Mar-2015 18:08:45 [---] [http] [ID#5324] Received header from server: Transfer-Encoding: chunked
19-Mar-2015 18:08:45 [---] [http] [ID#5324] Received header from server: Content-Type: text/plain; charset=UTF-8
19-Mar-2015 18:08:45 [---] [http] [ID#5324] Received header from server:
19-Mar-2015 18:08:45 [---] [http] [ID#5324] Info: Connection #1 to host www.gpugrid.net left intact |
|
|
|
1) Clients that are configured to use the HTTP presentation of GPUGRID shouldn't now be being redirected to HTTPS (for a while last night they were, just to see what would break)
2) If uploads you are having uploads fail, this is probably because the receiving process on the server will only run for so long. I'll need to know not the %age completion but the walltime that the upload had been processing for.
3) 301 redirects are not a problem - the client knows to follow them.
Matt |
|
|
|
5518 GPUGRID 19.3.2015 20:17:53 Started upload of 754-NOELIA_POT-8-13-RND0792_0_0
5519 GPUGRID 19.3.2015 20:18:00 Finished upload of 754-NOELIA_POT-8-13-RND0792_0_0
5520 GPUGRID 19.3.2015 20:18:06 Started upload of 754-NOELIA_POT-8-13-RND0792_0_1
5521 GPUGRID 19.3.2015 20:18:15 Finished upload of 754-NOELIA_POT-8-13-RND0792_0_1
5522 GPUGRID 19.3.2015 20:18:34 Started upload of 754-NOELIA_POT-8-13-RND0792_0_9
5523 19.3.2015 20:19:09 Project communication failed: attempting access to reference site
5524 GPUGRID 19.3.2015 20:19:09 Temporarily failed upload of 754-NOELIA_POT-8-13-RND0792_0_9: transient HTTP error
5525 GPUGRID 19.3.2015 20:19:09 Backing off 00:04:41 on upload of 754-NOELIA_POT-8-13-RND0792_0_9
5526 19.3.2015 20:19:10 Internet access OK - project servers may be temporarily down.
|
|
|
klepelSend message
Joined: 23 Dec 09 Posts: 189 Credit: 4,736,673,079 RAC: 572,603 Level
Scientific publications
|
Hi, All of my 3 machines have the "transient HTTP error" since yesterday evening. I have not changed anything!
BUT THERE IS NOW A PARTICULAR ERRO MASSEGE: "Scheduler request failed: Peer certificate cannot be authenticated with known CA certificates" DOES ANYBODY HAVE THE SAME PROBLEM?
Log of on is like this:
"19/03/2015 10:07:10 a.m. | GPUGRID | Sending scheduler request: Requested by project.
19/03/2015 10:07:10 a.m. | GPUGRID | Requesting new tasks for NVIDIA
19/03/2015 10:07:12 a.m. | GPUGRID | Scheduler request failed: Peer certificate cannot be authenticated with known CA certificates
19/03/2015 10:07:16 a.m. | | Project communication failed: attempting access to reference site
19/03/2015 10:07:19 a.m. | | Internet access OK - project servers may be temporarily down.
19/03/2015 01:25:04 p.m. | GPUGRID | Started upload of e18s8_e11s4f108-GERARD_CXCL12_LIG22_CGENFF2-0-2-RND4205_0_9
19/03/2015 01:25:04 p.m. | GPUGRID | Started upload of e16s6_e1s31f254-GERARD_CXCL12_Ctl11_GAAMPGAFF1-1-2-RND7394_0_0
19/03/2015 01:25:06 p.m. | GPUGRID | Temporarily failed upload of e18s8_e11s4f108-GERARD_CXCL12_LIG22_CGENFF2-0-2-RND4205_0_9: transient HTTP error
19/03/2015 01:25:06 p.m. | GPUGRID | Backing off 3 hr 29 min 2 sec on upload of e18s8_e11s4f108-GERARD_CXCL12_LIG22_CGENFF2-0-2-RND4205_0_9
19/03/2015 01:25:06 p.m. | GPUGRID | Temporarily failed upload of e16s6_e1s31f254-GERARD_CXCL12_Ctl11_GAAMPGAFF1-1-2-RND7394_0_0: transient HTTP error
19/03/2015 01:25:06 p.m. | GPUGRID | Backing off 23 min 0 sec on upload of e16s6_e1s31f254-GERARD_CXCL12_Ctl11_GAAMPGAFF1-1-2-RND7394_0_0
19/03/2015 01:25:09 p.m. | | Project communication failed: attempting access to reference site
19/03/2015 01:25:11 p.m. | | Internet access OK - project servers may be temporarily down.
19/03/2015 01:37:28 p.m. | GPUGRID | Sending scheduler request: Requested by project.
19/03/2015 01:37:28 p.m. | GPUGRID | Requesting new tasks for NVIDIA
19/03/2015 01:37:30 p.m. | GPUGRID | Scheduler request failed: Peer certificate cannot be authenticated with known CA certificates
19/03/2015 01:37:34 p.m. | | Project communication failed: attempting access to reference site
19/03/2015 01:37:36 p.m. | | Internet access OK - project servers may be temporarily down.
19/03/2015 03:17:52 p.m. | GPUGRID | update requested by user
19/03/2015 03:17:52 p.m. | GPUGRID | Sending scheduler request: Requested by user.
19/03/2015 03:17:52 p.m. | GPUGRID | Requesting new tasks for NVIDIA
19/03/2015 03:17:54 p.m. | GPUGRID | Scheduler request failed: Peer certificate cannot be authenticated with known CA certificates
19/03/2015 03:17:57 p.m. | | Project communication failed: attempting access to reference site
19/03/2015 03:18:01 p.m. | | Internet access OK - project servers may be temporarily down.
19/03/2015 03:19:39 p.m. | GPUGRID | Fetching scheduler list
19/03/2015 03:19:44 p.m. | | Project communication failed: attempting access to reference site
19/03/2015 03:19:46 p.m. | | Internet access OK - project servers may be temporarily down."
Please advice! Thanks! |
|
|
CarlSend message
Joined: 2 May 13 Posts: 8 Credit: 2,996,314,724 RAC: 3,151,552 Level
Scientific publications
|
Yes, I had the same message. I upgraded to the latest version of BOINC and now all is well. |
|
|
BlurfSend message
Joined: 20 Dec 11 Posts: 9 Credit: 120,872,506 RAC: 0 Level
Scientific publications
|
Files uploaded but now I can't report them |
|
|
BarryAZSend message
Joined: 16 Apr 09 Posts: 163 Credit: 920,927,307 RAC: 4,983 Level
Scientific publications
|
Blurf similar problem -- on now the FIFTH of my computers and ONLY with GPUGrid.
On the other four, updated to the newest version of the BOINC client resolved the GPUGrid specific problem.
On this fifth computer -- it was already on that version, a repair of that version didn't resolve the problem, a uninstall reinstall didn't resolve the problem, a replacement of the credentials file didn't resolve that problem.
Since it is specific to GPUGrid (hoping the GPUGrid folks realize that), that particular computer is now running Collatz for GPU processing instead.
Oh a reset of the project didn't solve the problem, a delete the project and reattach didn't solve the problem either.
Seems it is definitely a project specific issue.
Perhaps if it affects everyone it will move to the top of the 'what the heck is going on list'.
That's two reported work units I've lost in the past 24 hours - -and two that the folks at the project are not going to get unfortunately.
I REALLY hope the folks at the project figure this one out. |
|
|
BarryAZSend message
Joined: 16 Apr 09 Posts: 163 Credit: 920,927,307 RAC: 4,983 Level
Scientific publications
|
By the way, I took a ca file from another workstation which was (or at least still is) working, and brought it over to the workstation which doesn't believe in GPUGrid any more.
No joy, said the server was offline (it isn't).
Something clearly is weird with GPU grid and its cert handling or addressing -- in some cases it get's resolved by updating BOINC to the newest version -- though the cert file is identical.
The one 'I don't believe GPUGrid server is alive' workstation uses a GTX-650 -- so I guess it isn't that much of a loss.
Still I wonder what in blazes is going on here.
I'm glad others are reporting the problem as that likely moves it up the 'let's look at this' list.
|
|
|
klepelSend message
Joined: 23 Dec 09 Posts: 189 Credit: 4,736,673,079 RAC: 572,603 Level
Scientific publications
|
The good news first: All my result fill have no up-loaded to the GPUGRID servers!
The bad news is: They still are listed on the web site as running / calculating (in Spanish it is “En progreso”) because there is no communication with the reference site on two (02) of my three (03) computers:
“20/03/2015 09:39:10 a.m. | GPUGRID | update requested by user
20/03/2015 09:39:12 a.m. | GPUGRID | Fetching scheduler list
20/03/2015 09:39:17 a.m. | | Project communication failed: attempting access to reference site
20/03/2015 09:39:21 a.m. | | Internet access OK - project servers may be temporarily down.”
It is very frustrating, that the computers have finished the WUs, I will spend on the electric bill, but finally the Wus will not count as they are not reported as finished. |
|
|
BarryAZSend message
Joined: 16 Apr 09 Posts: 163 Credit: 920,927,307 RAC: 4,983 Level
Scientific publications
|
I am guessing here in the absence of data, just judging from the timing here that something done regarding the shift back and forth from HTTP to HTTPS resulted in 'confusion' regarding the certification of the site (the CA certs).
Even though the cert file is the same as it was a year ago, at least *some* workstation installations see the GPUGrid site as no longer certified (out of date certs).
*Sometimes* when this has happened, particularly if you have a somewhat older version of the BOINC client, updating the BOINC client resolves this.
If you have the current version of the BOINC client and the 'can't connect' problem shows up, it seems like there is no resolution.
However, it might also be a timing thing.
Last evening, when I tried to add the GPUGrid project back in on what was my 5th workstation with the problem (and only one that already had the current BOINC client), I was not successful -- trying each and every approach discussed in this thread.
This morning on this same workstation, I was successful in connecting and downloading. I won't know if reporting works until tomorrow evening as the workstation is running a GTX 650 which has about a 34 hour process time for the large workunits.
It seems though as the problem has been reported by several users and is specific to GPUGrid, that there was some change at the project, which perhaps will clear over time. It isn't clear that folks at the project have the resources to evaluate this. (If it is a persisting problem for increasing user population it might get higher priority I suspect).
|
|
|