Message boards :
News :
ATM
Message board moderation
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 35 · Next
| Author | Message |
|---|---|
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Just BACE tasks affected. I've now aborted 14 tasks and half were credited. really strange, isn't it? What's the criterion for granting credit or not granting credit ??? I now aborted two such BACE tasks which could not upload. For one I got credit, for the other one it said "upload failure" - real junk :-((( 15 hours on a RTX3070 just for NOTHING :-((( |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 731 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I just aborted my too large upload and got some credit for it. Missed the 50% bonus because of holding onto it for too long. Wish I had known that aborting a task may give you credits anyway. Now better now since we are likely to keep running into this situation because they will never fix the Apache misconfiguration. |
|
Send message Joined: 4 Mar 18 Posts: 53 Credit: 2,815,476,011 RAC: 0 Level ![]() Scientific publications
|
When I attempt to Abort two of these WUs, nothing seems to happen at all. Both tasks still show "Uploading" and the Transfers page still shows them at 0%. I have tried "Retry Now" on the Transfers page several times (each) to no avail. Should I instead "Abort Transfer"? |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 731 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Yes, you want to go to the Transfers page, select the tasks that are in upload backoff and "Abort Transfer" |
|
Send message Joined: 13 Apr 15 Posts: 11 Credit: 3,003,712,606 RAC: 2,573 Level ![]() Scientific publications
|
Way to go GPUgrid. Sure are on a roll lately with the complete utter mess ups. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
during last night, two of my machines downloaded and startet "Bace" tasks (first letter in upper case, the following ones in lower case). In contrast to the failing ones before which were calles "BACE" (all upper case). Did someone get the "Bace" before and did they succeed, or was their upload file also too large? The question for me now is: should I abort them? |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 57 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
during last night, two of my machines downloaded and startet "Bace" tasks (first letter in upper case, the following ones in lower case). https://www.gpugrid.net/workunit.php?wuid=27494001 Bace Unit was successful. See above link. If you run them, watch the elapsed time and progress rate, and you will know in a few minutes, how they will go. |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 57 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
during last night, two of my machines downloaded and startet "Bace" tasks (first letter in upper case, the following ones in lower case). See examples: This one is running well and should finish ok in a few hours. I am running two units simultaneously: https://www.gpugrid.net/workunit.php?wuid=27494007 This one was running long and I aborted: https://www.gpugrid.net/workunit.php?wuid=27492188 This one was probably good and I shouldn't have aborted it: https://www.gpugrid.net/workunit.php?wuid=27494031 In the BOINC manager, highlight the unit and click the properties button on the left, and its progress rate will tell you whether its good or not. |
|
Send message Joined: 28 Feb 23 Posts: 35 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
OK, confirmed - it is still the Apache problem. That's weird. I'll get a look. But this shouldn't happen so cancel the BACE (uppercase) runs. I'll have a look on how to do it from here. With the last implentation there shouldn't be such related file-size issues. All bad/bug jobs should be cancelled by now. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This seems like the clearest web advice: https://www.keycdn.com/support/413-request-entity-too-large |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 5,269 Level ![]() Scientific publications
|
OK, confirmed - it is still the Apache problem. this is not something new from this project. this has been a recurring issue from time to time. seems to pop up about every year or so whenever the result files get so large for one reason or another. so don't feel bad if you are unable to find the setting to fix the file size limit. no one else from the project has been able to for the last several years. why are the result files so large? 500+MB. that's the root cause of the issue. do you need the data in these files? if not, why are they being created?
|
|
Send message Joined: 4 May 17 Posts: 15 Credit: 17,444,875,743 RAC: 240 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
This files hold the results from the last run, i.e. sample 1 to 70 to start the next run with sample 71 to 140. There are the checkpoint data. |
|
Send message Joined: 28 Feb 23 Posts: 35 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
OK, confirmed - it is still the Apache problem. The heavy files are the .dcd which technically I don't really need to obtain to perform the final free energy calculation but it's necessary in case something weird is happening and we want to revisit those frames. .dcd files contains the information and coordinates of all the system atoms but uncompressed. Since there are other trajectory formats, such as .xtc, that compress this data resulting in much lower filesizes we asked to implement the fileformat into OpenMM. As far as I know this has been implemented in our lab but needs the final approval of the "higher-ups" to get it running and then modify ATM to process trajectory files with .xtc. Nevertheless, this shouldn't have happened (it run OK in other instances with BACE) and apologise for this. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I resolved my issue by spoofing client_state.xml - I said the over-size file had completed uploading, and that the task was ready to report. The server accepted it as valid. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
"ValueError: Energy is NaN" is back quite often :-( |
|
Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level ![]() Scientific publications ![]() ![]()
|
Looks like the Progress bar has stopped working again, all quickly pegged at 100%. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 731 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I noticed that too. It was working for a while, then today's work it's back to 100% almost immediately. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Remind yourselves of my explanation at message 60315. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I just notice that on one of my machines 2 "BACE" tasks are being processed, plus a third one is in waiting position. Will the same problem happen as before - upload file too large? So should I better delete these 3 BACE ? Edit: also on another machine a BACE is running right now |
|
Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level ![]() Scientific publications ![]()
|
Why? I am uploading. task 33495789 |
©2025 Universitat Pompeu Fabra