ATM

Message boards : News : ATM
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 35 · Next

AuthorMessage
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60396 - Posted: 9 May 2023, 16:36:10 UTC - in response to Message 60395.  
Last modified: 9 May 2023, 16:45:18 UTC

Just BACE tasks affected. I've now aborted 14 tasks and half were credited.

really strange, isn't it? What's the criterion for granting credit or not granting credit ???

I now aborted two such BACE tasks which could not upload.
For one I got credit, for the other one it said "upload failure" - real junk :-((( 15 hours on a RTX3070 just for NOTHING :-(((
ID: 60396 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60397 - Posted: 9 May 2023, 17:38:03 UTC
Last modified: 9 May 2023, 17:38:40 UTC

I just aborted my too large upload and got some credit for it. Missed the 50% bonus because of holding onto it for too long.

Wish I had known that aborting a task may give you credits anyway.

Now better now since we are likely to keep running into this situation because they will never fix the Apache misconfiguration.
ID: 60397 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kksplace

Send message
Joined: 4 Mar 18
Posts: 53
Credit: 2,815,476,011
RAC: 0
Level
Phe
Scientific publications
wat
Message 60398 - Posted: 9 May 2023, 19:57:58 UTC

When I attempt to Abort two of these WUs, nothing seems to happen at all. Both tasks still show "Uploading" and the Transfers page still shows them at 0%. I have tried "Retry Now" on the Transfers page several times (each) to no avail. Should I instead "Abort Transfer"?
ID: 60398 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60399 - Posted: 9 May 2023, 22:06:00 UTC - in response to Message 60398.  

Yes, you want to go to the Transfers page, select the tasks that are in upload backoff and "Abort Transfer"
ID: 60399 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bluestang

Send message
Joined: 13 Apr 15
Posts: 11
Credit: 3,003,712,606
RAC: 2,573
Level
Arg
Scientific publications
wat
Message 60400 - Posted: 10 May 2023, 3:01:27 UTC

Way to go GPUgrid. Sure are on a roll lately with the complete utter mess ups.
ID: 60400 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60401 - Posted: 10 May 2023, 4:17:08 UTC
Last modified: 10 May 2023, 4:17:42 UTC

during last night, two of my machines downloaded and startet "Bace" tasks (first letter in upper case, the following ones in lower case).
In contrast to the failing ones before which were calles "BACE" (all upper case).

Did someone get the "Bace" before and did they succeed, or was their upload file also too large?

The question for me now is: should I abort them?
ID: 60401 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 57
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60402 - Posted: 10 May 2023, 6:23:06 UTC - in response to Message 60401.  

during last night, two of my machines downloaded and startet "Bace" tasks (first letter in upper case, the following ones in lower case).
In contrast to the failing ones before which were calles "BACE" (all upper case).

Did someone get the "Bace" before and did they succeed, or was their upload file also too large?

The question for me now is: should I abort them?



https://www.gpugrid.net/workunit.php?wuid=27494001

Bace Unit was successful. See above link. If you run them, watch the elapsed time and progress rate, and you will know in a few minutes, how they will go.



ID: 60402 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 57
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60403 - Posted: 10 May 2023, 6:53:49 UTC - in response to Message 60402.  

during last night, two of my machines downloaded and startet "Bace" tasks (first letter in upper case, the following ones in lower case).
In contrast to the failing ones before which were calles "BACE" (all upper case).

Did someone get the "Bace" before and did they succeed, or was their upload file also too large?

The question for me now is: should I abort them?



https://www.gpugrid.net/workunit.php?wuid=27494001

Bace Unit was successful. See above link. If you run them, watch the elapsed time and progress rate, and you will know in a few minutes, how they will go.





See examples:

This one is running well and should finish ok in a few hours. I am running two units simultaneously:

https://www.gpugrid.net/workunit.php?wuid=27494007


This one was running long and I aborted:

https://www.gpugrid.net/workunit.php?wuid=27492188


This one was probably good and I shouldn't have aborted it:

https://www.gpugrid.net/workunit.php?wuid=27494031


In the BOINC manager, highlight the unit and click the properties button on the left, and its progress rate will tell you whether its good or not.


ID: 60403 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Quico
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 28 Feb 23
Posts: 35
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 60404 - Posted: 10 May 2023, 7:53:02 UTC - in response to Message 60392.  
Last modified: 10 May 2023, 8:13:36 UTC

OK, confirmed - it is still the Apache problem.

Tue 09 May 2023 15:06:15 BST | GPUGRID | [http] [ID#15383] Received header from server: HTTP/1.1 413 Request Entity Too Large
Tue 09 May 2023 15:06:15 BST | GPUGRID | [http] [ID#15383] Received header from server: Date: Tue, 09 May 2023 14:06:15 GMT
Tue 09 May 2023 15:06:15 BST | GPUGRID | [http] [ID#15383] Received header from server: Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips mod_auth_gssapi/1.5.1 mod_auth_kerb/5.4 mod_fcgid/2.3.9 PHP/5.4.16 mod_wsgi/3.4 Python/2.7.5

File (the larger of two) is 754.1 MB (Linux decimal), 719.15 MB (Boinc binary).

At this end, we have two choices:

1) Abort the data transfer, as Ian suggests.
2) Wait 90 days for somebody to find the key to the server closet.

Quico?


That's weird. I'll get a look.
But this shouldn't happen so cancel the BACE (uppercase) runs. I'll have a look on how to do it from here.
With the last implentation there shouldn't be such related file-size issues.

All bad/bug jobs should be cancelled by now.
ID: 60404 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60405 - Posted: 10 May 2023, 8:01:15 UTC - in response to Message 60404.  

This seems like the clearest web advice:

https://www.keycdn.com/support/413-request-entity-too-large
ID: 60405 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 60406 - Posted: 10 May 2023, 12:26:08 UTC - in response to Message 60404.  

OK, confirmed - it is still the Apache problem.

Tue 09 May 2023 15:06:15 BST | GPUGRID | [http] [ID#15383] Received header from server: HTTP/1.1 413 Request Entity Too Large
Tue 09 May 2023 15:06:15 BST | GPUGRID | [http] [ID#15383] Received header from server: Date: Tue, 09 May 2023 14:06:15 GMT
Tue 09 May 2023 15:06:15 BST | GPUGRID | [http] [ID#15383] Received header from server: Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips mod_auth_gssapi/1.5.1 mod_auth_kerb/5.4 mod_fcgid/2.3.9 PHP/5.4.16 mod_wsgi/3.4 Python/2.7.5

File (the larger of two) is 754.1 MB (Linux decimal), 719.15 MB (Boinc binary).

At this end, we have two choices:

1) Abort the data transfer, as Ian suggests.
2) Wait 90 days for somebody to find the key to the server closet.

Quico?


That's weird. I'll get a look.
But this shouldn't happen so cancel the BACE (uppercase) runs. I'll have a look on how to do it from here.
With the last implentation there shouldn't be such related file-size issues.

All bad/bug jobs should be cancelled by now.


this is not something new from this project. this has been a recurring issue from time to time. seems to pop up about every year or so whenever the result files get so large for one reason or another. so don't feel bad if you are unable to find the setting to fix the file size limit. no one else from the project has been able to for the last several years.

why are the result files so large? 500+MB. that's the root cause of the issue. do you need the data in these files? if not, why are they being created?

ID: 60406 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bibi

Send message
Joined: 4 May 17
Posts: 15
Credit: 17,444,875,743
RAC: 240
Level
Trp
Scientific publications
watwatwatwatwat
Message 60407 - Posted: 10 May 2023, 13:09:05 UTC - in response to Message 60406.  

This files hold the results from the last run, i.e. sample 1 to 70 to start the next run with sample 71 to 140. There are the checkpoint data.
ID: 60407 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Quico
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 28 Feb 23
Posts: 35
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 60408 - Posted: 10 May 2023, 14:46:26 UTC - in response to Message 60406.  

OK, confirmed - it is still the Apache problem.

Tue 09 May 2023 15:06:15 BST | GPUGRID | [http] [ID#15383] Received header from server: HTTP/1.1 413 Request Entity Too Large
Tue 09 May 2023 15:06:15 BST | GPUGRID | [http] [ID#15383] Received header from server: Date: Tue, 09 May 2023 14:06:15 GMT
Tue 09 May 2023 15:06:15 BST | GPUGRID | [http] [ID#15383] Received header from server: Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips mod_auth_gssapi/1.5.1 mod_auth_kerb/5.4 mod_fcgid/2.3.9 PHP/5.4.16 mod_wsgi/3.4 Python/2.7.5

File (the larger of two) is 754.1 MB (Linux decimal), 719.15 MB (Boinc binary).

At this end, we have two choices:

1) Abort the data transfer, as Ian suggests.
2) Wait 90 days for somebody to find the key to the server closet.

Quico?


That's weird. I'll get a look.
But this shouldn't happen so cancel the BACE (uppercase) runs. I'll have a look on how to do it from here.
With the last implentation there shouldn't be such related file-size issues.

All bad/bug jobs should be cancelled by now.


this is not something new from this project. this has been a recurring issue from time to time. seems to pop up about every year or so whenever the result files get so large for one reason or another. so don't feel bad if you are unable to find the setting to fix the file size limit. no one else from the project has been able to for the last several years.

why are the result files so large? 500+MB. that's the root cause of the issue. do you need the data in these files? if not, why are they being created?


The heavy files are the .dcd which technically I don't really need to obtain to perform the final free energy calculation but it's necessary in case something weird is happening and we want to revisit those frames. .dcd files contains the information and coordinates of all the system atoms but uncompressed. Since there are other trajectory formats, such as .xtc, that compress this data resulting in much lower filesizes we asked to implement the fileformat into OpenMM. As far as I know this has been implemented in our lab but needs the final approval of the "higher-ups" to get it running and then modify ATM to process trajectory files with .xtc.

Nevertheless, this shouldn't have happened (it run OK in other instances with BACE) and apologise for this.
ID: 60408 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60409 - Posted: 11 May 2023, 9:40:00 UTC - in response to Message 60408.  

I resolved my issue by spoofing client_state.xml - I said the over-size file had completed uploading, and that the task was ready to report. The server accepted it as valid.
ID: 60409 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60410 - Posted: 11 May 2023, 12:42:47 UTC

"ValueError: Energy is NaN" is back quite often :-(

ID: 60410 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 0
Level
Trp
Scientific publications
watwatwat
Message 60411 - Posted: 12 May 2023, 2:36:43 UTC

Looks like the Progress bar has stopped working again, all quickly pegged at 100%.
ID: 60411 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60412 - Posted: 12 May 2023, 6:39:22 UTC - in response to Message 60411.  

I noticed that too. It was working for a while, then today's work it's back to 100% almost immediately.
ID: 60412 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60413 - Posted: 12 May 2023, 7:19:53 UTC

Remind yourselves of my explanation at message 60315.
ID: 60413 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60414 - Posted: 12 May 2023, 11:50:58 UTC
Last modified: 12 May 2023, 11:52:13 UTC

I just notice that on one of my machines 2 "BACE" tasks are being processed, plus a third one is in waiting position.

Will the same problem happen as before - upload file too large? So should I better delete these 3 BACE ?

Edit: also on another machine a BACE is running right now
ID: 60414 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 60415 - Posted: 12 May 2023, 14:16:06 UTC
Last modified: 12 May 2023, 14:16:46 UTC

Why? I am uploading.
task 33495789
ID: 60415 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 35 · Next

Message boards : News : ATM

©2025 Universitat Pompeu Fabra