Message boards :
News :
ACEMD 4
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
| Author | Message |
|---|---|
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It's reached 50% in 2 hours 7 minutes, so this task is heading for about four and a quarter hours on my GTX 1660 Super. That would have failed without manual intervention. @ Raimondas - you need to consider both speed and size when making adjustments. |
|
Send message Joined: 2 Jul 16 Posts: 338 Credit: 7,987,341,558 RAC: 259 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
File size too big by both users on upload. https://www.gpugrid.net/result.php?resultid=32882663 Just take all the limits and x100000000. Ok, not that much, but its sad tasks error out on artificial limits esp on upload. |
|
Send message Joined: 9 May 13 Posts: 171 Credit: 4,594,296,466 RAC: 171 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thu 14 Apr 2022 09:27:26 AM CDT | GPUGRID | Aborting task T1_GAFF2_frag_00-RAIMIS_NNPMM-0-1-RND6653_1: exceeded elapsed time limit 7231.33 (1000000.00G/138.29G) Thu 14 Apr 2022 09:27:28 AM CDT | GPUGRID | Computation for task T1_GAFF2_frag_00-RAIMIS_NNPMM-0-1-RND6653_1 finished Thu 14 Apr 2022 09:27:28 AM CDT | GPUGRID | Output file T1_GAFF2_frag_00-RAIMIS_NNPMM-0-1-RND6653_1_4 for task T1_GAFF2_frag_00-RAIMIS_NNPMM-0-1-RND6653_1 exceeds size limit. Thu 14 Apr 2022 09:27:28 AM CDT | GPUGRID | File size: 137187308.000000 bytes. Limit: 10000000.000000 bytes |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Still getting elapsed time limit errors. Looks like the estimated GFLOPS was changed but still not enough. exceeded elapsed time limit 2675.08 (1000000.00G/373.82G)</message> exceeded elapsed time limit 1758.43 (1000000.00G/568.69G)</message> exceeded elapsed time limit 1803.39 (1000000.00G/554.51G)</message> |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
bombed out after 25mins and 20% completion on a 3080Ti exceeded elapsed time limit 1538.81
|
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Just got T1_GAFF2_frag_00-RAIMIS_NNPMM-0-1-RND6653_5. Mine don't (usually) start immediately, so I could get to it before it started. Added x1000 to the fpops measures, x100 to the _4 upload size (thanks captainjack). It's running now, should finish overnight. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
got another new task. flops bound had been increased by 10x from previous values (based on previous comments of what the value used to be). however, the max_nbytes of the _4 output file has not been increased at all, so I expect another computation error if the file size ends up too big. computation has already begun, and it's in a system with mixed GPUs so stopping BOINC to edit the size limit and restarting is not a great option, risks restarting on another GPU and insta-error. as far as run behavior: on an RTX 3080Ti ~80% GPU core use ~50% GPU memory bus use ~1% PCIe bus use ~2300MB VRAM used ~265W (with a 300W limit set) not really taking full advantage of the GPU resources.
|
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
called it. ran for 2hrs18mins and errored right after completion T2_NNPMM_frag_01-RAIMIS_NNPMM-0-2-RND4664_0 upload failure: <file_xfer_error> from the event log: Tue 19 Apr 2022 02:13:28 PM EDT | GPUGRID | File size: 20443520.000000 bytes. Limit: 10000000.000000 bytes it's important for the devs to see that these error out rather than fiddling with things on my end to ensure I get credit. otherwise they may be under the impression that it's not a problem.
|
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
it's important for the devs to see that these error out rather than fiddling with things on my end to ensure I get credit. otherwise they may be under the impression that it's not a problem. Agreed. But in that case, it's also helpful to post the local information from the event log that the devs can't easily see - like captainjack's note File size: 137187308.000000 bytes. Limit: 10000000.000000 bytes That gives them the magnitude of the required correction, as well as its location. My (single) patched run did indeed complete successfully after surgery, so the file size should be the last correction needed. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
i just edited with that info from the log.
|
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
Two more, half the run time, and 10x the file size for _4 output file. both run on 3080Tis again. odd that these ones showed different run behavior. more similar to how the ACEMD3 app works. ~96% GPU core use ~1-2% GPU memory bus use ~12% PCIe bus use T2_GAFF2_frag_02-RAIMIS_NNPMM-0-1-RND3120_1 Tue 19 Apr 2022 07:54:14 PM EDT | GPUGRID | File size: 213191804.000000 bytes. Limit: 10000000.000000 bytes T2_GAFF2_frag_00-RAIMIS_NNPMM-0-1-RND5192_2 Tue 19 Apr 2022 07:53:26 PM EDT | GPUGRID | File size: 213539276.000000 bytes. Limit: 10000000.000000 bytes
|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Same thing here. Ran 2 1/2 hours to completion and then failed on too large an upload file. upload failure: <file_xfer_error> <file_name>T2_GAFF2_frag_02-RAIMIS_NNPMM-0-1-RND3120_2_4</file_name> <error_code>-131 (file size too big)</error_code> Waste of resources. Wish the app admin dev would fix this issue. Like right now! |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Woke up this morning to find two unstarted ACEMD4 tasks awaiting my attention (and downloaded a third since). I've fixed the file size problem, partly to stop them recirculating to other users, but also to get some real science done, if possible. The initial estimates (10 - 12 minutes, with DCF) look tight, but I've left them alone to check if the devs' adjustments are adequate. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
the estimated runtime of my tasks were very close. at inception they started right at around 2hrs and that's how long it took. that was with no adjustments. so the estimated flops seems correct. they just need to bump the file size limit by at least 25x. maybe 100x to be safe. it really is a waste to trash good work on something like an arbitrary and artificial file size limit.
|
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The other thing they still have to sort out is checkpointing. I've just come home to find that BOINC had downloaded and started a new ACEMD4 task - for some reason, it pre-empted the two Einstein tasks running on the GPU I dedicate to GPUGrid. That must have been EDF kicking in, but with a six hour cache and a 24 hour deadline, it shouldn't have been needed. Anyway, I applied the upload correction, and the task restarted from 1% - I had stopped it at something like 16% in 44 minutes. That implies a longer runtime than this morning's group, so the output size may be larger as well. 25x may not be enough, if not all tasks are created equal. I'll check it when it finishes. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Another 3 hours wasted because of too large an upload file. |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 1,447 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've fixed the file size problem, partly to stop them recirculating to other users, but also to get some real science done, if possible. It is highly likely that the one (and only) at current Server Status page "successful users in last 24h" for ACEMD 4 tasks, is you ;-) |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've fixed the file size problem, partly to stop them recirculating to other users, but also to get some real science done, if possible. It might be one user, but it's four tasks and counting so far: Host 132158 Host 508381 I'll try and see off this run of timewasters, even if I have to do it all myself! |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
That implies a longer runtime than this morning's group, so the output size may be larger as well. Turned out not to be a problem - the file size was 20.4 MB, despite running nearly twice as long. I can't see anything about the filename which would reliably distinguish between "quick run, large file" and "slow run, small file" tasks. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
That implies a longer runtime than this morning's group, so the output size may be larger as well. T2_GAFF2_frag_00-RAIMIS_NNPMM = short run, large file size T2_NNPMM_frag_01-RAIMIS_NNPMM = longer run, smaller (but still too big) file size i processed several of both types.
|
©2025 Universitat Pompeu Fabra