ATM

Message boards : News : ATM
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 30 · 31 · 32 · 33 · 34 · 35 · Next

AuthorMessage
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 960
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 61481 - Posted: 4 May 2024, 20:18:23 UTC - in response to Message 61480.  

The work unit generator has an incorrect value for estimated time to complete in the task profile.

same thing here, short time ago:

https://www.gpugrid.net/result.php?resultid=35071561

this is a new type of failure?

What a waste :-(
Could someone back at GPUGRID please take care of this?
ID: 61481 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 61482 - Posted: 5 May 2024, 1:29:15 UTC - in response to Message 61481.  

One of the GPUGrid devs, Adria for the acemd3/Insilico-binding-assay devs said on their Discord server they would pass on the time limit exceeded error messages to the other devs so that the task generator templates can be updated so they get the proper values for the new tasks.
ID: 61482 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61483 - Posted: 5 May 2024, 7:56:16 UTC

Note that these tasks are ACEMD 3, rather than ATM - and they are indeed from a new version of that application, v2.27 deployed 19 Apr 2024 for Windows. So Erich is right to identify this as a new problem.

BOINC projects don't set a time limit explicitly. It's calculated by the client, on the host machine running the task. The calculation is done from:

rsc_fpops_bound - by default 10x the rsc_fpops_est, set by the project
Host average processing rate (avp) - shown as 38141 Gflops for Erich's machine, after just one successful task.

Keith - you'll remember that we had major problems with AVP at SETI, after it was first introduced in 2010. It isn't used by the server until 11 tasks have been completed and validated by that particular host.

I'll try and snag one of the new tasks on one of my Windows machines, so I can investigate further, but it'll be tricky.
ID: 61483 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 52,725
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61484 - Posted: 5 May 2024, 11:11:38 UTC - in response to Message 61483.  

Note that these tasks are ACEMD 3, rather than ATM - and they are indeed from a new version of that application, v2.27 deployed 19 Apr 2024 for Windows. So Erich is right to identify this as a new problem.


This is indeed a pertinent conversation, but if I may point out, it is listed on the wrong thread, for a reason mentioned above.....

ID: 61484 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 960
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 61486 - Posted: 5 May 2024, 12:21:38 UTC - in response to Message 61484.  

Note that these tasks are ACEMD 3, rather than ATM - and they are indeed from a new version of that application, v2.27 deployed 19 Apr 2024 for Windows. So Erich is right to identify this as a new problem.


This is indeed a pertinent conversation, but if I may point out, it is listed on the wrong thread, for a reason mentioned above.....


sorry folks, I hadn't even caught that the tasks in question are ACEMD3. So my complaint ended up in the wrong thread :-(

Anyway, I now deselected ACEMD 3 for the time being.


ID: 61486 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 61487 - Posted: 5 May 2024, 20:25:44 UTC

I wasn't lucky in snagging any of the new acemd3 tasks and app this last pass, I keep gorging on the QC tasks.

Has anyone running Linux run into the same time exceeded issue?
ID: 61487 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61488 - Posted: 5 May 2024, 20:34:53 UTC - in response to Message 61487.  

Likewise. Linux has a continuous supply of QC, only interrupted by the occasional ATM. And no joy yet on Windows.
ID: 61488 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 52,725
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61489 - Posted: 6 May 2024, 23:53:37 UTC - in response to Message 61487.  

I wasn't lucky in snagging any of the new acemd3 tasks and app this last pass, I keep gorging on the QC tasks.

Has anyone running Linux run into the same time exceeded issue?



I have:

https://www.gpugrid.net/results.php?hostid=610674&offset=0&show_names=0&state=0&appid=32

Twice.


ID: 61489 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 960
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 61537 - Posted: 19 Jun 2024, 9:19:29 UTC

I restarted downloading ATMs this morning on three of my hosts.
About a third of the tasks errored out after about 2.700 - 2.900 secs,

No stderr is shown at any of the erronous tasks, so this probably is part of the problem.
ID: 61537 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pascal

Send message
Joined: 15 Jul 20
Posts: 95
Credit: 2,550,803,412
RAC: 188,726
Level
Phe
Scientific publications
wat
Message 61538 - Posted: 19 Jun 2024, 11:49:56 UTC - in response to Message 61537.  

RAS sur mon pc linux mint avec rtx 4060 t rtx a2000.

https://www.gpugrid.net/results.php?userid=563937

nothing to report on my mint linux pc with rtx 4060 and rtx a2000.

https://www.gpugrid.net/results.php?userid=563937
ID: 61538 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 61539 - Posted: 19 Jun 2024, 14:58:32 UTC - in response to Message 61537.  

I restarted downloading ATMs this morning on three of my hosts.
About a third of the tasks errored out after about 2.700 - 2.900 secs,

No stderr is shown at any of the erronous tasks, so this probably is part of the problem.


not including tasks still in progress, you have 20 tasks processed, and only 4 errors (that's about 1/5th). of those four, 2 were aborted, not computation error.
ID: 61539 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 960
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 61540 - Posted: 19 Jun 2024, 18:02:29 UTC - in response to Message 61539.  

I restarted downloading ATMs this morning on three of my hosts.
About a third of the tasks errored out after about 2.700 - 2.900 secs,

No stderr is shown at any of the erronous tasks, so this probably is part of the problem.


not including tasks still in progress, you have 20 tasks processed, and only 4 errors (that's about 1/5th). of those four, 2 were aborted, not computation error.

why did I abort 2 tasks - you can see it: they were running, running, running - for many hours - but no CPU at all. Hence, they also were erronous.
ID: 61540 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 61541 - Posted: 19 Jun 2024, 19:52:59 UTC - in response to Message 61540.  
Last modified: 19 Jun 2024, 19:55:53 UTC

the two you aborted show less than an hour of runtime.

I was talking about these two:
http://www.gpugrid.net/result.php?resultid=35340996
http://www.gpugrid.net/result.php?resultid=35340990

I made my post before you aborted the other two. so now you have 4 that you aborted from that system.

"no CPU use at all" is a strange comment, these are GPU tasks. ATM also frequently stops GPU computation to write results to the file. you stopped the other two tasks around 19-20,000 seconds, which was in the same range to time completion for the tasks that completed successfully. perhaps you got confused and the tasks were nearly complete?
ID: 61541 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 960
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 61542 - Posted: 19 Jun 2024, 20:03:48 UTC - in response to Message 61541.  

...
"no CPU use at all" is a strange comment, these are GPU tasks. ATM also frequently stops GPU computation to write results to the file. you stopped the other two tasks around 19-20,000 seconds, which was in the same range to time completion for the tasks that completed successfully. perhaps you got confused and the tasks were nearly complete?

as can easily be seen from successfully completed tasks, CPU time is close to total runtime.
In the case of the tasks which I aborted, I realized by looking at the Windows task manager that there was no CPU usage at all, not at any time, so I aborted them. Also a look at the task list shows that CPU usage was "0".
So, in some way these tasks must have been faulty
ID: 61542 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 61543 - Posted: 19 Jun 2024, 20:18:26 UTC - in response to Message 61542.  

might be an intermittent problem with your computer. like a driver crash/recovery. since you have some tasks that are running fine.
ID: 61543 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
EA6LE

Send message
Joined: 28 Dec 20
Posts: 7
Credit: 26,500,257,436
RAC: 1,660
Level
Trp
Scientific publications
wat
Message 61549 - Posted: 20 Jun 2024, 22:36:12 UTC

6/20/2024 6:34:33 PM | GPUGRID | [error] Error reported by file upload server: Server is out of disk space

ID: 61549 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 61551 - Posted: 21 Jun 2024, 2:06:10 UTC - in response to Message 61549.  

Been seeing this issue now for several hours now.

Sent a PM through the GPUGrid Discord channel to Gianni.

Probably won't see any relief till tomorrow European time.
ID: 61551 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61552 - Posted: 21 Jun 2024, 10:53:04 UTC

Some files have uploaded, and some tasks reported, but they've now stopped again with a slightly different set of messages. Compare:

21/06/2024 11:46:33 | GPUGRID | [error] Error reported by file upload server: can't write file /home/ps3grid/projects/PS3GRID/upload/2a/BACE_m26_m17_5-QUICO_ATM_GAFF2_RESP-4-7-RND2126_0_1: No space left on server
21/06/2024 11:46:34 | GPUGRID | [error] Error reported by file upload server: Server is out of disk space

I interpret the long version as meaning there's no space left on the backing store either, but that's a guess.
ID: 61552 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61553 - Posted: 21 Jun 2024, 12:31:53 UTC

And now they've all gone. The quota system is even allowing me to download new tasks again.
ID: 61553 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 61555 - Posted: 22 Jun 2024, 0:36:21 UTC

Uploads are stalled out again.
ID: 61555 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 30 · 31 · 32 · 33 · 34 · 35 · Next

Message boards : News : ATM

©2025 Universitat Pompeu Fabra