Advanced search

Message boards : Number crunching : ATMML Task running but withut progress

Author Message
homer__simpsons
Send message
Joined: 17 Nov 15
Posts: 14
Credit: 135,798,465
RAC: 758,227
Level
Cys
Scientific publications
wat
Message 61892 - Posted: 17 Oct 2024 | 9:27:43 UTC
Last modified: 17 Oct 2024 | 9:29:25 UTC

Hello,

I had a work unit https://gpugrid.net/workunit.php?wuid=29624565 that was displaying as "running" but after minutes it didn't made any progress: neither in percentage nor in elapsed time.


So I had to cancel my task https://gpugrid.net/result.php?resultid=36261497.

Did this happen to someone else? I tried to suspend and resume it without success. Is there a way to "force restart" a task?

This task was then later correctly completed by someone else which is on Linux while I'm on Windows, could that explain?

Erich56
Send message
Joined: 1 Jan 15
Posts: 1142
Credit: 10,999,400,130
RAC: 21,485,241
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 61893 - Posted: 17 Oct 2024 | 16:21:37 UTC - in response to Message 61892.

with this type of task, it takes several minutes until you can see progess in the BOINC Manager. So just be patient. Also, it takes a short while until the GPU starts it's work.
Suspending these tasks will kill them. So don't do it. Just let it run, and be patient :-)

homer__simpsons
Send message
Joined: 17 Nov 15
Posts: 14
Credit: 135,798,465
RAC: 758,227
Level
Cys
Scientific publications
wat
Message 61894 - Posted: 17 Oct 2024 | 16:43:26 UTC - in response to Message 61893.
Last modified: 17 Oct 2024 | 16:44:41 UTC

with this type of task, it takes several minutes until you can see progess in the BOINC Manager.


Okay, from what I remember it was rather fast on my side (seconds) while here I waited 5-10 minutes

Also, it takes a short while until the GPU starts it's work.


Yes, usually about 5-10 minutes on my side.

Suspending these tasks will kill them. So don't do it. Just let it run, and be patient :-)


I knew it restart them from the beggining, I was worrying if this could somehow stuck the task. I will wait longer next time :)

Thanks for your reply !

homer__simpsons
Send message
Joined: 17 Nov 15
Posts: 14
Credit: 135,798,465
RAC: 758,227
Level
Cys
Scientific publications
wat
Message 61899 - Posted: 19 Oct 2024 | 20:55:59 UTC
Last modified: 19 Oct 2024 | 21:00:47 UTC

So it looks like there is in fact a bug.

I have this work unit: https://gpugrid.net/workunit.php?wuid=29641527

It went successfully to 10%. But then I suspended it manually (to play some games) and then I resumed it, but now it's been 30 minutes that it is like this (UI is in french):


(image link: https://imgur.com/a/1LCD3Hr)

The task resumes correctly (to 0%) on the following scenario though:
- Computer suspend
- Computer off/on
- Paused because there is a high priority application running

I tried to suspend my computer and restart it, but it looks like it didn't start the task yet (5 minutes after).

Is there any log I could provide to help debug this?

homer__simpsons
Send message
Joined: 17 Nov 15
Posts: 14
Credit: 135,798,465
RAC: 758,227
Level
Cys
Scientific publications
wat
Message 61900 - Posted: 19 Oct 2024 | 21:56:39 UTC

So I just restarted my computer and the task started directly!

I am not sure whether this is a BOINC or a GPUGRID bug though.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1142
Credit: 10,999,400,130
RAC: 21,485,241
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 61905 - Posted: 24 Oct 2024 | 6:34:29 UTC - in response to Message 61900.

So I just restarted my computer and the task started directly!

I am not sure whether this is a BOINC or a GPUGRID bug though.


make your choce in the BOINC Manager: "Options" > "Other settings"

homer__simpsons
Send message
Joined: 17 Nov 15
Posts: 14
Credit: 135,798,465
RAC: 758,227
Level
Cys
Scientific publications
wat
Message 61907 - Posted: 24 Oct 2024 | 8:48:35 UTC
Last modified: 24 Oct 2024 | 8:49:20 UTC

make your choce in the BOINC Manager: "Options" > "Other settings"


Which choice ? About starting boinc when computer starts ? If yes, then this is ticked and is the behavior I want.


The issue I'm speaking about is:
1. I manually suspend an ATMML task (suspended by user)
2. I manually resume it
3. It never starts (No CPU / GPU usage, no elapsed timer for 40+ minutes) but displays "in progress" <-- Here is the issue

If I then restart my computer it starts. So there is an issue either with the task or with BOINC, but I do not know how to identify where the issue is.

Next time I will try to stop BOINC and restart it without restarting the whole computer.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1627
Credit: 9,442,588,157
RAC: 17,098,132
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61908 - Posted: 24 Oct 2024 | 9:10:07 UTC - in response to Message 61907.

It probably depends on what else is running under the control of the BOINC client on that machine.

When you 'unsuspend' a task, you are merely giving permission for it to run when a resource of the specified type becomes available. If at that moment, your GPUS are busy on other projects' tasks, the GPUGrid tasks will simply be left on one side.

homer__simpsons
Send message
Joined: 17 Nov 15
Posts: 14
Credit: 135,798,465
RAC: 758,227
Level
Cys
Scientific publications
wat
Message 61909 - Posted: 24 Oct 2024 | 11:55:19 UTC - in response to Message 61908.

It probably depends on what else is running under the control of the BOINC client on that machine.

When you 'unsuspend' a task, you are merely giving permission for it to run when a resource of the specified type becomes available. If at that moment, your GPUS are busy on other projects' tasks, the GPUGrid tasks will simply be left on one side.


Yes, but GPUGRID has more priority than the others, and the others are correctly suspended due to ATMML being "running".

KeithBriggs
Send message
Joined: 29 Aug 24
Posts: 23
Credit: 1,289,251,212
RAC: 11,734,276
Level
Met
Scientific publications
wat
Message 61912 - Posted: 24 Oct 2024 | 13:36:10 UTC - in response to Message 61909.

I'm running on a laptop and had to turn off going to sleep even on battery. I didn't read the entire thread so I could be missing something. I also set on the activity tab to run gpu and cpu always. I also set liberal controls on the web gpugrid settings page. For example use at most 100% of everything and don't suspend for any reason. I'd suggest opening the floodgates for gpugrid and see if it changes anything.

I haven't noticed any thread about process monitors but I use HWiNFO64. That might shed light on heat issues that throttle processing.

homer__simpsons
Send message
Joined: 17 Nov 15
Posts: 14
Credit: 135,798,465
RAC: 758,227
Level
Cys
Scientific publications
wat
Message 61924 - Posted: 11 Nov 2024 | 20:02:52 UTC
Last modified: 11 Nov 2024 | 20:04:55 UTC

So I opened an issue on BOINC GitHub's about this and they say that this is an issue with GPUGRID application (https://github.com/BOINC/boinc/discussions/5894).

Where can I raise this issue to the project owners?

By checking the task manager when this happened earlier, I noticed that there were no "python" process, which should be the case for ATMML task.[/url]

KeithBriggs
Send message
Joined: 29 Aug 24
Posts: 23
Credit: 1,289,251,212
RAC: 11,734,276
Level
Met
Scientific publications
wat
Message 61925 - Posted: 12 Nov 2024 | 22:18:52 UTC - in response to Message 61924.

Coincidently, I inadvertently suspended gpugrid and when I saw it, I resumed the project. I too saw that the previously active task was in limbo. It said running but it wasn't (fans were not running to confirm). I simply dished it and the waiting task took off like normal.

Post to thread

Message boards : Number crunching : ATMML Task running but withut progress

//