Question re rebooting during an active Work Unit in Progress

Message boards : Number crunching : Question re rebooting during an active Work Unit in Progress
Message board moderation

To post messages, you must log in.

AuthorMessage
STARBASEn
Avatar

Send message
Joined: 17 Feb 09
Posts: 91
Credit: 1,603,303,394
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 47160 - Posted: 3 May 2017, 4:38:29 UTC

It has been sometime since I last participated with GPUGRID so I don't remember the proper procedure to reboot a machine after a kernel update and save an in progress gpu work unit. This happened a few days ago and I ended up loosing 5 hours of crunching time causing the WU to error out following the reboot.

I have checked the cpu slot using ACMED and did not see an obvious check point file so the point of this post is to inquire if it possible to reboot an in progress work unit and if so, what is the procedure? I would hate to think that I would have to suspend future work and wait until the current project completed prior to rebooting a new kernel.

I use Fedora and boinc runs as a daemon on all my systems. Any advice greatly appreciated.

Crunching since Feb 2003 (United Devices, Find-a-Drug)
ID: 47160 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47223 - Posted: 15 May 2017, 19:02:32 UTC - in response to Message 47160.  

Select no new work from the project. Allow existing work to complete, then update+restart before accepting work from the project again.
ID: 47223 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 47260 - Posted: 17 May 2017, 18:32:37 UTC - in response to Message 47160.  

... This happened a few days ago and I ended up loosing 5 hours of crunching time causing the WU to error out following the reboot. ...

hm, whenever I close the BOINC Manager by pushing the "exit" button, after a reboot of the PC GPUGRID and other projects continue their work exactly where they had interrupted it before.
ID: 47260 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
STARBASEn
Avatar

Send message
Joined: 17 Feb 09
Posts: 91
Credit: 1,603,303,394
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 47349 - Posted: 31 May 2017, 23:46:47 UTC

Select no new work from the project. Allow existing work to complete, then update+restart before accepting work from the project again.

I was hoping to avoid having to do that but you are correct in that it appears to be the only safe way to avoid loosing time.

hm, whenever I close the BOINC Manager by pushing the "exit" button, after a reboot of the PC GPUGRID and other projects continue their work exactly where they had interrupted it before.

I have had a couple of power outages since I first posted about this and those WU's did continue where they left off upon restarting. Leads me to believe this issue could be more WU related or just an unfortunate coincident.

Crunching since Feb 2003 (United Devices, Find-a-Drug)
ID: 47349 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Question re rebooting during an active Work Unit in Progress

©2025 Universitat Pompeu Fabra