ATM work units "bomb out"

Message boards : Number crunching : ATM work units "bomb out"
Message board moderation

To post messages, you must log in.

AuthorMessage
ARC3670

Send message
Joined: 25 Feb 22
Posts: 5
Credit: 42,182,903
RAC: 0
Level
Val
Scientific publications
wat
Message 61381 - Posted: 6 Mar 2024, 23:18:45 UTC
Last modified: 6 Mar 2024, 23:28:32 UTC

Below is an event log excerpt from a recent attempt to run a work unit. I enrolled my PC a long time ago and I'm pretty sure not one work unit has ever completed successfully, unless they're supposed to run in under two minutes. So, different versions of Windows, different Nvidia hardware/driver versions, different security software. None of that seems to make any difference. I have been reading anything that seemed relevant on this forum and don't find anyone reporting this problem who actually found a solution. My PC does run asteroids@home GPU units without any issues.

3/6/2024 3:32:09 PM | GPUGRID | Scheduler request completed: got 1 new tasks
3/6/2024 3:32:09 PM | GPUGRID | Project requested delay of 11 seconds
3/6/2024 3:32:11 PM | GPUGRID | Started download of Bace_m26_m15_2-QUICO_ATM_XFF-2-input
3/6/2024 3:32:11 PM | GPUGRID | Started download of Bace_m26_m15_2-QUICO_ATM_XFF-2-Bace_m26_m15_2-QUICO_ATM_XFF-1-7-RND5798_1
3/6/2024 3:33:53 PM | GPUGRID | Finished download of Bace_m26_m15_2-QUICO_ATM_XFF-2-input (5629576 bytes)
3/6/2024 3:44:40 PM | GPUGRID | Finished download of Bace_m26_m15_2-QUICO_ATM_XFF-2-Bace_m26_m15_2-QUICO_ATM_XFF-1-7-RND5798_1 (72249701 bytes)
3/6/2024 3:44:46 PM | GPUGRID | Starting task Bace_m26_m15_2-QUICO_ATM_XFF-2-7-RND5798_1
3/6/2024 3:46:15 PM | GPUGRID | Computation for task Bace_m26_m15_2-QUICO_ATM_XFF-2-7-RND5798_1 finished
3/6/2024 3:46:15 PM | GPUGRID | Output file Bace_m26_m15_2-QUICO_ATM_XFF-2-7-RND5798_1_0 for task Bace_m26_m15_2-QUICO_ATM_XFF-2-7-RND5798_1 absent
3/6/2024 3:46:16 PM | GPUGRID | Started upload of Bace_m26_m15_2-QUICO_ATM_XFF-2-7-RND5798_1_1
3/6/2024 3:48:01 PM | GPUGRID | Sending scheduler request: To fetch work.
3/6/2024 3:48:01 PM | GPUGRID | Reporting 1 completed tasks
3/6/2024 3:48:01 PM | GPUGRID | Requesting new tasks for NVIDIA GPU
3/6/2024 3:48:07 PM | GPUGRID | Scheduler request completed: got 0 new tasks
ID: 61381 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 61382 - Posted: 7 Mar 2024, 0:01:49 UTC - in response to Message 61381.  

unhide your system so we can properly look at the host details and the task errors.

the BOINC event log messages you've posted dont tell you anything.
ID: 61382 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 61383 - Posted: 7 Mar 2024, 0:46:50 UTC

The output file absent message likely means an antivirus app is restricting access to the task slots.

But agree you need to unhide your computers so we can read the stderr.txt task result.
ID: 61383 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ARC3670

Send message
Joined: 25 Feb 22
Posts: 5
Credit: 42,182,903
RAC: 0
Level
Val
Scientific publications
wat
Message 61384 - Posted: 7 Mar 2024, 1:16:37 UTC

I corrected the hidden PC issue and updated the project, but I don't know where I can view that information.
ID: 61384 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ARC3670

Send message
Joined: 25 Feb 22
Posts: 5
Credit: 42,182,903
RAC: 0
Level
Val
Scientific publications
wat
Message 61385 - Posted: 7 Mar 2024, 1:22:29 UTC

I think I found the area here:

https://www.gpugrid.net/results.php?hostid=613766
ID: 61385 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ARC3670

Send message
Joined: 25 Feb 22
Posts: 5
Credit: 42,182,903
RAC: 0
Level
Val
Scientific publications
wat
Message 61386 - Posted: 7 Mar 2024, 2:17:56 UTC - in response to Message 61385.  

After reviewing the tasks that are being processed and returned with what looks like a computation error, it seems that it only appears that they failed. This task(( https://www.gpugrid.net/result.php?resultid=34275043 )) appears to have been successful.
ID: 61386 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 61389 - Posted: 7 Mar 2024, 17:35:09 UTC - in response to Message 61386.  
Last modified: 7 Mar 2024, 17:37:48 UTC

Not much helpful information in stderr.txt output being this is a Windows host.

You are getting the typical message error for Windows:

<message>
The operating system cannot run %1.
(0xc3) - exit code 195 (0xc3)</message>


So the task is unable to properly setup your task environment.

Your gpu also is on the weak side with its VRAM limit of only 8GB.

The tasks are very spiky in memory utilization often exceeding 12GB which will produce and more usable error message of "out of memory" on Linux hosts.

You got lucky with that one successful task.

I'd disable the ATM and QC tasks and wait for the acemd tasks to make an appearance again someday.

Or put the project on Suspend and move onto other projects where your gpu is more capable of running.
ID: 61389 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 61391 - Posted: 7 Mar 2024, 18:06:48 UTC - in response to Message 61389.  

he's running the ATM tasks, which don't use much VRAM, not like the QChem tasks. so he's fine on VRAM.

but that error is the common issue many others are having with the Windows application. I don't think anyone has narrowed down exactly why some hosts work on Windows and others don't.
ID: 61391 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[BAT] Svennemans

Send message
Joined: 27 May 21
Posts: 54
Credit: 1,004,151,720
RAC: 0
Level
Met
Scientific publications
wat
Message 61392 - Posted: 7 Mar 2024, 19:48:52 UTC - in response to Message 61391.  

Anybody having this error should try to capture run.log. I know of several ways to achieve that, unfortunately I'm one of the lucky (or unlucky in this case) users that never has this error on Windows.
ID: 61392 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ARC3670

Send message
Joined: 25 Feb 22
Posts: 5
Credit: 42,182,903
RAC: 0
Level
Val
Scientific publications
wat
Message 61393 - Posted: 8 Mar 2024, 0:48:51 UTC - in response to Message 61389.  

It turns out I was wrong about one task completing. I clicked a link somewhere which showed the result of a different computer, not mine. When I click the properties button for the project in the BOINC manager there is a statistic; tasks completed: 0 - Tasks failed: 141. So, batting .1000 I guess. I have suspended for now. I'll watch this thread for a while, see if anyone suggests anything to try.
ID: 61393 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[BAT] Svennemans

Send message
Joined: 27 May 21
Posts: 54
Credit: 1,004,151,720
RAC: 0
Level
Met
Scientific publications
wat
Message 61394 - Posted: 8 Mar 2024, 7:53:35 UTC - in response to Message 61393.  

It turns out I was wrong about one task completing. I clicked a link somewhere which showed the result of a different computer, not mine. When I click the properties button for the project in the BOINC manager there is a statistic; tasks completed: 0 - Tasks failed: 141. So, batting .1000 I guess. I have suspended for now. I'll watch this thread for a while, see if anyone suggests anything to try.


I'll suggest to try this:
1) suspend all your projects
2) edit or create cc_config.xml in the main BOINC directory (probably C:\ProgramData\BOINC - but your config may vary)
3) set 'exit_after_finish' switch to 1:
<cc_config>  ...mandatory section, so add it if you're creating this from scratch
  <log_flags>
    ...this section may or may not be there - ignore it...
  </log_flags>
  <options>
    ...this section may or may not be there,if it's not: CREATE IT!
    <exit_after_finish>1</exit_after_finish>
    ...ignore all other flags that are already there
  </options>
</cc_config>

4) save and close
5) in boinc manager, select menu "options"=>"read config files" or just restart the boinc SERVICE (not just the manager)
6) unsuspend GPUGRID, wait for an ATM unit to run and finish. BOINC will exit immediately after it finishes
7) in the 'slots' directory (probably C:\Windows\ProgramData\BOINC\slots) there should be one or more subdirectories 0, 1, 2 etc. Locate the one that contains the ATM workunit. Copy the "run.log" file to some personal directory. Look for errormessages at the end of this file and post them here
8) edit cc_config.xml and set the flag 'exit_after_finish=0'
9) restart BOINC

ID: 61394 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : ATM work units "bomb out"

©2025 Universitat Pompeu Fabra