Message boards :
Number crunching :
ATM work units "bomb out"
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 25 Feb 22 Posts: 5 Credit: 42,182,903 RAC: 0 Level ![]() Scientific publications
|
Below is an event log excerpt from a recent attempt to run a work unit. I enrolled my PC a long time ago and I'm pretty sure not one work unit has ever completed successfully, unless they're supposed to run in under two minutes. So, different versions of Windows, different Nvidia hardware/driver versions, different security software. None of that seems to make any difference. I have been reading anything that seemed relevant on this forum and don't find anyone reporting this problem who actually found a solution. My PC does run asteroids@home GPU units without any issues. 3/6/2024 3:32:09 PM | GPUGRID | Scheduler request completed: got 1 new tasks 3/6/2024 3:32:09 PM | GPUGRID | Project requested delay of 11 seconds 3/6/2024 3:32:11 PM | GPUGRID | Started download of Bace_m26_m15_2-QUICO_ATM_XFF-2-input 3/6/2024 3:32:11 PM | GPUGRID | Started download of Bace_m26_m15_2-QUICO_ATM_XFF-2-Bace_m26_m15_2-QUICO_ATM_XFF-1-7-RND5798_1 3/6/2024 3:33:53 PM | GPUGRID | Finished download of Bace_m26_m15_2-QUICO_ATM_XFF-2-input (5629576 bytes) 3/6/2024 3:44:40 PM | GPUGRID | Finished download of Bace_m26_m15_2-QUICO_ATM_XFF-2-Bace_m26_m15_2-QUICO_ATM_XFF-1-7-RND5798_1 (72249701 bytes) 3/6/2024 3:44:46 PM | GPUGRID | Starting task Bace_m26_m15_2-QUICO_ATM_XFF-2-7-RND5798_1 3/6/2024 3:46:15 PM | GPUGRID | Computation for task Bace_m26_m15_2-QUICO_ATM_XFF-2-7-RND5798_1 finished 3/6/2024 3:46:15 PM | GPUGRID | Output file Bace_m26_m15_2-QUICO_ATM_XFF-2-7-RND5798_1_0 for task Bace_m26_m15_2-QUICO_ATM_XFF-2-7-RND5798_1 absent 3/6/2024 3:46:16 PM | GPUGRID | Started upload of Bace_m26_m15_2-QUICO_ATM_XFF-2-7-RND5798_1_1 3/6/2024 3:48:01 PM | GPUGRID | Sending scheduler request: To fetch work. 3/6/2024 3:48:01 PM | GPUGRID | Reporting 1 completed tasks 3/6/2024 3:48:01 PM | GPUGRID | Requesting new tasks for NVIDIA GPU 3/6/2024 3:48:07 PM | GPUGRID | Scheduler request completed: got 0 new tasks |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
unhide your system so we can properly look at the host details and the task errors. the BOINC event log messages you've posted dont tell you anything.
|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
The output file absent message likely means an antivirus app is restricting access to the task slots. But agree you need to unhide your computers so we can read the stderr.txt task result. |
|
Send message Joined: 25 Feb 22 Posts: 5 Credit: 42,182,903 RAC: 0 Level ![]() Scientific publications
|
I corrected the hidden PC issue and updated the project, but I don't know where I can view that information. |
|
Send message Joined: 25 Feb 22 Posts: 5 Credit: 42,182,903 RAC: 0 Level ![]() Scientific publications
|
I think I found the area here: https://www.gpugrid.net/results.php?hostid=613766 |
|
Send message Joined: 25 Feb 22 Posts: 5 Credit: 42,182,903 RAC: 0 Level ![]() Scientific publications
|
After reviewing the tasks that are being processed and returned with what looks like a computation error, it seems that it only appears that they failed. This task(( https://www.gpugrid.net/result.php?resultid=34275043 )) appears to have been successful. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Not much helpful information in stderr.txt output being this is a Windows host. You are getting the typical message error for Windows: <message> The operating system cannot run %1. (0xc3) - exit code 195 (0xc3)</message> So the task is unable to properly setup your task environment. Your gpu also is on the weak side with its VRAM limit of only 8GB. The tasks are very spiky in memory utilization often exceeding 12GB which will produce and more usable error message of "out of memory" on Linux hosts. You got lucky with that one successful task. I'd disable the ATM and QC tasks and wait for the acemd tasks to make an appearance again someday. Or put the project on Suspend and move onto other projects where your gpu is more capable of running. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
he's running the ATM tasks, which don't use much VRAM, not like the QChem tasks. so he's fine on VRAM. but that error is the common issue many others are having with the Windows application. I don't think anyone has narrowed down exactly why some hosts work on Windows and others don't.
|
|
Send message Joined: 27 May 21 Posts: 54 Credit: 1,004,151,720 RAC: 0 Level ![]() Scientific publications
|
Anybody having this error should try to capture run.log. I know of several ways to achieve that, unfortunately I'm one of the lucky (or unlucky in this case) users that never has this error on Windows. |
|
Send message Joined: 25 Feb 22 Posts: 5 Credit: 42,182,903 RAC: 0 Level ![]() Scientific publications
|
It turns out I was wrong about one task completing. I clicked a link somewhere which showed the result of a different computer, not mine. When I click the properties button for the project in the BOINC manager there is a statistic; tasks completed: 0 - Tasks failed: 141. So, batting .1000 I guess. I have suspended for now. I'll watch this thread for a while, see if anyone suggests anything to try. |
|
Send message Joined: 27 May 21 Posts: 54 Credit: 1,004,151,720 RAC: 0 Level ![]() Scientific publications
|
It turns out I was wrong about one task completing. I clicked a link somewhere which showed the result of a different computer, not mine. When I click the properties button for the project in the BOINC manager there is a statistic; tasks completed: 0 - Tasks failed: 141. So, batting .1000 I guess. I have suspended for now. I'll watch this thread for a while, see if anyone suggests anything to try. I'll suggest to try this: 1) suspend all your projects 2) edit or create cc_config.xml in the main BOINC directory (probably C:\ProgramData\BOINC - but your config may vary) 3) set 'exit_after_finish' switch to 1:
<cc_config> ...mandatory section, so add it if you're creating this from scratch
<log_flags>
...this section may or may not be there - ignore it...
</log_flags>
<options>
...this section may or may not be there,if it's not: CREATE IT!
<exit_after_finish>1</exit_after_finish>
...ignore all other flags that are already there
</options>
</cc_config>
4) save and close 5) in boinc manager, select menu "options"=>"read config files" or just restart the boinc SERVICE (not just the manager) 6) unsuspend GPUGRID, wait for an ATM unit to run and finish. BOINC will exit immediately after it finishes 7) in the 'slots' directory (probably C:\Windows\ProgramData\BOINC\slots) there should be one or more subdirectories 0, 1, 2 etc. Locate the one that contains the ATM workunit. Copy the "run.log" file to some personal directory. Look for errormessages at the end of this file and post them here 8) edit cc_config.xml and set the flag 'exit_after_finish=0' 9) restart BOINC |
©2025 Universitat Pompeu Fabra