Task failing after 3.669 seconds

Message boards : Number crunching : Task failing after 3.669 seconds
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57608 - Posted: 14 Oct 2021, 16:50:07 UTC

Any idea why this task failed with "computation error" about 1 hour after start:

https://www.gpugrid.net/result.php?resultid=32654746
ID: 57608 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 57613 - Posted: 14 Oct 2021, 18:07:32 UTC - in response to Message 57608.  

Exit status 194 (0xc2) EXIT_ABORTED_BY_CLIENT
ID: 57613 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57615 - Posted: 14 Oct 2021, 18:12:07 UTC - in response to Message 57613.  

Exit status 194 (0xc2) EXIT_ABORTED_BY_CLIENT

which is definitely wrong. At least, if the client refers to me personally, for sure I did NOT abort the WU.

Further, under result plus under clientstatus is says: "Berechnungsfehler", i.e. "computation error".
ID: 57615 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 57618 - Posted: 14 Oct 2021, 19:36:43 UTC - in response to Message 57615.  

don't get too hung up on the verbiage used by BOINC.

ANY kind of error, be it pre-computation, during-computation issues, manual aborts, automatic aborts, or even things like upload errors (after computation has completed) will be classified as "Computation Error". This is the same for all projects, It's just the generic words BOINC uses when there's an error it can't resolve, and more detailed info is usually in the logs or stderr output.

since it failed with aborted by client I can only assume some kind of issue between BOINC and the app, and the BOINC client itself just killed the task.

(unknown error) - exit code 194 (0xc2)


since all you have is "unknown error" I don't think there's much to run down here.
ID: 57618 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57624 - Posted: 15 Oct 2021, 4:50:30 UTC - in response to Message 57618.  

...
since it failed with aborted by client I can only assume some kind of issue between BOINC and the app, and the BOINC client itself just killed the task.

(unknown error) - exit code 194 (0xc2)

since all you have is "unknown error" I don't think there's much to run down here.

in a way, I was lucky anyway that this happened after about 1 hour, and not, say, after 15 hours or so :-)
ID: 57624 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57695 - Posted: 30 Oct 2021, 17:02:48 UTC

now, a task failed after about 16 hours, a few minitues before getting finished:
https://www.gpugrid.net/result.php?resultid=32658707

very annoying, of course.

Can anyone tell me what was going wrong with this task?
ID: 57695 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57696 - Posted: 30 Oct 2021, 19:11:06 UTC - in response to Message 57695.  
Last modified: 30 Oct 2021, 19:12:50 UTC

Detected memory leaks!

Error invoking kernel: CUDA_ERROR_UNKNOWN (999)


Probably an error in the VRAM on the card. Try reducing the card temp by moving the fan speed up or reducing any overclocking.
ID: 57696 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 428
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57698 - Posted: 30 Oct 2021, 20:10:53 UTC - in response to Message 57696.  

Detected memory leaks!

All Windows users have that report from the app, on perfectly good tasks. I wouldn't worry about that.

Error invoking kernel: CUDA_ERROR_UNKNOWN (999)

Isn't that what happens after a reboot, particularly after the NVidia driver has been updated by Microsoft / Windows 10?

Probably an error in the VRAM on the card. Try reducing the card temp by moving the fan speed up or reducing any overclocking.

I think we'd need more evidence before making a leap of interpretation like that.

Is the video card in question driving a monitor? If so, are there any problems with the visible display? Colour blocks, bad pixels, that sort of thing? Have you changed any operating parameters - overclocked? undervolted?
ID: 57698 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jul 16
Posts: 338
Credit: 7,987,341,558
RAC: 259
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57700 - Posted: 31 Oct 2021, 16:46:22 UTC - in response to Message 57698.  

Detected memory leaks!

All Windows users have that report from the app, on perfectly good tasks. I wouldn't worry about that.

Error invoking kernel: CUDA_ERROR_UNKNOWN (999)

Isn't that what happens after a reboot, particularly after the NVidia driver has been updated by Microsoft / Windows 10?

Probably an error in the VRAM on the card. Try reducing the card temp by moving the fan speed up or reducing any overclocking.

I think we'd need more evidence before making a leap of interpretation like that.

Is the video card in question driving a monitor? If so, are there any problems with the visible display? Colour blocks, bad pixels, that sort of thing? Have you changed any operating parameters - overclocked? undervolted?


Windows updates doesn't include OpenCL, so not an issue here at GPUGrid.
ID: 57700 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jiipee

Send message
Joined: 4 Jun 15
Posts: 19
Credit: 8,813,058,416
RAC: 114
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 57709 - Posted: 2 Nov 2021, 9:09:28 UTC

Is acemd3 for Windows broken? All tasks seem to be failing:

Stderr output

<core_client_version>7.16.20</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
03:57:10 (9116): wrapper (7.9.26016): starting
03:57:10 (9116): wrapper: running bin/acemd3.exe (--boinc --device 0)
03:57:12 (9116): bin/acemd3.exe exited; CPU time 0.000000
03:57:12 (9116): app exit status: 0xc0000135
03:57:12 (9116): called boinc_finish(195)
0 bytes in 0 Free Blocks.
456 bytes in 4 Normal Blocks.
1144 bytes in 1 CRT Blocks.
0 bytes in 0 Ignore Blocks.
0 bytes in 0 Client Blocks.
Largest number used: 0 bytes.
Total allocations: 120166 bytes.
Dumping objects ->
...
ID: 57709 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57710 - Posted: 2 Nov 2021, 13:10:11 UTC - in response to Message 57709.  

Is acemd3 for Windows broken? All tasks seem to be failing:

how come at all that you receive tasks? There have not been any new ones available for serveral days.
ID: 57710 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57717 - Posted: 2 Nov 2021, 16:52:10 UTC - in response to Message 57710.  

Check your preferences. I have been getting work everyday as have others.
ID: 57717 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zooxit

Send message
Joined: 4 Jul 21
Posts: 23
Credit: 12,095,488,127
RAC: 0
Level
Trp
Scientific publications
wat
Message 57791 - Posted: 10 Nov 2021, 18:40:35 UTC

Hi,
so what is the answer for this post's title question (well actually it is a statement :) )?
Why are python apps failing after 2-4 seconds?
Should I install something on my machine (running Debian 11)?
ID: 57791 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57793 - Posted: 10 Nov 2021, 19:07:39 UTC - in response to Message 57791.  

The tasks are beta and the scientists are still debugging the configuration parameters. Errors are to be expected.

If you have a task error, look at the task ID in your Tasks list and see if the task has been sent to many others that have also errored out the task. If so, everything is normal.

However if the wingmen for the task has completed the task successfully, you need to look at the stderr.txt output of the task in the list and read to the end and see what kind of error was generated. If the error is local you might be able to do something about it by restarting the host or updating the video drivers.

And you can't do anything else or need to do anything else like downloading libraries or similar because each task is bundled with exactly the resources it need to complete successfully. Or at least in theory. Again, these are beta tasks and are still being debugged.
ID: 57793 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zooxit

Send message
Joined: 4 Jul 21
Posts: 23
Credit: 12,095,488,127
RAC: 0
Level
Trp
Scientific publications
wat
Message 57807 - Posted: 11 Nov 2021, 20:11:40 UTC

Thanks@KeithMyers for tips.

Checked last 20 tasks I got and all failed (they all failed after 3 seconds) - they where all 'solved' by another host shortly thereafter, so...

Everything is updated on my host. It is Debian bullseye though, on computers that finished the task I think I mostly saw they were running Ubuntu 20.04 LTS. But that is probably not the likely cause.

My STDERR says:
INTERNAL ERROR: cannot create temporary directory!
Might that be a permissions problem?


----------------------------------------------
The full STDERR:

<core_client_version>7.16.16</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
14:26:46 (1648902): wrapper (7.7.26016): starting
14:26:46 (1648902): wrapper (7.7.26016): starting
14:26:46 (1648902): wrapper: running /usr/bin/flock (/var/lib/boinc-client/projects/www.gpugrid.net/miniconda.lock -c "/bin/bash ./miniconda-installer.sh -b -u -p /var/lib/boinc-client/projects/www.gpugrid.net/miniconda &&
/var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/conda install -m -y -p gpugridpy --file requirements.txt ")
[1648927] INTERNAL ERROR: cannot create temporary directory!
[1648931] INTERNAL ERROR: cannot create temporary directory!
14:26:47 (1648902): /usr/bin/flock exited; CPU time 0.139614
14:26:47 (1648902): app exit status: 0x1
14:26:47 (1648902): called boinc_finish(195)
ID: 57807 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,447
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57808 - Posted: 11 Nov 2021, 21:08:14 UTC - in response to Message 57807.  
Last modified: 11 Nov 2021, 21:09:22 UTC

My STDERR says:
INTERNAL ERROR: cannot create temporary directory!
Might that be a permissions problem?

The same problem was treated at Message #55986
A workaround solution was detailed there, maybe you are interested in trying.
ID: 57808 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zooxit

Send message
Joined: 4 Jul 21
Posts: 23
Credit: 12,095,488,127
RAC: 0
Level
Trp
Scientific publications
wat
Message 57811 - Posted: 12 Nov 2021, 9:30:41 UTC

Thanks!
So, I tried:
sudo systemctl edit boinc-client.service
and added:
[Service]
PrivateTmp=true
then rebooted

Waiting for tasks now to see if it works...
ID: 57811 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,447
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57816 - Posted: 12 Nov 2021, 18:44:17 UTC - in response to Message 57811.  

All right.
If it was that, You're done.
Now it's time to patiently wait for new Python WUs...
ID: 57816 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57820 - Posted: 12 Nov 2021, 20:10:34 UTC

The bug where all tasks always run on Device#0 has been fixed this morning.
Should be smooth sailing from now on for python tasks.
ID: 57820 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,447
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57822 - Posted: 13 Nov 2021, 0:19:07 UTC - in response to Message 57811.  

If still failing, please, double check that your boinc-client.service is similar to this:

After adding the stated lines, it is necessary to save changes with Ctrl + O, confirm with Enter, then exit with Ctrl + X, and then reboot.
(Excuse that the menus are shown in Spanish version :-)
ID: 57822 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Task failing after 3.669 seconds

©2025 Universitat Pompeu Fabra