Posts by TofPete

1) Message boards : Number crunching : ACEMD3 High error rates (Message 62189)
Posted 2 Feb 2025 by TofPete
Post:
I don't think that memory stick swapping could change anything.
There should be problems with other applications as well if there would be a hardware memory problem.
2) Message boards : Number crunching : ACEMD3 High error rates (Message 62175)
Posted 28 Jan 2025 by TofPete
Post:
These crashes are not from my machine.

My machine sent another one this morning:
https://www.gpugrid.net/result.php?resultid=38275319

As I can see it crashed 5 seconds after the task started and there is no reason in the logs why it crashed (no memory leak entry as earlier).
The task was not interrupted, CPU and GPU temperatures are normal, memory is ok and the pagefile size was increased as suggested.

The only similar thing to the other crashes you mentioned is that there is windows 10 on my computer as well.

Any other idea?


Two crashes from today
https://www.gpugrid.net/result.php?resultid=38242445
https://www.gpugrid.net/result.php?resultid=38238912

All crashes from acemd3 in the last days have the same problem signature:
Problem Event Name: APPCRASH
Anwendungsname: acemd.exe
Anwendungsversion: 0.0.0.0
Anwendungszeitstempel: 66e42355
Fehlermodulname: acemd.exe
Fehlermodulversion: 0.0.0.0
Fehlermodulzeitstempel: 66e42355
Ausnahmecode: c0000005
Ausnahmeoffset: 0000000000075b6f

All crashes are on Windows 10. The tasks on wsl2 are running.
3) Message boards : Number crunching : ACEMD3 High error rates (Message 62173)
Posted 25 Jan 2025 by TofPete
Post:
Thanks for the replies.

Based on this report I don't think that the problem is with my computer, because there are many hosts which have similar problem.

However, I investigate my computer to check if everything is fine with it:

    * GPU temperature is fine, the maximum is 76 Celsius
    * CPU temperature is also fine, the maximum is 55 Celsius
    * pagefile size has just increased, we will see if it helps...
    * memory seems fine, I don't experience any problem with other apps which could caused by a memory issue, but I will run a memtest
    * usually, I don't interrupt the calculation (BOINC is set to always run) and my computer is on in all day, but I will keep an eye on this too



Anyway, this could be a bug also because there are other affected hosts as well and I cannot image that all of these computers have memory problems.
Is it possible to ask the ACEMD developers to check the code in parallel?

4) Message boards : Number crunching : ACEMD3 High error rates (Message 62166)
Posted 23 Jan 2025 by TofPete
Post:
I understand this but my problem is that I don't know what settings need to be changed to solve these fails.

Sometimes I only lose several minutes, but there are tasks which needed about 9800 seconds to fail.

I think that 32 GB RAM, 3 GHz CPU clock and an Nvidia GTX 1050 Ti with 4 GB VRAM is enough for this kind of tasks. I use my cpu and video card with their's normal settings, I don't use overclocking, etc. And the strange thing is that the problem occurs only with ACEMD3 tasks.

And the logs says that there were memory leaks.
Why?
What settings should I change to prevent the leaking?

My host has been crunching these units non-stop without issue.

I initially thought that perhaps this was a Windows-specific problem, since that was the common factor with the people in this thread that complained and had their hosts visible. However, the work units in question were re-assigned to other Windows hosts, and those other processed their respective tasks to completion without issue.

The other possibility that came to mind was the task failing due to running out of VRAM. It's a possibility (especially if the host is being used for other graphical things or is simultaneously running other GPU work units), but in past instances I've seen of this, the task failed with a specific out-of-memory message.

Given that these tasks aren't universally failing, there's also the possibility that those who are reporting issues simply have failing or unreliable hardware. A quick (if perhaps disruptive) way of testing this is to power- or clock-limit the GPU and see if the errors stop.

5) Message boards : Number crunching : ACEMD3 High error rates (Message 62165)
Posted 23 Jan 2025 by TofPete
Post:
I checked the settings mentioned and it's already checked.

How can I "unhide" my host to see more details about this problem?

Log in to your home page on this website (https://www.gpugrid.net/myaccount.php).

Under 'Preferences', choose GPUGRID preferences (https://www.gpugrid.net/prefs.php?subset=project).

[don't worry about the error messages - it still works]

Edit the top group - 'Primary (default) preferences'.

Check 'Should GPUGRID show your computers on its web site?' and update. That's all.

6) Message boards : Number crunching : ACEMD3 High error rates (Message 62160)
Posted 22 Jan 2025 by TofPete
Post:
I have the same problem, the error rate is about 50% (16 errors / 33 total tasks) which is annoying!
I use this computer for other computing projects as well but there are errors at the ACEMD 3 of GPUGrid tasks only.

I can see unknown error and memory leak in the logs:
https://www.gpugrid.net/result.php?resultid=37934054

All of the operation system and graphic card driver updates are installed on my machine, so I don't know what else I can do to solve these memory leak errors.

How can I "unhide" my host to see more details about this problem?


unhide your hosts so that the whole error can be seen. any "195" code is not helpful. that's just the generic error from the BOINC app or wrapper. the actual reason for failure could be more embedded in the stderr output and could very well be related to your hardware and software configuration (such as incorrect drivers)

7) Message boards : Number crunching : No new WU's due to limitation on tasks (Message 62044)
Posted 16 Dec 2024 by TofPete
Post:
Same deadlock for me!
Waiting for someone to clean out the server's disk...

With the server disk space problem is there anyway to get around the limitation on tasks that cane be performed? I get the "This computer has reached a limit on tasks in progress" error since my other WU's are waiting to upload. I did a search but didn't find anything.

Is the alternative to currently not get anymore WU's and wait until the problem is fixed?

Thanks!

8) Message boards : Number crunching : ACEMD3 High error rates (Message 62012)
Posted 10 Dec 2024 by TofPete
Post:
I have the same problem.

Most of the acemd3 tasks failed due to memory leak or unknown error:



It's a bit annoying that 32 tasks were failing from my recent 54 tasks.
It's 59 % of failing rate.

Anyone can help me to solve this?

9) Message boards : Number crunching : ACEMD 3 error: Unsupported PRMTOP version (Message 61824)
Posted 23 Sep 2024 by TofPete
Post:
Hi,

I received errors for some of the ACEMD 3 tasks recently with the following error message:
Stderr output
<core_client_version>8.0.2</core_client_version>
<![CDATA[
<message>
Incorrect function.
 (0x1) - exit code 1 (0x1)</message>
<stderr_txt>
16:01:42 (35604): wrapper (7.9.26016): starting
16:01:42 (35604): wrapper: running bin/acemd.exe (--boinc --device 0)

</stderr_txt>
]]>


Any idea?
10) Message boards : Number crunching : ATMML (Message 61816)
Posted 18 Sep 2024 by TofPete
Post:
Hi,

Why do I receive such an error messages in ATMML tasks recently?

Stderr output
<core_client_version>8.0.2</core_client_version>
<![CDATA[
<message>
(unknown error) (0) - exit code 195 (0xc3)</message>
<stderr_txt>
09:59:48 (19024): wrapper (7.9.26016): starting
09:59:48 (19024): wrapper: running Library/usr/bin/tar.exe (xjvf input.tar.bz2)
aceforce_dft_v0.4.ckpt
11) Message boards : News : Discord channel for GPUGRID (Message 61802)
Posted 13 Sep 2024 by TofPete
Post:
Unable to accept invite. :(

Hi,
I have created a discord channel for GPUGRID. It kind of duplicates this in some aspects but maybe people are more used to Discord.

It is an additional channel that may help to build up the community.

JOIN:
https://discord.gg/abpWXawZ7v

GDF

12) Message boards : News : Experimental Python tasks (beta) - task description (Message 61749)
Posted 28 Aug 2024 by TofPete
Post:
Thank you

Those describe themselves as ATMML tasks - the clue is in the name.

There's been a major problem with ATMML tasks in the last 24 hours - all workunits created since around 13:00 UTC yesterday have a systemic failure which cause them to fail very early.

That's the project's problem, not your problem.

13) Message boards : News : Experimental Python tasks (beta) - task description (Message 61746)
Posted 28 Aug 2024 by TofPete
Post:
I think it's a python task because the error message is regarding a python problem:
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
Python runtime state: core initialized
ModuleNotFoundError: No module named 'encodings'


I got these tasks today and in the recent days:
Task received at in UTC | Computing status text | Runtime | Application name
28 Aug 2024 8:31:51 UTC | Error while computing | 672.11 | ATMML: Free energy with neural networks v1.01 (cuda1121)
28 Aug 2024 8:03:54 UTC | Error while computing | 703.63 | ATMML: Free energy with neural networks v1.01 (cuda1121)
28 Aug 2024 7:34:52 UTC	| Error while computing | 708.96 | ATMML: Free energy with neural networks v1.01 (cuda1121)
28 Aug 2024 7:20:30 UTC	| Error while computing | 714.93 | ATMML: Free energy with neural networks v1.01 (cuda1121)
28 Aug 2024 8:17:39 UTC	| Error while computing | 709.18 | ATMML: Free energy with neural networks v1.01 (cuda1121)
28 Aug 2024 7:49:20 UTC	| Error while computing | 724.49 | ATMML: Free energy with neural networks v1.01 (cuda1121)
27 Aug 2024 9:35:49 UTC	| Error while computing | 776.90 | ATMML: Free energy with neural networks v1.01 (cuda1121)
27 Aug 2024 1:24:00 UTC	| Error while computing | 60.60 | ATMML: Free energy with neural networks v1.01 (cuda1121)
26 Aug 2024 9:41:56 UTC	| Error while computing | 20.18 | ATMML: Free energy with neural networks v1.01 (cuda1121)
14) Message boards : News : Experimental Python tasks (beta) - task description (Message 61744)
Posted 28 Aug 2024 by TofPete
Post:
Hi,

I'm receiving the following error message after about 700-800 sec running time:
09:33:09 (32292): Library/usr/bin/tar.exe exited; CPU time 0.000000
09:33:09 (32292): wrapper: running C:/Windows/system32/cmd.exe (/c call Scripts\activate.bat && Scripts\conda-unpack.exe && run.bat)
Could not find platform independent libraries <prefix>
Python path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = (not set)
  program name = '\\?\D:\ProgramData\BOINC\slots\4\python.exe'
  isolated = 0
  environment = 1
  user site = 1
  safe_path = 0
  import site = 1
  is in build tree = 0
  stdlib dir = 'D:\ProgramData\BOINC\slots\4\Lib'
  sys._base_executable = '\\\\?\\D:\\ProgramData\\BOINC\\slots\\4\\python.exe'
  sys.base_prefix = 'D:\\ProgramData\\BOINC\\slots\\4'
  sys.base_exec_prefix = 'D:\\ProgramData\\BOINC\\slots\\4'
  sys.platlibdir = 'DLLs'
  sys.executable = '\\\\?\\D:\\ProgramData\\BOINC\\slots\\4\\python.exe'
  sys.prefix = 'D:\\ProgramData\\BOINC\\slots\\4'
  sys.exec_prefix = 'D:\\ProgramData\\BOINC\\slots\\4'
  sys.path = [
    'D:\\ProgramData\\BOINC\\slots\\4\\python311.zip',
    'D:\\ProgramData\\BOINC\\slots\\4\\DLLs',
    'D:\\ProgramData\\BOINC\\slots\\4\\Lib',
    '\\\\?\\D:\\ProgramData\\BOINC\\slots\\4',
  ]
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
Python runtime state: core initialized
ModuleNotFoundError: No module named 'encodings'

Current thread 0x000058b0 (most recent call first):
  <no Python frame>
09:33:10 (32292): C:/Windows/system32/cmd.exe exited; CPU time 0.000000
09:33:10 (32292): app exit status: 0x1
09:33:10 (32292): called boinc_finish(195)


Any idea why this error happens recently?

Thanks,

Peter
15) Message boards : Number crunching : Computation error (Message 61652)
Posted 7 Aug 2024 by TofPete
Post:
Hi,

Recently, I receive computation error for all of my tasks:
https://www.gpugrid.net/result.php?resultid=35634634

I can see only the following error in the Boinc logs:
07/08/2024 17:07:19 | GPUGRID | [error] Can't rename output file slots/5/progress.log to projects/www.gpugrid.net/e16s9_e12s1p0f183-ADRIA_Explor_srcpp1_e2t_25ns_pp1coor_v2_10us_b0-0-1-RND8727_0_0: Error 32


Earlier there was no problem with GPUGrid tasks but now I receive error every time.
What could cause this?

Regards,

TofPeter




©2026 Universitat Pompeu Fabra