ACEMD3 High error rates

Message boards : Number crunching : ACEMD3 High error rates
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
TofPete

Send message
Joined: 17 Mar 24
Posts: 15
Credit: 63,874,103
RAC: 0
Level
Thr
Scientific publications
wat
Message 62173 - Posted: 25 Jan 2025, 11:29:35 UTC - in response to Message 62172.  

Thanks for the replies.

Based on this report I don't think that the problem is with my computer, because there are many hosts which have similar problem.

However, I investigate my computer to check if everything is fine with it:

    * GPU temperature is fine, the maximum is 76 Celsius
    * CPU temperature is also fine, the maximum is 55 Celsius
    * pagefile size has just increased, we will see if it helps...
    * memory seems fine, I don't experience any problem with other apps which could caused by a memory issue, but I will run a memtest
    * usually, I don't interrupt the calculation (BOINC is set to always run) and my computer is on in all day, but I will keep an eye on this too



Anyway, this could be a bug also because there are other affected hosts as well and I cannot image that all of these computers have memory problems.
Is it possible to ask the ACEMD developers to check the code in parallel?

ID: 62173 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bibi

Send message
Joined: 4 May 17
Posts: 15
Credit: 17,444,875,743
RAC: 293
Level
Trp
Scientific publications
watwatwatwatwat
Message 62174 - Posted: 27 Jan 2025, 12:04:21 UTC
Last modified: 27 Jan 2025, 12:11:09 UTC

Two crashes from today
https://www.gpugrid.net/result.php?resultid=38242445
https://www.gpugrid.net/result.php?resultid=38238912

All crashes from acemd3 in the last days have the same problem signature:
Problem Event Name: APPCRASH
Anwendungsname: acemd.exe
Anwendungsversion: 0.0.0.0
Anwendungszeitstempel: 66e42355
Fehlermodulname: acemd.exe
Fehlermodulversion: 0.0.0.0
Fehlermodulzeitstempel: 66e42355
Ausnahmecode: c0000005
Ausnahmeoffset: 0000000000075b6f

All crashes are on Windows 10. The tasks on wsl2 are running.
ID: 62174 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TofPete

Send message
Joined: 17 Mar 24
Posts: 15
Credit: 63,874,103
RAC: 0
Level
Thr
Scientific publications
wat
Message 62175 - Posted: 28 Jan 2025, 11:03:17 UTC - in response to Message 62174.  

These crashes are not from my machine.

My machine sent another one this morning:
https://www.gpugrid.net/result.php?resultid=38275319

As I can see it crashed 5 seconds after the task started and there is no reason in the logs why it crashed (no memory leak entry as earlier).
The task was not interrupted, CPU and GPU temperatures are normal, memory is ok and the pagefile size was increased as suggested.

The only similar thing to the other crashes you mentioned is that there is windows 10 on my computer as well.

Any other idea?


Two crashes from today
https://www.gpugrid.net/result.php?resultid=38242445
https://www.gpugrid.net/result.php?resultid=38238912

All crashes from acemd3 in the last days have the same problem signature:
Problem Event Name: APPCRASH
Anwendungsname: acemd.exe
Anwendungsversion: 0.0.0.0
Anwendungszeitstempel: 66e42355
Fehlermodulname: acemd.exe
Fehlermodulversion: 0.0.0.0
Fehlermodulzeitstempel: 66e42355
Ausnahmecode: c0000005
Ausnahmeoffset: 0000000000075b6f

All crashes are on Windows 10. The tasks on wsl2 are running.
ID: 62175 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KeithBriggs

Send message
Joined: 29 Aug 24
Posts: 71
Credit: 3,321,790,989
RAC: 1,408
Level
Arg
Scientific publications
wat
Message 62176 - Posted: 28 Jan 2025, 16:51:27 UTC - in response to Message 62175.  

My observations. The last 2 failed runs are less the 3 minutes running. You might just not worry about it. Neither of those 2 had any GPU processing so they failed either during the initial CPU stage or while it was porting over to the GPU. Your GPU would be downright cold at that point in the process. I set my app config to run a full 1.0 cpu for each task and don't run CPU projects. Maybe those things don't help at all but just seemed to me that I had more errors if I didn't. I have been skimming the thread so something could be a repeat. Did you swap memory stick positions?
ID: 62176 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TofPete

Send message
Joined: 17 Mar 24
Posts: 15
Credit: 63,874,103
RAC: 0
Level
Thr
Scientific publications
wat
Message 62189 - Posted: 2 Feb 2025, 16:14:00 UTC - in response to Message 62176.  

I don't think that memory stick swapping could change anything.
There should be problems with other applications as well if there would be a hardware memory problem.
ID: 62189 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : ACEMD3 High error rates

©2025 Universitat Pompeu Fabra