Message boards :
Number crunching :
ACEMD3 High error rates
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 17 Nov 15 Posts: 14 Credit: 136,767,025 RAC: 0 Level ![]() Scientific publications
|
Looking at my host, over 15 task, 6 failed (40%): * https://www.gpugrid.net/rsult.php?resultid=36785317 * https://www.gpugrid.net/result.php?resultid=36785243 * https://www.gpugrid.net/result.php?resultid=36774440 * https://www.gpugrid.net/result.php?resultid=36774387 * https://www.gpugrid.net/result.php?resultid=36774151 * https://www.gpugrid.net/result.php?resultid=36759049
|
|
Send message Joined: 21 Feb 09 Posts: 1 Credit: 42,661,435 RAC: 19 Level ![]() Scientific publications
|
I have the same problem |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I have the same problem You do not have the same problem referenced in this thread since you've haven't run any acemd3 tasks. All your errors are the ATMML tasks. |
|
Send message Joined: 17 Nov 15 Posts: 14 Credit: 136,767,025 RAC: 0 Level ![]() Scientific publications
|
I upgraded my Game Ready drivers to v566.14 just in case. Now I am at 15 errors for 37 ACEMD3 tasks, so still 40%. The jobs fails rather early so "this is fine" but there still is a waste of resources. If I can provide anything to help debug this, please let me know |
|
Send message Joined: 17 Mar 24 Posts: 15 Credit: 63,874,103 RAC: 0 Level ![]() Scientific publications
|
I have the same problem. Most of the acemd3 tasks failed due to memory leak or unknown error: * memory leak: https://www.gpugrid.net/result.php?resultid=37054183 * unknown error: https://www.gpugrid.net/result.php?resultid=37053274
|
den777Send message Joined: 29 Apr 13 Posts: 1 Credit: 71,060,506 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
Same problem here. The worst thing is that Windows shows popup about memory access violation and until I manually click OK, the task won't finish and will just keep being idle. |
Michael H.W. WeberSend message Joined: 9 Feb 16 Posts: 78 Credit: 656,229,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Same problem - two error messages, the first being the major one: (unknown error) (0) - exit code 195 (0xc3) (unknown error) (87) - exit code 195 (0xc3) Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
Same problem - two error messages, the first being the major one: unhide your hosts so that the whole error can be seen. any "195" code is not helpful. that's just the generic error from the BOINC app or wrapper. the actual reason for failure could be more embedded in the stderr output and could very well be related to your hardware and software configuration (such as incorrect drivers)
|
|
Send message Joined: 17 Mar 24 Posts: 15 Credit: 63,874,103 RAC: 0 Level ![]() Scientific publications
|
I have the same problem, the error rate is about 50% (16 errors / 33 total tasks) which is annoying! I use this computer for other computing projects as well but there are errors at the ACEMD 3 of GPUGrid tasks only. I can see unknown error and memory leak in the logs: https://www.gpugrid.net/result.php?resultid=37934054 All of the operation system and graphic card driver updates are installed on my machine, so I don't know what else I can do to solve these memory leak errors. How can I "unhide" my host to see more details about this problem?
|
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
How can I "unhide" my host to see more details about this problem? Log in to your home page on this website (https://www.gpugrid.net/myaccount.php). Under 'Preferences', choose GPUGRID preferences (https://www.gpugrid.net/prefs.php?subset=project). [don't worry about the error messages - it still works] Edit the top group - 'Primary (default) preferences'. Check 'Should GPUGRID show your computers on its web site?' and update. That's all. |
|
Send message Joined: 22 Sep 24 Posts: 9 Credit: 195,120,851 RAC: 0 Level ![]() Scientific publications
|
My host has been crunching these units non-stop without issue. I initially thought that perhaps this was a Windows-specific problem, since that was the common factor with the people in this thread that complained and had their hosts visible. However, the work units in question were re-assigned to other Windows hosts, and those other processed their respective tasks to completion without issue. The other possibility that came to mind was the task failing due to running out of VRAM. It's a possibility (especially if the host is being used for other graphical things or is simultaneously running other GPU work units), but in past instances I've seen of this, the task failed with a specific out-of-memory message. Given that these tasks aren't universally failing, there's also the possibility that those who are reporting issues simply have failing or unreliable hardware. A quick (if perhaps disruptive) way of testing this is to power- or clock-limit the GPU and see if the errors stop. |
|
Send message Joined: 7 Oct 13 Posts: 5 Credit: 1,077,934,108 RAC: 207 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
my fail rate is ranging from 1-2 tasks to 7-9 tasks a day for ACEMD 3. usually tasks failed within a few minutes from the beginning so not much resources were used. better to have none tho. i also noticed that gpu time and run time used were similar like a hundred seconds different or so. is this normal? |
|
Send message Joined: 17 Mar 24 Posts: 15 Credit: 63,874,103 RAC: 0 Level ![]() Scientific publications
|
I checked the settings mentioned and it's already checked. How can I "unhide" my host to see more details about this problem? |
|
Send message Joined: 17 Mar 24 Posts: 15 Credit: 63,874,103 RAC: 0 Level ![]() Scientific publications
|
I understand this but my problem is that I don't know what settings need to be changed to solve these fails. Sometimes I only lose several minutes, but there are tasks which needed about 9800 seconds to fail. I think that 32 GB RAM, 3 GHz CPU clock and an Nvidia GTX 1050 Ti with 4 GB VRAM is enough for this kind of tasks. I use my cpu and video card with their's normal settings, I don't use overclocking, etc. And the strange thing is that the problem occurs only with ACEMD3 tasks. And the logs says that there were memory leaks. Why? What settings should I change to prevent the leaking? My host has been crunching these units non-stop without issue. |
|
Send message Joined: 15 Jul 20 Posts: 95 Credit: 2,550,803,412 RAC: 248 Level ![]() Scientific publications
|
bonjour vous devriez essayer en augmentant la memoire virtuelle a 50 gb. increasing the pagefile size in Windows to around 50-60GB. https://forums.cnetfrance.fr/tutoriels-windows-10/575813-windows-10-augmenter-la-memoire-de-pagination-ou-memoire-virtuelle https://answers.microsoft.com/fr-fr/windows/forum/all/restauration-pagefilesys/628b8a32-f8cd-4481-95a1-2ebd1ef08ce1 Cela a marcher pour moi avant mon passage a linux. It worked for me before my passage to linux |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
And I can see your computer (host 619264) and tasks just fine - not sure why others were having problems. Your computer is completing ATM tasks OK, but failing ACEMD3 tasks. The logs show the underlying errors: 16:28:06 (6248): bin/acemd.exe exited; CPU time 0.000000 09:46:12 (20156): bin/acemd.exe exited; CPU time 0.015625 03:13:45 (10704): bin/acemd.exe exited; CPU time 0.000000 17:47:38 (11904): bin/acemd.exe exited; CPU time 0.000000 14:00:30 (25420): bin/acemd.exe exited; CPU time 0.000000 The exit status (normally written 0xC0000005) is a Windows code defined as "STATUS_ACCESS_VIOLATION", which in full would be reported as 'The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.' - BOINC hasn't passed on those extra parameters. Many online 'answers' to online searches will suggest that this could be caused by faulty computer RAM, but that's not the only answer - it can also be caused by bad programming. In your case, every example still visible occurs as the application starts or restarts. I'd recommend that you try to avoid pausing ACEMD3 tasks mid-run - try to let them run continuously to completion. See if that reduces the error rate to an acceptable level. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I'd just increase the Windows pagefile size first to 50-60GB and reboot to see if that fixes the issue. If that fails I would start investigating your memory for errors. |
|
Send message Joined: 15 Jul 20 Posts: 95 Credit: 2,550,803,412 RAC: 248 Level ![]() Scientific publications
|
pour tester la mémoire il faut utiliser memtest et non le logiciel intégré a windows.Memtest est plus fiable. to test memory you must use memtest and not the built-in software with windows. Memtest is more reliable. https://www.memtest86.com/ |
|
Send message Joined: 15 Jul 20 Posts: 95 Credit: 2,550,803,412 RAC: 248 Level ![]() Scientific publications
|
je vous conseille aussi de désactiver l'intégrité de la mémoire . I also advise you to disable memory integrity. Apres cela,si le probleme continue,cela dépasse mes connaissances. After that, if the problem continues,it’s beyond my knowledge. https://www.malekal.com/desactiver-isolation-noyau-windows-11-10/ |
|
Send message Joined: 15 Jul 20 Posts: 95 Credit: 2,550,803,412 RAC: 248 Level ![]() Scientific publications
|
apres personne n'est a l'abri d'unites de travail qui semble avoir un bug enfin je suppose que c'est cela et non mon pc. after no one is safe from work units that seems to have a bug finally I guess it’s this and not my pc. https://www.gpugrid.net/workunit.php?wuid=31255135 n'oubliez pas de vérifier la température en fonctionnement de votre carte graphique au cas ou le ventilateur serait fatigué. Remember to check the operating temperature of your graphics card in case the fan is tired. Thermal and Power Specs: 97 97 97 Maximum GPU Temperature (in C) https://www.nvidia.com/en-us/geforce/10-series/ |
©2025 Universitat Pompeu Fabra