ATMML work units erroring out with "Illegal instruction"

Author	Message
William Albert Send message Joined: 22 Sep 24 Posts: 9 Credit: 198,870,851 RAC: 818 Level Scientific publications	Message 61830 - Posted: 26 Sep 2024, 12:03:10 UTC I have an older computer crunching full-time for WCG and GPUGRID. The computer in question: https://www.gpugrid.net/show_host_detail.php?hostid=626036 This computer seems to be unable to complete any ATMML work units, and receives the following error when it tries: + python bin/rbfe_explicit_sync.py QB_A24_A36_asyncre.cntl run.sh: line 24: 12561 Illegal instruction (core dumped) python bin/rbfe_explicit_sync.py $CONFIG_FILE 2024-09-25 15:43:12 (12505): bin/bash exited; CPU time 19.896937 2024-09-25 15:43:12 (12505): app exit status: 0x84 2024-09-25 15:43:12 (12505): called boinc_finish(195) The full output of an example failed WU is here: https://www.gpugrid.net/result.php?resultid=36014912 Looking around online, this appears to be a common error with Python ML frameworks that include pre-compiled binaries built for newer CPU targets with support for SSE4, AVX, etc., where the user attempts to run it on a processor that doesn't support those instruction set extensions. Looking at the GPUGRID apps list (https://www.gpugrid.net/apps.php), the only CPU requirement listed is a 64-bit version of Windows or Linux running on an x86-64 processor. And if that listing is machine-generated in a way that doesn't list the actual CPU requirements, then GPUGRID's "Join Us" page (https://www.gpugrid.net/join.php) also similarly lists the CPU requirements as "64-bit" with at least one core, which literally any x86-64 CPU should be able to meet. If support for additional instruction set extensions beyond the base x86-64 spec is a requirement of ATMML, then fair enough -- this computer's processor is admittedly quite old (although the GPU is much newer and AFAIK supports all the GPUGRID apps currently running). However, if the CPU is only being used to run a CUDA app that does the actual heavy lifting, then ATMML's CPU requirements could be unnecessarily limiting the users who are able to run these jobs. For the time being, I've opted out of ATMML work units in my preferences. ID: 61830 · Rating: 0 · rate: / Reply Quote

William Albert Send message Joined: 22 Sep 24 Posts: 9 Credit: 198,870,851 RAC: 818 Level Scientific publications	Message 61831 - Posted: 26 Sep 2024, 12:17:07 UTC I found this related entry in the Linux kernel log, if it's helpful: traps: python[12561] trap invalid opcode ip:73ddfba85876 sp:7ffc7f5a2a90 error:0 in libOpenMMPME.so[73ddfba85000+3d000] ID: 61831 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 2,714 Level Scientific publications	Message 61832 - Posted: 26 Sep 2024, 12:17:19 UTC - in response to Message 61830. my hunch is that you're right about the CPU being the issue. it lacks SSE4.1, SSE4.2, and AVX/2, and up. I wouldn't be surprised at all if SSE4 or AVX were used since those features are so ubiquitous in basically all x86_64 in the last decade. it's very common that an application is not 100% GPU based and still needs the CPU to do some work, I believe the ATM/ATMML apps do this. Some Einstein apps do this also. opting out is the right decision for you. if you want to run these tasks, you'll have to upgrade the system to something more modern. ID: 61832 · Rating: 0 · rate: / Reply Quote

pututu Send message Joined: 8 Oct 16 Posts: 27 Credit: 4,153,801,869 RAC: 0 Level Scientific publications	Message 61833 - Posted: 26 Sep 2024, 17:19:31 UTC At least your 6GB 1060 card can crunch quantum chemistry task successfully with the cpu that you have. ID: 61833 · Rating: 0 · rate: / Reply Quote