Anaconda Python 3 Environment v4.01 failures

Message boards : Number crunching : Anaconda Python 3 Environment v4.01 failures
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 56121 - Posted: 21 Dec 2020, 15:53:49 UTC - in response to Message 56045.  

I increased the FLOPS estimate to 5e15 (was 3e12) and disk usage limit to 10 GB.

Unfortunately Python tasks tend to cache packages, which makes disk occupation go up and possibly reach some limit. If the disk size goes out of hand, please reset the project (while no tasks are running).
ID: 56121 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 56122 - Posted: 21 Dec 2020, 15:56:01 UTC - in response to Message 56121.  

Also: there shouldn't be Windows Python tasks right now.

Also: spaces in directory names may well create problems. But why spaces? that's inviting problems.
ID: 56122 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 428
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56143 - Posted: 22 Dec 2020, 19:01:25 UTC - in response to Message 56121.  
Last modified: 22 Dec 2020, 19:21:20 UTC

I increased the FLOPS estimate to 5e15 (was 3e12) and disk usage limit to 10 GB.

Thanks, that's a great help. But could you do something similar to the speed estimate, too, please?

I'm working on a python task now. The machine also works on ACEMD Tasks, too, but the speeds are totally mis-aligned:

<app_version>
    <app_name>acemd3</app_name>
    <version_num>211</version_num>
    <platform>x86_64-pc-linux-gnu</platform>
    <flops>610819274798.369263</flops>
    ...
<app_version>
    <app_name>Python</app_name>
    <version_num>401</version_num>
    <platform>x86_64-pc-linux-gnu</platform>
    <flops>165617818.500620</flops>
    ...

That's 610 GigaFlops when running ACEMD, and a measly 165 MegaFlops when running Python (on the same GTX 1660 SUPER).

As a result, at 87% done, it's still estimating nearly 600 days to completion!

Edit - those figures are what your server sends in response to the 'average processing rate' in https://www.gpugrid.net/host_app_versions.php?hostid=537311
ID: 56143 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 428
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56144 - Posted: 22 Dec 2020, 19:05:06 UTC - in response to Message 56122.  

Also: spaces in directory names may well create problems. But why spaces? that's inviting problems.

Sadly, you've missed that boat. Long file names and directory names, including spaces in either, have been de rigeur in Windows since 1995!

The accepted solution is to enclose affected names "in quotation marks".
ID: 56144 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile trigggl

Send message
Joined: 6 Mar 09
Posts: 25
Credit: 102,324,681
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 56150 - Posted: 25 Dec 2020, 16:01:17 UTC - in response to Message 56144.  
Last modified: 25 Dec 2020, 16:03:34 UTC

Also: spaces in directory names may well create problems. But why spaces? that's inviting problems.

Sadly, you've missed that boat. Long file names and directory names, including spaces in either, have been de rigeur in Windows since 1995!

The accepted solution is to enclose affected names "in quotation marks".


This is a thread for an app that runs on Linux. I think most of us prefer to use the "_" underscore.
ID: 56150 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Azmodes

Send message
Joined: 7 Jan 17
Posts: 34
Credit: 1,371,429,518
RAC: 0
Level
Met
Scientific publications
watwatwat
Message 56164 - Posted: 28 Dec 2020, 12:09:12 UTC

Been getting "1 (0x1) Unknown error number" after a couple of minutes on a few of my machines (one of which also had valid ones a few days ago). Is this something on my end?
<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
12:51:44 (69060): wrapper (7.7.26016): starting
12:51:44 (69060): wrapper (7.7.26016): starting
12:51:44 (69060): wrapper: running /usr/bin/flock (/var/lib/boinc-client/projects/www.gpugrid.net/miniconda.lock -c "/bin/bash ./miniconda-installer.sh -b -u -p /var/lib/boinc-client/projects/www.gpugrid.net/miniconda &&
                      /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/conda install -m -y -p gpugridpy --file requirements.txt ")

  0%|          | 0/96 [00:00<?, ?it/s]
Extracting : libgcc-ng-9.1.0-hdf63c60_0.conda:   0%|          | 0/96 [00:00<?, ?it/s]
Extracting : libgcc-ng-9.1.0-hdf63c60_0.conda:   1%|1         | 1/96 [00:00<00:23,  4.13it/s]
Extracting : tk-8.6.8-hbc83047_0.conda:   1%|1         | 1/96 [00:00<00:23,  4.13it/s]       
Extracting : setuptools-46.4.0-py37_0.conda:   2%|2         | 2/96 [00:00<00:22,  4.13it/s]
Extracting : requests-2.23.0-py37_0.conda:   3%|3         | 3/96 [00:00<00:22,  4.13it/s]  
Extracting : cudnn-7.6.5-cuda10.2_0.conda:   4%|4         | 4/96 [03:04<00:22,  4.13it/s]
Extracting : cudnn-7.6.5-cuda10.2_0.conda:   5%|5         | 5/96 [03:04<21:12, 13.99s/it]
Extracting : numpy-1.19.2-py37h54aff64_0.conda:   5%|5         | 5/96 [03:04<21:12, 13.99s/it]
Extracting : pysocks-1.7.1-py37_0.conda:   6%|6         | 6/96 [03:04<20:59, 13.99s/it]       
Extracting : cffi-1.14.0-py37he30daa8_1.conda:   7%|7         | 7/96 [03:04<20:45, 13.99s/it]
Extracting : conda-package-handling-1.6.1-py37h7b6447c_0.conda:   8%|8         | 8/96 [03:04<20:31, 13.99s/it]
Extracting : pycosat-0.6.3-py37h7b6447c_0.conda:   9%|9         | 9/96 [03:04<20:17, 13.99s/it]               
Extracting : wheel-0.34.2-py37_0.conda:  10%|#         | 10/96 [03:04<20:03, 13.99s/it]        
Extracting : libedit-3.1.20181209-hc058e9b_0.conda:  11%|#1        | 11/96 [03:04<19:49, 13.99s/it]
Extracting : sqlite-3.31.1-h62c20be_1.conda:  12%|#2        | 12/96 [03:04<19:35, 13.99s/it]       
Extracting : libstdcxx-ng-9.1.0-hdf63c60_0.conda:  14%|#3        | 13/96 [03:04<19:21, 13.99s/it]
Extracting : chardet-3.0.4-py37_1003.conda:  15%|#4        | 14/96 [03:04<19:07, 13.99s/it]      
Extracting : readline-8.0-h7b6447c_0.conda:  16%|#5        | 15/96 [03:04<18:53, 13.99s/it]
Extracting : python-3.7.7-hcff3b4d_5.conda:  17%|#6        | 16/96 [03:04<18:39, 13.99s/it]
Extracting : libffi-3.3-he6710b0_1.conda:  18%|#7        | 17/96 [03:04<18:25, 13.99s/it]  
Extracting : pip-20.0.2-py37_3.conda:  19%|#8        | 18/96 [03:04<18:11, 13.99s/it]    
Extracting : pyopenssl-19.1.0-py37_0.conda:  20%|#9        | 19/96 [03:04<17:57, 13.99s/it]
Extracting : _libgcc_mutex-0.1-main.conda:  21%|##        | 20/96 [03:04<17:43, 13.99s/it] 
Extracting : urllib3-1.25.8-py37_0.conda:  22%|##1       | 21/96 [03:04<17:29, 13.99s/it] 
Extracting : blas-1.0-mkl.conda:  23%|##2       | 22/96 [03:04<17:15, 13.99s/it]         
Extracting : certifi-2020.4.5.1-py37_0.conda:  24%|##3       | 23/96 [03:04<17:01, 13.99s/it]
Extracting : scipy-1.5.2-py37h0b6359f_0.conda:  25%|##5       | 24/96 [03:04<16:47, 13.99s/it]
Extracting : six-1.14.0-py37_0.conda:  26%|##6       | 25/96 [03:04<16:33, 13.99s/it]         
Extracting : idna-2.9-py_1.conda:  27%|##7       | 26/96 [03:04<16:19, 13.99s/it]    
Extracting : ncurses-6.2-he6710b0_1.conda:  28%|##8       | 27/96 [03:04<16:05, 13.99s/it]
Extracting : zlib-1.2.11-h7b6447c_3.conda:  29%|##9       | 28/96 [03:04<15:51, 13.99s/it]
Extracting : ld_impl_linux-64-2.33.1-h53a641e_7.conda:  30%|###       | 29/96 [03:04<15:37, 13.99s/it]
Extracting : xz-5.2.5-h7b6447c_0.conda:  31%|###1      | 30/96 [03:04<15:23, 13.99s/it]               
Extracting : ca-certificates-2020.1.1-0.conda:  32%|###2      | 31/96 [03:04<15:09, 13.99s/it]
Extracting : tqdm-4.46.0-py_0.conda:  33%|###3      | 32/96 [03:04<14:55, 13.99s/it]          
Extracting : pycparser-2.20-py_0.conda:  34%|###4      | 33/96 [03:04<14:41, 13.99s/it]
Extracting : openssl-1.1.1g-h7b6447c_0.conda:  35%|###5      | 34/96 [03:04<14:27, 13.99s/it]
Extracting : cudatoolkit-10.2.89-hfd86e86_1.conda:  36%|###6      | 35/96 [03:40<14:13, 13.99s/it]
Extracting : cudatoolkit-10.2.89-hfd86e86_1.conda:  38%|###7      | 36/96 [03:40<10:08, 10.14s/it]
Extracting : yaml-0.1.7-had09818_2.conda:  38%|###7      | 36/96 [03:40<10:08, 10.14s/it]         
Extracting : ruamel_yaml-0.15.87-py37h7b6447c_0.conda:  39%|###8      | 37/96 [03:40<09:58, 10.14s/it]
Extracting : numpy-base-1.19.2-py37hfa32c7d_0.conda:  40%|###9      | 38/96 [03:40<09:47, 10.14s/it]  
Extracting : cryptography-2.9.2-py37h1ba5d50_0.conda:  41%|####      | 39/96 [03:40<09:37, 10.14s/it]
Extracting : ncurses-6.2-h58526e2_3.tar.bz2:  42%|####1     | 40/96 [03:40<09:27, 10.14s/it]         
Extracting : libstdcxx-ng-9.3.0-h2ae2ef3_17.tar.bz2:  43%|####2     | 41/96 [03:40<09:17, 10.14s/it]
Extracting : cffi-1.14.3-py37h00ebd2e_1.tar.bz2:  44%|####3     | 42/96 [03:40<09:07, 10.14s/it]    
Extracting : networkx-2.5-py_0.tar.bz2:  45%|####4     | 43/96 [03:40<08:57, 10.14s/it]         
Extracting : libffi-3.2.1-he1b5a44_1007.tar.bz2:  46%|####5     | 44/96 [03:40<08:47, 10.14s/it]
Extracting : libllvm10-10.0.1-he513fc3_3.tar.bz2:  47%|####6     | 45/96 [03:40<08:36, 10.14s/it]
Extracting : mkl-2020.4-h726a3e6_304.tar.bz2:  48%|####7     | 46/96 [03:40<08:26, 10.14s/it]    
Extracting : brotlipy-0.7.0-py37hb5d75c8_1001.tar.bz2:  49%|####8     | 47/96 [03:40<08:16, 10.14s/it]
Extracting : six-1.15.0-pyh9f0ad1d_0.tar.bz2:  50%|#####     | 48/96 [03:40<08:06, 10.14s/it]         
Extracting : sqlite-3.33.0-h4cf870e_1.tar.bz2:  51%|#####1    | 49/96 [03:40<07:56, 10.14s/it]
Extracting : xz-5.2.5-h516909a_1.tar.bz2:  52%|#####2    | 50/96 [03:40<07:46, 10.14s/it]     
Extracting : mkl_fft-1.2.0-py37h161383b_1.tar.bz2:  53%|#####3    | 51/96 [03:40<07:36, 10.14s/it]
Extracting : urllib3-1.25.11-py_0.tar.bz2:  54%|#####4    | 52/96 [03:40<07:25, 10.14s/it]        
Extracting : conda-4.8.3-py37_0.tar.bz2:  55%|#####5    | 53/96 [03:40<07:15, 10.14s/it]  
Extracting : mkl_random-1.2.0-py37h9fdb41a_1.tar.bz2:  56%|#####6    | 54/96 [03:40<07:05, 10.14s/it]
Extracting : lark-parser-0.10.0-pyh9f0ad1d_0.tar.bz2:  57%|#####7    | 55/96 [03:40<06:55, 10.14s/it]
Extracting : zlib-1.2.11-h516909a_1010.tar.bz2:  58%|#####8    | 56/96 [03:40<06:45, 10.14s/it]      
Extracting : wheel-0.35.1-pyh9f0ad1d_0.tar.bz2:  59%|#####9    | 57/96 [03:40<06:35, 10.14s/it]
Extracting : tqdm-4.51.0-pyh9f0ad1d_0.tar.bz2:  60%|######    | 58/96 [03:40<06:25, 10.14s/it] 
Extracting : libgcc-ng-9.3.0-h5dbcf3e_17.tar.bz2:  61%|######1   | 59/96 [03:40<06:15, 10.14s/it]
Extracting : _libgcc_mutex-0.1-conda_forge.tar.bz2:  62%|######2   | 60/96 [03:40<06:04, 10.14s/it]
Extracting : numba-0.51.2-py37h9fdb41a_0.tar.bz2:  64%|######3   | 61/96 [03:40<05:54, 10.14s/it]  
Extracting : llvmlite-0.34.0-py37h5202443_2.tar.bz2:  65%|######4   | 62/96 [03:40<05:44, 10.14s/it]
Extracting : certifi-2020.6.20-py37he5f6b98_2.tar.bz2:  66%|######5   | 63/96 [03:40<05:34, 10.14s/it]
Extracting : torchani-2.2-pyh9f0ad1d_0.tar.bz2:  67%|######6   | 64/96 [03:40<05:24, 10.14s/it]       
Extracting : ld_impl_linux-64-2.35-h769bd43_9.tar.bz2:  68%|######7   | 65/96 [03:40<05:14, 10.14s/it]
Extracting : pycparser-2.20-pyh9f0ad1d_2.tar.bz2:  69%|######8   | 66/96 [03:40<05:04, 10.14s/it]     
Extracting : ca-certificates-2020.11.8-ha878542_0.tar.bz2:  70%|######9   | 67/96 [03:40<04:53, 10.14s/it]
Extracting : pysocks-1.7.1-py37he5f6b98_2.tar.bz2:  71%|#######   | 68/96 [03:40<04:43, 10.14s/it]        
Extracting : cryptography-3.2.1-py37hc72a4ac_0.tar.bz2:  72%|#######1  | 69/96 [03:40<04:33, 10.14s/it]
Extracting : python-dateutil-2.8.1-py_0.tar.bz2:  73%|#######2  | 70/96 [03:40<04:23, 10.14s/it]       
Extracting : acemd3-3.3.0-cuda100_0.tar.bz2:  74%|#######3  | 71/96 [03:40<04:13, 10.14s/it]    
Extracting : tk-8.6.10-hed695b0_1.tar.bz2:  75%|#######5  | 72/96 [03:40<04:03, 10.14s/it]  
Extracting : mkl-service-2.3.0-py37h8f50634_2.tar.bz2:  76%|#######6  | 73/96 [03:40<03:53, 10.14s/it]
Extracting : _openmp_mutex-4.5-1_llvm.tar.bz2:  77%|#######7  | 74/96 [03:40<03:42, 10.14s/it]        
Extracting : readline-8.0-he28a2e2_2.tar.bz2:  78%|#######8  | 75/96 [03:40<03:32, 10.14s/it] 
Extracting : decorator-4.4.2-py_0.tar.bz2:  79%|#######9  | 76/96 [03:40<03:22, 10.14s/it]   
Extracting : python_abi-3.7-1_cp37m.tar.bz2:  80%|########  | 77/96 [03:40<03:12, 10.14s/it]
Extracting : setuptools-49.6.0-py37he5f6b98_2.tar.bz2:  81%|########1 | 78/96 [03:40<03:02, 10.14s/it]
Extracting : libgfortran4-7.5.0-hae1eefd_17.tar.bz2:  82%|########2 | 79/96 [03:40<02:52, 10.14s/it]  
Extracting : acemd3-3.3.0_72_gcceda4a-cuda102_0.tar.bz2:  83%|########3 | 80/96 [03:40<02:42, 10.14s/it]
Extracting : ninja-1.10.1-hfc4b9b4_2.tar.bz2:  84%|########4 | 81/96 [03:40<02:32, 10.14s/it]           
Extracting : idna-2.10-pyh9f0ad1d_0.tar.bz2:  85%|########5 | 82/96 [03:40<02:21, 10.14s/it] 
Extracting : requests-2.24.0-pyh9f0ad1d_0.tar.bz2:  86%|########6 | 83/96 [03:40<02:11, 10.14s/it]
Extracting : openssl-1.1.1h-h516909a_0.tar.bz2:  88%|########7 | 84/96 [03:40<02:01, 10.14s/it]   
Extracting : pytorch-1.6.0-py3.7_cuda10.2.89_cudnn7.6.5_0.tar.bz2:  89%|########8 | 85/96 [04:56<01:51, 10.14s/it]
Extracting : pytorch-1.6.0-py3.7_cuda10.2.89_cudnn7.6.5_0.tar.bz2:  90%|########9 | 86/96 [04:56<01:15,  7.55s/it]
Extracting : pytz-2020.4-pyhd8ed1ab_0.tar.bz2:  90%|########9 | 86/96 [04:56<01:15,  7.55s/it]                    
Extracting : nnpops-pytorch-0.0.0a3-0.tar.bz2:  91%|######### | 87/96 [04:56<01:07,  7.55s/it]
Extracting : chardet-3.0.4-py37he5f6b98_1008.tar.bz2:  92%|#########1| 88/96 [04:56<01:00,  7.55s/it]
Extracting : pip-20.2.4-py_0.tar.bz2:  93%|#########2| 89/96 [04:56<00:52,  7.55s/it]                
Extracting : pandas-1.1.4-py37h10a2094_0.tar.bz2:  94%|#########3| 90/96 [04:56<00:45,  7.55s/it]
Extracting : libgfortran-ng-7.5.0-hae1eefd_17.tar.bz2:  95%|#########4| 91/96 [04:56<00:37,  7.55s/it]
Extracting : llvm-openmp-11.0.0-hfc4b9b4_1.tar.bz2:  96%|#########5| 92/96 [04:56<00:30,  7.55s/it]   
Extracting : moleculekit-0.4.4-py37_0.tar.bz2:  97%|#########6| 93/96 [04:56<00:22,  7.55s/it]     
Extracting : python-3.7.8-h6f2ec95_1_cpython.tar.bz2:  98%|#########7| 94/96 [04:56<00:15,  7.55s/it]
Extracting : pyopenssl-19.1.0-py_1.tar.bz2:  99%|#########8| 95/96 [04:56<00:07,  7.55s/it]          
                                                                                           
12:56:52 (69060): /usr/bin/flock exited; CPU time 141.751422
application ./gpugridpy/bin/python missing

</stderr_txt>
]]>
ID: 56164 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
biodoc

Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56165 - Posted: 28 Dec 2020, 12:18:01 UTC

@azmodes, all of my python tasks have failed with the same error. Resetting the project did not make a difference.
ID: 56165 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rod4x4

Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 56166 - Posted: 28 Dec 2020, 13:42:14 UTC - in response to Message 56165.  
Last modified: 28 Dec 2020, 14:16:09 UTC

@azmodes, all of my python tasks have failed with the same error. Resetting the project did not make a difference.

Also have the same error
application ./gpugridpy/bin/python missing

The next line on successful tasks would normally be
: wrapper: running ./gpugridpy/bin/python (run.py)

Issue seems to be with further testing from Project Admin.
There is a preliminary step to setup the environment for the app. This step is writing files to /tmp/ instead of the /var/lib/boinc-client/slots/ directory (This is the error for missing files). Hence the above error.
Nothing wrong with your system.
These tasks are Experimental. Errors are to be expected.
ID: 56166 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 56169 - Posted: 28 Dec 2020, 17:51:59 UTC
Last modified: 28 Dec 2020, 17:53:10 UTC

I did have one successful task today, but it might have been from the old set.

https://www.gpugrid.net/result.php?resultid=32364018

but all the others have been the same failure, about missing python. must be an error in the task creation, likely to be fixed by the admins when they are aware of it.
ID: 56169 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 428
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56170 - Posted: 28 Dec 2020, 18:43:45 UTC
Last modified: 28 Dec 2020, 19:08:34 UTC

I had a long series of failures on 537311 earlier today, while I was out. Then saw that I had two waiting to run on 508381 - looked through the task specs, but couldn't see anything odd. And behold, they're now running normally (one approaching 50%). First-run tasks, created around 16:20 UTC today, so maybe they've found the fault and sent out a new batch.

I've also got a recent new task waiting to start on 537311, so we'll see how that goes.

Edit - the new task on 537311 has started properly, as well. We might just be out of this particular thicket, even if not yet fully out of the woods.
ID: 56170 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 428
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56171 - Posted: 28 Dec 2020, 20:20:51 UTC

The first one finished successfully, but its replacement has a different early failure:

ERROR: /home/user/conda/conda-bld/acemd3_1592833101337/work/src/mdio/amberparm.cpp line 70: Failed to open PRMTOP file!
ID: 56171 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 56172 - Posted: 28 Dec 2020, 20:57:40 UTC

interesting error message on this one too. looks like it ran to term, but file size too big to upload

upload failure: <file_xfer_error>
<file_name>1k4g219000-RAIMIS_NNPMM-0-1-RND3269_1_0</file_name>
<error_code>-131 (file size too big)</error_code>
</file_xfer_error>


http://gpugrid.net/result.php?resultid=32369678
ID: 56172 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile trigggl

Send message
Joined: 6 Mar 09
Posts: 25
Credit: 102,324,681
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 56173 - Posted: 28 Dec 2020, 21:21:57 UTC
Last modified: 28 Dec 2020, 21:28:19 UTC

Had one finish and validate.
https://www.gpugrid.net/result.php?resultid=32368128

Some things I noticed about it while it was running.
    Ran at 99% GPU use
    Started with a Time Remaining of 189 days
    Used under 400mb of GPU ram

ID: 56173 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Azmodes

Send message
Joined: 7 Jan 17
Posts: 34
Credit: 1,371,429,518
RAC: 0
Level
Met
Scientific publications
watwatwat
Message 56178 - Posted: 29 Dec 2020, 10:57:58 UTC

I'll be removing this for the time being. Been getting a lot of tasks that error out after hours, lots of wasted computing time.
ID: 56178 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
biodoc

Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56179 - Posted: 29 Dec 2020, 11:37:27 UTC - in response to Message 56172.  

interesting error message on this one too. looks like it ran to term, but file size too big to upload

upload failure: <file_xfer_error>
<file_name>1k4g219000-RAIMIS_NNPMM-0-1-RND3269_1_0</file_name>
<error_code>-131 (file size too big)</error_code>
</file_xfer_error>


http://gpugrid.net/result.php?resultid=32369678


I've had quite a few of these errors. From boinc messages:

52988: 29-Dec-2020 11:24:44 (low) [GPUGRID] Output file 2za0216000-RAIMIS_NNPMM-0-1-RND5553_3_0 for task 2za0216000-RAIMIS_NNPMM-0-1-RND5553_3 exceeds size limit.
52989: 29-Dec-2020 11:24:44 (low) [GPUGRID] File size: 158101319.000000 bytes.  Limit: 100000000.000000 bytes

ID: 56179 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
biodoc

Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56180 - Posted: 29 Dec 2020, 12:17:03 UTC - in response to Message 56179.  

interesting error message on this one too. looks like it ran to term, but file size too big to upload

upload failure: <file_xfer_error>
<file_name>1k4g219000-RAIMIS_NNPMM-0-1-RND3269_1_0</file_name>
<error_code>-131 (file size too big)</error_code>
</file_xfer_error>


http://gpugrid.net/result.php?resultid=32369678


I've had quite a few of these errors. From boinc messages:

52988: 29-Dec-2020 11:24:44 (low) [GPUGRID] Output file 2za0216000-RAIMIS_NNPMM-0-1-RND5553_3_0 for task 2za0216000-RAIMIS_NNPMM-0-1-RND5553_3 exceeds size limit.
52989: 29-Dec-2020 11:24:44 (low) [GPUGRID] File size: 158101319.000000 bytes.  Limit: 100000000.000000 bytes



I went through the event log to look for more of the "output file exceeds size limit" errors. It looks like increasing the file upload size limit to 160000000 bytes or more would eliminate this problem.

53027: 29-Dec-2020 11:53:12 (low) [GPUGRID] File size: 123705476.000000 bytes.  Limit: 100000000.000000 bytes
53019: 29-Dec-2020 11:30:56 (low) [GPUGRID] File size: 107341284.000000 bytes.  Limit: 100000000.000000 bytes
52989: 29-Dec-2020 11:24:44 (low) [GPUGRID] File size: 158101319.000000 bytes.  Limit: 100000000.000000 bytes
52981: 29-Dec-2020 10:50:22 (low) [GPUGRID] File size: 123461155.000000 bytes.  Limit: 100000000.000000 bytes
52908: 29-Dec-2020 08:38:07 (low) [GPUGRID] File size: 123677635.000000 bytes.  Limit: 100000000.000000 bytes
52870: 29-Dec-2020 08:18:04 (low) [GPUGRID] File size: 158191729.000000 bytes.  Limit: 100000000.000000 bytes
52816: 29-Dec-2020 07:01:51 (low) [GPUGRID] File size: 138164005.000000 bytes.  Limit: 100000000.000000 bytes
52808: 29-Dec-2020 06:25:55 (low) [GPUGRID] File size: 158342958.000000 bytes.  Limit: 100000000.000000 bytes
52800: 29-Dec-2020 06:13:12 (low) [GPUGRID] File size: 158116895.000000 bytes.  Limit: 100000000.000000 bytes
52781: 29-Dec-2020 05:33:28 (low) [GPUGRID] File size: 107236868.000000 bytes.  Limit: 100000000.000000 bytes
52743: 29-Dec-2020 04:32:48 (low) [GPUGRID] File size: 138252720.000000 bytes.  Limit: 100000000.000000 bytes
52645: 29-Dec-2020 02:03:47 (low) [GPUGRID] File size: 123487055.000000 bytes.  Limit: 100000000.000000 bytes
52482: 29-Dec-2020 00:57:04 (low) [GPUGRID] File size: 123503099.000000 bytes.  Limit: 100000000.000000 bytes
52343: 28-Dec-2020 23:55:57 (low) [GPUGRID] File size: 107409143.000000 bytes.  Limit: 100000000.000000 bytes


ID: 56180 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 428
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56184 - Posted: 29 Dec 2020, 14:17:45 UTC - in response to Message 56180.  

interesting error message on this one too. looks like it ran to term, but file size too big to upload

upload failure: <file_xfer_error>
<file_name>1k4g219000-RAIMIS_NNPMM-0-1-RND3269_1_0</file_name>
<error_code>-131 (file size too big)</error_code>
</file_xfer_error>


http://gpugrid.net/result.php?resultid=32369678


I've had quite a few of these errors. From boinc messages:

52988: 29-Dec-2020 11:24:44 (low) [GPUGRID] Output file 2za0216000-RAIMIS_NNPMM-0-1-RND5553_3_0 for task 2za0216000-RAIMIS_NNPMM-0-1-RND5553_3 exceeds size limit.
52989: 29-Dec-2020 11:24:44 (low) [GPUGRID] File size: 158101319.000000 bytes.  Limit: 100000000.000000 bytes



I went through the event log to look for more of the "output file exceeds size limit" errors. It looks like increasing the file upload size limit to 160000000 bytes or more would eliminate this problem.

I can confirm that. I had four resends running this morning, which had failed this way on their previous hosts.

There's only one upload file for these tasks, which makes things easier. The limit is set at 100 (decimal) MB, or 95.3674 (binary) MiB. I changed that with an extra zero, and they ran to completion. The upload files ended at up to 135 MB, but uploaded fine and validated.

Instructions and notes
1) the tasks I worked on were reporting progress every 0.180%, but the latest ones are updating every 0.450%. Another change.
2) they claim to checkpoint each time, but don't. They restart right back at 10%, and previous work is lost (the output file starts again at 0 MB).
3) But the recorded runtime is not lost, and continues to increase. Anybody on the brink of the 20 credit sanity-check, take care.

How to
a) Let a new task start. I'd advise you let it run to 10.450% or whatever, to get past the 'Failed to open PRMTOP file!' failure point.
b) Stop the BOINC client, by whatever means is appropriate for your version of Linux.
c) Navigate to the BOINC data directory - often /var/lib/boinc-client, but YMMV.
d) Make sure you have write access - may need sudo or similar.
e) Open the file client_state.xml with a plain text editor.
f) Find the start of the GPUGrid project section.
g) Within that section, find the <file> description which contains an <upload_url> entry.
h) A couple of lines above, you should see
<max_nbytes>100000000.000000</max_nbytes>
Make that number bigger. I'd suggest allowing plenty of headroom - change the '1' to '2', or simply add a nought to make it ten times bigger.
i) Check that you have made no other changes to values or formats - this file is fragile.
j) Save it, and restart the BOINC client. You should be good to go, but check it's running properly before you walk away.
ID: 56184 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile trigggl

Send message
Joined: 6 Mar 09
Posts: 25
Credit: 102,324,681
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 56188 - Posted: 29 Dec 2020, 16:01:27 UTC

Is it possible to have the app clean up after itself in the /tmp folder? Each task leaves multiple files. Keep in mind that some people use ram for their /tmp folder, so zombie files are wasting ram

gpugrid files in my /tmp folder
-rw-r--r-- 1 boinc boinc  43314 Dec 28 10:23  7ea97c91d7361ec56632bfca26d5d9d13d642930_75_64
-rw-r--r-- 1 boinc boinc 126286 Dec 28 10:23  e0ef5982e1b68c789da96dc45363b9eb8c2aa57e_75_64
-rw-r--r-- 1 boinc boinc  80597 Dec 28 10:23  b4482d8d58bb8c6d2b7a91f73e7b773c0fd5dd94_75_64
-rw-r--r-- 1 boinc boinc  20647 Dec 28 10:23  714822709ce1b8bc002bdbbf552e28e75c6da4e6_75_64
-rw-r--r-- 1 boinc boinc   4733 Dec 28 10:23  fcf820cc959eabea7aebe1c17b691a0c2f4b0ff7_75_64
-rw-r--r-- 1 boinc boinc  10245 Dec 28 10:23  50f9f113993bfc730988bdc619171c634a15523f_75_64
-rw-r--r-- 1 boinc boinc  30374 Dec 28 10:23  f2dab3f09e77bbb4575c7ddc986723c25d6a4f3d_75_64
-rw-r--r-- 1 boinc boinc  20869 Dec 28 10:23  91f4a5a43a8501dea5dc4332cf4e8f9f5fd0359a_75_64
-rw-r--r-- 1 boinc boinc  13716 Dec 28 10:23  7980fb2556658945503bd805a0d8e93a948e72ba_75_64
-rw-r--r-- 1 boinc boinc  30377 Dec 28 10:23  93916279bb5343117c8bdbd1338022d943b665dc_75_64
-rw-r--r-- 1 boinc boinc  57843 Dec 28 10:23  7a1eb9372723ed7381f98389a26fd422b3bacdc9_75_64
-rw-r--r-- 1 boinc boinc  35422 Dec 28 10:23  39066daa18fdb7f9d5831a3a738a007a1c33bb21_75_64
-rw-r--r-- 1 boinc boinc  36936 Dec 28 10:23  97a2b7cad094f060cf83fbee7aa045351955025d_75_64
-rw-r--r-- 1 boinc boinc  36936 Dec 28 10:23  8764d8337c744fb5661e0b67108406e054c1ecf7_75_64
-rw-r--r-- 1 boinc boinc 126286 Dec 28 14:28  971f8bb04c1c0504d9d6226436fdcc5d07886be4_75_64
-rw-r--r-- 1 boinc boinc  80619 Dec 28 14:28  a102f87703190d04f8f9faa27478ab445379a41e_75_64
-rw-r--r-- 1 boinc boinc  10245 Dec 28 14:28  bceacf79d9e3a7852ce16958202b232509caa430_75_64
-rw-r--r-- 1 boinc boinc  30376 Dec 28 14:28  de46595abda900c61564054a4bd137ee3d70c9dd_75_64
-rw-r--r-- 1 boinc boinc  30379 Dec 28 14:28  b7c29d7e65c014012b9fd8c8c80b17c0e781422c_75_64
-rw-r--r-- 1 boinc boinc  57839 Dec 28 14:28  d12a034ced08fa013e7e9ba14395bb50e706b7c4_75_64
-rw-r--r-- 1 boinc boinc  35422 Dec 28 14:28  f58184112e590f9b8bac2ca1ba27f19730db8fe1_75_64
-rw-r--r-- 1 boinc boinc  36936 Dec 28 14:28  38eba1979b0e8c5c6c8337db39ec5e693353efd2_75_64
-rw-r--r-- 1 boinc boinc  36936 Dec 28 14:28  18682cb468a4f7ffe709f43654f80b950fd71e3e_75_64
-rw-r--r-- 1 boinc boinc 126295 Dec 28 18:18  9d841da4897f17abb1063a155a32a239510006d4_75_64
-rw-r--r-- 1 boinc boinc  80223 Dec 28 18:18  8f5b905c553a27de4570db40320a32e00282e107_75_64
-rw-r--r-- 1 boinc boinc  10245 Dec 28 18:18  87be55f7f9b2231f4bf29107ba0bc5586e26d8cc_75_64
-rw-r--r-- 1 boinc boinc  30391 Dec 28 18:18  56e68b46f69582f6255d5a6ea2b6c106147735ba_75_64
-rw-r--r-- 1 boinc boinc  30394 Dec 28 18:19  ee66161b69723e8e712d6305af6a5b0af4cb7d70_75_64
-rw-r--r-- 1 boinc boinc  57843 Dec 28 18:19  b0adc452ed6fa6fd24c9741c2b96482e35fa3923_75_64
-rw-r--r-- 1 boinc boinc  35428 Dec 28 18:19  742b600f272127103bb928076a71f4cd8c5bd736_75_64
-rw-r--r-- 1 boinc boinc  36942 Dec 28 18:19  9156034baf2ffa58ec15e153da37e52b70e396c2_75_64
-rw-r--r-- 1 boinc boinc  36942 Dec 28 18:19  60c5e6cea54031315f04913098148579068979fc_75_64
-rw-r--r-- 1 boinc boinc 126286 Dec 29 00:55  9233bab642d82fd791010f41501ae4d00de9a064_75_64
-rw-r--r-- 1 boinc boinc  80619 Dec 29 00:55  1700e1fa0b176d84ed7d27df18324a520188c225_75_64
-rw-r--r-- 1 boinc boinc  10245 Dec 29 00:55  5e0d2da29ad9b0dfc82c54d591826bf8ff0d142d_75_64
-rw-r--r-- 1 boinc boinc  30374 Dec 29 00:55  46d78c01e997eb624412f6a569d66daa53d1e29a_75_64
-rw-r--r-- 1 boinc boinc  30377 Dec 29 00:55  48b2935cd5e3da2f016c3332beb1a19cbd86e466_75_64
-rw-r--r-- 1 boinc boinc  57843 Dec 29 00:55  0e1f699fe80e24604c63598f5d787edeb8ea9f4e_75_64
-rw-r--r-- 1 boinc boinc  35422 Dec 29 00:55  a794b2186bfc30990054ab3ebb0426ce27632b6f_75_64
-rw-r--r-- 1 boinc boinc  36936 Dec 29 00:55  9d9a624fc68aa25d518928890e6ac3b36ca1d0cd_75_64
-rw-r--r-- 1 boinc boinc  36936 Dec 29 00:55  a836e46556989dc15e437e80577ddaa7c6393b75_75_64
-rw-r--r-- 1 boinc boinc 126286 Dec 29 04:51  fb42a08ad7eca6be05c99ee38f3b4f71f1a717b9_75_64
-rw-r--r-- 1 boinc boinc  80619 Dec 29 04:51  8408b5d9e631d0726f455499a816face8bab4174_75_64
-rw-r--r-- 1 boinc boinc  10245 Dec 29 04:51  e5cb623ff3b139a425e04b40a7c572c3005561d3_75_64
-rw-r--r-- 1 boinc boinc  30376 Dec 29 04:51  32c1d753d0db123091d675f042cf383ec465b4d9_75_64
-rw-r--r-- 1 boinc boinc  30379 Dec 29 04:51  b42845ba5a8b020687c1285c4f36c70f62a4ab9f_75_64
-rw-r--r-- 1 boinc boinc  57839 Dec 29 04:51  800c239c9c2ea4abf6b56821da41bd95c7dfbf74_75_64
-rw-r--r-- 1 boinc boinc  35422 Dec 29 04:51  2fa8fe0195793ba7d4867aeeca60dcd4c3941b88_75_64
-rw-r--r-- 1 boinc boinc  36936 Dec 29 04:51  1e91dd22529e2e5af37a6c1c356af9d7f3ffcf12_75_64
-rw-r--r-- 1 boinc boinc  36936 Dec 29 04:51  27c226ae18af868c3e7195046dd14dea61bf4101_75_64
-rw-r--r-- 1 boinc boinc 126295 Dec 29 08:26  8520d639fa8271b8962c3da8ae2803659d270827_75_64
-rw-r--r-- 1 boinc boinc  80223 Dec 29 08:26  ae54db82a21c8e2e5747967ed4f68427e90509bc_75_64
-rw-r--r-- 1 boinc boinc  10245 Dec 29 08:26  17fea8103490d36bfce7b9443641dfdedd2e24bb_75_64
-rw-r--r-- 1 boinc boinc  30391 Dec 29 08:26  33c9c1c55d8a8cb2db72b91b3344b4f33f5f15da_75_64
-rw-r--r-- 1 boinc boinc  30394 Dec 29 08:26  a6514953bc083c082f2ad6ebe386904bf2576ee2_75_64
-rw-r--r-- 1 boinc boinc  57845 Dec 29 08:26  573ac214fa460acfe44af24489e461d6aa3107ef_75_64
-rw-r--r-- 1 boinc boinc  35428 Dec 29 08:26  8ead506b530d7fe93e46f3a28ca9e77798609101_75_64
-rw-r--r-- 1 boinc boinc  36942 Dec 29 08:27  a1bee26d81077fe4b29a180e84aab497b09097d0_75_64
-rw-r--r-- 1 boinc boinc  36942 Dec 29 08:27  482ad04da8945ca75cd26712d462e899addae4f7_75_64

At the moment, I'm just asking for a task to do some cleanup when it finishes. Don't like seeing boinc outside of the boinc folder.
ID: 56188 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,447
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56211 - Posted: 31 Dec 2020, 13:03:57 UTC - in response to Message 56022.  

Returning to graphics card RAM size and Python tasks:
My Host #567828 and Host #557889, both running acemd3 tasks fine on their 2GB RAM graphics cards, haven't been able to process none of the Python tasks that they have received.

After the last rebuilt of Python tasks on December 29th, all my 2 GB VRAM graphics cards have started to succeed them:
These cards are based on GTX 750, GTX 750 TI, GTX 950, and GT 1030 GPUs.
Also failures regarding "file size too big" disappeared.
Some entity in the background seems to know well what is handling...
ID: 56211 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 56212 - Posted: 31 Dec 2020, 13:43:16 UTC - in response to Message 56211.  

yeah i noticed the baseline behavior changed from the last round of Python to this latest round.

VRAM use dropped from about 2GB to about 200MB.
PCIe use returned to the same as the MDAD tasks (using about as much as PCIe3.0 x4)
ID: 56212 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Anaconda Python 3 Environment v4.01 failures

©2025 Universitat Pompeu Fabra