Task 38486828

Name wu_3c5123d1-GIANNI_GLLM-0-1-RND3995_0
Workunit 31482136
Created 23 Apr 2025, 12:23:48 UTC
Sent 23 Apr 2025, 12:48:02 UTC
Report deadline 28 Apr 2025, 12:48:02 UTC
Received 23 Apr 2025, 13:04:02 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 195 (0x000000C3) EXIT_CHILD_FAILED
Computer ID 637819
Run time 3 min 53 sec
CPU time 3 min 34 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 15,485.45 GFLOPS
Application version LLM: LLMs for chemistry v1.00 (cuda124L)
x86_64-pc-linux-gnu
Peak working set size 3.15 GB
Peak swap size 38.12 GB
Peak disk usage 8.15 GB

Stderr output

<core_client_version>7.19.0</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
2025-04-23 08:59:34 (4190915): wrapper (8.1.26018): starting
2025-04-23 09:00:28 (4190915): wrapper: running bin/python (bin/conda-unpack)
2025-04-23 09:00:28 (4190915): wrapper: created child process 4191136
2025-04-23 09:00:30 (4190915): bin/python exited; CPU time 0.953739
2025-04-23 09:00:30 (4190915): wrapper: running bin/tar (xjvf input.tar.bz2)
2025-04-23 09:00:30 (4190915): wrapper: created child process 4191145
2025-04-23 09:00:31 (4190915): bin/tar exited; CPU time 0.018925
2025-04-23 09:00:31 (4190915): wrapper: running bin/bash (run.sh)
2025-04-23 09:00:31 (4190915): wrapper: created child process 4191154
+ echo 'Setup environment'
+ source bin/activate
++ _conda_pack_activate
++ local _CONDA_SHELL_FLAVOR
++ '[' -n x ']'
++ _CONDA_SHELL_FLAVOR=bash
++ local script_dir
++ case "$_CONDA_SHELL_FLAVOR" in
+++ dirname bin/activate
++ script_dir=bin
+++ cd bin
+++ pwd
++ local full_path_script_dir=/home/ian/BOINC/slots/4/bin
+++ dirname /home/ian/BOINC/slots/4/bin
++ local full_path_env=/home/ian/BOINC/slots/4
+++ basename /home/ian/BOINC/slots/4
++ local env_name=4
++ '[' -n '' ']'
++ export CONDA_PREFIX=/home/ian/BOINC/slots/4
++ CONDA_PREFIX=/home/ian/BOINC/slots/4
++ export _CONDA_PACK_OLD_PS1=
++ _CONDA_PACK_OLD_PS1=
++ PATH=/home/ian/BOINC/slots/4/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:.
++ PS1='(4) '
++ case "$_CONDA_SHELL_FLAVOR" in
++ hash -r
++ local _script_dir=/home/ian/BOINC/slots/4/etc/conda/activate.d
++ '[' -d /home/ian/BOINC/slots/4/etc/conda/activate.d ']'
+ export PATH=/home/ian/BOINC/slots/4:/home/ian/BOINC/slots/4/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:.
+ PATH=/home/ian/BOINC/slots/4:/home/ian/BOINC/slots/4/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:.
+ echo 'Create a temporary directory'
+ export TMP=/home/ian/BOINC/slots/4/tmp
+ TMP=/home/ian/BOINC/slots/4/tmp
+ mkdir -p /home/ian/BOINC/slots/4/tmp
+ which python
+ pip install main_generation-0.1.0-py3-none-any.whl -v --no-deps
+ export CUDA_VISIBLE_DEVICES=0
+ CUDA_VISIBLE_DEVICES=0
+ export HF_HOME=../.cache
+ HF_HOME=../.cache
+ export VLLM_ASSETS_CACHE=../.cache
+ VLLM_ASSETS_CACHE=../.cache
+ export VLLM_CACHE_ROOT=../.cache
+ VLLM_CACHE_ROOT=../.cache
+ echo RUNNING
+ pythonbinary=/home/ian/BOINC/slots/4/lib/python3.12/site-packages/aiengine/main_generation.pyc
+ python /home/ian/BOINC/slots/4/lib/python3.12/site-packages/aiengine/main_generation.pyc --conf conf.yaml

Generating train split: 0 examples [00:00, ? examples/s]
Generating train split: 1000 examples [00:00, 213276.92 examples/s]

Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]

Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:01<00:01,  1.03s/it]

Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:02<00:00,  1.04s/it]

Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:02<00:00,  1.04s/it]


Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]

Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:01<00:01,  1.12s/it]

Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:02<00:00,  1.13s/it]

Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:02<00:00,  1.13s/it]


Map:   0%|          | 0/1000 [00:00<?, ? examples/s]
Map: 100%|&#226;&#150;&#136;&#226;&#150;&#136;&#226;&#150;&#136;&#226;&#150;&#136;&#226;&#150;&#136;&#226;&#150;&#136;&#226;&#150;&#136;&#226;&#150;&#136;&#226;&#150;&#136;&#226;&#150;&#136;| 1000/1000 [00:00<00:00, 15459.22 examples/s]
run.sh: line 26: 4191183 Killed                  python ${pythonbinary} --conf conf.yaml
2025-04-23 09:03:26 (4190915): bin/bash exited; CPU time 35.252312
2025-04-23 09:03:26 (4190915): app exit status: 0x89
2025-04-23 09:03:26 (4190915): called boinc_finish(195)

</stderr_txt>
]]>


©2025 Universitat Pompeu Fabra