Task 38575405

Name wu_ec313f2d-GIANNI_GPROTO7-0-1-RND1778_1
Workunit 31541200
Created 22 Sep 2025, 23:04:14 UTC
Sent 22 Sep 2025, 23:04:18 UTC
Report deadline 27 Sep 2025, 23:04:18 UTC
Received 22 Sep 2025, 23:09:25 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 195 (0x000000C3) EXIT_CHILD_FAILED
Computer ID 619869
Run time 3 min 7 sec
CPU time 41 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 19,494.86 GFLOPS
Application version LLM: LLMs for chemistry v1.00 (cuda124L)
x86_64-pc-linux-gnu
Peak working set size 652.28 MB
Peak swap size 8.48 GB
Peak disk usage 8.15 GB

Stderr output

<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
2025-09-23 07:04:33 (1223612): wrapper (8.1.26018): starting
2025-09-23 07:05:58 (1223612): wrapper: running bin/python (bin/conda-unpack)
2025-09-23 07:05:58 (1223612): wrapper: created child process 1223775
2025-09-23 07:06:56 (1223612): bin/python exited; CPU time 2.491811
2025-09-23 07:06:56 (1223612): wrapper: running bin/tar (xjvf input.tar.bz2)
2025-09-23 07:06:56 (1223612): wrapper: created child process 1223855
2025-09-23 07:06:57 (1223612): bin/tar exited; CPU time 0.041401
2025-09-23 07:06:57 (1223612): wrapper: running bin/bash (run.sh)
2025-09-23 07:06:57 (1223612): wrapper: created child process 1223857
+ echo 'Setup environment'
+ source bin/activate
++ _conda_pack_activate
++ local _CONDA_SHELL_FLAVOR
++ '[' -n x ']'
++ _CONDA_SHELL_FLAVOR=bash
++ local script_dir
++ case "$_CONDA_SHELL_FLAVOR" in
+++ dirname bin/activate
++ script_dir=bin
+++ cd bin
+++ pwd
++ local full_path_script_dir=/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/bin
+++ dirname /home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/bin
++ local full_path_env=/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7
+++ basename /home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7
++ local env_name=7
++ '[' -n '' ']'
++ export CONDA_PREFIX=/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7
++ CONDA_PREFIX=/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7
++ export _CONDA_PACK_OLD_PS1=
++ _CONDA_PACK_OLD_PS1=
++ PATH=/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:.
++ PS1='(7) '
++ case "$_CONDA_SHELL_FLAVOR" in
++ hash -r
++ local _script_dir=/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/etc/conda/activate.d
++ '[' -d /home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/etc/conda/activate.d ']'
+ export PATH=/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7:/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:.
+ PATH=/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7:/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:.
+ echo 'Create a temporary directory'
+ export TMP=/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/tmp
+ TMP=/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/tmp
+ mkdir -p /home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/tmp
+ which python
+ pip install main_generation-0.1.0-py3-none-any.whl -v --no-deps
+ export CUDA_VISIBLE_DEVICES=1
+ CUDA_VISIBLE_DEVICES=1
+ export HF_HOME=../.cache
+ HF_HOME=../.cache
+ export VLLM_ASSETS_CACHE=../.cache
+ VLLM_ASSETS_CACHE=../.cache
+ export VLLM_CACHE_ROOT=../.cache
+ VLLM_CACHE_ROOT=../.cache
+ echo RUNNING
+ pythonbinary=/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/aiengine/main_generation.pyc
+ python /home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/aiengine/main_generation.pyc --conf conf.yaml

Generating train split: 0 examples [00:00, ? examples/s]
Generating train split: 2500 examples [00:00, 184702.75 examples/s]
Traceback (most recent call last):
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/urllib3/connection.py", line 198, in _new_conn
    sock = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/urllib3/util/connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/socket.py", line 978, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno -3] Temporary failure in name resolution

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/urllib3/connectionpool.py", line 488, in _make_request
    raise new_e
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/urllib3/connectionpool.py", line 464, in _make_request
    self._validate_conn(conn)
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/urllib3/connectionpool.py", line 1093, in _validate_conn
    conn.connect()
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/urllib3/connection.py", line 704, in connect
    self.sock = sock = self._new_conn()
                       ^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/urllib3/connection.py", line 205, in _new_conn
    raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x14b15861eb40>: Failed to resolve 'huggingface.co' ([Errno -3] Temporary failure in name resolution)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/urllib3/connectionpool.py", line 841, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/urllib3/util/retry.py", line 519, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/Acellera/proto/tree/main?recursive=True&expand=False (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x14b15861eb40>: Failed to resolve 'huggingface.co' ([Errno -3] Temporary failure in name resolution)"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/vllm/transformers_utils/config.py", line 258, in get_config
    if is_gguf or file_or_path_exists(
                  ^^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/vllm/transformers_utils/config.py", line 180, in file_or_path_exists
    return file_exists(str(model),
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/vllm/transformers_utils/config.py", line 155, in file_exists
    file_list = list_repo_files(repo_id,
                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/vllm/transformers_utils/config.py", line 144, in list_repo_files
    return with_retry(lookup_files, "Error retrieving file list")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/vllm/transformers_utils/config.py", line 98, in with_retry
    return func()
           ^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/vllm/transformers_utils/config.py", line 134, in lookup_files
    return hf_list_repo_files(repo_id,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/huggingface_hub/hf_api.py", line 2996, in list_repo_files
    for f in self.list_repo_tree(
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/huggingface_hub/hf_api.py", line 3131, in list_repo_tree
    for path_info in paginate(path=tree_url, headers=headers, params={"recursive": recursive, "expand": expand}):
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/huggingface_hub/utils/_pagination.py", line 36, in paginate
    r = session.get(path, params=params, headers=headers)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/requests/sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/huggingface_hub/utils/_http.py", line 96, in send
    return super().send(request, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/requests/adapters.py", line 700, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: (MaxRetryError('HTTPSConnectionPool(host=\'huggingface.co\', port=443): Max retries exceeded with url: /api/models/Acellera/proto/tree/main?recursive=True&expand=False (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x14b15861eb40>: Failed to resolve \'huggingface.co\' ([Errno -3] Temporary failure in name resolution)"))'), '(Request ID: a0609825-80dc-40c0-8048-ddd7b7e31b1d)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "wheel_contents/aiengine/main_generation.py", line 87, in <module>
  File "wheel_contents/aiengine/model.py", line 36, in __init__
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/vllm/utils.py", line 1096, in inner
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 243, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 514, in from_engine_args
    vllm_config = engine_args.create_engine_config(usage_context)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 1137, in create_engine_config
    model_config = self.create_model_config()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 1026, in create_model_config
    return ModelConfig(
           ^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/vllm/config.py", line 343, in __init__
    hf_config = get_config(self.hf_config_path or self.model,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hfnl/yaoxingcan83/siyuanchen/opt/boinc-client/bin/slots/7/lib/python3.12/site-packages/vllm/transformers_utils/config.py", line 283, in get_config
    raise ValueError(error_message) from e
ValueError: Invalid repository ID or local directory specified: 'Acellera/proto'.
Please verify the following requirements:
1. Provide a valid Hugging Face repository ID.
2. Specify a local directory that contains a recognized configuration file.
   - For Hugging Face models: ensure the presence of a 'config.json'.
   - For Mistral models: ensure the presence of a 'params.json'.

2025-09-23 07:07:58 (1223612): bin/bash exited; CPU time 34.151994
2025-09-23 07:07:58 (1223612): app exit status: 0x1
2025-09-23 07:07:58 (1223612): called boinc_finish(195)

</stderr_txt>
]]>


©2025 Universitat Pompeu Fabra