Task 38577047

Name wu_dc352407-GIANNI_GPROTO7-0-1-RND8527_0
Workunit 31542622
Created 24 Sep 2025, 15:29:29 UTC
Sent 24 Sep 2025, 15:29:39 UTC
Report deadline 29 Sep 2025, 15:29:39 UTC
Received 24 Sep 2025, 19:40:35 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 195 (0x000000C3) EXIT_CHILD_FAILED
Computer ID 632751
Run time 18 min 11 sec
CPU time 2 min 28 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 83,567.27 GFLOPS
Application version LLM: LLMs for chemistry v1.01 (cuda124L)
windows_x86_64
Peak working set size 4.86 GB
Peak swap size 20.08 GB
Peak disk usage 6.35 GB

Stderr output

<core_client_version>8.2.4</core_client_version>
<![CDATA[
<message>
Das Betriebssystem kann (null) nicht ausf�hren.
 (0xc3) - exit code 195 (0xc3)</message>
<stderr_txt>
17:35:26 (24404): wrapper (7.9.26016): starting
17:35:26 (24404): wrapper: running Library/usr/bin/tar.exe (xjvf input.tar.bz2)
tasks.json
run.bat
conf.yaml
main_generation-0.1.0-py3-none-any.whl
run.sh
17:35:27 (24404): Library/usr/bin/tar.exe exited; CPU time 0.000000
17:35:27 (24404): wrapper: running C:/Windows/system32/cmd.exe (/c call Scripts\activate.bat && Scripts\conda-unpack.exe && run.bat)

Generating train split: 0 examples [00:00, ? examples/s]
Generating train split: 2500 examples [00:00, 192251.11 examples/s]
E:\BOINC\DATA\slots\0\Lib\site-packages\huggingface_hub\file_download.py:144: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in E:\BOINC\DATA\slots\.cache\hub\models--Acellera--proto. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
[W924 17:38:21.000000000 socket.cpp:759] [c10d] The client socket has failed to connect to [ZEUSLORD.fritz.box]:52488 (system error: 10049 - Die angeforderte Adresse ist in diesem Kontext ung&#252;ltig.).

Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]

Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:21<00:21, 21.66s/it]

Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:49<00:00, 25.45s/it]

Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:49<00:00, 24.88s/it]


Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]

Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:03<00:03,  3.77s/it]

Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:07<00:00,  3.83s/it]

Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:07<00:00,  3.82s/it]


Capturing CUDA graph shapes:   0%|          | 0/35 [00:00<?, ?it/s]
Capturing CUDA graph shapes:   3%|2         | 1/35 [00:05<02:58,  5.24s/it]
Capturing CUDA graph shapes:   6%|5         | 2/35 [00:09<02:39,  4.83s/it]
Capturing CUDA graph shapes:   9%|8         | 3/35 [00:10<01:41,  3.17s/it]
Capturing CUDA graph shapes:  11%|#1        | 4/35 [00:12<01:13,  2.38s/it]
Capturing CUDA graph shapes:  14%|#4        | 5/35 [00:13<00:58,  1.96s/it]
Capturing CUDA graph shapes:  17%|#7        | 6/35 [00:14<00:49,  1.70s/it]
Capturing CUDA graph shapes:  20%|##        | 7/35 [00:15<00:43,  1.54s/it]21:20:33 (16900): wrapper (7.9.26016): starting
21:20:33 (16900): wrapper: running C:/Windows/system32/cmd.exe (/c call Scripts\activate.bat && Scripts\conda-unpack.exe && run.bat)
Ein Unterverzeichnis oder eine Datei mit dem Namen "E:\BOINC\DATA\slots\0\tmp" existiert bereits.

Generating train split: 0 examples [00:00, ? examples/s]
Generating train split: 2500 examples [00:00, 206015.17 examples/s]
E:\BOINC\DATA\slots\0\Lib\site-packages\huggingface_hub\file_download.py:144: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in E:\BOINC\DATA\slots\.cache\hub\models--Acellera--proto. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
E:\BOINC\DATA\slots\0\Lib\site-packages\huggingface_hub\file_download.py:144: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in E:\BOINC\DATA\slots\.cache\hub\models--unsloth--Qwen2.5-14B-Instruct-bnb-4bit. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
[W924 21:26:04.000000000 socket.cpp:759] [c10d] The client socket has failed to connect to [ZEUSLORD.fritz.box]:55136 (system error: 10049 - Die angeforderte Adresse ist in diesem Kontext ung&#252;ltig.).

Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]

Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:10<00:10, 10.05s/it]

Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:22<00:00, 11.32s/it]

Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:22<00:00, 11.13s/it]


Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]

Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:04<00:04,  4.06s/it]

Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:07<00:00,  3.86s/it]

Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:07<00:00,  3.89s/it]

[rank0]: Traceback (most recent call last):
[rank0]:   File "wheel_contents/aiengine/main_generation.py", line 87, in <module>
[rank0]:   File "wheel_contents/aiengine/model.py", line 36, in __init__
[rank0]:   File "E:\BOINC\DATA\slots\0\Lib\site-packages\vllm\utils.py", line 1096, in inner
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "E:\BOINC\DATA\slots\0\Lib\site-packages\vllm\entrypoints\llm.py", line 243, in __init__
[rank0]:     self.llm_engine = LLMEngine.from_engine_args(
[rank0]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "E:\BOINC\DATA\slots\0\Lib\site-packages\vllm\engine\llm_engine.py", line 521, in from_engine_args
[rank0]:     return engine_cls.from_vllm_config(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "E:\BOINC\DATA\slots\0\Lib\site-packages\vllm\engine\llm_engine.py", line 497, in from_vllm_config
[rank0]:     return cls(
[rank0]:            ^^^^
[rank0]:   File "E:\BOINC\DATA\slots\0\Lib\site-packages\vllm\engine\llm_engine.py", line 284, in __init__
[rank0]:     self._initialize_kv_caches()
[rank0]:   File "E:\BOINC\DATA\slots\0\Lib\site-packages\vllm\engine\llm_engine.py", line 446, in _initialize_kv_caches
[rank0]:     self.model_executor.initialize_cache(num_gpu_blocks, num_cpu_blocks)
[rank0]:   File "E:\BOINC\DATA\slots\0\Lib\site-packages\vllm\executor\executor_base.py", line 123, in initialize_cache
[rank0]:     self.collective_rpc("initialize_cache",
[rank0]:   File "E:\BOINC\DATA\slots\0\Lib\site-packages\vllm\executor\uniproc_executor.py", line 56, in collective_rpc
[rank0]:     answer = run_method(self.driver_worker, method, args, kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "E:\BOINC\DATA\slots\0\Lib\site-packages\vllm\utils.py", line 2359, in run_method
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "E:\BOINC\DATA\slots\0\Lib\site-packages\vllm\worker\worker.py", line 308, in initialize_cache
[rank0]:     self._init_cache_engine()
[rank0]:   File "E:\BOINC\DATA\slots\0\Lib\site-packages\vllm\worker\worker.py", line 314, in _init_cache_engine
[rank0]:     CacheEngine(self.cache_config, self.model_config,
[rank0]:   File "E:\BOINC\DATA\slots\0\Lib\site-packages\vllm\worker\cache_engine.py", line 66, in __init__
[rank0]:     self.cpu_cache = self._allocate_kv_cache(self.num_cpu_blocks, "cpu")
[rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "E:\BOINC\DATA\slots\0\Lib\site-packages\vllm\worker\cache_engine.py", line 83, in _allocate_kv_cache
[rank0]:     layer_kv_cache = torch.zeros(kv_cache_shape,
[rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: RuntimeError: CUDA error: resource already mapped
[rank0]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank0]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[rank0]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

21:39:00 (16900): C:/Windows/system32/cmd.exe exited; CPU time 148.093750
21:39:00 (16900): app exit status: 0x16
21:39:00 (16900): called boinc_finish(195)
0 bytes in 0 Free Blocks.
176 bytes in 6 Normal Blocks.
1144 bytes in 1 CRT Blocks.
0 bytes in 0 Ignore Blocks.
0 bytes in 0 Client Blocks.
Largest number used: 0 bytes.
Total allocations: 17370347 bytes.
Dumping objects ->
{1594} normal block at 0x000001A4FEB6A610, 48 bytes long.
 Data: <PATH=E:\BOINC\DA> 50 41 54 48 3D 45 3A 5C 42 4F 49 4E 43 5C 44 41 
{1583} normal block at 0x000001A4FEB720D0, 32 bytes long.
 Data: <HOME=E:\BOINC\DA> 48 4F 4D 45 3D 45 3A 5C 42 4F 49 4E 43 5C 44 41 
{1572} normal block at 0x000001A4FEB726D0, 32 bytes long.
 Data: <TMP=E:\BOINC\DAT> 54 4D 50 3D 45 3A 5C 42 4F 49 4E 43 5C 44 41 54 
{1561} normal block at 0x000001A4FEB71DD0, 32 bytes long.
 Data: <TEMP=E:\BOINC\DA> 54 45 4D 50 3D 45 3A 5C 42 4F 49 4E 43 5C 44 41 
{1550} normal block at 0x000001A4FEB72310, 32 bytes long.
 Data: <TMPDIR=E:\BOINC\> 54 4D 50 44 49 52 3D 45 3A 5C 42 4F 49 4E 43 5C 
{1519} normal block at 0x000001A4FEB6A760, 48 bytes long.
 Data: <PATH=E:\BOINC\DA> 50 41 54 48 3D 45 3A 5C 42 4F 49 4E 43 5C 44 41 
..\api\boinc_api.cpp(309) : {1506} normal block at 0x000001A4FEB70440, 8 bytes long.
 Data: <        > 00 00 11 FF A4 01 00 00 
{300} normal block at 0x000001A4FEB701C0, 8 bytes long.
 Data: < 3      > D0 33 B7 FE A4 01 00 00 
{291} normal block at 0x000001A4FEB56C40, 80 bytes long.
 Data: </c call Scripts\> 2F 63 20 63 61 6C 6C 20 53 63 72 69 70 74 73 5C 
{290} normal block at 0x000001A4FEB70350, 16 bytes long.
 Data: < 2              > D8 32 B7 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{289} normal block at 0x000001A4FEB702B0, 16 bytes long.
 Data: < 2              > B0 32 B7 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{288} normal block at 0x000001A4FEB70260, 16 bytes long.
 Data: < 2              > 88 32 B7 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{287} normal block at 0x000001A4FEB70210, 16 bytes long.
 Data: <`2              > 60 32 B7 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{286} normal block at 0x000001A4FEB6FF40, 16 bytes long.
 Data: <82              > 38 32 B7 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{285} normal block at 0x000001A4FEB705D0, 16 bytes long.
 Data: < 2              > 10 32 B7 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{284} normal block at 0x000001A4FEB6A4C0, 48 bytes long.
 Data: <ComSpec=C:\Windo> 43 6F 6D 53 70 65 63 3D 43 3A 5C 57 69 6E 64 6F 
{283} normal block at 0x000001A4FEB700D0, 16 bytes long.
 Data: <                > 88 B1 B6 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{282} normal block at 0x000001A4FEB72430, 32 bytes long.
 Data: <SystemRoot=C:\Wi> 53 79 73 74 65 6D 52 6F 6F 74 3D 43 3A 5C 57 69 
{281} normal block at 0x000001A4FEB703A0, 16 bytes long.
 Data: <`               > 60 B1 B6 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{279} normal block at 0x000001A4FEB70120, 16 bytes long.
 Data: <8               > 38 B1 B6 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{278} normal block at 0x000001A4FEB70A30, 16 bytes long.
 Data: <                > 10 B1 B6 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{277} normal block at 0x000001A4FEB6FEF0, 16 bytes long.
 Data: <                > E8 B0 B6 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{276} normal block at 0x000001A4FEB6FE00, 16 bytes long.
 Data: <                > C0 B0 B6 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{275} normal block at 0x000001A4FEB703F0, 16 bytes long.
 Data: <                > 98 B0 B6 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{274} normal block at 0x000001A4FEB72370, 32 bytes long.
 Data: <CUDA_DEVICE=0 PU> 43 55 44 41 5F 44 45 56 49 43 45 3D 30 00 50 55 
{273} normal block at 0x000001A4FEB6FDB0, 16 bytes long.
 Data: <p               > 70 B0 B6 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{272} normal block at 0x000001A4FEB6B070, 320 bytes long.
 Data: <        p#      > B0 FD B6 FE A4 01 00 00 70 23 B7 FE A4 01 00 00 
{271} normal block at 0x000001A4FEB70C10, 16 bytes long.
 Data: < 1              > F0 31 B7 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{270} normal block at 0x000001A4FEB70AD0, 16 bytes long.
 Data: < 1              > C8 31 B7 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{269} normal block at 0x000001A4FEB72250, 32 bytes long.
 Data: <C:/Windows/syste> 43 3A 2F 57 69 6E 64 6F 77 73 2F 73 79 73 74 65 
{268} normal block at 0x000001A4FEB6FD10, 16 bytes long.
 Data: < 1              > A0 31 B7 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{267} normal block at 0x000001A4FEB72970, 32 bytes long.
 Data: <xjvf input.tar.b> 78 6A 76 66 20 69 6E 70 75 74 2E 74 61 72 2E 62 
{266} normal block at 0x000001A4FEB6FD60, 16 bytes long.
 Data: < 0              > E8 30 B7 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{265} normal block at 0x000001A4FEB709E0, 16 bytes long.
 Data: < 0              > C0 30 B7 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{264} normal block at 0x000001A4FEB70B70, 16 bytes long.
 Data: < 0              > 98 30 B7 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{263} normal block at 0x000001A4FEB6FCC0, 16 bytes long.
 Data: <p0              > 70 30 B7 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{262} normal block at 0x000001A4FEB70990, 16 bytes long.
 Data: <H0              > 48 30 B7 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{261} normal block at 0x000001A4FEB708F0, 16 bytes long.
 Data: < 0              > 20 30 B7 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{259} normal block at 0x000001A4FEB707B0, 16 bytes long.
 Data: <                > 90 A9 B6 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{258} normal block at 0x000001A4FEB6A990, 40 bytes long.
 Data: <        `       > B0 07 B7 FE A4 01 00 00 60 A7 B6 FE A4 01 00 00 
{257} normal block at 0x000001A4FEB706C0, 16 bytes long.
 Data: < 0              > 00 30 B7 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{256} normal block at 0x000001A4FEB70620, 16 bytes long.
 Data: < /              > D8 2F B7 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{255} normal block at 0x000001A4FEB72B50, 32 bytes long.
 Data: <Library/usr/bin/> 4C 69 62 72 61 72 79 2F 75 73 72 2F 62 69 6E 2F 
{254} normal block at 0x000001A4FEB70170, 16 bytes long.
 Data: < /              > B0 2F B7 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{253} normal block at 0x000001A4FEB72FB0, 992 bytes long.
 Data: <p       P+      > 70 01 B7 FE A4 01 00 00 50 2B B7 FE A4 01 00 00 
{97} normal block at 0x000001A4FEB666F0, 32 bytes long.
 Data: <windows_x86_64__> 77 69 6E 64 6F 77 73 5F 78 38 36 5F 36 34 5F 5F 
{96} normal block at 0x000001A4FEB70B20, 16 bytes long.
 Data: <                > E0 A3 B6 FE A4 01 00 00 00 00 00 00 00 00 00 00 
{95} normal block at 0x000001A4FEB6A3E0, 40 bytes long.
 Data: <         f      > 20 0B B7 FE A4 01 00 00 F0 66 B6 FE A4 01 00 00 
{74} normal block at 0x000001A4FEB7E760, 16 bytes long.
 Data: <  :             > 80 EA 3A D2 F6 7F 00 00 00 00 00 00 00 00 00 00 
{73} normal block at 0x000001A4FEB7E080, 16 bytes long.
 Data: <@ :             > 40 E9 3A D2 F6 7F 00 00 00 00 00 00 00 00 00 00 
{72} normal block at 0x000001A4FEB7E3F0, 16 bytes long.
 Data: < W7             > F8 57 37 D2 F6 7F 00 00 00 00 00 00 00 00 00 00 
{71} normal block at 0x000001A4FEB7DF40, 16 bytes long.
 Data: < W7             > D8 57 37 D2 F6 7F 00 00 00 00 00 00 00 00 00 00 
{70} normal block at 0x000001A4FEB7DE00, 16 bytes long.
 Data: <P 7             > 50 04 37 D2 F6 7F 00 00 00 00 00 00 00 00 00 00 
{69} normal block at 0x000001A4FEB7DD10, 16 bytes long.
 Data: <0 7             > 30 04 37 D2 F6 7F 00 00 00 00 00 00 00 00 00 00 
{68} normal block at 0x000001A4FEB7DDB0, 16 bytes long.
 Data: <  7             > E0 02 37 D2 F6 7F 00 00 00 00 00 00 00 00 00 00 
{67} normal block at 0x000001A4FEB7DEF0, 16 bytes long.
 Data: <  7             > 10 04 37 D2 F6 7F 00 00 00 00 00 00 00 00 00 00 
{66} normal block at 0x000001A4FEB7DCC0, 16 bytes long.
 Data: <p 7             > 70 04 37 D2 F6 7F 00 00 00 00 00 00 00 00 00 00 
{65} normal block at 0x000001A4FEB7E710, 16 bytes long.
 Data: <  5             > 18 C0 35 D2 F6 7F 00 00 00 00 00 00 00 00 00 00 
Object dump complete.

</stderr_txt>
]]>


©2025 Universitat Pompeu Fabra