Experimental Python tasks (beta)

Author	Message
kotenok2000 Send message Joined: 18 Jul 13 Posts: 79 Credit: 218,778,292 RAC: 12,880 Level Scientific publications	Message 59482 - Posted: 18 Oct 2022, 21:03:38 UTC I have recieved my first new working task. ID: 59482 · Rating: 0 · rate: / Reply Quote

KAMasud Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level Scientific publications	Message 59483 - Posted: 19 Oct 2022, 5:15:25 UTC I wish I could get a sniff also. ID: 59483 · Rating: 0 · rate: / Reply Quote

GS Send message Joined: 16 Oct 22 Posts: 12 Credit: 1,382,500 RAC: 0 Level Scientific publications	Message 59484 - Posted: 19 Oct 2022, 8:28:42 UTC I got another one this morning, still no luck, the task failed as all the other before. Is there something, that I have to change on my side? This is the log file: <core_client_version>7.20.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code 195 (0xc3)</message> <stderr_txt> 09:56:38 (11564): wrapper (7.9.26016): starting 09:56:38 (11564): wrapper: running .\7za.exe (x pythongpu_windows_x86_64__cuda1131.txz -y) 7-Zip (a) 22.01 (x86) : Copyright (c) 1999-2022 Igor Pavlov : 2022-07-15 Scanning the drive for archives: 1 file, 1976180228 bytes (1885 MiB) Extracting archive: pythongpu_windows_x86_64__cuda1131.txz -- Path = pythongpu_windows_x86_64__cuda1131.txz Type = xz Physical Size = 1976180228 Method = LZMA2:22 CRC64 Streams = 1523 Blocks = 1523 Cluster Size = 4210688 Everything is Ok Size: 6410311680 Compressed: 1976180228 09:58:33 (11564): .\7za.exe exited; CPU time 111.125000 09:58:33 (11564): wrapper: running C:\Windows\system32\cmd.exe (/C "del pythongpu_windows_x86_64__cuda1131.txz") 09:58:34 (11564): C:\Windows\system32\cmd.exe exited; CPU time 0.000000 09:58:34 (11564): wrapper: running .\7za.exe (x pythongpu_windows_x86_64__cuda1131.tar -y) 7-Zip (a) 22.01 (x86) : Copyright (c) 1999-2022 Igor Pavlov : 2022-07-15 Scanning the drive for archives: 1 file, 6410311680 bytes (6114 MiB) Extracting archive: pythongpu_windows_x86_64__cuda1131.tar -- Path = pythongpu_windows_x86_64__cuda1131.tar Type = tar Physical Size = 6410311680 Headers Size = 19965952 Code Page = UTF-8 Characteristics = GNU LongName ASCII Everything is Ok Files: 38141 Size: 6380353601 Compressed: 6410311680 10:01:10 (11564): .\7za.exe exited; CPU time 41.140625 10:01:10 (11564): wrapper: running C:\Windows\system32\cmd.exe (/C "del pythongpu_windows_x86_64__cuda1131.tar") 10:01:11 (11564): C:\Windows\system32\cmd.exe exited; CPU time 0.000000 10:01:11 (11564): wrapper: running python.exe (run.py) Starting!! Windows fix!! Define rollouts storage Define scheme Created CWorker with worker_index 0 Created GWorker with worker_index 0 Created UWorker with worker_index 0 Created training scheme. Define learner Created Learner. Look for a progress_last_chk file - if exists, adjust target_env_steps Define train loop Traceback (most recent call last): File "C:\ProgramData\BOINC\slots\4\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 196, in get_data self.next_batch = self.batches.__next__() AttributeError: 'GWorker' object has no attribute 'batches' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "run.py", line 475, in <module> main() File "run.py", line 131, in main learner.step() File "C:\ProgramData\BOINC\slots\4\lib\site-packages\pytorchrl\learner.py", line 46, in step info = self.update_worker.step() File "C:\ProgramData\BOINC\slots\4\lib\site-packages\pytorchrl\scheme\updates\u_worker.py", line 118, in step self.updater.step() File "C:\ProgramData\BOINC\slots\4\lib\site-packages\pytorchrl\scheme\updates\u_worker.py", line 259, in step grads = self.local_worker.step(self.decentralized_update_execution) File "C:\ProgramData\BOINC\slots\4\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 178, in step self.get_data() File "C:\ProgramData\BOINC\slots\4\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 211, in get_data self.collector.step() File "C:\ProgramData\BOINC\slots\4\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 490, in step rollouts = self.local_worker.collect_data(listen_to=["sync"], data_to_cpu=False) File "C:\ProgramData\BOINC\slots\4\lib\site-packages\pytorchrl\scheme\collection\c_worker.py", line 168, in collect_data train_info = self.collect_train_data(listen_to=listen_to) File "C:\ProgramData\BOINC\slots\4\lib\site-packages\pytorchrl\scheme\collection\c_worker.py", line 251, in collect_train_data self.storage.insert_transition(transition) File "C:\ProgramData\BOINC\slots\4\python_dependencies\buffer.py", line 794, in insert_transition state_embeds = [i["StateEmbeddings"] for i in sample[prl.INFO]] File "C:\ProgramData\BOINC\slots\4\python_dependencies\buffer.py", line 794, in <listcomp> state_embeds = [i["StateEmbeddings"] for i in sample[prl.INFO]] KeyError: 'StateEmbeddings' Traceback (most recent call last): File "C:\ProgramData\BOINC\slots\4\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 196, in get_data self.next_batch = self.batches.__next__() AttributeError: 'GWorker' object has no attribute 'batches' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "run.py", line 475, in <module> main() File "run.py", line 131, in main learner.step() File "C:\ProgramData\BOINC\slots\4\lib\site-packages\pytorchrl\learner.py", line 46, in step info = self.update_worker.step() File "C:\ProgramData\BOINC\slots\4\lib\site-packages\pytorchrl\scheme\updates\u_worker.py", line 118, in step self.updater.step() File "C:\ProgramData\BOINC\slots\4\lib\site-packages\pytorchrl\scheme\updates\u_worker.py", line 259, in step grads = self.local_worker.step(self.decentralized_update_execution) File "C:\ProgramData\BOINC\slots\4\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 178, in step self.get_data() File "C:\ProgramData\BOINC\slots\4\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 211, in get_data self.collector.step() File "C:\ProgramData\BOINC\slots\4\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 490, in step rollouts = self.local_worker.collect_data(listen_to=["sync"], data_to_cpu=False) File "C:\ProgramData\BOINC\slots\4\lib\site-packages\pytorchrl\scheme\collection\c_worker.py", line 168, in collect_data train_info = self.collect_train_data(listen_to=listen_to) File "C:\ProgramData\BOINC\slots\4\lib\site-packages\pytorchrl\scheme\collection\c_worker.py", line 251, in collect_train_data self.storage.insert_transition(transition) File "C:\ProgramData\BOINC\slots\4\python_dependencies\buffer.py", line 794, in insert_transition state_embeds = [i["StateEmbeddings"] for i in sample[prl.INFO]] File "C:\ProgramData\BOINC\slots\4\python_dependencies\buffer.py", line 794, in <listcomp> state_embeds = [i["StateEmbeddings"] for i in sample[prl.INFO]] KeyError: 'StateEmbeddings' 10:05:44 (11564): python.exe exited; CPU time 2660.984375 10:05:44 (11564): app exit status: 0x1 10:05:44 (11564): called boinc_finish(195) 0 bytes in 0 Free Blocks. 442 bytes in 9 Normal Blocks. 1144 bytes in 1 CRT Blocks. 0 bytes in 0 Ignore Blocks. 0 bytes in 0 Client Blocks. Largest number used: 0 bytes. Total allocations: 6550134 bytes. Dumping objects -> {10837} normal block at 0x0000024DEACAF4D0, 48 bytes long. Data: <PSI_SCRATCH=C:\P> 50 53 49 5F 53 43 52 41 54 43 48 3D 43 3A 5C 50 {10796} normal block at 0x0000024DEACAF310, 48 bytes long. Data: <HOMEPATH=C:\Prog> 48 4F 4D 45 50 41 54 48 3D 43 3A 5C 50 72 6F 67 {10785} normal block at 0x0000024DEACAEA50, 48 bytes long. Data: <HOME=C:\ProgramD> 48 4F 4D 45 3D 43 3A 5C 50 72 6F 67 72 61 6D 44 {10774} normal block at 0x0000024DEACAEF20, 48 bytes long. Data: <TMP=C:\ProgramDa> 54 4D 50 3D 43 3A 5C 50 72 6F 67 72 61 6D 44 61 {10763} normal block at 0x0000024DEACAEB30, 48 bytes long. Data: <TEMP=C:\ProgramD> 54 45 4D 50 3D 43 3A 5C 50 72 6F 67 72 61 6D 44 {10752} normal block at 0x0000024DEACAF3F0, 48 bytes long. Data: <TMPDIR=C:\Progra> 54 4D 50 44 49 52 3D 43 3A 5C 50 72 6F 67 72 61 {10671} normal block at 0x0000024DEAC990A0, 85 bytes long. Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65 ..\api\boinc_api.cpp(309) : {10668} normal block at 0x0000024DEACB0A60, 8 bytes long. Data: < {ìM > 00 00 7B EC 4D 02 00 00 {10030} normal block at 0x0000024DEAC9B890, 85 bytes long. Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65 {9426} normal block at 0x0000024DEACB0600, 8 bytes long. Data: < ÇÊêM > 80 C7 CA EA 4D 02 00 00 ..\zip\boinc_zip.cpp(122) : {545} normal block at 0x0000024DEACB12E0, 260 bytes long. Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 {532} normal block at 0x0000024DEACA99C0, 32 bytes long. Data: <ÐáÊêM ÀåÊêM > D0 E1 CA EA 4D 02 00 00 C0 E5 CA EA 4D 02 00 00 {531} normal block at 0x0000024DEACAE4E0, 52 bytes long. Data: < r ÍÍ > 01 00 00 00 72 00 CD CD 00 00 00 00 00 00 00 00 {526} normal block at 0x0000024DEACAE080, 43 bytes long. Data: < p ÍÍ > 01 00 00 00 70 00 CD CD 00 00 00 00 00 00 00 00 {521} normal block at 0x0000024DEACAE5C0, 44 bytes long. Data: < ÍÍáåÊêM > 01 00 00 00 00 00 CD CD E1 E5 CA EA 4D 02 00 00 {516} normal block at 0x0000024DEACAE1D0, 44 bytes long. Data: < ÍÍñáÊêM > 01 00 00 00 00 00 CD CD F1 E1 CA EA 4D 02 00 00 {506} normal block at 0x0000024DEACB39A0, 16 bytes long. Data: < ãÊêM > 20 E3 CA EA 4D 02 00 00 00 00 00 00 00 00 00 00 {505} normal block at 0x0000024DEACAE320, 40 bytes long. Data: < 9ËêM input.zi> A0 39 CB EA 4D 02 00 00 69 6E 70 75 74 2E 7A 69 {498} normal block at 0x0000024DEACB3950, 16 bytes long. Data: <è)ËêM > E8 29 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {497} normal block at 0x0000024DEACB3450, 16 bytes long. Data: <À)ËêM > C0 29 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {496} normal block at 0x0000024DEACB3770, 16 bytes long. Data: < )ËêM > 98 29 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {495} normal block at 0x0000024DEACB37C0, 16 bytes long. Data: <p)ËêM > 70 29 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {494} normal block at 0x0000024DEACB3900, 16 bytes long. Data: <H)ËêM > 48 29 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {493} normal block at 0x0000024DEACB3A40, 16 bytes long. Data: < )ËêM > 20 29 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {491} normal block at 0x0000024DEACB35E0, 16 bytes long. Data: <8úÊêM > 38 FA CA EA 4D 02 00 00 00 00 00 00 00 00 00 00 {490} normal block at 0x0000024DEACAA6E0, 32 bytes long. Data: <username=Compsci> 75 73 65 72 6E 61 6D 65 3D 43 6F 6D 70 73 63 69 {489} normal block at 0x0000024DEACB2E60, 16 bytes long. Data: < úÊêM > 10 FA CA EA 4D 02 00 00 00 00 00 00 00 00 00 00 {488} normal block at 0x0000024DEAC9A460, 64 bytes long. Data: <PYTHONPATH=.\lib> 50 59 54 48 4F 4E 50 41 54 48 3D 2E 5C 6C 69 62 {487} normal block at 0x0000024DEACB31D0, 16 bytes long. Data: <èùÊêM > E8 F9 CA EA 4D 02 00 00 00 00 00 00 00 00 00 00 {486} normal block at 0x0000024DEACAA3E0, 32 bytes long. Data: <PATH=.\Library\b> 50 41 54 48 3D 2E 5C 4C 69 62 72 61 72 79 5C 62 {485} normal block at 0x0000024DEACB3180, 16 bytes long. Data: <ÀùÊêM > C0 F9 CA EA 4D 02 00 00 00 00 00 00 00 00 00 00 {484} normal block at 0x0000024DEACB3A90, 16 bytes long. Data: < ùÊêM > 98 F9 CA EA 4D 02 00 00 00 00 00 00 00 00 00 00 {483} normal block at 0x0000024DEACB2DC0, 16 bytes long. Data: <pùÊêM > 70 F9 CA EA 4D 02 00 00 00 00 00 00 00 00 00 00 {482} normal block at 0x0000024DEACB3720, 16 bytes long. Data: <HùÊêM > 48 F9 CA EA 4D 02 00 00 00 00 00 00 00 00 00 00 {481} normal block at 0x0000024DEACB3040, 16 bytes long. Data: < ùÊêM > 20 F9 CA EA 4D 02 00 00 00 00 00 00 00 00 00 00 {480} normal block at 0x0000024DEACB36D0, 16 bytes long. Data: <øøÊêM > F8 F8 CA EA 4D 02 00 00 00 00 00 00 00 00 00 00 {479} normal block at 0x0000024DEACA9DE0, 32 bytes long. Data: <SystemRoot=C:\Wi> 53 79 73 74 65 6D 52 6F 6F 74 3D 43 3A 5C 57 69 {478} normal block at 0x0000024DEACB3C70, 16 bytes long. Data: <ÐøÊêM > D0 F8 CA EA 4D 02 00 00 00 00 00 00 00 00 00 00 {477} normal block at 0x0000024DEACA9F00, 32 bytes long. Data: <GPU_DEVICE_NUM=0> 47 50 55 5F 44 45 56 49 43 45 5F 4E 55 4D 3D 30 {476} normal block at 0x0000024DEACB39F0, 16 bytes long. Data: <¨øÊêM > A8 F8 CA EA 4D 02 00 00 00 00 00 00 00 00 00 00 {475} normal block at 0x0000024DEACAA2C0, 32 bytes long. Data: <NTHREADS=1 THREA> 4E 54 48 52 45 41 44 53 3D 31 00 54 48 52 45 41 {474} normal block at 0x0000024DEACB3B80, 16 bytes long. Data: < øÊêM > 80 F8 CA EA 4D 02 00 00 00 00 00 00 00 00 00 00 {473} normal block at 0x0000024DEACAF880, 480 bytes long. Data: < ;ËêM À¢ÊêM > 80 3B CB EA 4D 02 00 00 C0 A2 CA EA 4D 02 00 00 {472} normal block at 0x0000024DEACB3AE0, 16 bytes long. Data: < )ËêM > 00 29 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {471} normal block at 0x0000024DEACB3310, 16 bytes long. Data: <Ø(ËêM > D8 28 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {470} normal block at 0x0000024DEACB3590, 16 bytes long. Data: <°(ËêM > B0 28 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {469} normal block at 0x0000024DEACAE160, 48 bytes long. Data: </C "del pythongp> 2F 43 20 22 64 65 6C 20 70 79 74 68 6F 6E 67 70 {468} normal block at 0x0000024DEACB3630, 16 bytes long. Data: <ø'ËêM > F8 27 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {467} normal block at 0x0000024DEACB2FF0, 16 bytes long. Data: <Ð'ËêM > D0 27 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {466} normal block at 0x0000024DEACB3B30, 16 bytes long. Data: <¨'ËêM > A8 27 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {465} normal block at 0x0000024DEACB3400, 16 bytes long. Data: < 'ËêM > 80 27 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {464} normal block at 0x0000024DEACB34F0, 16 bytes long. Data: <X'ËêM > 58 27 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {463} normal block at 0x0000024DEACB38B0, 16 bytes long. Data: <0'ËêM > 30 27 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {462} normal block at 0x0000024DEACB3220, 16 bytes long. Data: < 'ËêM > 10 27 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {461} normal block at 0x0000024DEACB32C0, 16 bytes long. Data: <è&ËêM > E8 26 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {460} normal block at 0x0000024DEACAA500, 32 bytes long. Data: <C:\Windows\syste> 43 3A 5C 57 69 6E 64 6F 77 73 5C 73 79 73 74 65 {459} normal block at 0x0000024DEACB3130, 16 bytes long. Data: <À&ËêM > C0 26 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {458} normal block at 0x0000024DEACAE0F0, 48 bytes long. Data: <x pythongpu_wind> 78 20 70 79 74 68 6F 6E 67 70 75 5F 77 69 6E 64 {457} normal block at 0x0000024DEACB3270, 16 bytes long. Data: < &ËêM > 08 26 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {456} normal block at 0x0000024DEACB3BD0, 16 bytes long. Data: <à%ËêM > E0 25 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {455} normal block at 0x0000024DEACB3860, 16 bytes long. Data: <¸%ËêM > B8 25 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {454} normal block at 0x0000024DEACB3540, 16 bytes long. Data: < %ËêM > 90 25 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {453} normal block at 0x0000024DEACB2D20, 16 bytes long. Data: <h%ËêM > 68 25 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {452} normal block at 0x0000024DEACB2F50, 16 bytes long. Data: <@%ËêM > 40 25 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {451} normal block at 0x0000024DEACB2FA0, 16 bytes long. Data: < %ËêM > 20 25 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {450} normal block at 0x0000024DEACB3680, 16 bytes long. Data: <ø$ËêM > F8 24 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {449} normal block at 0x0000024DEACB3810, 16 bytes long. Data: <Ð$ËêM > D0 24 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {448} normal block at 0x0000024DEACAE780, 48 bytes long. Data: </C "del pythongp> 2F 43 20 22 64 65 6C 20 70 79 74 68 6F 6E 67 70 {447} normal block at 0x0000024DEACB2E10, 16 bytes long. Data: < $ËêM > 18 24 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {446} normal block at 0x0000024DEACB2F00, 16 bytes long. Data: <ð#ËêM > F0 23 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {445} normal block at 0x0000024DEACB2D70, 16 bytes long. Data: <È#ËêM > C8 23 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {444} normal block at 0x0000024DEACB33B0, 16 bytes long. Data: < #ËêM > A0 23 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {443} normal block at 0x0000024DEACB3360, 16 bytes long. Data: <x#ËêM > 78 23 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {442} normal block at 0x0000024DEACB34A0, 16 bytes long. Data: <P#ËêM > 50 23 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {441} normal block at 0x0000024DEACB04C0, 16 bytes long. Data: <0#ËêM > 30 23 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {440} normal block at 0x0000024DEACB08D0, 16 bytes long. Data: < #ËêM > 08 23 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {439} normal block at 0x0000024DEACAA380, 32 bytes long. Data: <C:\Windows\syste> 43 3A 5C 57 69 6E 64 6F 77 73 5C 73 79 73 74 65 {438} normal block at 0x0000024DEACB02E0, 16 bytes long. Data: <à"ËêM > E0 22 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {437} normal block at 0x0000024DEACAE710, 48 bytes long. Data: <x pythongpu_wind> 78 20 70 79 74 68 6F 6E 67 70 75 5F 77 69 6E 64 {436} normal block at 0x0000024DEACB0010, 16 bytes long. Data: <("ËêM > 28 22 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {435} normal block at 0x0000024DEACAFF20, 16 bytes long. Data: < "ËêM > 00 22 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {434} normal block at 0x0000024DEACB0880, 16 bytes long. Data: <Ø!ËêM > D8 21 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {433} normal block at 0x0000024DEACB01A0, 16 bytes long. Data: <°!ËêM > B0 21 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {432} normal block at 0x0000024DEACB0970, 16 bytes long. Data: < !ËêM > 88 21 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {431} normal block at 0x0000024DEACB0150, 16 bytes long. Data: <`!ËêM > 60 21 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {430} normal block at 0x0000024DEACB0E70, 16 bytes long. Data: <@!ËêM > 40 21 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {429} normal block at 0x0000024DEACB06A0, 16 bytes long. Data: < !ËêM > 18 21 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {428} normal block at 0x0000024DEACB0E20, 16 bytes long. Data: <ð ËêM > F0 20 CB EA 4D 02 00 00 00 00 00 00 00 00 00 00 {427} normal block at 0x0000024DEACB20F0, 2976 bytes long. Data: < ËêM .\7za.ex> 20 0E CB EA 4D 02 00 00 2E 5C 37 7A 61 2E 65 78 {66} normal block at 0x0000024DEACA3AB0, 16 bytes long. Data: < ê»¤ö > 80 EA BB A4 F6 7F 00 00 00 00 00 00 00 00 00 00 {65} normal block at 0x0000024DEACA42D0, 16 bytes long. Data: <@é»¤ö > 40 E9 BB A4 F6 7F 00 00 00 00 00 00 00 00 00 00 {64} normal block at 0x0000024DEACA3B50, 16 bytes long. Data: <øW¸¤ö > F8 57 B8 A4 F6 7F 00 00 00 00 00 00 00 00 00 00 {63} normal block at 0x0000024DEACA4460, 16 bytes long. Data: <ØW¸¤ö > D8 57 B8 A4 F6 7F 00 00 00 00 00 00 00 00 00 00 {62} normal block at 0x0000024DEACA46E0, 16 bytes long. Data: <P ¸¤ö > 50 04 B8 A4 F6 7F 00 00 00 00 00 00 00 00 00 00 {61} normal block at 0x0000024DEACA4280, 16 bytes long. Data: <0 ¸¤ö > 30 04 B8 A4 F6 7F 00 00 00 00 00 00 00 00 00 00 {60} normal block at 0x0000024DEACA3A60, 16 bytes long. Data: <à ¸¤ö > E0 02 B8 A4 F6 7F 00 00 00 00 00 00 00 00 00 00 {59} normal block at 0x0000024DEACA4140, 16 bytes long. Data: < ¸¤ö > 10 04 B8 A4 F6 7F 00 00 00 00 00 00 00 00 00 00 {58} normal block at 0x0000024DEACA3CE0, 16 bytes long. Data: <p ¸¤ö > 70 04 B8 A4 F6 7F 00 00 00 00 00 00 00 00 00 00 {57} normal block at 0x0000024DEACA4690, 16 bytes long. Data: < À¶¤ö > 18 C0 B6 A4 F6 7F 00 00 00 00 00 00 00 00 00 00 Object dump complete. </stderr_txt> ]]> ID: 59484 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 59485 - Posted: 19 Oct 2022, 8:45:38 UTC An example of that: workunit 27329338 has failed for everyone, mine after about 10%. ID: 59485 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 59486 - Posted: 19 Oct 2022, 9:08:02 UTC - in response to Message 59481. I am sorry, old batch jobs are still being mixed with new ones that do run successfully (I have been monitoring them). BOINC will eventually run out of bad jobs, the problems is that it attempts to run them 8 times... ID: 59486 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 59487 - Posted: 19 Oct 2022, 9:19:00 UTC - in response to Message 59486. the problems is that it attempts to run them 8 times... Look at that last workunit link. Above the list, it says: max # of error/total/success tasks 7, 10, 6 That's configurable by the project, I think at the application level. You might be able to reduce it a bit? ID: 59487 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 59488 - Posted: 20 Oct 2022, 5:51:53 UTC - in response to Message 59487. Yesterday I was unable to find the specific parameter that defines the number of job attempts. I will ask the main admin. Maybe it is set for all applications. ID: 59488 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 42,316 Level Scientific publications	Message 59489 - Posted: 20 Oct 2022, 7:43:13 UTC - in response to Message 59488. From looking at the server code in create_work.cpp module, the parameter is pulled from the work unit template file. You need to change the input (infile1, infile2 ...) file that feeds into the wu template file. Or directly change the wu template file. Refer to these documents. https://boinc.berkeley.edu/trac/wiki/JobSubmission https://boinc.berkeley.edu/trac/wiki/JobTemplates#Inputtemplates ID: 59489 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 59490 - Posted: 20 Oct 2022, 7:43:50 UTC - in response to Message 59488. Found some documentation: in https://boinc.berkeley.edu/trac/wiki/JobSubmission The following job parameters may be passed in the input template, or as command-line arguments to create_work; the input template has precedence. If not specified, the given defaults will be used. --target_nresults x default 2 --max_error_results x default 3 --max_total_results x default 10 --max_success_results x default 6 I can't find any similar detail for Local web-based job submission or Remote job submission, but it must be buried somewhere in there. You're not using the stated default values, so somebody at GPUGrid must have found it at least once! ID: 59490 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 91,654 Level Scientific publications	Message 59491 - Posted: 20 Oct 2022, 9:23:24 UTC - in response to Message 59477. Abouh wrote: Hello, thanks you for reporting the job errors. Sorry to all, there was an error on my side setting up a batch of experiment agents. ... They will fail briefly after starting as reported, so not a lot of compute will be wasted. well, whatever "they will fail briefly after starting" means :-) Mine are failing after 3.780 - 8.597 seconds :-( Is there no way to call them back or delete them from the server? ID: 59491 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 59492 - Posted: 20 Oct 2022, 13:46:39 UTC - in response to Message 59487. Last modified: 20 Oct 2022, 13:51:25 UTC I see these values can be set in the app workunit template as mentioned --max_error_results x default 3 --max_total_results x default 10 --max_success_results x default 6 I have checked, and for PythonGPU and PythonGPU apps the parameters are not specified, so the default values should apply (also coherent with the info previously posted). However, the number of times the server attempts to solve a task by sending it to a GPUGrid machine before giving up is 8. So it does not seem like it is specified by these parameters to me (shouldn't it be 3 according to the default value?). I have asked for help to the admin server, maybe the parameters are overwritten somewhere else. Even if not for this time, it will be convenient to know to solve future issues like this one. Sorry again for the problems. ID: 59492 · Rating: 0 · rate: / Reply Quote

KAMasud Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level Scientific publications	Message 59493 - Posted: 20 Oct 2022, 16:59:01 UTC - in response to Message 59491. Abouh wrote: Hello, thanks you for reporting the job errors. Sorry to all, there was an error on my side setting up a batch of experiment agents. ... They will fail briefly after starting as reported, so not a lot of compute will be wasted. well, whatever "they will fail briefly after starting" means :-) Mine are failing after 3.780 - 8.597 seconds :-( Is there no way to call them back or delete them from the server? Not anymore. Anyway, after 9.45 UTC something seems to have changed. I have two Wu's (fingers crossed and touch wood) that have reached 35% in six hours. ID: 59493 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 91,654 Level Scientific publications	Message 59496 - Posted: 20 Oct 2022, 19:04:05 UTC can someone give me advice with regard to the following dilemma: Until last week, on my host with 2 RTX3070 inside I could process 2 Pythons concurrently on each GPU, i.e. 4 Pythons at a time. On device_0 VRAM became rather tight - it comes with 8.192MB, about 300-400MB were used for the monitor, and with the two Pythons the total VRAM usage was at around 8.112MB (as said: tight, but it worked fine). On device_1 it was not that tight, since no VRAM usage for the monitor. Since yesterday I notice that device_0 uses about 1.400MB for the monitor - no idea why. So no way to process 2 Pythons concurrently. And no way for device_2 to run 2 Pythons either, because any additional Python beyond the one running on device_0 and the one running on device_1 would automatically start on device_0. Hence, my question to the experts here: is there a way to tell the third Python to run on device_1, instead of device_0 ? Or, any idea how I could lower the VRAM usage for the monitor on device_0? As said, it was much less before, all of a sudden it jumped up (I was naiv enough to connect the monitor cable to device_1 - which did, of course, not work). Or any other ideas? ID: 59496 · Rating: 0 · rate: / Reply Quote

GS Send message Joined: 16 Oct 22 Posts: 12 Credit: 1,382,500 RAC: 0 Level Scientific publications	Message 59497 - Posted: 20 Oct 2022, 19:44:56 UTC Finally, WU #38 worked and was completed within two hours. Thanks, earned my first points here. ID: 59497 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 9,834 Level Scientific publications	Message 59498 - Posted: 20 Oct 2022, 21:39:09 UTC - in response to Message 59496. reboot the system and free up the VRAM maybe. ID: 59498 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 42,316 Level Scientific publications	Message 59499 - Posted: 20 Oct 2022, 23:54:05 UTC Browser tabs are notorious RAM eaters. Both in the cpu and gpu if you have hardware acceleration enabled in the browser. You can use gpu_exclude statement in the cc_config.xml file to keep a gpu task off specific gpus. I do that for keeping the tasks off my fastest gpus which run other projects. But that is permanent for the BOINC session that is booted. You would have to edit cc_config files for different sessions and boot what you need as necessary to get around this issue. Doable but cumbersome. ID: 59499 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 9,834 Level Scientific publications	Message 59500 - Posted: 21 Oct 2022, 1:28:32 UTC - in response to Message 59499. Browser tabs are notorious RAM eaters. Both in the cpu and gpu if you have hardware acceleration enabled in the browser. good call. forgot the browser can use some GPU resources. that's a good thing to check. ID: 59500 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 91,654 Level Scientific publications	Message 59501 - Posted: 21 Oct 2022, 6:09:14 UTC many thanks, folks, for your replies regarding my VRAM problem. I have rebooted, and the VRAM usage of device_0 was almost 2GB. No browser open, no other apps either (except GPU-Z, the MSI Afterburner, MemInfo, DUMeter, and the Windows Task Manager - these apps had been present before, too). Now, with processing 1 Python on each GPU, the VRAM situation is as follows: device_0: 6.034MB device_1: 3.932MB hence, a second Python could be run on device_1. I know about the "gpu_exclude" thing in the cc_config.xml, but for sure this is a very cumbersome method; and I am not even sure whether in Windows a running Python survives a BOINC reboot (I think a did that once before, for a different reason, and the Python was gone). The only thing I could try again is to open the second instance of BOINC which I had configured some time ago, with the "gpu_exclude" provision for device_0. However, when I tried this out, everything crashed after a short while (1 or 2 hours). I did not find out why. Perhaps it was simply a coincidence and would not happen again? It's really a pitty that with all these various configuration possibilities via cc_config (and also app_config.xml) there is no way to have a configuration available which would solve my problem :-( ID: 59501 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 42,316 Level Scientific publications	Message 59502 - Posted: 21 Oct 2022, 6:39:35 UTC I think you may have to accept the tasks are what they are. Variable because of the different parameter sets. Some may use little RAM and some may use a lot. So you may not always be able to run doubles on your 8GB cards. ID: 59502 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 91,654 Level Scientific publications	Message 59503 - Posted: 21 Oct 2022, 8:32:19 UTC - in response to Message 59502. Last modified: 21 Oct 2022, 8:39:11 UTC I think you may have to accept the tasks are what they are. Variable because of the different parameter sets. Some may use little RAM and some may use a lot. So you may not always be able to run doubles on your 8GB cards. yes, meanwhile I noticed on the other two hosts which are running Pythons ATM: the amount of VRAM used varies. No problem of course on the host with the Quadro P5000 which comes with 16GB. Out of which only some 7.5GB are being used even with 4 tasks in parallel, due to the lower number of CUDA cores of this GPU. ID: 59503 · Rating: 0 · rate: / Reply Quote

Experimental Python tasks (beta) - task description