Message boards :
News :
Experimental Python tasks (beta) - task description
Message board moderation
Previous · 1 . . . 46 · 47 · 48 · 49 · 50 · Next
| Author | Message |
|---|---|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 731 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I had a couple of the ATM's finish successfully a week ago, but long cleared from the database for anyone to look at. |
|
Send message Joined: 6 Jan 15 Posts: 76 Credit: 25,499,534,331 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Here is one completed Pop Piasa https://www.gpugrid.net/result.php?resultid=33327466 |
|
Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level ![]() Scientific publications ![]()
|
Thanks Greger, it's good to have a successful example to compare with when examining errors. I appreciate it. |
|
Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level ![]() Scientific publications ![]()
|
Windows here. You know, sometimes these WUs go to sleep, then I click the mouse and it starts running again. Not all WUs. task 33333635 |
|
Send message Joined: 18 Jul 13 Posts: 79 Credit: 210,528,292 RAC: 0 Level ![]() Scientific publications
|
Maybe you can change system power settings? Disable spinning down hard drive for example? |
|
Send message Joined: 22 Apr 19 Posts: 3 Credit: 24,096,692 RAC: 4 Level ![]() Scientific publications
|
My recent results uploaded to GPUGRID often got "Error while computing" and lost all credits, I don't know why, what should I do ? 33359888 27429785 604308 17 Mar 2023 | 13:14:49 UTC 19 Mar 2023 | 14:23:21 UTC Error while computing 50,964.34 50,964.34 --- Python apps for GPU hosts v4.04 (cuda1131) 19/3/2023 17:37:41 | | Starting BOINC client version 7.20.2 for windows_x86_64 19/3/2023 17:37:41 | | log flags: file_xfer, sched_ops, task 19/3/2023 17:37:41 | | Libraries: libcurl/7.84.0-DEV Schannel zlib/1.2.12 19/3/2023 17:37:41 | | Data directory: C:\ProgramData\BOINC 19/3/2023 17:37:41 | | 19/3/2023 17:37:41 | | CUDA: NVIDIA GPU 0: NVIDIA GeForce RTX 3060 (driver version 531.18, CUDA version 12.1, compute capability 8.6, 12288MB, 12288MB available, 12738 GFLOPS peak) 19/3/2023 17:37:41 | | OpenCL: NVIDIA GPU 0: NVIDIA GeForce RTX 3060 (driver version 531.18, device version OpenCL 3.0 CUDA, 12288MB, 12288MB available, 12738 GFLOPS peak) 19/3/2023 17:37:41 | | Windows processor group 0: 20 processors 19/3/2023 17:37:41 | | Host name: NGcomputer 19/3/2023 17:37:41 | | Processor: 20 GenuineIntel 12th Gen Intel(R) Core(TM) i7-12700F [Family 6 Model 151 Stepping 2] 19/3/2023 17:37:41 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 tm2 pbe fsgsbase bmi1 smep bmi2 19/3/2023 17:37:41 | | OS: Microsoft Windows Vista: Home Premium x64 Edition, Service Pack 2, (06.00.6002.00) 19/3/2023 17:37:41 | | Memory: 15.76 GB physical, 63.76 GB virtual 19/3/2023 17:37:41 | | Disk: 952.93 GB total, 700.19 GB free 19/3/2023 17:37:41 | | Local time is UTC +8 hours 19/3/2023 17:37:41 | | No WSL found. 19/3/2023 17:37:41 | | VirtualBox version: 7.0.6 |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 731 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
You have to look at the errored task results on the website to find why you errored. Two of the tasks errored out because you don't have enough virtual memory available for the expansion phase where the task sets up its libraries. On Windows it is advised to set up your system page file for at least 50GB size. |
|
Send message Joined: 22 Apr 19 Posts: 3 Credit: 24,096,692 RAC: 4 Level ![]() Scientific publications
|
You have to look at the errored task results on the website to find why you errored. Thank you very much |
|
Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level ![]() Scientific publications ![]()
|
The server status shows WU's are available but my machines have received no task since yesterday. |
|
Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
Hello! The previous population experiment ended and needed to analyse the results. But I am starting a new experiment today. |
|
Send message Joined: 22 Apr 19 Posts: 3 Credit: 24,096,692 RAC: 4 Level ![]() Scientific publications
|
I don't understand why my task fail, why ? Name e00002a04604-ABOU_rnd_ppod_expand_demos29_2_exp5-0-1-RND9901_1 Workunit 27434170 Created 20 Mar 2023 | 22:42:54 UTC Sent 21 Mar 2023 | 0:31:58 UTC Received 21 Mar 2023 | 11:05:25 UTC Server state Over Outcome Computation error Client state Compute error Exit status 195 (0xc3) EXIT_CHILD_FAILED Computer ID 604308 Report deadline 26 Mar 2023 | 0:31:58 UTC Run time 13,385.72 CPU time 13,385.72 Validate state Invalid Credit 0.00 Application version Python apps for GPU hosts v4.04 (cuda1131) Stderr output <core_client_version>7.20.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code 195 (0xc3)</message> <stderr_txt> 08:32:00 (19880): wrapper (7.9.26016): starting 08:32:00 (19880): wrapper: running .\7za.exe (x pythongpu_windows_x86_64__cuda1131.txz -y) 7-Zip (a) 22.01 (x86) : Copyright (c) 1999-2022 Igor Pavlov : 2022-07-15 Scanning the drive for archives: 1 file, 1976180228 bytes (1885 MiB) Extracting archive: pythongpu_windows_x86_64__cuda1131.txz -- Path = pythongpu_windows_x86_64__cuda1131.txz Type = xz Physical Size = 1976180228 Method = LZMA2:22 CRC64 Streams = 1523 Blocks = 1523 Cluster Size = 4210688 Everything is Ok Size: 6410311680 Compressed: 1976180228 08:34:12 (19880): .\7za.exe exited; CPU time 107.906250 08:34:12 (19880): wrapper: running C:\Windows\system32\cmd.exe (/C "del pythongpu_windows_x86_64__cuda1131.txz") 08:34:13 (19880): C:\Windows\system32\cmd.exe exited; CPU time 0.000000 08:34:13 (19880): wrapper: running .\7za.exe (x pythongpu_windows_x86_64__cuda1131.tar -y) 7-Zip (a) 22.01 (x86) : Copyright (c) 1999-2022 Igor Pavlov : 2022-07-15 Scanning the drive for archives: 1 file, 6410311680 bytes (6114 MiB) Extracting archive: pythongpu_windows_x86_64__cuda1131.tar -- Path = pythongpu_windows_x86_64__cuda1131.tar Type = tar Physical Size = 6410311680 Headers Size = 19965952 Code Page = UTF-8 Characteristics = GNU LongName ASCII Everything is Ok Files: 38141 Size: 6380353601 Compressed: 6410311680 08:35:04 (19880): .\7za.exe exited; CPU time 9.515625 08:35:04 (19880): wrapper: running C:\Windows\system32\cmd.exe (/C "del pythongpu_windows_x86_64__cuda1131.tar") 08:35:05 (19880): C:\Windows\system32\cmd.exe exited; CPU time 0.000000 08:35:05 (19880): wrapper: running python.exe (run.py) Windows fix executed. Detected GPUs: 1 Define environment factory Define algorithm factory Define storage factory Define scheme Created CWorker with worker_index 0 Created GWorker with worker_index 0 Created UWorker with worker_index 0 Created training scheme. Define learner Created Learner. Detected memory leaks! Dumping objects -> ..\api\boinc_api.cpp(309) : {16368} normal block at 0x0000025533011E30, 8 bytes long. Data: < 4U > 00 00 D1 34 55 02 00 00 ..\lib\diagnostics_win.cpp(417) : {15114} normal block at 0x000002553306C260, 1080 bytes long. Data: < > D8 1D 00 00 CD CD CD CD 8C 01 00 00 00 00 00 00 ..\zip\boinc_zip.cpp(122) : {550} normal block at 0x0000025532FFBE70, 260 bytes long. Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 {536} normal block at 0x0000025532FFEC80, 52 bytes long. Data: < r > 01 00 00 00 72 00 CD CD 00 00 00 00 00 00 00 00 {531} normal block at 0x0000025533009E00, 43 bytes long. Data: < p > 01 00 00 00 70 00 CD CD 00 00 00 00 00 00 00 00 {526} normal block at 0x000002553300A940, 44 bytes long. Data: < a 3U > 01 00 00 00 00 00 CD CD 61 A9 00 33 55 02 00 00 {521} normal block at 0x000002553300A080, 44 bytes long. Data: < 3U > 01 00 00 00 00 00 CD CD A1 A0 00 33 55 02 00 00 Object dump complete. 16:26:14 (3936): wrapper (7.9.26016): starting 16:26:14 (3936): wrapper: running python.exe (run.py) Windows fix executed. Detected GPUs: 1 Define environment factory Define algorithm factory Define storage factory Define scheme Created CWorker with worker_index 0 Created GWorker with worker_index 0 Created UWorker with worker_index 0 Created training scheme. Define learner Created Learner. Look for a progress_last_chk file - if exists, adjust target_env_steps Define train loop Traceback (most recent call last): File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 196, in get_data self.next_batch = self.batches.__next__() StopIteration During handling of the above exception, another exception occurred: Traceback (most recent call last): File "run.py", line 471, in <module> main() File "run.py", line 136, in main learner.step() File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\learner.py", line 46, in step info = self.update_worker.step() File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\scheme\updates\u_worker.py", line 118, in step self.updater.step() File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\scheme\updates\u_worker.py", line 259, in step grads = self.local_worker.step(self.decentralized_update_execution) File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 178, in step self.get_data() File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 211, in get_data self.collector.step() File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 490, in step rollouts = self.local_worker.collect_data(listen_to=["sync"], data_to_cpu=False) File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\scheme\collection\c_worker.py", line 168, in collect_data train_info = self.collect_train_data(listen_to=listen_to) File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\scheme\collection\c_worker.py", line 242, in collect_train_data obs2, reward, done2, episode_infos = self.envs_train.step(clip_act) File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\agent\env\vec_envs\vec_env_base.py", line 85, in step return self.step_wait() File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\agent\env\vec_envs\vector_wrappers.py", line 72, in step_wait obs = torch.from_numpy(obs).float().to(self.device) RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 3612672 bytes. 19:03:33 (3936): python.exe exited; CPU time 10494.140625 19:03:33 (3936): app exit status: 0x1 19:03:33 (3936): called boinc_finish(195) 0 bytes in 0 Free Blocks. 552 bytes in 9 Normal Blocks. 1144 bytes in 1 CRT Blocks. 0 bytes in 0 Ignore Blocks. 0 bytes in 0 Client Blocks. Largest number used: 0 bytes. Total allocations: 179414097 bytes. Dumping objects -> {16455} normal block at 0x000001D92CFBFBE0, 48 bytes long. Data: <PSI_SCRATCH=C:\P> 50 53 49 5F 53 43 52 41 54 43 48 3D 43 3A 5C 50 {16414} normal block at 0x000001D92CFC08F0, 48 bytes long. Data: <HOMEPATH=C:\Prog> 48 4F 4D 45 50 41 54 48 3D 43 3A 5C 50 72 6F 67 {16403} normal block at 0x000001D92CFBFF50, 48 bytes long. Data: <HOME=C:\ProgramD> 48 4F 4D 45 3D 43 3A 5C 50 72 6F 67 72 61 6D 44 {16392} normal block at 0x000001D92CFC0790, 48 bytes long. Data: <TMP=C:\ProgramDa> 54 4D 50 3D 43 3A 5C 50 72 6F 67 72 61 6D 44 61 {16381} normal block at 0x000001D92CFC0630, 48 bytes long. Data: <TEMP=C:\ProgramD> 54 45 4D 50 3D 43 3A 5C 50 72 6F 67 72 61 6D 44 {16370} normal block at 0x000001D92CFC0160, 48 bytes long. Data: <TMPDIR=C:\Progra> 54 4D 50 44 49 52 3D 43 3A 5C 50 72 6F 67 72 61 {16289} normal block at 0x000001D92CF9C280, 140 bytes long. Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65 ..\api\boinc_api.cpp(309) : {16286} normal block at 0x000001D92CFB20C0, 8 bytes long. Data: < 8- > 00 00 38 2D D9 01 00 00 {15645} normal block at 0x000001D92CFAE470, 140 bytes long. Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65 {15033} normal block at 0x000001D92CFB2840, 8 bytes long. Data: <@ 7- > 40 18 37 2D D9 01 00 00 ..\zip\boinc_zip.cpp(122) : {550} normal block at 0x000001D92CF9B820, 260 bytes long. Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 {537} normal block at 0x000001D92CFAA2D0, 32 bytes long. Data: < , P , > B0 A9 FA 2C D9 01 00 00 50 AF FA 2C D9 01 00 00 {536} normal block at 0x000001D92CFC0580, 52 bytes long. Data: < r > 01 00 00 00 72 00 CD CD 00 00 00 00 00 00 00 00 {531} normal block at 0x000001D92CFAA0F0, 43 bytes long. Data: < p > 01 00 00 00 70 00 CD CD 00 00 00 00 00 00 00 00 {526} normal block at 0x000001D92CFAAF50, 44 bytes long. Data: < q , > 01 00 00 00 00 00 CD CD 71 AF FA 2C D9 01 00 00 {521} normal block at 0x000001D92CFAA9B0, 44 bytes long. Data: < , > 01 00 00 00 00 00 CD CD D1 A9 FA 2C D9 01 00 00 {511} normal block at 0x000001D92CFBBDB0, 16 bytes long. Data: < , > B0 AE FA 2C D9 01 00 00 00 00 00 00 00 00 00 00 {510} normal block at 0x000001D92CFAAEB0, 40 bytes long. Data: < , input.zi> B0 BD FB 2C D9 01 00 00 69 6E 70 75 74 2E 7A 69 {503} normal block at 0x000001D92CFBCAA0, 16 bytes long. Data: <h , > 68 F8 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {502} normal block at 0x000001D92CFBCA10, 16 bytes long. Data: <@ , > 40 F8 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {501} normal block at 0x000001D92CFBCC50, 16 bytes long. Data: < , > 18 F8 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {500} normal block at 0x000001D92CFBB0C0, 16 bytes long. Data: < , > F0 F7 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {499} normal block at 0x000001D92CFBC980, 16 bytes long. Data: < , > C8 F7 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {498} normal block at 0x000001D92CFBAFA0, 16 bytes long. Data: < , > A0 F7 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {496} normal block at 0x000001D92CFBBD20, 16 bytes long. Data: <X , > 58 E9 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00 {495} normal block at 0x000001D92CFAAE10, 32 bytes long. Data: <username=Compsci> 75 73 65 72 6E 61 6D 65 3D 43 6F 6D 70 73 63 69 {494} normal block at 0x000001D92CFBCBC0, 16 bytes long. Data: <0 , > 30 E9 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00 {493} normal block at 0x000001D92CF9C3B0, 64 bytes long. Data: <PYTHONPATH=.\lib> 50 59 54 48 4F 4E 50 41 54 48 3D 2E 5C 6C 69 62 {492} normal block at 0x000001D92CFBCE90, 16 bytes long. Data: < , > 08 E9 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00 {491} normal block at 0x000001D92CFAAFF0, 32 bytes long. Data: <PATH=.\Library\b> 50 41 54 48 3D 2E 5C 4C 69 62 72 61 72 79 5C 62 {490} normal block at 0x000001D92CFBC350, 16 bytes long. Data: < , > E0 E8 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00 {489} normal block at 0x000001D92CFBC1A0, 16 bytes long. Data: < , > B8 E8 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00 {488} normal block at 0x000001D92CFBC8F0, 16 bytes long. Data: < , > 90 E8 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00 {487} normal block at 0x000001D92CFBB420, 16 bytes long. Data: <h , > 68 E8 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00 {486} normal block at 0x000001D92CFBBA50, 16 bytes long. Data: <@ , > 40 E8 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00 {485} normal block at 0x000001D92CFBC110, 16 bytes long. Data: < , > 18 E8 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00 {484} normal block at 0x000001D92CFAA730, 32 bytes long. Data: <SystemRoot=C:\Wi> 53 79 73 74 65 6D 52 6F 6F 74 3D 43 3A 5C 57 69 {483} normal block at 0x000001D92CFBC7D0, 16 bytes long. Data: < , > F0 E7 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00 {482} normal block at 0x000001D92CFAA370, 32 bytes long. Data: <GPU_DEVICE_NUM=0> 47 50 55 5F 44 45 56 49 43 45 5F 4E 55 4D 3D 30 {481} normal block at 0x000001D92CFBC6B0, 16 bytes long. Data: < , > C8 E7 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00 {480} normal block at 0x000001D92CFAAC30, 32 bytes long. Data: <NTHREADS=1 THREA> 4E 54 48 52 45 41 44 53 3D 31 00 54 48 52 45 41 {479} normal block at 0x000001D92CFBC620, 16 bytes long. Data: < , > A0 E7 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00 {478} normal block at 0x000001D92CFAE7A0, 480 bytes long. Data: < , 0 , > 20 C6 FB 2C D9 01 00 00 30 AC FA 2C D9 01 00 00 {477} normal block at 0x000001D92CFBCE00, 16 bytes long. Data: < , > 80 F7 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {476} normal block at 0x000001D92CFBCD70, 16 bytes long. Data: <X , > 58 F7 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {475} normal block at 0x000001D92CFBB780, 16 bytes long. Data: <0 , > 30 F7 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {474} normal block at 0x000001D92CFAE6F0, 48 bytes long. Data: </C "del pythongp> 2F 43 20 22 64 65 6C 20 70 79 74 68 6F 6E 67 70 {473} normal block at 0x000001D92CFBC590, 16 bytes long. Data: <x , > 78 F6 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {472} normal block at 0x000001D92CFBB150, 16 bytes long. Data: <P , > 50 F6 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {471} normal block at 0x000001D92CFBC500, 16 bytes long. Data: <( , > 28 F6 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {470} normal block at 0x000001D92CFBB300, 16 bytes long. Data: < , > 00 F6 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {469} normal block at 0x000001D92CFBCCE0, 16 bytes long. Data: < , > D8 F5 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {468} normal block at 0x000001D92CFBCB30, 16 bytes long. Data: < , > B0 F5 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {467} normal block at 0x000001D92CFBC740, 16 bytes long. Data: < , > 90 F5 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {466} normal block at 0x000001D92CFBBB70, 16 bytes long. Data: <h , > 68 F5 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {465} normal block at 0x000001D92CFAA690, 32 bytes long. Data: <C:\Windows\syste> 43 3A 5C 57 69 6E 64 6F 77 73 5C 73 79 73 74 65 {464} normal block at 0x000001D92CFBBAE0, 16 bytes long. Data: <@ , > 40 F5 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {463} normal block at 0x000001D92CFA8510, 48 bytes long. Data: <x pythongpu_wind> 78 20 70 79 74 68 6F 6E 67 70 75 5F 77 69 6E 64 {462} normal block at 0x000001D92CFBC860, 16 bytes long. Data: < , > 88 F4 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {461} normal block at 0x000001D92CFBB810, 16 bytes long. Data: <` , > 60 F4 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {460} normal block at 0x000001D92CFBB030, 16 bytes long. Data: <8 , > 38 F4 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {459} normal block at 0x000001D92CFBC080, 16 bytes long. Data: < , > 10 F4 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {458} normal block at 0x000001D92CFBB9C0, 16 bytes long. Data: < , > E8 F3 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {457} normal block at 0x000001D92CFBE000, 16 bytes long. Data: < , > C0 F3 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {456} normal block at 0x000001D92CFBEB40, 16 bytes long. Data: < , > A0 F3 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {455} normal block at 0x000001D92CFBDF70, 16 bytes long. Data: <x , > 78 F3 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {454} normal block at 0x000001D92CFBDAF0, 16 bytes long. Data: <P , > 50 F3 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {453} normal block at 0x000001D92CFA8460, 48 bytes long. Data: </C "del pythongp> 2F 43 20 22 64 65 6C 20 70 79 74 68 6F 6E 67 70 {452} normal block at 0x000001D92CFBDEE0, 16 bytes long. Data: < , > 98 F2 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {451} normal block at 0x000001D92CFBD8B0, 16 bytes long. Data: <p , > 70 F2 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {450} normal block at 0x000001D92CFBD790, 16 bytes long. Data: <H , > 48 F2 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {449} normal block at 0x000001D92CFBDE50, 16 bytes long. Data: < , > 20 F2 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {448} normal block at 0x000001D92CFBEAB0, 16 bytes long. Data: < , > F8 F1 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {447} normal block at 0x000001D92CFBD9D0, 16 bytes long. Data: < , > D0 F1 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {446} normal block at 0x000001D92CFBD700, 16 bytes long. Data: < , > B0 F1 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {445} normal block at 0x000001D92CFBEA20, 16 bytes long. Data: < , > 88 F1 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {444} normal block at 0x000001D92CFAA4B0, 32 bytes long. Data: <C:\Windows\syste> 43 3A 5C 57 69 6E 64 6F 77 73 5C 73 79 73 74 65 {443} normal block at 0x000001D92CFBD820, 16 bytes long. Data: <` , > 60 F1 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {442} normal block at 0x000001D92CFA38F0, 48 bytes long. Data: <x pythongpu_wind> 78 20 70 79 74 68 6F 6E 67 70 75 5F 77 69 6E 64 {441} normal block at 0x000001D92CFBDA60, 16 bytes long. Data: < , > A8 F0 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {440} normal block at 0x000001D92CFBE900, 16 bytes long. Data: < , > 80 F0 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {439} normal block at 0x000001D92CFBE870, 16 bytes long. Data: <X , > 58 F0 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {438} normal block at 0x000001D92CFBEBD0, 16 bytes long. Data: <0 , > 30 F0 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {437} normal block at 0x000001D92CFBE6C0, 16 bytes long. Data: < , > 08 F0 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {436} normal block at 0x000001D92CFBE480, 16 bytes long. Data: < , > E0 EF FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {435} normal block at 0x000001D92CFBD4C0, 16 bytes long. Data: < , > C0 EF FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {434} normal block at 0x000001D92CFBDC10, 16 bytes long. Data: < , > 98 EF FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {433} normal block at 0x000001D92CFBD430, 16 bytes long. Data: <p , > 70 EF FB 2C D9 01 00 00 00 00 00 00 00 00 00 00 {432} normal block at 0x000001D92CFBEF70, 2976 bytes long. Data: <0 , .\7za.ex> 30 D4 FB 2C D9 01 00 00 2E 5C 37 7A 61 2E 65 78 {69} normal block at 0x000001D92CFACC20, 16 bytes long. Data: < ;* > 80 EA 3B 2A F6 7F 00 00 00 00 00 00 00 00 00 00 {68} normal block at 0x000001D92CFACA70, 16 bytes long. Data: <@ ;* > 40 E9 3B 2A F6 7F 00 00 00 00 00 00 00 00 00 00 {67} normal block at 0x000001D92CFAC0E0, 16 bytes long. Data: < W8* > F8 57 38 2A F6 7F 00 00 00 00 00 00 00 00 00 00 {66} normal block at 0x000001D92CFAC050, 16 bytes long. Data: < W8* > D8 57 38 2A F6 7F 00 00 00 00 00 00 00 00 00 00 {65} normal block at 0x000001D92CFAC9E0, 16 bytes long. Data: <P 8* > 50 04 38 2A F6 7F 00 00 00 00 00 00 00 00 00 00 {64} normal block at 0x000001D92CFAC680, 16 bytes long. Data: <0 8* > 30 04 38 2A F6 7F 00 00 00 00 00 00 00 00 00 00 {63} normal block at 0x000001D92CFACB00, 16 bytes long. Data: < 8* > E0 02 38 2A F6 7F 00 00 00 00 00 00 00 00 00 00 {62} normal block at 0x000001D92CFAC950, 16 bytes long. Data: < 8* > 10 04 38 2A F6 7F 00 00 00 00 00 00 00 00 00 00 {61} normal block at 0x000001D92CFAC8C0, 16 bytes long. Data: <p 8* > 70 04 38 2A F6 7F 00 00 00 00 00 00 00 00 00 00 {60} normal block at 0x000001D92CFAC710, 16 bytes long. Data: < 6* > 18 C0 36 2A F6 7F 00 00 00 00 00 00 00 00 00 00 Object dump complete. </stderr_txt> ]]> |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 5,269 Level ![]() Scientific publications
|
it's right in your message: "RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 3612672 bytes." that's why. this is a known problem with the windows app. you need to increase your virtual memory (page file) to like 50GB. also it looks like your host only has 16GB system RAM. if you're running other things that use lots of memory (like rosetta or einstein GW CPU tasks) then you might be running out of system memory too. these python tasks need about 10GB of system memory for each one.
|
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I am experiencing a strange problem on my PC with two RTX3070 inside, CPU Intel i9-10900KF (10 cores/20 threads), 128 GB RAM: until about 2 weeks ago, I crunched 4 Python tasks concurrently (2 ea. GPU). Then I processed ACEMD_3 and ATM tasks, the queues of which ran dry now. So I changed back to Python - and surprise: after downloading 4 tasks, only 3 started, the fourth one stays in status "ready to start". I had made no changes, neither in the hardware, nore in the software, nor in the settings. Anyone any idea what I can do in order to get the fourth task to run? |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 731 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Consequence of running the acemd3 and ATM tasks is that it dropped your APR rate on the host and now the client thinks that you will not be able to finish the second Python task before deadline. You probably have the single Python task in EDF mode now. Try adding <fraction_done_exact/> into every app section in your app_config.xml That helps produce more realistic progress percentages and could/may persuade the client to let you run that second task on that gpu. But you may just have to let the APR mechanism balance out again. One of the many flaws in BOINC. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
thank you, Keith, for the explanation :-) <fraction_done_exact/> has been in the app_config to begin with. So I am afraid I just need to wait ... |
|
Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level ![]() Scientific publications ![]()
|
Absolutely no usage of GPU only CPU. task 27464783 |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 5,269 Level ![]() Scientific publications
|
for the first 5 minutes or so, there will only be CPU use and no GPU use because the task is extracting the python environment to the designated slot. after this, the task will run and start using both GPU and CPU. GPU use will be low.
|
|
Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level ![]() Scientific publications ![]()
|
No, it was not at 5% but 29% and stuck. I exited BOINC and restarted. The WU is now normal at 34%. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 5,269 Level ![]() Scientific publications
|
i said 5 minutes not 5%. but sounds like an issue with your system, not the tasks. my tasks have never gotten stuck like that.
|
|
Send message Joined: 18 Jul 13 Posts: 79 Credit: 210,528,292 RAC: 0 Level ![]() Scientific publications
|
It is 20 minutes on my hdd. |
©2025 Universitat Pompeu Fabra