Experimental Python tasks (beta) - task description

Message boards : News : Experimental Python tasks (beta) - task description
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 46 · 47 · 48 · 49 · 50 · Next

AuthorMessage
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59998 - Posted: 28 Feb 2023, 2:32:08 UTC - in response to Message 59997.  

I had a couple of the ATM's finish successfully a week ago, but long cleared from the database for anyone to look at.
ID: 59998 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greger

Send message
Joined: 6 Jan 15
Posts: 76
Credit: 25,499,534,331
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 59999 - Posted: 28 Feb 2023, 17:42:01 UTC - in response to Message 59997.  

ID: 59999 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 60000 - Posted: 28 Feb 2023, 19:40:14 UTC - in response to Message 59999.  

Thanks Greger, it's good to have a successful example to compare with when examining errors. I appreciate it.

ID: 60000 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 60001 - Posted: 3 Mar 2023, 10:26:16 UTC
Last modified: 3 Mar 2023, 10:42:25 UTC

Windows here. You know, sometimes these WUs go to sleep, then I click the mouse and it starts running again. Not all WUs.

task 33333635
ID: 60001 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000

Send message
Joined: 18 Jul 13
Posts: 79
Credit: 210,528,292
RAC: 0
Level
Leu
Scientific publications
wat
Message 60004 - Posted: 3 Mar 2023, 10:44:06 UTC - in response to Message 60001.  

Maybe you can change system power settings?
Disable spinning down hard drive for example?
ID: 60004 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ngkachun1982

Send message
Joined: 22 Apr 19
Posts: 3
Credit: 24,096,692
RAC: 4
Level
Pro
Scientific publications
wat
Message 60112 - Posted: 19 Mar 2023, 14:57:35 UTC

My recent results uploaded to GPUGRID often got "Error while computing" and lost all credits, I don't know why, what should I do ?

33359888 27429785 604308
17 Mar 2023 | 13:14:49 UTC 19 Mar 2023 | 14:23:21 UTC
Error while computing 50,964.34 50,964.34 ---
Python apps for GPU hosts v4.04 (cuda1131)


19/3/2023 17:37:41 | | Starting BOINC client version 7.20.2 for windows_x86_64
19/3/2023 17:37:41 | | log flags: file_xfer, sched_ops, task
19/3/2023 17:37:41 | | Libraries: libcurl/7.84.0-DEV Schannel zlib/1.2.12
19/3/2023 17:37:41 | | Data directory: C:\ProgramData\BOINC
19/3/2023 17:37:41 | |
19/3/2023 17:37:41 | | CUDA: NVIDIA GPU 0: NVIDIA GeForce RTX 3060 (driver version 531.18, CUDA version 12.1, compute capability 8.6, 12288MB, 12288MB available, 12738 GFLOPS peak)
19/3/2023 17:37:41 | | OpenCL: NVIDIA GPU 0: NVIDIA GeForce RTX 3060 (driver version 531.18, device version OpenCL 3.0 CUDA, 12288MB, 12288MB available, 12738 GFLOPS peak)
19/3/2023 17:37:41 | | Windows processor group 0: 20 processors
19/3/2023 17:37:41 | | Host name: NGcomputer
19/3/2023 17:37:41 | | Processor: 20 GenuineIntel 12th Gen Intel(R) Core(TM) i7-12700F [Family 6 Model 151 Stepping 2]
19/3/2023 17:37:41 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 tm2 pbe fsgsbase bmi1 smep bmi2
19/3/2023 17:37:41 | | OS: Microsoft Windows Vista: Home Premium x64 Edition, Service Pack 2, (06.00.6002.00)
19/3/2023 17:37:41 | | Memory: 15.76 GB physical, 63.76 GB virtual
19/3/2023 17:37:41 | | Disk: 952.93 GB total, 700.19 GB free
19/3/2023 17:37:41 | | Local time is UTC +8 hours
19/3/2023 17:37:41 | | No WSL found.
19/3/2023 17:37:41 | | VirtualBox version: 7.0.6



ID: 60112 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60113 - Posted: 19 Mar 2023, 17:21:18 UTC - in response to Message 60112.  

You have to look at the errored task results on the website to find why you errored.

Two of the tasks errored out because you don't have enough virtual memory available for the expansion phase where the task sets up its libraries.

On Windows it is advised to set up your system page file for at least 50GB size.
ID: 60113 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ngkachun1982

Send message
Joined: 22 Apr 19
Posts: 3
Credit: 24,096,692
RAC: 4
Level
Pro
Scientific publications
wat
Message 60118 - Posted: 20 Mar 2023, 10:55:45 UTC - in response to Message 60113.  

You have to look at the errored task results on the website to find why you errored.

Two of the tasks errored out because you don't have enough virtual memory available for the expansion phase where the task sets up its libraries.

On Windows it is advised to set up your system page file for at least 50GB size.


Thank you very much
ID: 60118 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 60125 - Posted: 22 Mar 2023, 5:07:40 UTC

The server status shows WU's are available but my machines have received no task since yesterday.
ID: 60125 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 60127 - Posted: 22 Mar 2023, 7:44:52 UTC - in response to Message 60125.  
Last modified: 22 Mar 2023, 7:45:19 UTC

Hello!

The previous population experiment ended and needed to analyse the results.
But I am starting a new experiment today.
ID: 60127 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ngkachun1982

Send message
Joined: 22 Apr 19
Posts: 3
Credit: 24,096,692
RAC: 4
Level
Pro
Scientific publications
wat
Message 60131 - Posted: 22 Mar 2023, 14:42:28 UTC

I don't understand why my task fail, why ?

Name e00002a04604-ABOU_rnd_ppod_expand_demos29_2_exp5-0-1-RND9901_1
Workunit 27434170
Created 20 Mar 2023 | 22:42:54 UTC
Sent 21 Mar 2023 | 0:31:58 UTC
Received 21 Mar 2023 | 11:05:25 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 195 (0xc3) EXIT_CHILD_FAILED
Computer ID 604308
Report deadline 26 Mar 2023 | 0:31:58 UTC
Run time 13,385.72
CPU time 13,385.72
Validate state Invalid
Credit 0.00
Application version Python apps for GPU hosts v4.04 (cuda1131)
Stderr output

<core_client_version>7.20.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
08:32:00 (19880): wrapper (7.9.26016): starting
08:32:00 (19880): wrapper: running .\7za.exe (x pythongpu_windows_x86_64__cuda1131.txz -y)

7-Zip (a) 22.01 (x86) : Copyright (c) 1999-2022 Igor Pavlov : 2022-07-15

Scanning the drive for archives:
1 file, 1976180228 bytes (1885 MiB)

Extracting archive: pythongpu_windows_x86_64__cuda1131.txz
--
Path = pythongpu_windows_x86_64__cuda1131.txz
Type = xz
Physical Size = 1976180228
Method = LZMA2:22 CRC64
Streams = 1523
Blocks = 1523
Cluster Size = 4210688

Everything is Ok

Size: 6410311680
Compressed: 1976180228
08:34:12 (19880): .\7za.exe exited; CPU time 107.906250
08:34:12 (19880): wrapper: running C:\Windows\system32\cmd.exe (/C "del pythongpu_windows_x86_64__cuda1131.txz")
08:34:13 (19880): C:\Windows\system32\cmd.exe exited; CPU time 0.000000
08:34:13 (19880): wrapper: running .\7za.exe (x pythongpu_windows_x86_64__cuda1131.tar -y)

7-Zip (a) 22.01 (x86) : Copyright (c) 1999-2022 Igor Pavlov : 2022-07-15

Scanning the drive for archives:
1 file, 6410311680 bytes (6114 MiB)

Extracting archive: pythongpu_windows_x86_64__cuda1131.tar
--
Path = pythongpu_windows_x86_64__cuda1131.tar
Type = tar
Physical Size = 6410311680
Headers Size = 19965952
Code Page = UTF-8
Characteristics = GNU LongName ASCII

Everything is Ok

Files: 38141
Size: 6380353601
Compressed: 6410311680
08:35:04 (19880): .\7za.exe exited; CPU time 9.515625
08:35:04 (19880): wrapper: running C:\Windows\system32\cmd.exe (/C "del pythongpu_windows_x86_64__cuda1131.tar")
08:35:05 (19880): C:\Windows\system32\cmd.exe exited; CPU time 0.000000
08:35:05 (19880): wrapper: running python.exe (run.py)
Windows fix executed.
Detected GPUs: 1
Define environment factory
Define algorithm factory
Define storage factory
Define scheme
Created CWorker with worker_index 0
Created GWorker with worker_index 0
Created UWorker with worker_index 0
Created training scheme.
Define learner
Created Learner.
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {16368} normal block at 0x0000025533011E30, 8 bytes long.
Data: < 4U > 00 00 D1 34 55 02 00 00
..\lib\diagnostics_win.cpp(417) : {15114} normal block at 0x000002553306C260, 1080 bytes long.
Data: < > D8 1D 00 00 CD CD CD CD 8C 01 00 00 00 00 00 00
..\zip\boinc_zip.cpp(122) : {550} normal block at 0x0000025532FFBE70, 260 bytes long.
Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
{536} normal block at 0x0000025532FFEC80, 52 bytes long.
Data: < r > 01 00 00 00 72 00 CD CD 00 00 00 00 00 00 00 00
{531} normal block at 0x0000025533009E00, 43 bytes long.
Data: < p > 01 00 00 00 70 00 CD CD 00 00 00 00 00 00 00 00
{526} normal block at 0x000002553300A940, 44 bytes long.
Data: < a 3U > 01 00 00 00 00 00 CD CD 61 A9 00 33 55 02 00 00
{521} normal block at 0x000002553300A080, 44 bytes long.
Data: < 3U > 01 00 00 00 00 00 CD CD A1 A0 00 33 55 02 00 00
Object dump complete.
16:26:14 (3936): wrapper (7.9.26016): starting
16:26:14 (3936): wrapper: running python.exe (run.py)
Windows fix executed.
Detected GPUs: 1
Define environment factory
Define algorithm factory
Define storage factory
Define scheme
Created CWorker with worker_index 0
Created GWorker with worker_index 0
Created UWorker with worker_index 0
Created training scheme.
Define learner
Created Learner.
Look for a progress_last_chk file - if exists, adjust target_env_steps
Define train loop
Traceback (most recent call last):
File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 196, in get_data
self.next_batch = self.batches.__next__()
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run.py", line 471, in <module>
main()
File "run.py", line 136, in main
learner.step()
File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\learner.py", line 46, in step
info = self.update_worker.step()
File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\scheme\updates\u_worker.py", line 118, in step
self.updater.step()
File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\scheme\updates\u_worker.py", line 259, in step
grads = self.local_worker.step(self.decentralized_update_execution)
File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 178, in step
self.get_data()
File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 211, in get_data
self.collector.step()
File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 490, in step
rollouts = self.local_worker.collect_data(listen_to=["sync"], data_to_cpu=False)
File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\scheme\collection\c_worker.py", line 168, in collect_data
train_info = self.collect_train_data(listen_to=listen_to)
File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\scheme\collection\c_worker.py", line 242, in collect_train_data
obs2, reward, done2, episode_infos = self.envs_train.step(clip_act)
File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\agent\env\vec_envs\vec_env_base.py", line 85, in step
return self.step_wait()
File "C:\ProgramData\BOINC\slots\16\lib\site-packages\pytorchrl\agent\env\vec_envs\vector_wrappers.py", line 72, in step_wait
obs = torch.from_numpy(obs).float().to(self.device)
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 3612672 bytes.
19:03:33 (3936): python.exe exited; CPU time 10494.140625
19:03:33 (3936): app exit status: 0x1
19:03:33 (3936): called boinc_finish(195)
0 bytes in 0 Free Blocks.
552 bytes in 9 Normal Blocks.
1144 bytes in 1 CRT Blocks.
0 bytes in 0 Ignore Blocks.
0 bytes in 0 Client Blocks.
Largest number used: 0 bytes.
Total allocations: 179414097 bytes.
Dumping objects ->
{16455} normal block at 0x000001D92CFBFBE0, 48 bytes long.
Data: <PSI_SCRATCH=C:\P> 50 53 49 5F 53 43 52 41 54 43 48 3D 43 3A 5C 50
{16414} normal block at 0x000001D92CFC08F0, 48 bytes long.
Data: <HOMEPATH=C:\Prog> 48 4F 4D 45 50 41 54 48 3D 43 3A 5C 50 72 6F 67
{16403} normal block at 0x000001D92CFBFF50, 48 bytes long.
Data: <HOME=C:\ProgramD> 48 4F 4D 45 3D 43 3A 5C 50 72 6F 67 72 61 6D 44
{16392} normal block at 0x000001D92CFC0790, 48 bytes long.
Data: <TMP=C:\ProgramDa> 54 4D 50 3D 43 3A 5C 50 72 6F 67 72 61 6D 44 61
{16381} normal block at 0x000001D92CFC0630, 48 bytes long.
Data: <TEMP=C:\ProgramD> 54 45 4D 50 3D 43 3A 5C 50 72 6F 67 72 61 6D 44
{16370} normal block at 0x000001D92CFC0160, 48 bytes long.
Data: <TMPDIR=C:\Progra> 54 4D 50 44 49 52 3D 43 3A 5C 50 72 6F 67 72 61
{16289} normal block at 0x000001D92CF9C280, 140 bytes long.
Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65
..\api\boinc_api.cpp(309) : {16286} normal block at 0x000001D92CFB20C0, 8 bytes long.
Data: < 8- > 00 00 38 2D D9 01 00 00
{15645} normal block at 0x000001D92CFAE470, 140 bytes long.
Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65
{15033} normal block at 0x000001D92CFB2840, 8 bytes long.
Data: <@ 7- > 40 18 37 2D D9 01 00 00
..\zip\boinc_zip.cpp(122) : {550} normal block at 0x000001D92CF9B820, 260 bytes long.
Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
{537} normal block at 0x000001D92CFAA2D0, 32 bytes long.
Data: < , P , > B0 A9 FA 2C D9 01 00 00 50 AF FA 2C D9 01 00 00
{536} normal block at 0x000001D92CFC0580, 52 bytes long.
Data: < r > 01 00 00 00 72 00 CD CD 00 00 00 00 00 00 00 00
{531} normal block at 0x000001D92CFAA0F0, 43 bytes long.
Data: < p > 01 00 00 00 70 00 CD CD 00 00 00 00 00 00 00 00
{526} normal block at 0x000001D92CFAAF50, 44 bytes long.
Data: < q , > 01 00 00 00 00 00 CD CD 71 AF FA 2C D9 01 00 00
{521} normal block at 0x000001D92CFAA9B0, 44 bytes long.
Data: < , > 01 00 00 00 00 00 CD CD D1 A9 FA 2C D9 01 00 00
{511} normal block at 0x000001D92CFBBDB0, 16 bytes long.
Data: < , > B0 AE FA 2C D9 01 00 00 00 00 00 00 00 00 00 00
{510} normal block at 0x000001D92CFAAEB0, 40 bytes long.
Data: < , input.zi> B0 BD FB 2C D9 01 00 00 69 6E 70 75 74 2E 7A 69
{503} normal block at 0x000001D92CFBCAA0, 16 bytes long.
Data: <h , > 68 F8 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{502} normal block at 0x000001D92CFBCA10, 16 bytes long.
Data: <@ , > 40 F8 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{501} normal block at 0x000001D92CFBCC50, 16 bytes long.
Data: < , > 18 F8 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{500} normal block at 0x000001D92CFBB0C0, 16 bytes long.
Data: < , > F0 F7 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{499} normal block at 0x000001D92CFBC980, 16 bytes long.
Data: < , > C8 F7 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{498} normal block at 0x000001D92CFBAFA0, 16 bytes long.
Data: < , > A0 F7 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{496} normal block at 0x000001D92CFBBD20, 16 bytes long.
Data: <X , > 58 E9 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00
{495} normal block at 0x000001D92CFAAE10, 32 bytes long.
Data: <username=Compsci> 75 73 65 72 6E 61 6D 65 3D 43 6F 6D 70 73 63 69
{494} normal block at 0x000001D92CFBCBC0, 16 bytes long.
Data: <0 , > 30 E9 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00
{493} normal block at 0x000001D92CF9C3B0, 64 bytes long.
Data: <PYTHONPATH=.\lib> 50 59 54 48 4F 4E 50 41 54 48 3D 2E 5C 6C 69 62
{492} normal block at 0x000001D92CFBCE90, 16 bytes long.
Data: < , > 08 E9 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00
{491} normal block at 0x000001D92CFAAFF0, 32 bytes long.
Data: <PATH=.\Library\b> 50 41 54 48 3D 2E 5C 4C 69 62 72 61 72 79 5C 62
{490} normal block at 0x000001D92CFBC350, 16 bytes long.
Data: < , > E0 E8 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00
{489} normal block at 0x000001D92CFBC1A0, 16 bytes long.
Data: < , > B8 E8 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00
{488} normal block at 0x000001D92CFBC8F0, 16 bytes long.
Data: < , > 90 E8 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00
{487} normal block at 0x000001D92CFBB420, 16 bytes long.
Data: <h , > 68 E8 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00
{486} normal block at 0x000001D92CFBBA50, 16 bytes long.
Data: <@ , > 40 E8 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00
{485} normal block at 0x000001D92CFBC110, 16 bytes long.
Data: < , > 18 E8 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00
{484} normal block at 0x000001D92CFAA730, 32 bytes long.
Data: <SystemRoot=C:\Wi> 53 79 73 74 65 6D 52 6F 6F 74 3D 43 3A 5C 57 69
{483} normal block at 0x000001D92CFBC7D0, 16 bytes long.
Data: < , > F0 E7 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00
{482} normal block at 0x000001D92CFAA370, 32 bytes long.
Data: <GPU_DEVICE_NUM=0> 47 50 55 5F 44 45 56 49 43 45 5F 4E 55 4D 3D 30
{481} normal block at 0x000001D92CFBC6B0, 16 bytes long.
Data: < , > C8 E7 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00
{480} normal block at 0x000001D92CFAAC30, 32 bytes long.
Data: <NTHREADS=1 THREA> 4E 54 48 52 45 41 44 53 3D 31 00 54 48 52 45 41
{479} normal block at 0x000001D92CFBC620, 16 bytes long.
Data: < , > A0 E7 FA 2C D9 01 00 00 00 00 00 00 00 00 00 00
{478} normal block at 0x000001D92CFAE7A0, 480 bytes long.
Data: < , 0 , > 20 C6 FB 2C D9 01 00 00 30 AC FA 2C D9 01 00 00
{477} normal block at 0x000001D92CFBCE00, 16 bytes long.
Data: < , > 80 F7 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{476} normal block at 0x000001D92CFBCD70, 16 bytes long.
Data: <X , > 58 F7 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{475} normal block at 0x000001D92CFBB780, 16 bytes long.
Data: <0 , > 30 F7 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{474} normal block at 0x000001D92CFAE6F0, 48 bytes long.
Data: </C "del pythongp> 2F 43 20 22 64 65 6C 20 70 79 74 68 6F 6E 67 70
{473} normal block at 0x000001D92CFBC590, 16 bytes long.
Data: <x , > 78 F6 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{472} normal block at 0x000001D92CFBB150, 16 bytes long.
Data: <P , > 50 F6 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{471} normal block at 0x000001D92CFBC500, 16 bytes long.
Data: <( , > 28 F6 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{470} normal block at 0x000001D92CFBB300, 16 bytes long.
Data: < , > 00 F6 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{469} normal block at 0x000001D92CFBCCE0, 16 bytes long.
Data: < , > D8 F5 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{468} normal block at 0x000001D92CFBCB30, 16 bytes long.
Data: < , > B0 F5 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{467} normal block at 0x000001D92CFBC740, 16 bytes long.
Data: < , > 90 F5 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{466} normal block at 0x000001D92CFBBB70, 16 bytes long.
Data: <h , > 68 F5 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{465} normal block at 0x000001D92CFAA690, 32 bytes long.
Data: <C:\Windows\syste> 43 3A 5C 57 69 6E 64 6F 77 73 5C 73 79 73 74 65
{464} normal block at 0x000001D92CFBBAE0, 16 bytes long.
Data: <@ , > 40 F5 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{463} normal block at 0x000001D92CFA8510, 48 bytes long.
Data: <x pythongpu_wind> 78 20 70 79 74 68 6F 6E 67 70 75 5F 77 69 6E 64
{462} normal block at 0x000001D92CFBC860, 16 bytes long.
Data: < , > 88 F4 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{461} normal block at 0x000001D92CFBB810, 16 bytes long.
Data: <` , > 60 F4 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{460} normal block at 0x000001D92CFBB030, 16 bytes long.
Data: <8 , > 38 F4 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{459} normal block at 0x000001D92CFBC080, 16 bytes long.
Data: < , > 10 F4 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{458} normal block at 0x000001D92CFBB9C0, 16 bytes long.
Data: < , > E8 F3 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{457} normal block at 0x000001D92CFBE000, 16 bytes long.
Data: < , > C0 F3 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{456} normal block at 0x000001D92CFBEB40, 16 bytes long.
Data: < , > A0 F3 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{455} normal block at 0x000001D92CFBDF70, 16 bytes long.
Data: <x , > 78 F3 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{454} normal block at 0x000001D92CFBDAF0, 16 bytes long.
Data: <P , > 50 F3 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{453} normal block at 0x000001D92CFA8460, 48 bytes long.
Data: </C "del pythongp> 2F 43 20 22 64 65 6C 20 70 79 74 68 6F 6E 67 70
{452} normal block at 0x000001D92CFBDEE0, 16 bytes long.
Data: < , > 98 F2 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{451} normal block at 0x000001D92CFBD8B0, 16 bytes long.
Data: <p , > 70 F2 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{450} normal block at 0x000001D92CFBD790, 16 bytes long.
Data: <H , > 48 F2 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{449} normal block at 0x000001D92CFBDE50, 16 bytes long.
Data: < , > 20 F2 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{448} normal block at 0x000001D92CFBEAB0, 16 bytes long.
Data: < , > F8 F1 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{447} normal block at 0x000001D92CFBD9D0, 16 bytes long.
Data: < , > D0 F1 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{446} normal block at 0x000001D92CFBD700, 16 bytes long.
Data: < , > B0 F1 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{445} normal block at 0x000001D92CFBEA20, 16 bytes long.
Data: < , > 88 F1 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{444} normal block at 0x000001D92CFAA4B0, 32 bytes long.
Data: <C:\Windows\syste> 43 3A 5C 57 69 6E 64 6F 77 73 5C 73 79 73 74 65
{443} normal block at 0x000001D92CFBD820, 16 bytes long.
Data: <` , > 60 F1 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{442} normal block at 0x000001D92CFA38F0, 48 bytes long.
Data: <x pythongpu_wind> 78 20 70 79 74 68 6F 6E 67 70 75 5F 77 69 6E 64
{441} normal block at 0x000001D92CFBDA60, 16 bytes long.
Data: < , > A8 F0 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{440} normal block at 0x000001D92CFBE900, 16 bytes long.
Data: < , > 80 F0 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{439} normal block at 0x000001D92CFBE870, 16 bytes long.
Data: <X , > 58 F0 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{438} normal block at 0x000001D92CFBEBD0, 16 bytes long.
Data: <0 , > 30 F0 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{437} normal block at 0x000001D92CFBE6C0, 16 bytes long.
Data: < , > 08 F0 FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{436} normal block at 0x000001D92CFBE480, 16 bytes long.
Data: < , > E0 EF FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{435} normal block at 0x000001D92CFBD4C0, 16 bytes long.
Data: < , > C0 EF FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{434} normal block at 0x000001D92CFBDC10, 16 bytes long.
Data: < , > 98 EF FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{433} normal block at 0x000001D92CFBD430, 16 bytes long.
Data: <p , > 70 EF FB 2C D9 01 00 00 00 00 00 00 00 00 00 00
{432} normal block at 0x000001D92CFBEF70, 2976 bytes long.
Data: <0 , .\7za.ex> 30 D4 FB 2C D9 01 00 00 2E 5C 37 7A 61 2E 65 78
{69} normal block at 0x000001D92CFACC20, 16 bytes long.
Data: < ;* > 80 EA 3B 2A F6 7F 00 00 00 00 00 00 00 00 00 00
{68} normal block at 0x000001D92CFACA70, 16 bytes long.
Data: <@ ;* > 40 E9 3B 2A F6 7F 00 00 00 00 00 00 00 00 00 00
{67} normal block at 0x000001D92CFAC0E0, 16 bytes long.
Data: < W8* > F8 57 38 2A F6 7F 00 00 00 00 00 00 00 00 00 00
{66} normal block at 0x000001D92CFAC050, 16 bytes long.
Data: < W8* > D8 57 38 2A F6 7F 00 00 00 00 00 00 00 00 00 00
{65} normal block at 0x000001D92CFAC9E0, 16 bytes long.
Data: <P 8* > 50 04 38 2A F6 7F 00 00 00 00 00 00 00 00 00 00
{64} normal block at 0x000001D92CFAC680, 16 bytes long.
Data: <0 8* > 30 04 38 2A F6 7F 00 00 00 00 00 00 00 00 00 00
{63} normal block at 0x000001D92CFACB00, 16 bytes long.
Data: < 8* > E0 02 38 2A F6 7F 00 00 00 00 00 00 00 00 00 00
{62} normal block at 0x000001D92CFAC950, 16 bytes long.
Data: < 8* > 10 04 38 2A F6 7F 00 00 00 00 00 00 00 00 00 00
{61} normal block at 0x000001D92CFAC8C0, 16 bytes long.
Data: <p 8* > 70 04 38 2A F6 7F 00 00 00 00 00 00 00 00 00 00
{60} normal block at 0x000001D92CFAC710, 16 bytes long.
Data: < 6* > 18 C0 36 2A F6 7F 00 00 00 00 00 00 00 00 00 00
Object dump complete.

</stderr_txt>
]]>
ID: 60131 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 60133 - Posted: 22 Mar 2023, 14:50:47 UTC - in response to Message 60131.  
Last modified: 22 Mar 2023, 14:53:09 UTC

it's right in your message:
"RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 3612672 bytes."

that's why.

this is a known problem with the windows app. you need to increase your virtual memory (page file) to like 50GB.

also it looks like your host only has 16GB system RAM. if you're running other things that use lots of memory (like rosetta or einstein GW CPU tasks) then you might be running out of system memory too. these python tasks need about 10GB of system memory for each one.
ID: 60133 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60298 - Posted: 8 Apr 2023, 13:48:47 UTC

I am experiencing a strange problem on my PC with two RTX3070 inside, CPU Intel i9-10900KF (10 cores/20 threads), 128 GB RAM:
until about 2 weeks ago, I crunched 4 Python tasks concurrently (2 ea. GPU).
Then I processed ACEMD_3 and ATM tasks, the queues of which ran dry now.
So I changed back to Python - and surprise: after downloading 4 tasks, only 3 started, the fourth one stays in status "ready to start".

I had made no changes, neither in the hardware, nore in the software, nor in the settings.
Anyone any idea what I can do in order to get the fourth task to run?
ID: 60298 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60299 - Posted: 8 Apr 2023, 16:16:24 UTC - in response to Message 60298.  

Consequence of running the acemd3 and ATM tasks is that it dropped your APR rate on the host and now the client thinks that you will not be able to finish the second Python task before deadline.

You probably have the single Python task in EDF mode now.

Try adding <fraction_done_exact/> into every app section in your app_config.xml

That helps produce more realistic progress percentages and could/may persuade the client to let you run that second task on that gpu.

But you may just have to let the APR mechanism balance out again. One of the many flaws in BOINC.
ID: 60299 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60300 - Posted: 8 Apr 2023, 16:57:26 UTC

thank you, Keith, for the explanation :-)

<fraction_done_exact/> has been in the app_config to begin with.

So I am afraid I just need to wait ...

ID: 60300 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 60345 - Posted: 23 Apr 2023, 13:00:19 UTC

Absolutely no usage of GPU only CPU.
task 27464783
ID: 60345 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 60346 - Posted: 23 Apr 2023, 13:21:39 UTC - in response to Message 60345.  

for the first 5 minutes or so, there will only be CPU use and no GPU use because the task is extracting the python environment to the designated slot. after this, the task will run and start using both GPU and CPU. GPU use will be low.
ID: 60346 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 60347 - Posted: 23 Apr 2023, 16:19:05 UTC

No, it was not at 5% but 29% and stuck. I exited BOINC and restarted. The WU is now normal at 34%.
ID: 60347 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 60348 - Posted: 23 Apr 2023, 16:23:51 UTC - in response to Message 60347.  

i said 5 minutes not 5%.

but sounds like an issue with your system, not the tasks. my tasks have never gotten stuck like that.
ID: 60348 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000

Send message
Joined: 18 Jul 13
Posts: 79
Credit: 210,528,292
RAC: 0
Level
Leu
Scientific publications
wat
Message 60349 - Posted: 24 Apr 2023, 11:11:31 UTC - in response to Message 60348.  

It is 20 minutes on my hdd.
ID: 60349 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 46 · 47 · 48 · 49 · 50 · Next

Message boards : News : Experimental Python tasks (beta) - task description

©2025 Universitat Pompeu Fabra