Message boards :
News :
Project restarted
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
But another GTX 1070 failed twice.The error message: ACEMD failed: Error initializing CUDA: CUDA_ERROR_NO_DEVICE (100) at /opt/conda/conda-bld/openmm_1589507810497/work/platforms/cuda/src/CudaContext.cpp:148looks to me like it was due to a driver update, or some other intervention made your CUDA device inaccessible. |
Send message Joined: 2 Apr 20 Posts: 20 Credit: 35,363,533 RAC: 0 Level ![]() Scientific publications ![]() |
I aborted the jobs for my GT 730 as there was no way they were going to finish them in time. And I suspended that computer from taking work. My GTX 1660 seems to take about a day and a half to finish the jobs. Maybe make the WUs smaller or increase the deadlines? My GT 730 previously did quite a bit of work. I currently have that computer doing nothing but boinc stuff. |
Send message Joined: 16 May 13 Posts: 41 Credit: 145,731,947 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
Same here with a small GT 710. But the PC always runs 24/7 an did a few WUs per week. It would be a pity if I could not continue to use this card :/ |
Send message Joined: 2 Apr 20 Posts: 20 Credit: 35,363,533 RAC: 0 Level ![]() Scientific publications ![]() |
I think my GT 730 took about 22hours per WU. I think the current ones would have taken about 10 days. |
Send message Joined: 12 Oct 16 Posts: 1 Credit: 215,247,623 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Work units got stuck at 14% and were canceled after 10 hours of running overnight workunit 27023967 e2s547_e1s101p0f147-ADRIA_D3RBandit_batch1-0-1-RND7525_0 workunit 27023977 e2s557_e1s68p0f131-ADRIA_D3RBandit_batch1-0-1-RND8450_0 Log Stderr Output <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> aborted by user</message> <stderr_txt> 21:59:52 (17660): wrapper (7.9.26016): starting 21:59:52 (17660): wrapper: running acemd3.exe (--boinc input --device 0) Detected memory leaks! Dumping objects -> ..\api\boinc_api.cpp(309) : {1599} normal block at 0x000001FE55531F80, 8 bytes long. Data: < NU > 00 00 4E 55 FE 01 00 00 ..\lib\diagnostics_win.cpp(417) : {202} normal block at 0x000001FE55530DC0, 1080 bytes long. Data: < > EC 0E 00 00 CD CD CD CD 0C 01 00 00 00 00 00 00 Object dump complete. 23:49:15 (10692): wrapper (7.9.26016): starting 23:49:15 (10692): wrapper: running acemd3.exe (--boinc input --device 0) Detected memory leaks! Dumping objects -> ..\api\boinc_api.cpp(309) : {1599} normal block at 0x0000027A5CB916D0, 8 bytes long. Data: < \z > 00 00 B4 5C 7A 02 00 00 ..\lib\diagnostics_win.cpp(417) : {202} normal block at 0x0000027A5CB900B0, 1080 bytes long. Data: <DJ > 44 4A 00 00 CD CD CD CD B0 00 00 00 00 00 00 00 Object dump complete. 20:24:59 (4020): wrapper (7.9.26016): starting 20:24:59 (4020): wrapper: running acemd3.exe (--boinc input --device 0) Detected memory leaks! Dumping objects -> ..\api\boinc_api.cpp(309) : {1599} normal block at 0x000001525997F2E0, 8 bytes long. Data: < 1[R > 00 00 31 5B 52 01 00 00 ..\lib\diagnostics_win.cpp(417) : {202} normal block at 0x000001525997FE90, 1080 bytes long. Data: < 1 X > 04 31 00 00 CD CD CD CD 58 00 00 00 00 00 00 00 Object dump complete. 23:10:11 (3712): wrapper (7.9.26016): starting 23:10:11 (3712): wrapper: running acemd3.exe (--boinc input --device 0) Detected memory leaks! Dumping objects -> ..\api\boinc_api.cpp(309) : {1599} normal block at 0x000001CA91D1FCA0, 8 bytes long. Data: < > 00 00 CF 91 CA 01 00 00 ..\lib\diagnostics_win.cpp(417) : {202} normal block at 0x000001CA91D204D0, 1080 bytes long. Data: < > D0 1C 00 00 CD CD CD CD 0C 01 00 00 00 00 00 00 Object dump complete. |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
I doubt it was stuck. These tasks take a LONG time to run. The “memory leaks” messages seem to be common for the Windows app. Just let it go next time. Will probably take ~48hrs to run on a 1060. ![]() |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Took me 245,275 seconds on a GTX 1050 Ti - that's a smidge under 3 days. Since these tasks update their progress 150 times over a full run, that's 1,635 seconds between updates - almost half an hour. And I got the same memory leak warning, even though the task was successful and validated. Don't despair if nothing seems to be happening - have a cup of coffee and come back later. It's probably still alive. |
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
They are all on Ubuntu 18.04/20.04.The error message:ACEMD failed: Error initializing CUDA: CUDA_ERROR_NO_DEVICE (100) at /opt/conda/conda-bld/openmm_1589507810497/work/platforms/cuda/src/CudaContext.cpp:148looks to me like it was due to a driver update, or some other intervention made your CUDA device inaccessible. That is probably it. I think I was updating to the 460 driver (CUDA 11.2), but I don't know why the second one failed right away. Maybe the driver was not initialized yet after the reboot? |
Send message Joined: 5 May 19 Posts: 36 Credit: 711,308,218 RAC: 46,014 Level ![]() Scientific publications ![]() |
Is there a way to configure BOINC to start downloading a second task only when the first one is finishing? I got 2 tasks, each 5M GFLOPS, with the same deadline. Obviously, the second wouldn't be computed on time, so I had to abort it (it was in READY status, so hopefully it will come to someone else). Or, if that's not possible, can I limit BOINC to 1 task only? Less preferrable, as time will be spent on comms and downloads, but better than expiring tasks someone else could've processed... |
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Or, if that's not possible, can I limit BOINC to 1 task only? Less preferrable, as time will be spent on comms and downloads, but better than expiring tasks someone else could've processed... I think the problem will be solved after BOINC processes the first one and gets better time estimates. It should not then download the second one. But that begs the question of why don't they either give them better estimates to begin with, or limit them to one? They limited the earlier ones to two anyway. |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Is there a way to configure BOINC to start downloading a second task only when the first one is finishing?You should set your work cache to 1 day, or less. The smallest value is 0.01 days (14 minutes and 24 seconds), this is the unit for this setting. However it will take a couple of workunits to make the processing time estimate accurate. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
You can set the work cache to exactly zero, if you like. In that case, BOINC will nominally initiate the request for work three minutes before the final task is estimated to finish. I say 'nominally', because BOINC only performs the 'work needed?' check once per minute, so the actual request may happen any time between three minutes and two minutes before the new task is needed. |
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
You can set the work cache to exactly zero, if you like. In that case, BOINC will nominally initiate the request for work three minutes before the final task is estimated to finish. Now that you mention it, won't setting the resource share to zero accomplish the same thing? That is how I often set it anyway. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Yup, that's exactly the same. Setting it through BOINC Manager activates it immediately, going via resource share means a visit to the website and a project update. Edit - using 'work cache 0' affects all attached projects, using 'resource share 0' only affects that one project. |
![]() Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Or, if that's not possible, can I limit BOINC to 1 task only? Less preferrable, as time will be spent on comms and downloads, but better than expiring tasks someone else could've processed... One reason is that always having a second task downloaded while the first one is running avoids wasting GPU time by having no task running while it is downloading the second one. I wouldn't mind if it waited for this download until the first task was approaching its finish, though. But note that the estimated time to completion is not a reliable way of deciding when it is about to finish until at least a few tasks have completed successfully. |
Send message Joined: 10 Nov 13 Posts: 101 Credit: 15,773,211,122 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Wouldn't you be able to set the max_concurrent variable in the GPUGRID app_config file to 1? Such as this: <app_config> <app> <name>acemd3</name> <max_concurrent>1</max_concurrent> <gpu_versions> <gpu_usage>1</gpu_usage> <cpu_usage>1</cpu_usage> </gpu_versions> </app> </app_config> |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
That only affects how many jobs run concurrently. Not how many get downloaded. ![]() |
Send message Joined: 13 Nov 19 Posts: 5 Credit: 8,496,529 RAC: 0 Level ![]() Scientific publications ![]() |
Abort for every task, that shows 0.333%, 1,333% and so on. Every time i look at the app, there is a bad WU, i am abort them and 1:00 minute later next U show me normal work. So... Every second WU working normal. By the way, here is qute from log of the aborted task <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> aborted by user</message> <stderr_txt> 20:41:32 (19816): wrapper (7.9.26016): starting 20:41:32 (19816): wrapper: running acemd3.exe (--boinc input --device 0) 02:44:06 (1896): wrapper (7.9.26016): starting 02:44:06 (1896): wrapper: running acemd3.exe (--boinc input --device 0) Detected memory leaks! Dumping objects -> ..\api\boinc_api.cpp(309) : {1546} normal block at 0x000001C43E1D27E0, 8 bytes long. Data: < > > 00 00 17 3E C4 01 00 00 ..\lib\diagnostics_win.cpp(417) : {205} normal block at 0x000001C43E1D68C0, 1080 bytes long. Data: <0 > 30 0B 00 00 CD CD CD CD C4 01 00 00 00 00 00 00 Object dump complete. 17:20:35 (8916): wrapper (7.9.26016): starting 17:20:35 (8916): wrapper: running acemd3.exe (--boinc input --device 0) 17:30:43 (12568): wrapper (7.9.26016): starting 17:30:43 (12568): wrapper: running acemd3.exe (--boinc input --device 0) Detected memory leaks! Dumping objects -> ..\api\boinc_api.cpp(309) : {1546} normal block at 0x0000029AAAFE2EC0, 8 bytes long. Data: < > 00 00 F9 AA 9A 02 00 00 ..\lib\diagnostics_win.cpp(417) : {205} normal block at 0x0000029AAAFE68C0, 1080 bytes long. Data: < 3 > FC 33 00 00 CD CD CD CD C8 01 00 00 00 00 00 00 Object dump complete. </stderr_txt> ]]> |
Send message Joined: 5 May 19 Posts: 36 Credit: 711,308,218 RAC: 46,014 Level ![]() Scientific publications ![]() |
I wouldn't mind if it waited for this download until the first task was approaching its finish, though. But note that the estimated time to completion is not a reliable way of deciding when it is about to finish until at least a few tasks have completed successfully. Agree. If a second task is downloaded at, say, 80% of the first task, or even 90%, it should work well. E.g., short-term tasks are small, so it won't take long to download them before the first finishes. For large tasks, 10% may still take some 15-30 min at least, so also should be more than enough to complete download. And with the recent large tasks, 10% is a few hours, so definitely, BOINC will make it in time before completing the first task. |
![]() Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
clych wrote: I have aborted this WU, new one much more faster. Sorry, but what you're doing does not make any sense. By now you have 9 aborted WUs. No matter how often you abort, these new monster WUs are just too big for your quite old card to return in time. clych wrote: Abort for every task, that shows 0.333%, 1,333% and so on. No, this is wrong. See The answer from Ian&Steve: link BTW: my GTX1070 takes 94k - 95k per WU in Win 10, in line with the 26h reported before under Linux. MrS Scanning for our furry friends since Jan 2002 |
©2025 Universitat Pompeu Fabra