Project restarted

Author	Message
Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 56474 - Posted: 13 Feb 2021, 15:11:47 UTC - in response to Message 56473. But another GTX 1070 failed twice. http://www.gpugrid.net/results.php?hostid=528983 The first time was due to a reboot, and then the next one failed immediately thereafter. They are all on Ubuntu 18.04/20.04. The error message: ACEMD failed: Error initializing CUDA: CUDA_ERROR_NO_DEVICE (100) at /opt/conda/conda-bld/openmm_1589507810497/work/platforms/cuda/src/CudaContext.cpp:148 looks to me like it was due to a driver update, or some other intervention made your CUDA device inaccessible. ID: 56474 · Rating: 0 · rate: / Reply Quote

RJ The Bike Guy Send message Joined: 2 Apr 20 Posts: 20 Credit: 35,363,533 RAC: 0 Level Scientific publications	Message 56475 - Posted: 13 Feb 2021, 15:55:03 UTC I aborted the jobs for my GT 730 as there was no way they were going to finish them in time. And I suspended that computer from taking work. My GTX 1660 seems to take about a day and a half to finish the jobs. Maybe make the WUs smaller or increase the deadlines? My GT 730 previously did quite a bit of work. I currently have that computer doing nothing but boinc stuff. ID: 56475 · Rating: 0 · rate: / Reply Quote

bormolino Send message Joined: 16 May 13 Posts: 41 Credit: 145,731,947 RAC: 0 Level Scientific publications	Message 56476 - Posted: 13 Feb 2021, 16:05:20 UTC - in response to Message 56475. Maybe make the WUs smaller or increase the deadlines? My GT 730 previously did quite a bit of work. I currently have that computer doing nothing but boinc stuff. Same here with a small GT 710. But the PC always runs 24/7 an did a few WUs per week. It would be a pity if I could not continue to use this card :/ ID: 56476 · Rating: 0 · rate: / Reply Quote

RJ The Bike Guy Send message Joined: 2 Apr 20 Posts: 20 Credit: 35,363,533 RAC: 0 Level Scientific publications	Message 56477 - Posted: 13 Feb 2021, 17:06:07 UTC - in response to Message 56476. Same here with a small GT 710. But the PC always runs 24/7 an did a few WUs per week. It would be a pity if I could not continue to use this card :/ I think my GT 730 took about 22hours per WU. I think the current ones would have taken about 10 days. ID: 56477 · Rating: 0 · rate: / Reply Quote

BlueGhost Send message Joined: 12 Oct 16 Posts: 1 Credit: 215,247,623 RAC: 0 Level Scientific publications	Message 56479 - Posted: 13 Feb 2021, 17:44:33 UTC Work units got stuck at 14% and were canceled after 10 hours of running overnight workunit 27023967 e2s547_e1s101p0f147-ADRIA_D3RBandit_batch1-0-1-RND7525_0 workunit 27023977 e2s557_e1s68p0f131-ADRIA_D3RBandit_batch1-0-1-RND8450_0 Log Stderr Output <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> aborted by user</message> <stderr_txt> 21:59:52 (17660): wrapper (7.9.26016): starting 21:59:52 (17660): wrapper: running acemd3.exe (--boinc input --device 0) Detected memory leaks! Dumping objects -> ..\api\boinc_api.cpp(309) : {1599} normal block at 0x000001FE55531F80, 8 bytes long. Data: < NU > 00 00 4E 55 FE 01 00 00 ..\lib\diagnostics_win.cpp(417) : {202} normal block at 0x000001FE55530DC0, 1080 bytes long. Data: < > EC 0E 00 00 CD CD CD CD 0C 01 00 00 00 00 00 00 Object dump complete. 23:49:15 (10692): wrapper (7.9.26016): starting 23:49:15 (10692): wrapper: running acemd3.exe (--boinc input --device 0) Detected memory leaks! Dumping objects -> ..\api\boinc_api.cpp(309) : {1599} normal block at 0x0000027A5CB916D0, 8 bytes long. Data: < \z > 00 00 B4 5C 7A 02 00 00 ..\lib\diagnostics_win.cpp(417) : {202} normal block at 0x0000027A5CB900B0, 1080 bytes long. Data: <DJ > 44 4A 00 00 CD CD CD CD B0 00 00 00 00 00 00 00 Object dump complete. 20:24:59 (4020): wrapper (7.9.26016): starting 20:24:59 (4020): wrapper: running acemd3.exe (--boinc input --device 0) Detected memory leaks! Dumping objects -> ..\api\boinc_api.cpp(309) : {1599} normal block at 0x000001525997F2E0, 8 bytes long. Data: < 1[R > 00 00 31 5B 52 01 00 00 ..\lib\diagnostics_win.cpp(417) : {202} normal block at 0x000001525997FE90, 1080 bytes long. Data: < 1 X > 04 31 00 00 CD CD CD CD 58 00 00 00 00 00 00 00 Object dump complete. 23:10:11 (3712): wrapper (7.9.26016): starting 23:10:11 (3712): wrapper: running acemd3.exe (--boinc input --device 0) Detected memory leaks! Dumping objects -> ..\api\boinc_api.cpp(309) : {1599} normal block at 0x000001CA91D1FCA0, 8 bytes long. Data: < > 00 00 CF 91 CA 01 00 00 ..\lib\diagnostics_win.cpp(417) : {202} normal block at 0x000001CA91D204D0, 1080 bytes long. Data: < > D0 1C 00 00 CD CD CD CD 0C 01 00 00 00 00 00 00 Object dump complete. ID: 56479 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 56480 - Posted: 13 Feb 2021, 17:49:13 UTC - in response to Message 56479. I doubt it was stuck. These tasks take a LONG time to run. The “memory leaks” messages seem to be common for the Windows app. Just let it go next time. Will probably take ~48hrs to run on a 1060. ID: 56480 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 56481 - Posted: 13 Feb 2021, 18:04:40 UTC Took me 245,275 seconds on a GTX 1050 Ti - that's a smidge under 3 days. Since these tasks update their progress 150 times over a full run, that's 1,635 seconds between updates - almost half an hour. And I got the same memory leak warning, even though the task was successful and validated. Don't despair if nothing seems to be happening - have a cup of coffee and come back later. It's probably still alive. ID: 56481 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 56482 - Posted: 13 Feb 2021, 21:04:40 UTC - in response to Message 56474. Last modified: 13 Feb 2021, 21:06:00 UTC They are all on Ubuntu 18.04/20.04. The error message: ACEMD failed: Error initializing CUDA: CUDA_ERROR_NO_DEVICE (100) at /opt/conda/conda-bld/openmm_1589507810497/work/platforms/cuda/src/CudaContext.cpp:148 looks to me like it was due to a driver update, or some other intervention made your CUDA device inaccessible. That is probably it. I think I was updating to the 460 driver (CUDA 11.2), but I don't know why the second one failed right away. Maybe the driver was not initialized yet after the reboot? ID: 56482 · Rating: 0 · rate: / Reply Quote

goldfinch Send message Joined: 5 May 19 Posts: 36 Credit: 712,058,218 RAC: 9,651 Level Scientific publications	Message 56483 - Posted: 13 Feb 2021, 21:06:34 UTC - in response to Message 56481. Is there a way to configure BOINC to start downloading a second task only when the first one is finishing? I got 2 tasks, each 5M GFLOPS, with the same deadline. Obviously, the second wouldn't be computed on time, so I had to abort it (it was in READY status, so hopefully it will come to someone else). Or, if that's not possible, can I limit BOINC to 1 task only? Less preferrable, as time will be spent on comms and downloads, but better than expiring tasks someone else could've processed... ID: 56483 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 56484 - Posted: 13 Feb 2021, 21:16:30 UTC - in response to Message 56483. Or, if that's not possible, can I limit BOINC to 1 task only? Less preferrable, as time will be spent on comms and downloads, but better than expiring tasks someone else could've processed... I think the problem will be solved after BOINC processes the first one and gets better time estimates. It should not then download the second one. But that begs the question of why don't they either give them better estimates to begin with, or limit them to one? They limited the earlier ones to two anyway. ID: 56484 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 56485 - Posted: 13 Feb 2021, 21:43:02 UTC - in response to Message 56483. Is there a way to configure BOINC to start downloading a second task only when the first one is finishing? You should set your work cache to 1 day, or less. The smallest value is 0.01 days (14 minutes and 24 seconds), this is the unit for this setting. However it will take a couple of workunits to make the processing time estimate accurate. ID: 56485 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 56486 - Posted: 13 Feb 2021, 22:09:50 UTC - in response to Message 56485. You can set the work cache to exactly zero, if you like. In that case, BOINC will nominally initiate the request for work three minutes before the final task is estimated to finish. I say 'nominally', because BOINC only performs the 'work needed?' check once per minute, so the actual request may happen any time between three minutes and two minutes before the new task is needed. ID: 56486 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 56487 - Posted: 13 Feb 2021, 22:12:46 UTC - in response to Message 56486. You can set the work cache to exactly zero, if you like. In that case, BOINC will nominally initiate the request for work three minutes before the final task is estimated to finish. Now that you mention it, won't setting the resource share to zero accomplish the same thing? That is how I often set it anyway. ID: 56487 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 56489 - Posted: 13 Feb 2021, 22:34:26 UTC - in response to Message 56487. Last modified: 13 Feb 2021, 22:36:54 UTC Yup, that's exactly the same. Setting it through BOINC Manager activates it immediately, going via resource share means a visit to the website and a project update. Edit - using 'work cache 0' affects all attached projects, using 'resource share 0' only affects that one project. ID: 56489 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level Scientific publications	Message 56491 - Posted: 13 Feb 2021, 23:30:14 UTC - in response to Message 56484. Or, if that's not possible, can I limit BOINC to 1 task only? Less preferrable, as time will be spent on comms and downloads, but better than expiring tasks someone else could've processed... I think the problem will be solved after BOINC processes the first one and gets better time estimates. It should not then download the second one. But that begs the question of why don't they either give them better estimates to begin with, or limit them to one? They limited the earlier ones to two anyway. One reason is that always having a second task downloaded while the first one is running avoids wasting GPU time by having no task running while it is downloading the second one. I wouldn't mind if it waited for this download until the first task was approaching its finish, though. But note that the estimated time to completion is not a reliable way of deciding when it is about to finish until at least a few tasks have completed successfully. ID: 56491 · Rating: 0 · rate: / Reply Quote

jjch Send message Joined: 10 Nov 13 Posts: 101 Credit: 15,776,211,122 RAC: 73 Level Scientific publications	Message 56494 - Posted: 14 Feb 2021, 3:35:57 UTC - in response to Message 56491. Wouldn't you be able to set the max_concurrent variable in the GPUGRID app_config file to 1? Such as this: <app_config> <app> <name>acemd3</name> <max_concurrent>1</max_concurrent> <gpu_versions> <gpu_usage>1</gpu_usage> <cpu_usage>1</cpu_usage> </gpu_versions> </app> </app_config> ID: 56494 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 56495 - Posted: 14 Feb 2021, 5:07:21 UTC - in response to Message 56494. That only affects how many jobs run concurrently. Not how many get downloaded. ID: 56495 · Rating: 0 · rate: / Reply Quote

clych Send message Joined: 13 Nov 19 Posts: 5 Credit: 8,496,529 RAC: 0 Level Scientific publications	Message 56499 - Posted: 14 Feb 2021, 21:57:38 UTC - in response to Message 56403. Last modified: 14 Feb 2021, 22:01:15 UTC Abort for every task, that shows 0.333%, 1,333% and so on. Every time i look at the app, there is a bad WU, i am abort them and 1:00 minute later next U show me normal work. So... Every second WU working normal. By the way, here is qute from log of the aborted task <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> aborted by user</message> <stderr_txt> 20:41:32 (19816): wrapper (7.9.26016): starting 20:41:32 (19816): wrapper: running acemd3.exe (--boinc input --device 0) 02:44:06 (1896): wrapper (7.9.26016): starting 02:44:06 (1896): wrapper: running acemd3.exe (--boinc input --device 0) Detected memory leaks! Dumping objects -> ..\api\boinc_api.cpp(309) : {1546} normal block at 0x000001C43E1D27E0, 8 bytes long. Data: < > > 00 00 17 3E C4 01 00 00 ..\lib\diagnostics_win.cpp(417) : {205} normal block at 0x000001C43E1D68C0, 1080 bytes long. Data: <0 > 30 0B 00 00 CD CD CD CD C4 01 00 00 00 00 00 00 Object dump complete. 17:20:35 (8916): wrapper (7.9.26016): starting 17:20:35 (8916): wrapper: running acemd3.exe (--boinc input --device 0) 17:30:43 (12568): wrapper (7.9.26016): starting 17:30:43 (12568): wrapper: running acemd3.exe (--boinc input --device 0) Detected memory leaks! Dumping objects -> ..\api\boinc_api.cpp(309) : {1546} normal block at 0x0000029AAAFE2EC0, 8 bytes long. Data: < > 00 00 F9 AA 9A 02 00 00 ..\lib\diagnostics_win.cpp(417) : {205} normal block at 0x0000029AAAFE68C0, 1080 bytes long. Data: < 3 > FC 33 00 00 CD CD CD CD C8 01 00 00 00 00 00 00 Object dump complete. </stderr_txt> ]]> ID: 56499 · Rating: 0 · rate: / Reply Quote

goldfinch Send message Joined: 5 May 19 Posts: 36 Credit: 712,058,218 RAC: 9,651 Level Scientific publications	Message 56500 - Posted: 14 Feb 2021, 22:08:01 UTC - in response to Message 56491. Last modified: 14 Feb 2021, 22:08:48 UTC I wouldn't mind if it waited for this download until the first task was approaching its finish, though. But note that the estimated time to completion is not a reliable way of deciding when it is about to finish until at least a few tasks have completed successfully. Agree. If a second task is downloaded at, say, 80% of the first task, or even 90%, it should work well. E.g., short-term tasks are small, so it won't take long to download them before the first finishes. For large tasks, 10% may still take some 15-30 min at least, so also should be more than enough to complete download. And with the recent large tasks, 10% is a few hours, so definitely, BOINC will make it in time before completing the first task. ID: 56500 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 56501 - Posted: 14 Feb 2021, 22:39:17 UTC - in response to Message 56461. clych wrote: I have aborted this WU, new one much more faster. Sorry, but what you're doing does not make any sense. By now you have 9 aborted WUs. No matter how often you abort, these new monster WUs are just too big for your quite old card to return in time. clych wrote: Abort for every task, that shows 0.333%, 1,333% and so on. Every time i look at the app, there is a bad WU, i am abort them and 1:00 minute later next U show me normal work. So... Every second WU working normal. No, this is wrong. See The answer from Ian&Steve: link BTW: my GTX1070 takes 94k - 95k per WU in Win 10, in line with the 26h reported before under Linux. MrS Scanning for our furry friends since Jan 2002 ID: 56501 · Rating: 0 · rate: / Reply Quote