Project restarted

Message boards : News : Project restarted
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56474 - Posted: 13 Feb 2021, 15:11:47 UTC - in response to Message 56473.  

But another GTX 1070 failed twice.
http://www.gpugrid.net/results.php?hostid=528983
The first time was due to a reboot, and then the next one failed immediately thereafter.

They are all on Ubuntu 18.04/20.04.
The error message:
ACEMD failed:
    Error initializing CUDA: CUDA_ERROR_NO_DEVICE (100) at /opt/conda/conda-bld/openmm_1589507810497/work/platforms/cuda/src/CudaContext.cpp:148
looks to me like it was due to a driver update, or some other intervention made your CUDA device inaccessible.
ID: 56474 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
RJ The Bike Guy

Send message
Joined: 2 Apr 20
Posts: 20
Credit: 35,363,533
RAC: 0
Level
Val
Scientific publications
wat
Message 56475 - Posted: 13 Feb 2021, 15:55:03 UTC

I aborted the jobs for my GT 730 as there was no way they were going to finish them in time. And I suspended that computer from taking work. My GTX 1660 seems to take about a day and a half to finish the jobs.

Maybe make the WUs smaller or increase the deadlines? My GT 730 previously did quite a bit of work. I currently have that computer doing nothing but boinc stuff.
ID: 56475 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bormolino

Send message
Joined: 16 May 13
Posts: 41
Credit: 145,731,947
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 56476 - Posted: 13 Feb 2021, 16:05:20 UTC - in response to Message 56475.  


Maybe make the WUs smaller or increase the deadlines? My GT 730 previously did quite a bit of work. I currently have that computer doing nothing but boinc stuff.


Same here with a small GT 710. But the PC always runs 24/7 an did a few WUs per week.

It would be a pity if I could not continue to use this card :/

ID: 56476 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
RJ The Bike Guy

Send message
Joined: 2 Apr 20
Posts: 20
Credit: 35,363,533
RAC: 0
Level
Val
Scientific publications
wat
Message 56477 - Posted: 13 Feb 2021, 17:06:07 UTC - in response to Message 56476.  


Same here with a small GT 710. But the PC always runs 24/7 an did a few WUs per week.

It would be a pity if I could not continue to use this card :/


I think my GT 730 took about 22hours per WU. I think the current ones would have taken about 10 days.
ID: 56477 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BlueGhost

Send message
Joined: 12 Oct 16
Posts: 1
Credit: 215,247,623
RAC: 0
Level
Leu
Scientific publications
watwatwatwat
Message 56479 - Posted: 13 Feb 2021, 17:44:33 UTC

Work units got stuck at 14% and were canceled after 10 hours of running overnight

workunit 27023967
e2s547_e1s101p0f147-ADRIA_D3RBandit_batch1-0-1-RND7525_0

workunit 27023977
e2s557_e1s68p0f131-ADRIA_D3RBandit_batch1-0-1-RND8450_0

Log

Stderr Output

<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
aborted by user</message>
<stderr_txt>
21:59:52 (17660): wrapper (7.9.26016): starting
21:59:52 (17660): wrapper: running acemd3.exe (--boinc input --device 0)
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {1599} normal block at 0x000001FE55531F80, 8 bytes long.
Data: < NU > 00 00 4E 55 FE 01 00 00
..\lib\diagnostics_win.cpp(417) : {202} normal block at 0x000001FE55530DC0, 1080 bytes long.
Data: < > EC 0E 00 00 CD CD CD CD 0C 01 00 00 00 00 00 00
Object dump complete.
23:49:15 (10692): wrapper (7.9.26016): starting
23:49:15 (10692): wrapper: running acemd3.exe (--boinc input --device 0)
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {1599} normal block at 0x0000027A5CB916D0, 8 bytes long.
Data: < \z > 00 00 B4 5C 7A 02 00 00
..\lib\diagnostics_win.cpp(417) : {202} normal block at 0x0000027A5CB900B0, 1080 bytes long.
Data: <DJ > 44 4A 00 00 CD CD CD CD B0 00 00 00 00 00 00 00
Object dump complete.
20:24:59 (4020): wrapper (7.9.26016): starting
20:24:59 (4020): wrapper: running acemd3.exe (--boinc input --device 0)
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {1599} normal block at 0x000001525997F2E0, 8 bytes long.
Data: < 1[R > 00 00 31 5B 52 01 00 00
..\lib\diagnostics_win.cpp(417) : {202} normal block at 0x000001525997FE90, 1080 bytes long.
Data: < 1 X > 04 31 00 00 CD CD CD CD 58 00 00 00 00 00 00 00
Object dump complete.
23:10:11 (3712): wrapper (7.9.26016): starting
23:10:11 (3712): wrapper: running acemd3.exe (--boinc input --device 0)
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {1599} normal block at 0x000001CA91D1FCA0, 8 bytes long.
Data: < > 00 00 CF 91 CA 01 00 00
..\lib\diagnostics_win.cpp(417) : {202} normal block at 0x000001CA91D204D0, 1080 bytes long.
Data: < > D0 1C 00 00 CD CD CD CD 0C 01 00 00 00 00 00 00
Object dump complete.

ID: 56479 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56480 - Posted: 13 Feb 2021, 17:49:13 UTC - in response to Message 56479.  

I doubt it was stuck. These tasks take a LONG time to run.

The “memory leaks” messages seem to be common for the Windows app.

Just let it go next time. Will probably take ~48hrs to run on a 1060.
ID: 56480 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56481 - Posted: 13 Feb 2021, 18:04:40 UTC

Took me 245,275 seconds on a GTX 1050 Ti - that's a smidge under 3 days. Since these tasks update their progress 150 times over a full run, that's 1,635 seconds between updates - almost half an hour.

And I got the same memory leak warning, even though the task was successful and validated.

Don't despair if nothing seems to be happening - have a cup of coffee and come back later. It's probably still alive.
ID: 56481 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56482 - Posted: 13 Feb 2021, 21:04:40 UTC - in response to Message 56474.  
Last modified: 13 Feb 2021, 21:06:00 UTC

They are all on Ubuntu 18.04/20.04.
The error message:
ACEMD failed:
    Error initializing CUDA: CUDA_ERROR_NO_DEVICE (100) at /opt/conda/conda-bld/openmm_1589507810497/work/platforms/cuda/src/CudaContext.cpp:148
looks to me like it was due to a driver update, or some other intervention made your CUDA device inaccessible.


That is probably it. I think I was updating to the 460 driver (CUDA 11.2), but I don't know why the second one failed right away. Maybe the driver was not initialized yet after the reboot?
ID: 56482 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
goldfinch

Send message
Joined: 5 May 19
Posts: 36
Credit: 711,308,218
RAC: 41,661
Level
Lys
Scientific publications
wat
Message 56483 - Posted: 13 Feb 2021, 21:06:34 UTC - in response to Message 56481.  

Is there a way to configure BOINC to start downloading a second task only when the first one is finishing? I got 2 tasks, each 5M GFLOPS, with the same deadline. Obviously, the second wouldn't be computed on time, so I had to abort it (it was in READY status, so hopefully it will come to someone else). Or, if that's not possible, can I limit BOINC to 1 task only? Less preferrable, as time will be spent on comms and downloads, but better than expiring tasks someone else could've processed...
ID: 56483 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56484 - Posted: 13 Feb 2021, 21:16:30 UTC - in response to Message 56483.  

Or, if that's not possible, can I limit BOINC to 1 task only? Less preferrable, as time will be spent on comms and downloads, but better than expiring tasks someone else could've processed...

I think the problem will be solved after BOINC processes the first one and gets better time estimates. It should not then download the second one.

But that begs the question of why don't they either give them better estimates to begin with, or limit them to one? They limited the earlier ones to two anyway.
ID: 56484 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56485 - Posted: 13 Feb 2021, 21:43:02 UTC - in response to Message 56483.  

Is there a way to configure BOINC to start downloading a second task only when the first one is finishing?
You should set your work cache to 1 day, or less. The smallest value is 0.01 days (14 minutes and 24 seconds), this is the unit for this setting.
However it will take a couple of workunits to make the processing time estimate accurate.
ID: 56485 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56486 - Posted: 13 Feb 2021, 22:09:50 UTC - in response to Message 56485.  

You can set the work cache to exactly zero, if you like. In that case, BOINC will nominally initiate the request for work three minutes before the final task is estimated to finish. I say 'nominally', because BOINC only performs the 'work needed?' check once per minute, so the actual request may happen any time between three minutes and two minutes before the new task is needed.
ID: 56486 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56487 - Posted: 13 Feb 2021, 22:12:46 UTC - in response to Message 56486.  

You can set the work cache to exactly zero, if you like. In that case, BOINC will nominally initiate the request for work three minutes before the final task is estimated to finish.

Now that you mention it, won't setting the resource share to zero accomplish the same thing? That is how I often set it anyway.
ID: 56487 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56489 - Posted: 13 Feb 2021, 22:34:26 UTC - in response to Message 56487.  
Last modified: 13 Feb 2021, 22:36:54 UTC

Yup, that's exactly the same. Setting it through BOINC Manager activates it immediately, going via resource share means a visit to the website and a project update.

Edit - using 'work cache 0' affects all attached projects, using 'resource share 0' only affects that one project.
ID: 56489 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56491 - Posted: 13 Feb 2021, 23:30:14 UTC - in response to Message 56484.  

Or, if that's not possible, can I limit BOINC to 1 task only? Less preferrable, as time will be spent on comms and downloads, but better than expiring tasks someone else could've processed...

I think the problem will be solved after BOINC processes the first one and gets better time estimates. It should not then download the second one.

But that begs the question of why don't they either give them better estimates to begin with, or limit them to one? They limited the earlier ones to two anyway.

One reason is that always having a second task downloaded while the first one is running avoids wasting GPU time by having no task running while it is downloading the second one.

I wouldn't mind if it waited for this download until the first task was approaching its finish, though. But note that the estimated time to completion is not a reliable way of deciding when it is about to finish until at least a few tasks have completed successfully.
ID: 56491 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jjch

Send message
Joined: 10 Nov 13
Posts: 101
Credit: 15,773,211,122
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56494 - Posted: 14 Feb 2021, 3:35:57 UTC - in response to Message 56491.  

Wouldn't you be able to set the max_concurrent variable in the GPUGRID app_config file to 1?

Such as this:

<app_config>

<app>
<name>acemd3</name>
<max_concurrent>1</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>

</app_config>
ID: 56494 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56495 - Posted: 14 Feb 2021, 5:07:21 UTC - in response to Message 56494.  

That only affects how many jobs run concurrently. Not how many get downloaded.
ID: 56495 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
clych

Send message
Joined: 13 Nov 19
Posts: 5
Credit: 8,496,529
RAC: 0
Level
Ser
Scientific publications
wat
Message 56499 - Posted: 14 Feb 2021, 21:57:38 UTC - in response to Message 56403.  
Last modified: 14 Feb 2021, 22:01:15 UTC

Abort for every task, that shows 0.333%, 1,333% and so on.
Every time i look at the app, there is a bad WU, i am abort them and 1:00 minute later next U show me normal work.

So... Every second WU working normal.


By the way, here is qute from log of the aborted task

<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
aborted by user</message>
<stderr_txt>
20:41:32 (19816): wrapper (7.9.26016): starting
20:41:32 (19816): wrapper: running acemd3.exe (--boinc input --device 0)
02:44:06 (1896): wrapper (7.9.26016): starting
02:44:06 (1896): wrapper: running acemd3.exe (--boinc input --device 0)
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {1546} normal block at 0x000001C43E1D27E0, 8 bytes long.
 Data: <   >    > 00 00 17 3E C4 01 00 00 
..\lib\diagnostics_win.cpp(417) : {205} normal block at 0x000001C43E1D68C0, 1080 bytes long.
 Data: <0               > 30 0B 00 00 CD CD CD CD C4 01 00 00 00 00 00 00 
Object dump complete.
17:20:35 (8916): wrapper (7.9.26016): starting
17:20:35 (8916): wrapper: running acemd3.exe (--boinc input --device 0)
17:30:43 (12568): wrapper (7.9.26016): starting
17:30:43 (12568): wrapper: running acemd3.exe (--boinc input --device 0)
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {1546} normal block at 0x0000029AAAFE2EC0, 8 bytes long.
 Data: <        > 00 00 F9 AA 9A 02 00 00 
..\lib\diagnostics_win.cpp(417) : {205} normal block at 0x0000029AAAFE68C0, 1080 bytes long.
 Data: < 3              > FC 33 00 00 CD CD CD CD C8 01 00 00 00 00 00 00 
Object dump complete.

</stderr_txt>
]]>
ID: 56499 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
goldfinch

Send message
Joined: 5 May 19
Posts: 36
Credit: 711,308,218
RAC: 41,661
Level
Lys
Scientific publications
wat
Message 56500 - Posted: 14 Feb 2021, 22:08:01 UTC - in response to Message 56491.  
Last modified: 14 Feb 2021, 22:08:48 UTC

I wouldn't mind if it waited for this download until the first task was approaching its finish, though. But note that the estimated time to completion is not a reliable way of deciding when it is about to finish until at least a few tasks have completed successfully.

Agree. If a second task is downloaded at, say, 80% of the first task, or even 90%, it should work well. E.g., short-term tasks are small, so it won't take long to download them before the first finishes. For large tasks, 10% may still take some 15-30 min at least, so also should be more than enough to complete download. And with the recent large tasks, 10% is a few hours, so definitely, BOINC will make it in time before completing the first task.
ID: 56500 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56501 - Posted: 14 Feb 2021, 22:39:17 UTC - in response to Message 56461.  

clych wrote:
I have aborted this WU, new one much more faster.

Sorry, but what you're doing does not make any sense. By now you have 9 aborted WUs. No matter how often you abort, these new monster WUs are just too big for your quite old card to return in time.

clych wrote:
Abort for every task, that shows 0.333%, 1,333% and so on.
Every time i look at the app, there is a bad WU, i am abort them and 1:00 minute later next U show me normal work.

So... Every second WU working normal.

No, this is wrong. See The answer from Ian&Steve: link

BTW: my GTX1070 takes 94k - 95k per WU in Win 10, in line with the 26h reported before under Linux.

MrS
Scanning for our furry friends since Jan 2002
ID: 56501 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : News : Project restarted

©2025 Universitat Pompeu Fabra