Advanced search

Message boards : News : Project restarted

Author Message
Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 988
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 56118 - Posted: 21 Dec 2020 | 13:53:44 UTC

Sorry for the downtime.

WPrion
Send message
Joined: 30 Apr 13
Posts: 81
Credit: 1,064,621,611
RAC: 289
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56125 - Posted: 21 Dec 2020 | 17:03:21 UTC - in response to Message 56118.
Last modified: 21 Dec 2020 | 17:04:57 UTC


https://ibb.co/cTqYNMZ

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 73
Credit: 78,583,439
RAC: 4
Level
Thr
Scientific publications
watwatwatwatwatwat
Message 56128 - Posted: 21 Dec 2020 | 23:05:27 UTC - in response to Message 56118.

What happened?


Sorry for the downtime.

10esseeTony
Send message
Joined: 4 Feb 15
Posts: 8
Credit: 1,142,244,649
RAC: 62
Level
Met
Scientific publications
watwatwatwatwatwat
Message 56129 - Posted: 21 Dec 2020 | 23:19:21 UTC - in response to Message 56128.

Outta disk space.

Greger
Send message
Joined: 6 Jan 15
Posts: 48
Credit: 6,096,382,366
RAC: 59,156
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 56130 - Posted: 21 Dec 2020 | 23:39:42 UTC
Last modified: 21 Dec 2020 | 23:41:06 UTC

Thanks Toni for dealing with this in busy time before Christmas.

Have good one.

johnnymc
Avatar
Send message
Joined: 7 Apr 11
Posts: 6
Credit: 92,079,090
RAC: 2,439
Level
Thr
Scientific publications
watwat
Message 56135 - Posted: 22 Dec 2020 | 9:49:47 UTC

Many Thanks Toni for keeping this project under control. Merry Christmas!
____________
Life's short; make fun of it!

Kolossus
Send message
Joined: 26 Feb 12
Posts: 2
Credit: 203,877,275
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwat
Message 56136 - Posted: 22 Dec 2020 | 14:58:03 UTC
Last modified: 22 Dec 2020 | 14:58:49 UTC

when will GPUGrid support G-Force 3000?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 324
Credit: 3,365,990,576
RAC: 4,059,647
Level
Arg
Scientific publications
wat
Message 56139 - Posted: 22 Dec 2020 | 15:27:10 UTC - in response to Message 56136.

when will GPUGrid support G-Force 3000?

+1 :)
____________

ZUSE
Send message
Joined: 10 Jun 20
Posts: 1
Credit: 28,629,698
RAC: 86,870
Level
Val
Scientific publications
wat
Message 56167 - Posted: 28 Dec 2020 | 14:07:55 UTC

Hello, Is there already a possibility to use RTX 3000 under Windows 10 (Boinc 7.16.11)?

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,200,441,910
RAC: 243,507
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 56168 - Posted: 28 Dec 2020 | 14:19:16 UTC - in response to Message 56167.
Last modified: 28 Dec 2020 | 14:21:46 UTC

Hello, Is there already a possibility to use RTX 3000 under Windows 10 (Boinc 7.16.11)?


Not yet. Keep an eye on the News forum for any updates. There will be numerous posts on this subject when it happens.
From past experience it takes 6 months, or more. But we can always hope it will be sooner.

Kolossus
Send message
Joined: 26 Feb 12
Posts: 2
Credit: 203,877,275
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwat
Message 56224 - Posted: 1 Jan 2021 | 21:06:29 UTC

Why is it taking so long to complete support for the 3000s?
The same problem was to support the 2000s. No one will use old Graphics cards

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,200,441,910
RAC: 243,507
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 56225 - Posted: 1 Jan 2021 | 22:17:35 UTC - in response to Message 56224.

No one will use old Graphics cards


If you check this post, GTX 10xx cards are the most common GPU
http://www.gpugrid.net/forum_thread.php?id=5194&nowrap=true#55687

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 688
Credit: 904,899,505
RAC: 377,648
Level
Glu
Scientific publications
watwatwatwatwat
Message 56226 - Posted: 1 Jan 2021 | 23:48:05 UTC - in response to Message 56224.

Why is it taking so long to complete support for the 3000s?
The same problem was to support the 2000s. No one will use old Graphics cards

Who knows!?
Low priority I assume from the developers standpoint.

I would think that only a very small code change where the CC capability of the card is probed would be needed.

Just open up the variable limits to include up to CC8.5 and CC8.6 for the Ampere cards.

I think that is all they did to get the Turing cards compatible with the app.

Dayle Diamond
Send message
Joined: 5 Dec 12
Posts: 79
Credit: 1,550,002,318
RAC: 43,711
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56227 - Posted: 2 Jan 2021 | 1:51:38 UTC

With the current round of work units wrapping up in a matter of hours, it probably is easier to let them finish before updating the app.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 688
Credit: 904,899,505
RAC: 377,648
Level
Glu
Scientific publications
watwatwatwatwat
Message 56228 - Posted: 2 Jan 2021 | 2:07:18 UTC

You might be onto something. Could be a great stopping point to update the app with no more current work being distributed.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 324
Credit: 3,365,990,576
RAC: 4,059,647
Level
Arg
Scientific publications
wat
Message 56230 - Posted: 2 Jan 2021 | 17:47:56 UTC - in response to Message 56228.

You might be onto something. Could be a great stopping point to update the app with no more current work being distributed.

We can only hope.
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 324
Credit: 3,365,990,576
RAC: 4,059,647
Level
Arg
Scientific publications
wat
Message 56231 - Posted: 2 Jan 2021 | 17:49:16 UTC - in response to Message 56228.

You might be onto something. Could be a great stopping point to update the app with no more current work being distributed.

We can only hope. I would imagine any Ampere support would hit the beta apps first, and I’m honestly surprised they haven’t tried adding Ampere support there in the recent rounds of beta tasks.
____________

ub
Send message
Joined: 27 May 20
Posts: 2
Credit: 1,915,819
RAC: 0
Level
Ala
Scientific publications
wat
Message 56242 - Posted: 3 Jan 2021 | 22:45:35 UTC

I'm confused. Is GPUGRID officially back up and running since Dec 22/2020?
My 2 Intel systems (both with Nivdia cards & Linuxmint 20) haven't had any GPUGrid jobs in several days. I've attempted more than once to communicate with server. Nothing. I've resorted to using my systems for other projects requiring Nvidia as a result.

RJ The Bike Guy
Send message
Joined: 2 Apr 20
Posts: 16
Credit: 21,672,595
RAC: 64,924
Level
Pro
Scientific publications
wat
Message 56243 - Posted: 4 Jan 2021 | 2:33:00 UTC - in response to Message 56242.

I'm confused. Is GPUGRID officially back up and running since Dec 22/2020?
My 2 Intel systems (both with Nivdia cards & Linuxmint 20) haven't had any GPUGrid jobs in several days. I've attempted more than once to communicate with server. Nothing. I've resorted to using my systems for other projects requiring Nvidia as a result.


They fixed problem they had before. But now they are out of work units.
http://www.gpugrid.net/server_status.php

ub
Send message
Joined: 27 May 20
Posts: 2
Credit: 1,915,819
RAC: 0
Level
Ala
Scientific publications
wat
Message 56244 - Posted: 4 Jan 2021 | 2:43:02 UTC - in response to Message 56243.

Thanks for the update. Now I understand.

Edd's-Pc
Send message
Joined: 14 Dec 20
Posts: 2
Credit: 921,778
RAC: 1
Level
Gly
Scientific publications
wat
Message 56247 - Posted: 7 Jan 2021 | 16:09:25 UTC

Does anyone know when the new work units will be available?

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 345
Credit: 1,927,950,084
RAC: 392,962
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56248 - Posted: 7 Jan 2021 | 16:40:07 UTC - in response to Message 56247.

Does anyone know when the new work units will be available?

169 WUs in progress at this time, and decreasing.
If "0 WUs in progress" is a must condition for this, it is about to happen...

Edd's-Pc
Send message
Joined: 14 Dec 20
Posts: 2
Credit: 921,778
RAC: 1
Level
Gly
Scientific publications
wat
Message 56249 - Posted: 7 Jan 2021 | 16:53:26 UTC - in response to Message 56248.

Cheers for reply. Seems like those units are in progress forever.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 324
Credit: 3,365,990,576
RAC: 4,059,647
Level
Arg
Scientific publications
wat
Message 56250 - Posted: 7 Jan 2021 | 17:32:47 UTC - in response to Message 56249.

Cheers for reply. Seems like those units are in progress forever.


it's inevitable with BOINC projects since some users download the tasks, then shut their system off, or have some system problem preventing timely work completion. GPUGRID is better than most in this regard with the short 5-day deadlines. so it gets re-sent rather quickly.

____________

Greger
Send message
Joined: 6 Jan 15
Posts: 48
Credit: 6,096,382,366
RAC: 59,156
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 56263 - Posted: 11 Jan 2021 | 18:06:10 UTC

Toni: When can we expect a new flow of work?

johndad5
Send message
Joined: 14 Nov 10
Posts: 6
Credit: 20,336,399
RAC: 1
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 56275 - Posted: 13 Jan 2021 | 5:22:32 UTC - in response to Message 56263.

I second the question of when will new tasks become available.

mikey
Send message
Joined: 2 Jan 09
Posts: 283
Credit: 555,276,959
RAC: 4,871
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56276 - Posted: 13 Jan 2021 | 12:10:36 UTC - in response to Message 56275.

Me three

JohnMK
Send message
Joined: 1 Apr 20
Posts: 1
Credit: 37,337,456
RAC: 6,868
Level
Val
Scientific publications
wat
Message 56281 - Posted: 14 Jan 2021 | 5:46:26 UTC

No new work since Jan 2, 2021. What is going on?

____________

Frank [RKN]
Send message
Joined: 30 Sep 17
Posts: 2
Credit: 103,453,955
RAC: 750
Level
Cys
Scientific publications
wat
Message 56289 - Posted: 15 Jan 2021 | 13:41:58 UTC - in response to Message 56281.

No new work since Jan 2, 2021. What is going on?

Nothing...
Maybe Toni stuck in the snow ?

johndad5
Send message
Joined: 14 Nov 10
Posts: 6
Credit: 20,336,399
RAC: 1
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 56290 - Posted: 15 Jan 2021 | 15:18:10 UTC
Last modified: 15 Jan 2021 | 15:18:34 UTC

No contact with Tony for 25 days! Hope he has not been struck by Covid!

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 200
Credit: 337,048,583
RAC: 83,503
Level
Asp
Scientific publications
watwat
Message 56292 - Posted: 15 Jan 2021 | 20:50:07 UTC - in response to Message 56289.

No new work since Jan 2, 2021. What is going on?

Nothing...
Maybe Toni stuck in the snow ?


Hi all,
Winter Break occurs around the first of the year for most colleges. This batch likely ran out just as the break was starting. The next batch would probably be delayed by that.
- Hopefully we'll get a notice if this project is in the computation completed stage and no more batches are to come. In that case we will have to wait for the next project, which could be a 'good while'.

Meanwhile, give GPUGRID a much higher priority than your backup projects. That way when more work is available your boinc manager will automatically download and start GPUGRID tasks, pausing the backup projects.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 492
Credit: 559,634,377
RAC: 19,638
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56294 - Posted: 16 Jan 2021 | 1:19:18 UTC - in response to Message 56292.

No new work since Jan 2, 2021. What is going on?

Nothing...
Maybe Toni stuck in the snow ?


Hi all,
Winter Break occurs around the first of the year for most colleges. This batch likely ran out just as the break was starting. The next batch would probably be delayed by that.
- Hopefully we'll get a notice if this project is in the computation completed stage and no more batches are to come. In that case we will have to wait for the next project, which could be a 'good while'.

You should be able to hurry up the next project - by donating enough that they will reach the estimated 50,000 euros necessary to hire and support an intern for one year.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 200
Credit: 337,048,583
RAC: 83,503
Level
Asp
Scientific publications
watwat
Message 56295 - Posted: 16 Jan 2021 | 3:00:23 UTC - in response to Message 56294.
Last modified: 16 Jan 2021 | 3:45:34 UTC

You should be able to hurry up the next project - by donating enough that they will reach the estimated 50,000 euros necessary to hire and support an intern for one year.


Or, I can crunch for Folding@home, which is monetarily supported both technically and informationally by its host institution, and donate to the Covid Moonshot project so that third world nations will have access to a cure.

Please watch==>
https://www.youtube.com/watch?v=VnyaAmM1nhE

I'd gladly donate, but it's a tough decision when You're on a pension similar to what an intern earns.

Pietro
Send message
Joined: 3 Nov 19
Posts: 1
Credit: 11,997,473
RAC: 18,061
Level
Pro
Scientific publications
wat
Message 56296 - Posted: 16 Jan 2021 | 13:51:20 UTC - in response to Message 56118.

Hi,
the project is still down?
I have tried to remove and install the project in BOINC, but any task run.

Thank you for your availability and collaboration.

KR
Pietro
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1095
Credit: 3,258,384,910
RAC: 181,881
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56297 - Posted: 16 Jan 2021 | 16:12:37 UTC - in response to Message 56296.

On any BOINC project: check the server status page.

https://www.gpugrid.net/server_status.php

The project is running (we can type here), but there is no work available.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 988
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 56305 - Posted: 18 Jan 2021 | 8:52:55 UTC - in response to Message 56289.

No new work since Jan 2, 2021. What is going on?

Nothing...
Maybe Toni stuck in the snow ?


Worse, stuck in grant writing... :(
Sorry, I will attend later when I can.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 324
Credit: 3,365,990,576
RAC: 4,059,647
Level
Arg
Scientific publications
wat
Message 56306 - Posted: 18 Jan 2021 | 17:01:22 UTC - in response to Message 56305.

No new work since Jan 2, 2021. What is going on?

Nothing...
Maybe Toni stuck in the snow ?


Worse, stuck in grant writing... :(
Sorry, I will attend later when I can.


Any chance to have a CUDA 11.1/11.2 app compiled for Ampere support before we come back?
____________

Greg _BE
Send message
Joined: 30 Jun 14
Posts: 73
Credit: 78,583,439
RAC: 4
Level
Thr
Scientific publications
watwatwatwatwatwat
Message 56307 - Posted: 19 Jan 2021 | 0:51:19 UTC - in response to Message 56305.

No new work since Jan 2, 2021. What is going on?

Nothing...
Maybe Toni stuck in the snow ?


Worse, stuck in grant writing... :(
Sorry, I will attend later when I can.


Your machine is hungry. It's calling out to us that it is starving.

bozz4science
Send message
Joined: 22 May 20
Posts: 75
Credit: 8,016,653
RAC: 40,942
Level
Ser
Scientific publications
wat
Message 56308 - Posted: 19 Jan 2021 | 20:55:54 UTC

I just saw some WUs popping up. Does that mean the project is slowly starting new work generation again?

RJ The Bike Guy
Send message
Joined: 2 Apr 20
Posts: 16
Credit: 21,672,595
RAC: 64,924
Level
Pro
Scientific publications
wat
Message 56311 - Posted: 19 Jan 2021 | 22:43:12 UTC

I see 10 tasks in progress now. Hmmmm

Profile Bill F
Avatar
Send message
Joined: 21 Nov 16
Posts: 7
Credit: 13,977,869
RAC: 4,112
Level
Pro
Scientific publications
wat
Message 56367 - Posted: 4 Feb 2021 | 4:02:28 UTC

Well we are getting almost no work units these days and I can't tell if the WU's coming out are new or recycled from prior users after a timeout or WU error.

It would be wonderful for an Admin or moderator to post current Project status and news.

Thanks
Bill F

____________
In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.


Erich56
Send message
Joined: 1 Jan 15
Posts: 756
Credit: 3,402,889,227
RAC: 6,146
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56368 - Posted: 4 Feb 2021 | 7:03:02 UTC - in response to Message 56367.


It would be wonderful for an Admin or moderator to post current Project status and news.

+ 1

RJ The Bike Guy
Send message
Joined: 2 Apr 20
Posts: 16
Credit: 21,672,595
RAC: 64,924
Level
Pro
Scientific publications
wat
Message 56369 - Posted: 4 Feb 2021 | 19:26:25 UTC - in response to Message 56367.

Well we are getting almost no work units these days and I can't tell if the WU's coming out are new or recycled from prior users after a timeout or WU error.

It would be wonderful for an Admin or moderator to post current Project status and news.

Thanks
Bill F


+2

BarryAZ
Send message
Joined: 16 Apr 09
Posts: 163
Credit: 908,263,446
RAC: 34,519
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56370 - Posted: 4 Feb 2021 | 22:15:39 UTC - in response to Message 56369.

One wonders if the project has in fact been restarted or is going thru a period of hibernation.

It would be wonderful for an Admin or moderator to post current Project status and news.

Profile BladeD
Send message
Joined: 1 May 11
Posts: 9
Credit: 143,458,070
RAC: 26,675
Level
Cys
Scientific publications
watwatwat
Message 56371 - Posted: 5 Feb 2021 | 0:28:39 UTC - in response to Message 56370.

One wonders if the project has in fact been restarted or is going thru a period of hibernation.

It would be wonderful for an Admin or moderator to post current Project status and news.

+1
____________

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 492
Credit: 559,634,377
RAC: 19,638
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56372 - Posted: 5 Feb 2021 | 2:31:53 UTC - in response to Message 56370.

One wonders if the project has in fact been restarted or is going thru a period of hibernation.

It would be wonderful for an Admin or moderator to post current Project status and news.

The last I read, the project staff was putting most of their time into seeking a grant to pay for the next thing they will do. Whoever provides that grant will probably have a large influence on what that next thing is.

Drago
Send message
Joined: 3 May 20
Posts: 1
Credit: 13,436,992
RAC: 85
Level
Pro
Scientific publications
wat
Message 56391 - Posted: 10 Feb 2021 | 17:49:34 UTC

I just got two new WU's for my two PC's but they seem to be huge! 5.000.000 GFLOPS, holy cow! It will take a while to get those babies done! After 1h they are at 3%!

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 345
Credit: 1,927,950,084
RAC: 392,962
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56395 - Posted: 10 Feb 2021 | 21:48:40 UTC

This is to thank once more all the efforts involved in bringing new tasks available for us, voracious crunchers.
I guess they don't grow spontaneously, but there is to be some kind of thinking minds and corporate resources behind, isn't it?
Thanks again!

programagor
Send message
Joined: 3 Feb 21
Posts: 5
Credit: 1,046,250
RAC: 695
Level
Ala
Scientific publications
wat
Message 56399 - Posted: 11 Feb 2021 | 4:57:23 UTC
Last modified: 11 Feb 2021 | 4:58:29 UTC

All WUs that I receive now are failing, I don't see any errors but I don't really know where to look. I'm running Debian 10 on a machine with Optimus, so my NVIDIA GPU has device id 1 (at id 0 I have an intel graphics card). My NVIDIA driver is version 460.39

When I run the acemd3 binary directly, I get this output:

# Looking for node-locked license in [/opt/acellera/license.dat]
# Looking for node-locked license in [/opt/acellera/.acellera/license.dat]
# Looking for node-locked license in [/opt/acellera/.htmd/license.dat]
# Looking for node-locked license in [/root/license.dat]
# Looking for node-locked license in [/root/.acellera/license.dat]
# Looking for node-locked license in [/root/.htmd/license.dat]
#
# ACEMD is running with a basic licence!
# Contact Acellera (info@acellera.com) for licencing.
#
# WARNING: ACEMD is limited to run on the GPU device 0 only!

Anyone has any suggestions? This is the first time I'm trying to crunch tasks for GPUGrid, so I'm not sure if the problem is with these new WUs or if it's my setup.

clych
Send message
Joined: 13 Nov 19
Posts: 5
Credit: 8,496,529
RAC: 1
Level
Ser
Scientific publications
wat
Message 56400 - Posted: 11 Feb 2021 | 5:15:20 UTC - in response to Message 56391.
Last modified: 11 Feb 2021 | 5:16:03 UTC

2,666% after 09:29 and deadliine 15/02...

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 324
Credit: 3,365,990,576
RAC: 4,059,647
Level
Arg
Scientific publications
wat
Message 56401 - Posted: 11 Feb 2021 | 5:41:54 UTC - in response to Message 56399.

All WUs that I receive now are failing, I don't see any errors but I don't really know where to look. I'm running Debian 10 on a machine with Optimus, so my NVIDIA GPU has device id 1 (at id 0 I have an intel graphics card). My NVIDIA driver is version 460.39

When I run the acemd3 binary directly, I get this output:
# Looking for node-locked license in [/opt/acellera/license.dat]
# Looking for node-locked license in [/opt/acellera/.acellera/license.dat]
# Looking for node-locked license in [/opt/acellera/.htmd/license.dat]
# Looking for node-locked license in [/root/license.dat]
# Looking for node-locked license in [/root/.acellera/license.dat]
# Looking for node-locked license in [/root/.htmd/license.dat]
#
# ACEMD is running with a basic licence!
# Contact Acellera (info@acellera.com) for licencing.
#
# WARNING: ACEMD is limited to run on the GPU device 0 only!

Anyone has any suggestions? This is the first time I'm trying to crunch tasks for GPUGrid, so I'm not sure if the problem is with these new WUs or if it's my setup.


All of the failed tasks in your host stats seem to indicate a problem with your drivers. It’s throwing CUDA errors. Try re-installing them.

____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 324
Credit: 3,365,990,576
RAC: 4,059,647
Level
Arg
Scientific publications
wat
Message 56402 - Posted: 11 Feb 2021 | 5:43:11 UTC - in response to Message 56400.
Last modified: 11 Feb 2021 | 5:43:35 UTC

2,666% after 09:29 and deadliine 15/02...

At that rate, you’ll take over 14 days to complete the task. Deadlines are 5 days.

I think your GTX 650 is just too slow for these tasks.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 345
Credit: 1,927,950,084
RAC: 392,962
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56403 - Posted: 11 Feb 2021 | 6:30:59 UTC
Last modified: 11 Feb 2021 | 7:07:58 UTC

2,666% after 09:29 and deadliine 15/02...

At that rate, you’ll take over 14 days to complete the task. Deadlines are 5 days.

I think your GTX 650 is just too slow for these tasks.

I agree.
These new ADRIA tasks are highly performance-demanding.
Every graphics cards below an (overclocked) GTX 750 TI GPU working 24/7, are for sure missing the 120 hours deadline.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,200,441,910
RAC: 243,507
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 56404 - Posted: 11 Feb 2021 | 6:38:42 UTC - in response to Message 56399.

All WUs that I receive now are failing, I don't see any errors but I don't really know where to look. I'm running Debian 10 on a machine with Optimus, so my NVIDIA GPU has device id 1 (at id 0 I have an intel graphics card). My NVIDIA driver is version 460.39

When I run the acemd3 binary directly, I get this output:
# Looking for node-locked license in [/opt/acellera/license.dat]
# Looking for node-locked license in [/opt/acellera/.acellera/license.dat]
# Looking for node-locked license in [/opt/acellera/.htmd/license.dat]
# Looking for node-locked license in [/root/license.dat]
# Looking for node-locked license in [/root/.acellera/license.dat]
# Looking for node-locked license in [/root/.htmd/license.dat]
#
# ACEMD is running with a basic licence!
# Contact Acellera (info@acellera.com) for licencing.
#
# WARNING: ACEMD is limited to run on the GPU device 0 only!

Anyone has any suggestions? This is the first time I'm trying to crunch tasks for GPUGrid, so I'm not sure if the problem is with these new WUs or if it's my setup.

All your failed work units have failed on all other Hosts who tried the same work unit.
You have been unlucky in receiving a bad batch.
Keep trying, you may receive a valid Work Unit

programagor
Send message
Joined: 3 Feb 21
Posts: 5
Credit: 1,046,250
RAC: 695
Level
Ala
Scientific publications
wat
Message 56406 - Posted: 11 Feb 2021 | 8:41:06 UTC - in response to Message 56404.

All WUs that I receive now are failing, I don't see any errors but I don't really know where to look. I'm running Debian 10 on a machine with Optimus, so my NVIDIA GPU has device id 1 (at id 0 I have an intel graphics card). My NVIDIA driver is version 460.39

When I run the acemd3 binary directly, I get this output:
# Looking for node-locked license in [/opt/acellera/license.dat]
# Looking for node-locked license in [/opt/acellera/.acellera/license.dat]
# Looking for node-locked license in [/opt/acellera/.htmd/license.dat]
# Looking for node-locked license in [/root/license.dat]
# Looking for node-locked license in [/root/.acellera/license.dat]
# Looking for node-locked license in [/root/.htmd/license.dat]
#
# ACEMD is running with a basic licence!
# Contact Acellera (info@acellera.com) for licencing.
#
# WARNING: ACEMD is limited to run on the GPU device 0 only!

Anyone has any suggestions? This is the first time I'm trying to crunch tasks for GPUGrid, so I'm not sure if the problem is with these new WUs or if it's my setup.

All your failed work units have failed on all other Hosts who tried the same work unit.
You have been unlucky in receiving a bad batch.
Keep trying, you may receive a valid Work Unit


At this point it seems to me that there is a bug in the app itself. Are there any Linux/NVIDIA hosts which manage to run these ADRIA tasks successfully?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1095
Credit: 3,258,384,910
RAC: 181,881
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56407 - Posted: 11 Feb 2021 | 9:34:18 UTC - in response to Message 56406.

At this point it seems to me that there is a bug in the app itself. Are there any Linux/NVIDIA hosts which manage to run these ADRIA tasks successfully?

My host 132158 has two in progress at the moment.

Linux Mint 20.1 (Ubuntu derivative), dual GTX 1660 SUPER, driver 460.32

Approaching 40% after 11 hours. Look to be running normally so far, but I'll keep an eye on them.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 324
Credit: 3,365,990,576
RAC: 4,059,647
Level
Arg
Scientific publications
wat
Message 56410 - Posted: 11 Feb 2021 | 11:57:12 UTC - in response to Message 56406.

All WUs that I receive now are failing, I don't see any errors but I don't really know where to look. I'm running Debian 10 on a machine with Optimus, so my NVIDIA GPU has device id 1 (at id 0 I have an intel graphics card). My NVIDIA driver is version 460.39

When I run the acemd3 binary directly, I get this output:
# Looking for node-locked license in [/opt/acellera/license.dat]
# Looking for node-locked license in [/opt/acellera/.acellera/license.dat]
# Looking for node-locked license in [/opt/acellera/.htmd/license.dat]
# Looking for node-locked license in [/root/license.dat]
# Looking for node-locked license in [/root/.acellera/license.dat]
# Looking for node-locked license in [/root/.htmd/license.dat]
#
# ACEMD is running with a basic licence!
# Contact Acellera (info@acellera.com) for licencing.
#
# WARNING: ACEMD is limited to run on the GPU device 0 only!

Anyone has any suggestions? This is the first time I'm trying to crunch tasks for GPUGrid, so I'm not sure if the problem is with these new WUs or if it's my setup.

All your failed work units have failed on all other Hosts who tried the same work unit.
You have been unlucky in receiving a bad batch.
Keep trying, you may receive a valid Work Unit


At this point it seems to me that there is a bug in the app itself. Are there any Linux/NVIDIA hosts which manage to run these ADRIA tasks successfully?


Check my host or many other linux hosts on the leaderboard. I’ve submitted many valid results. There’s no bug in the app

____________

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2265
Credit: 15,986,076,810
RAC: 42,384
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56411 - Posted: 11 Feb 2021 | 12:02:39 UTC - in response to Message 56410.

All WUs that I receive now are failing, I don't see any errors but I don't really know where to look. I'm running Debian 10 on a machine with Optimus, so my NVIDIA GPU has device id 1 (at id 0 I have an intel graphics card). My NVIDIA driver is version 460.39

When I run the acemd3 binary directly, I get this output:
# Looking for node-locked license in [/opt/acellera/license.dat]
# Looking for node-locked license in [/opt/acellera/.acellera/license.dat]
# Looking for node-locked license in [/opt/acellera/.htmd/license.dat]
# Looking for node-locked license in [/root/license.dat]
# Looking for node-locked license in [/root/.acellera/license.dat]
# Looking for node-locked license in [/root/.htmd/license.dat]
#
# ACEMD is running with a basic licence!
# Contact Acellera (info@acellera.com) for licencing.
#
# WARNING: ACEMD is limited to run on the GPU device 0 only!

Anyone has any suggestions? This is the first time I'm trying to crunch tasks for GPUGrid, so I'm not sure if the problem is with these new WUs or if it's my setup.

All your failed work units have failed on all other Hosts who tried the same work unit.
You have been unlucky in receiving a bad batch.
Keep trying, you may receive a valid Work Unit


At this point it seems to me that there is a bug in the app itself. Are there any Linux/NVIDIA hosts which manage to run these ADRIA tasks successfully?


Check my host or many other linux hosts on the leaderboard. I’ve submitted many valid results. There’s no bug in the app

Perhaps there's no bug in the app now, but there was when the programagor originally posted about it. This bug was that the app had a "basic" licence, thus it could run only on GPU device 0, while programagor has his/her NVidia GPU on device ID 1.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 324
Credit: 3,365,990,576
RAC: 4,059,647
Level
Arg
Scientific publications
wat
Message 56413 - Posted: 11 Feb 2021 | 12:14:20 UTC - in response to Message 56411.
Last modified: 11 Feb 2021 | 12:24:55 UTC

you missed that he was trying to run the executable directly (outside of BOINC), which is likely why he received that error message, it’s running in basic license “mode” because it can’t find the license files listed above. He probably didn’t move them correctly to be able to run the tasks in the same way that BOINC does. All of my machines have devices on dev 1+, and even one in the same situation with an unusable card at dev0 (which has been excluded) and only runs on the card on dev1
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1095
Credit: 3,258,384,910
RAC: 181,881
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56414 - Posted: 11 Feb 2021 | 12:23:47 UTC - in response to Message 56411.

# WARNING: ACEMD is limited to run on the GPU device 0 only!

Perhaps there's no bug in the app now, but there was when the programagor originally posted about it. This bug was that the app had a "basic" licence, thus it could run only on GPU device 0, while programagor has his/her NVidia GPU on device ID 1.

The warning was shown "When I run the acemd3 binary directly". Most BOINC GPU apps will have a default setting to use device 0 when no BOINC control file is available to specify something different. I'm seeing tasks running happily on device 1, when that is specified in init_data.xml

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2265
Credit: 15,986,076,810
RAC: 42,384
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56415 - Posted: 11 Feb 2021 | 12:28:40 UTC - in response to Message 56413.

you missed that he was trying to run the executable directly (outside of BOINC), which is likely why he received that error message. All of my machines have devices on dev 1+, and even one in the same situation with an unusable card at dev0 (which has been excluded) and only runs on the card on dev1

You're right I really missed that, but then this is the reason for that strange licensing error.
It is not a good idea to run the GPUGrid app directly, as it needs a wrapper. Perhaps the wrapper contains the appropiate license, or it tells the app where to look for it. We don't know how, so we can't use this method to debug this error.
Perhaps he installed the BOINC manager as a service?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1095
Credit: 3,258,384,910
RAC: 181,881
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56416 - Posted: 11 Feb 2021 | 12:39:05 UTC - in response to Message 56415.
Last modified: 11 Feb 2021 | 12:46:06 UTC

Perhaps he installed the BOINC manager as a service?

The Manager always runs in user space, but the client can run as a service.

My Linux machines do run as a service, without GPU problems. Windows machines can't run GPU apps on a service install, because of Microsoft driver security protocols.

Edit: programagor's Debian install on host 576641 looks OK from the outside. I'd suspect a driver problem - something like using a nouveau driver without the extra CUDA (computation) libraries provided through a manufacturer driver install.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2265
Credit: 15,986,076,810
RAC: 42,384
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56417 - Posted: 11 Feb 2021 | 13:50:31 UTC - in response to Message 56416.
Last modified: 11 Feb 2021 | 13:53:20 UTC

Edit: programagor's Debian install on host 576641 looks OK from the outside. I'd suspect a driver problem - something like using a nouveau driver without the extra CUDA (computation) libraries provided through a manufacturer driver install.
I've installed a fresh Ubuntu 18.04 two days ago, and it has downloaded the 460.32 driver on its own, which works with FAH and GPUGrid also. I've upgraded it to 20.04 today.
EDIT: the 460.39 driver on his host should be from ppa:graphics-drivers/ppa. (It works on my other host)

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1095
Credit: 3,258,384,910
RAC: 181,881
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56419 - Posted: 11 Feb 2021 | 14:38:46 UTC - in response to Message 56417.

The new Linux Mint (v20.1) offers me a driver manager:



It was defaulted to the open-source driver, but for computation, I think the proprietary driver is better.

programagor
Send message
Joined: 3 Feb 21
Posts: 5
Credit: 1,046,250
RAC: 695
Level
Ala
Scientific publications
wat
Message 56432 - Posted: 11 Feb 2021 | 20:23:20 UTC
Last modified: 11 Feb 2021 | 20:31:34 UTC

When I ran the binary directly, I tried supplying the `--device 1` parameter, but to no avail; due to the basic license the binary always uses device id 0. Also, I don't see the `license.dat[.*]` anywhere on my system. And my drivers are straight from nvidia, no nouveau. I can compile and run CUDA programs/kernels without any issue. For completeness sake, I reinstalled my drivers, but the issue persists.

EDIT: I also looked inside the wrapper, and there is no string `license.dat`, which leads me to believe that the license file is missing, preventing me from running on GPU id 1

EDIT 2: I just noticed that the wrapper is launching the acemd with device id 0:

wrapper: running acemd3 (--boinc input --device 0)

clych
Send message
Joined: 13 Nov 19
Posts: 5
Credit: 8,496,529
RAC: 1
Level
Ser
Scientific publications
wat
Message 56433 - Posted: 11 Feb 2021 | 20:28:24 UTC - in response to Message 56403.

7.333% after 24 hours.
It is a pitty, GPUGRID is the only projects that is not cause my computer to lag.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 324
Credit: 3,365,990,576
RAC: 4,059,647
Level
Arg
Scientific publications
wat
Message 56435 - Posted: 11 Feb 2021 | 21:17:23 UTC - in response to Message 56432.

When I ran the binary directly, I tried supplying the `--device 1` parameter, but to no avail; due to the basic license the binary always uses device id 0. Also, I don't see the `license.dat[.*]` anywhere on my system. And my drivers are straight from nvidia, no nouveau. I can compile and run CUDA programs/kernels without any issue. For completeness sake, I reinstalled my drivers, but the issue persists.

EDIT: I also looked inside the wrapper, and there is no string `license.dat`, which leads me to believe that the license file is missing, preventing me from running on GPU id 1

EDIT 2: I just noticed that the wrapper is launching the acemd with device id 0:
wrapper: running acemd3 (--boinc input --device 0)


look in your BOINC event log at startup. Is your nvidia GPU device id 0? when you see "device [id]" in boinc, it's always the BOINC order, not the system order, which can vary due to the way BOINC decides what is the best device.

____________

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 200
Credit: 337,048,583
RAC: 83,503
Level
Asp
Scientific publications
watwat
Message 56438 - Posted: 11 Feb 2021 | 21:48:02 UTC - in response to Message 56433.

7.333% after 24 hours.
It is a pitty, GPUGRID is the only projects that is not cause my computer to lag.


These WUs are too large for many GPUs to complete in the 5 day (120 hour) window before they expire.

It would be best to abort them on hosts which cannot meet the deadline as running them would be time and electricity wasted. Same goes for having a spare WU waiting in your cue if your GPU takes 60 or more hours to complete one. The spare will expire before completion, yielding no credit.

I recommend a longer period of time before these "extra-long runs" expire. I think it will get them back quicker in the long run.

programagor
Send message
Joined: 3 Feb 21
Posts: 5
Credit: 1,046,250
RAC: 695
Level
Ala
Scientific publications
wat
Message 56439 - Posted: 11 Feb 2021 | 22:00:30 UTC - in response to Message 56435.

look in your BOINC event log at startup. Is your nvidia GPU device id 0? when you see "device [id]" in boinc, it's always the BOINC order, not the system order, which can vary due to the way BOINC decides what is the best device.


Right, my apologies, boinc has my GPU at id 0
CUDA: NVIDIA GPU 0: GeForce GTX 1060 (driver version 460.39, CUDA version 11.2, compute capability 6.1, 4096MB, 3974MB available, 4276 GFLOPS peak)
OpenCL: NVIDIA GPU 0: GeForce GTX 1060 (driver version 460.39, device version OpenCL 1.2 CUDA, 6078MB, 3974MB available, 4276 GFLOPS peak)

So licensing is likely not the culprit in my case.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 289
Credit: 1,152,511,238
RAC: 10,830
Level
Met
Scientific publications
watwatwatwatwat
Message 56443 - Posted: 11 Feb 2021 | 23:24:22 UTC

The app has not changed.
https://www.gpugrid.net/apps.php

2x GPUs are running on one of my PCs w/o issue.

clych
Send message
Joined: 13 Nov 19
Posts: 5
Credit: 8,496,529
RAC: 1
Level
Ser
Scientific publications
wat
Message 56461 - Posted: 12 Feb 2021 | 18:25:42 UTC - in response to Message 56438.

I have aborted this WU, new one much more faster.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 324
Credit: 3,365,990,576
RAC: 4,059,647
Level
Arg
Scientific publications
wat
Message 56462 - Posted: 12 Feb 2021 | 18:36:27 UTC - in response to Message 56461.

I have aborted this WU, new one much more faster.


wait until it runs for a few hours. you will see that the initial percentage increase is not accurate, it is only an estimation from BOINC until it hits a real checkpoint. you'll see the % increase fast until it hits the checkpoint, then it will reset to 0.333 or 0.666% and will go very slow from that point.
____________

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 200
Credit: 337,048,583
RAC: 83,503
Level
Asp
Scientific publications
watwat
Message 56465 - Posted: 12 Feb 2021 | 21:08:01 UTC - in response to Message 56462.

I have aborted this WU, new one much more faster.


wait until it runs for a few hours. you will see that the initial percentage increase is not accurate, it is only an estimation from BOINC until it hits a real checkpoint. you'll see the % increase fast until it hits the checkpoint, then it will reset to 0.333 or 0.666% and will go very slow from that point.


And... (sorry to butt in) Once you get to a checkpoint, highlight the task in the task window and click on the properties button. Check the progress rate near the bottom of the list. If it is less than 0.9% per hour the GPU is too slow to make the 120 hour window, even crunching 24/7. Best to send it on to someone else before it expires.


Clive
Send message
Joined: 2 Jul 19
Posts: 21
Credit: 57,316,455
RAC: 1,519
Level
Thr
Scientific publications
wat
Message 56466 - Posted: 12 Feb 2021 | 21:17:14 UTC

Toni:

I have removed my laptop from crunching for GPUGRID. The laptop has a GTX 660M GPU which is inadequate for these large files. In my desktop there is a GTX 1060 which seems to have enough muscle to crunch these large files.

I hope all this crunching will benefit humanity in some way.

Clive

goldfinch
Send message
Joined: 5 May 19
Posts: 9
Credit: 74,684,299
RAC: 13,068
Level
Thr
Scientific publications
wat
Message 56467 - Posted: 12 Feb 2021 | 21:37:40 UTC - in response to Message 56466.

Toni:

I have removed my laptop from crunching for GPUGRID. The laptop has a GTX 660M GPU which is inadequate for these large files. In my desktop there is a GTX 1060 which seems to have enough muscle to crunch these large files.

I hope all this crunching will benefit humanity in some way.

Clive

I have a discrete GTX 1060 on my laptop, but after 2 days of crunching a single task BOINC is still showing estimates of 8 more days... I'm crunching 24x7, and GPU's temperature is >90C, so the card has to be throttled. Are all new tasks that big? If that's the case, not only will i not be able to finish tasks in 24 hours to get some bonus, but also i won't be able to complete them in the allocated timeframe.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 688
Credit: 904,899,505
RAC: 377,648
Level
Glu
Scientific publications
watwatwatwatwat
Message 56468 - Posted: 12 Feb 2021 | 23:40:12 UTC - in response to Message 56467.

These are the largest (longest) tasks in the history of the project I believe.

Previous longest was around 12 hours back in the acemd2 (long-runs) application days.

If these are to become the nominal type of task in the future, they really need to increase the deadlines.

Or restrict them to adequate hardware like discrete GTX 1060 or better.

The estimated task GFLOPS seems to be roughly correct at 5,000,000 value.

goldfinch
Send message
Joined: 5 May 19
Posts: 9
Credit: 74,684,299
RAC: 13,068
Level
Thr
Scientific publications
wat
Message 56470 - Posted: 13 Feb 2021 | 4:09:19 UTC - in response to Message 56468.

I agree that so long-running tasks should have their deadlines increased. Otherwise, we gradually go back to super-computers that no one can afford. And the purpose of crowd-computing is that many can participate.
As for limiting tasks to certain GPUs, that's not quite adequate. As I said, my GTX1060 isn't capable of handling those tasks, so it's not only the card that is important, but where it's installed, and what type of cooling is used. Unfortunately, my laptop isn't great at cooling, so both CPU and GPU heat up to 90-93 C. Putting the laptop in a fridge is not an option... And taking all parameters into account, such as cooling, power supply, throttling, even manufacture! - isn't feasible.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 324
Credit: 3,365,990,576
RAC: 4,059,647
Level
Arg
Scientific publications
wat
Message 56471 - Posted: 13 Feb 2021 | 4:29:58 UTC - in response to Message 56470.

a normal full 1060 should be capable to process these tasks in 5 days. must be because it's a laptop version.
____________

RockLr
Send message
Joined: 14 Mar 20
Posts: 5
Credit: 9,171,845
RAC: 94
Level
Ser
Scientific publications
wat
Message 56472 - Posted: 13 Feb 2021 | 6:00:02 UTC - in response to Message 56470.

My 1050ti in my laptop can finish a task in 66 hours. Must be something wrong.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 775
Credit: 1,557,993,086
RAC: 216
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56473 - Posted: 13 Feb 2021 | 13:13:27 UTC

My GTX 1060 finished one work unit in 38 hours.
http://www.gpugrid.net/results.php?hostid=512821

The GTX 1070 took 26 hours.
http://www.gpugrid.net/results.php?hostid=524425

But another GTX 1070 failed twice.
http://www.gpugrid.net/results.php?hostid=528983
The first time was due to a reboot, and then the next one failed immediately thereafter.

They are all on Ubuntu 18.04/20.04.
I think they are better used on Folding.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2265
Credit: 15,986,076,810
RAC: 42,384
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56474 - Posted: 13 Feb 2021 | 15:11:47 UTC - in response to Message 56473.

But another GTX 1070 failed twice.
http://www.gpugrid.net/results.php?hostid=528983
The first time was due to a reboot, and then the next one failed immediately thereafter.

They are all on Ubuntu 18.04/20.04.
The error message:
ACEMD failed: Error initializing CUDA: CUDA_ERROR_NO_DEVICE (100) at /opt/conda/conda-bld/openmm_1589507810497/work/platforms/cuda/src/CudaContext.cpp:148
looks to me like it was due to a driver update, or some other intervention made your CUDA device inaccessible.

RJ The Bike Guy
Send message
Joined: 2 Apr 20
Posts: 16
Credit: 21,672,595
RAC: 64,924
Level
Pro
Scientific publications
wat
Message 56475 - Posted: 13 Feb 2021 | 15:55:03 UTC

I aborted the jobs for my GT 730 as there was no way they were going to finish them in time. And I suspended that computer from taking work. My GTX 1660 seems to take about a day and a half to finish the jobs.

Maybe make the WUs smaller or increase the deadlines? My GT 730 previously did quite a bit of work. I currently have that computer doing nothing but boinc stuff.

bormolino
Send message
Joined: 16 May 13
Posts: 34
Credit: 49,390,364
RAC: 96
Level
Val
Scientific publications
watwatwatwatwatwat
Message 56476 - Posted: 13 Feb 2021 | 16:05:20 UTC - in response to Message 56475.


Maybe make the WUs smaller or increase the deadlines? My GT 730 previously did quite a bit of work. I currently have that computer doing nothing but boinc stuff.


Same here with a small GT 710. But the PC always runs 24/7 an did a few WUs per week.

It would be a pity if I could not continue to use this card :/

RJ The Bike Guy
Send message
Joined: 2 Apr 20
Posts: 16
Credit: 21,672,595
RAC: 64,924
Level
Pro
Scientific publications
wat
Message 56477 - Posted: 13 Feb 2021 | 17:06:07 UTC - in response to Message 56476.


Same here with a small GT 710. But the PC always runs 24/7 an did a few WUs per week.

It would be a pity if I could not continue to use this card :/


I think my GT 730 took about 22hours per WU. I think the current ones would have taken about 10 days.

BlueGhost
Send message
Joined: 12 Oct 16
Posts: 1
Credit: 165,980,616
RAC: 428
Level
Ile
Scientific publications
watwatwatwat
Message 56479 - Posted: 13 Feb 2021 | 17:44:33 UTC

Work units got stuck at 14% and were canceled after 10 hours of running overnight

workunit 27023967
e2s547_e1s101p0f147-ADRIA_D3RBandit_batch1-0-1-RND7525_0

workunit 27023977
e2s557_e1s68p0f131-ADRIA_D3RBandit_batch1-0-1-RND8450_0

Log

Stderr Output

<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
aborted by user</message>
<stderr_txt>
21:59:52 (17660): wrapper (7.9.26016): starting
21:59:52 (17660): wrapper: running acemd3.exe (--boinc input --device 0)
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {1599} normal block at 0x000001FE55531F80, 8 bytes long.
Data: < NU > 00 00 4E 55 FE 01 00 00
..\lib\diagnostics_win.cpp(417) : {202} normal block at 0x000001FE55530DC0, 1080 bytes long.
Data: < > EC 0E 00 00 CD CD CD CD 0C 01 00 00 00 00 00 00
Object dump complete.
23:49:15 (10692): wrapper (7.9.26016): starting
23:49:15 (10692): wrapper: running acemd3.exe (--boinc input --device 0)
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {1599} normal block at 0x0000027A5CB916D0, 8 bytes long.
Data: < \z > 00 00 B4 5C 7A 02 00 00
..\lib\diagnostics_win.cpp(417) : {202} normal block at 0x0000027A5CB900B0, 1080 bytes long.
Data: <DJ > 44 4A 00 00 CD CD CD CD B0 00 00 00 00 00 00 00
Object dump complete.
20:24:59 (4020): wrapper (7.9.26016): starting
20:24:59 (4020): wrapper: running acemd3.exe (--boinc input --device 0)
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {1599} normal block at 0x000001525997F2E0, 8 bytes long.
Data: < 1[R > 00 00 31 5B 52 01 00 00
..\lib\diagnostics_win.cpp(417) : {202} normal block at 0x000001525997FE90, 1080 bytes long.
Data: < 1 X > 04 31 00 00 CD CD CD CD 58 00 00 00 00 00 00 00
Object dump complete.
23:10:11 (3712): wrapper (7.9.26016): starting
23:10:11 (3712): wrapper: running acemd3.exe (--boinc input --device 0)
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {1599} normal block at 0x000001CA91D1FCA0, 8 bytes long.
Data: < > 00 00 CF 91 CA 01 00 00
..\lib\diagnostics_win.cpp(417) : {202} normal block at 0x000001CA91D204D0, 1080 bytes long.
Data: < > D0 1C 00 00 CD CD CD CD 0C 01 00 00 00 00 00 00
Object dump complete.

____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 324
Credit: 3,365,990,576
RAC: 4,059,647
Level
Arg
Scientific publications
wat
Message 56480 - Posted: 13 Feb 2021 | 17:49:13 UTC - in response to Message 56479.

I doubt it was stuck. These tasks take a LONG time to run.

The “memory leaks” messages seem to be common for the Windows app.

Just let it go next time. Will probably take ~48hrs to run on a 1060.
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1095
Credit: 3,258,384,910
RAC: 181,881
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56481 - Posted: 13 Feb 2021 | 18:04:40 UTC

Took me 245,275 seconds on a GTX 1050 Ti - that's a smidge under 3 days. Since these tasks update their progress 150 times over a full run, that's 1,635 seconds between updates - almost half an hour.

And I got the same memory leak warning, even though the task was successful and validated.

Don't despair if nothing seems to be happening - have a cup of coffee and come back later. It's probably still alive.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 775
Credit: 1,557,993,086
RAC: 216
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56482 - Posted: 13 Feb 2021 | 21:04:40 UTC - in response to Message 56474.
Last modified: 13 Feb 2021 | 21:06:00 UTC

They are all on Ubuntu 18.04/20.04.
The error message:
ACEMD failed: Error initializing CUDA: CUDA_ERROR_NO_DEVICE (100) at /opt/conda/conda-bld/openmm_1589507810497/work/platforms/cuda/src/CudaContext.cpp:148
looks to me like it was due to a driver update, or some other intervention made your CUDA device inaccessible.


That is probably it. I think I was updating to the 460 driver (CUDA 11.2), but I don't know why the second one failed right away. Maybe the driver was not initialized yet after the reboot?

goldfinch
Send message
Joined: 5 May 19
Posts: 9
Credit: 74,684,299
RAC: 13,068
Level
Thr
Scientific publications
wat
Message 56483 - Posted: 13 Feb 2021 | 21:06:34 UTC - in response to Message 56481.

Is there a way to configure BOINC to start downloading a second task only when the first one is finishing? I got 2 tasks, each 5M GFLOPS, with the same deadline. Obviously, the second wouldn't be computed on time, so I had to abort it (it was in READY status, so hopefully it will come to someone else). Or, if that's not possible, can I limit BOINC to 1 task only? Less preferrable, as time will be spent on comms and downloads, but better than expiring tasks someone else could've processed...

Jim1348
Send message
Joined: 28 Jul 12
Posts: 775
Credit: 1,557,993,086
RAC: 216
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56484 - Posted: 13 Feb 2021 | 21:16:30 UTC - in response to Message 56483.

Or, if that's not possible, can I limit BOINC to 1 task only? Less preferrable, as time will be spent on comms and downloads, but better than expiring tasks someone else could've processed...

I think the problem will be solved after BOINC processes the first one and gets better time estimates. It should not then download the second one.

But that begs the question of why don't they either give them better estimates to begin with, or limit them to one? They limited the earlier ones to two anyway.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2265
Credit: 15,986,076,810
RAC: 42,384
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56485 - Posted: 13 Feb 2021 | 21:43:02 UTC - in response to Message 56483.

Is there a way to configure BOINC to start downloading a second task only when the first one is finishing?
You should set your work cache to 1 day, or less. The smallest value is 0.01 days (14 minutes and 24 seconds), this is the unit for this setting.
However it will take a couple of workunits to make the processing time estimate accurate.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1095
Credit: 3,258,384,910
RAC: 181,881
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56486 - Posted: 13 Feb 2021 | 22:09:50 UTC - in response to Message 56485.

You can set the work cache to exactly zero, if you like. In that case, BOINC will nominally initiate the request for work three minutes before the final task is estimated to finish. I say 'nominally', because BOINC only performs the 'work needed?' check once per minute, so the actual request may happen any time between three minutes and two minutes before the new task is needed.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 775
Credit: 1,557,993,086
RAC: 216
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56487 - Posted: 13 Feb 2021 | 22:12:46 UTC - in response to Message 56486.

You can set the work cache to exactly zero, if you like. In that case, BOINC will nominally initiate the request for work three minutes before the final task is estimated to finish.

Now that you mention it, won't setting the resource share to zero accomplish the same thing? That is how I often set it anyway.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1095
Credit: 3,258,384,910
RAC: 181,881
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56489 - Posted: 13 Feb 2021 | 22:34:26 UTC - in response to Message 56487.
Last modified: 13 Feb 2021 | 22:36:54 UTC

Yup, that's exactly the same. Setting it through BOINC Manager activates it immediately, going via resource share means a visit to the website and a project update.

Edit - using 'work cache 0' affects all attached projects, using 'resource share 0' only affects that one project.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 492
Credit: 559,634,377
RAC: 19,638
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56491 - Posted: 13 Feb 2021 | 23:30:14 UTC - in response to Message 56484.

Or, if that's not possible, can I limit BOINC to 1 task only? Less preferrable, as time will be spent on comms and downloads, but better than expiring tasks someone else could've processed...

I think the problem will be solved after BOINC processes the first one and gets better time estimates. It should not then download the second one.

But that begs the question of why don't they either give them better estimates to begin with, or limit them to one? They limited the earlier ones to two anyway.

One reason is that always having a second task downloaded while the first one is running avoids wasting GPU time by having no task running while it is downloading the second one.

I wouldn't mind if it waited for this download until the first task was approaching its finish, though. But note that the estimated time to completion is not a reliable way of deciding when it is about to finish until at least a few tasks have completed successfully.

jjch
Send message
Joined: 10 Nov 13
Posts: 43
Credit: 14,185,011,786
RAC: 672,697
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56494 - Posted: 14 Feb 2021 | 3:35:57 UTC - in response to Message 56491.

Wouldn't you be able to set the max_concurrent variable in the GPUGRID app_config file to 1?

Such as this:

<app_config>

<app>
<name>acemd3</name>
<max_concurrent>1</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>

</app_config>

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 324
Credit: 3,365,990,576
RAC: 4,059,647
Level
Arg
Scientific publications
wat
Message 56495 - Posted: 14 Feb 2021 | 5:07:21 UTC - in response to Message 56494.

That only affects how many jobs run concurrently. Not how many get downloaded.
____________

clych
Send message
Joined: 13 Nov 19
Posts: 5
Credit: 8,496,529
RAC: 1
Level
Ser
Scientific publications
wat
Message 56499 - Posted: 14 Feb 2021 | 21:57:38 UTC - in response to Message 56403.
Last modified: 14 Feb 2021 | 22:01:15 UTC

Abort for every task, that shows 0.333%, 1,333% and so on.
Every time i look at the app, there is a bad WU, i am abort them and 1:00 minute later next U show me normal work.

So... Every second WU working normal.


By the way, here is qute from log of the aborted task

<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
aborted by user</message>
<stderr_txt>
20:41:32 (19816): wrapper (7.9.26016): starting
20:41:32 (19816): wrapper: running acemd3.exe (--boinc input --device 0)
02:44:06 (1896): wrapper (7.9.26016): starting
02:44:06 (1896): wrapper: running acemd3.exe (--boinc input --device 0)
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {1546} normal block at 0x000001C43E1D27E0, 8 bytes long.
Data: < > > 00 00 17 3E C4 01 00 00
..\lib\diagnostics_win.cpp(417) : {205} normal block at 0x000001C43E1D68C0, 1080 bytes long.
Data: <0 > 30 0B 00 00 CD CD CD CD C4 01 00 00 00 00 00 00
Object dump complete.
17:20:35 (8916): wrapper (7.9.26016): starting
17:20:35 (8916): wrapper: running acemd3.exe (--boinc input --device 0)
17:30:43 (12568): wrapper (7.9.26016): starting
17:30:43 (12568): wrapper: running acemd3.exe (--boinc input --device 0)
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {1546} normal block at 0x0000029AAAFE2EC0, 8 bytes long.
Data: < > 00 00 F9 AA 9A 02 00 00
..\lib\diagnostics_win.cpp(417) : {205} normal block at 0x0000029AAAFE68C0, 1080 bytes long.
Data: < 3 > FC 33 00 00 CD CD CD CD C8 01 00 00 00 00 00 00
Object dump complete.

</stderr_txt>
]]>

goldfinch
Send message
Joined: 5 May 19
Posts: 9
Credit: 74,684,299
RAC: 13,068
Level
Thr
Scientific publications
wat
Message 56500 - Posted: 14 Feb 2021 | 22:08:01 UTC - in response to Message 56491.
Last modified: 14 Feb 2021 | 22:08:48 UTC

I wouldn't mind if it waited for this download until the first task was approaching its finish, though. But note that the estimated time to completion is not a reliable way of deciding when it is about to finish until at least a few tasks have completed successfully.

Agree. If a second task is downloaded at, say, 80% of the first task, or even 90%, it should work well. E.g., short-term tasks are small, so it won't take long to download them before the first finishes. For large tasks, 10% may still take some 15-30 min at least, so also should be more than enough to complete download. And with the recent large tasks, 10% is a few hours, so definitely, BOINC will make it in time before completing the first task.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2699
Credit: 1,309,814,736
RAC: 15,864
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56501 - Posted: 14 Feb 2021 | 22:39:17 UTC - in response to Message 56461.

clych wrote:
I have aborted this WU, new one much more faster.

Sorry, but what you're doing does not make any sense. By now you have 9 aborted WUs. No matter how often you abort, these new monster WUs are just too big for your quite old card to return in time.

clych wrote:
Abort for every task, that shows 0.333%, 1,333% and so on.
Every time i look at the app, there is a bad WU, i am abort them and 1:00 minute later next U show me normal work.

So... Every second WU working normal.

No, this is wrong. See The answer from Ian&Steve: link

BTW: my GTX1070 takes 94k - 95k per WU in Win 10, in line with the 26h reported before under Linux.

MrS
____________
Scanning for our furry friends since Jan 2002

programagor
Send message
Joined: 3 Feb 21
Posts: 5
Credit: 1,046,250
RAC: 695
Level
Ala
Scientific publications
wat
Message 56502 - Posted: 15 Feb 2021 | 0:09:00 UTC

I now solved my issue with all WUs failing with CUDA error 999.
The error I was receiving was the following:

ACEMD failed:
Error initializing CUDA: CUDA_ERROR_UNKNOWN (999) at /opt/conda/conda-bld/openmm_1589507810497/work/platforms/cuda/src/CudaContext.cpp:148

After looking around, I found this thread, where the error 999 was encountered after entering the sleep mode. Simple reboot fixed my issue. 🤦

lukeu
Send message
Joined: 14 Oct 11
Posts: 29
Credit: 71,042,677
RAC: 424
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56503 - Posted: 15 Feb 2021 | 8:29:22 UTC

I crunch only during office hours*. It looks like my GTX 1060 will just make the 5 day limit on these biggies, but only if I hand massage it each week. :( If I start one each Monday then the weekend won't get in the way.

(* more like 10-12 hours, computers are set to preheat my little office before I arrive.)

clych
Send message
Joined: 13 Nov 19
Posts: 5
Credit: 8,496,529
RAC: 1
Level
Ser
Scientific publications
wat
Message 56506 - Posted: 15 Feb 2021 | 10:28:08 UTC - in response to Message 56501.

No, normal WU shows me a % after 1 minute after star.
It`s took them aabout 5 minutes for 1%.
Big one would show 0,333 after a hour.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2265
Credit: 15,986,076,810
RAC: 42,384
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56511 - Posted: 15 Feb 2021 | 12:46:56 UTC - in response to Message 56506.

No, normal WU shows me a % after 1 minute after start.
The present workunits are exceptionally long.
It`s took them about 5 minutes for 1%.
Big one would show 0,333 after a hour.
It takes 4 minutes for an RTX 2080Ti to reach 0,666% with these workunits.
Your GTX 650 is too slow for these workunits, you should set GPUGrid to "no new tasks" until this batch is over.

Post to thread

Message boards : News : Project restarted