More Acemd3 tests

Message boards : News : More Acemd3 tests
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Killersocke

Send message
Joined: 18 Oct 13
Posts: 53
Credit: 406,647,419
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52764 - Posted: 2 Oct 2019, 17:58:55 UTC


My GPU feels extremely neglected 😪
ID: 52764 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PappaLitto

Send message
Joined: 21 Mar 16
Posts: 513
Credit: 4,673,458,277
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 52765 - Posted: 2 Oct 2019, 19:23:39 UTC - in response to Message 52764.  

If you want something to crunch, folding@home always has work and is always looking for more volunteers
ID: 52765 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52767 - Posted: 2 Oct 2019, 21:47:43 UTC - in response to Message 52765.  

If you want something to crunch, folding@home always has work and is always looking for more volunteers

That is my standard line too. But at the moment, even they are having server problems in at least one of their locations, maybe two. No one seems to know exactly what is going on. Give them a week to figure it out.
ID: 52767 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 52769 - Posted: 3 Oct 2019, 7:22:06 UTC

I have decided to not restart or fiddle around with my machine. Let us see if it finishes successfully.
If I may say so, according to Afterburner my GPU is running four degrees centigrade hotter then the old WU or normal. As hot as Einstien@Home which is my backup when GPU Grid becomes lazy. Resources set at zero for Einstien.
ID: 52769 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 52770 - Posted: 3 Oct 2019, 7:58:54 UTC

Lost it. Power failure.
ID: 52770 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
biodoc

Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52772 - Posted: 4 Oct 2019, 11:19:38 UTC - in response to Message 52750.  

Dears, sorry for the slow progress but I determined (at least) a restart problem, and it is not related to the wrapper. It is Windows-only, CUDA 10 only, as far as I can tell from your reports, and manifests itself with the
"The periodic box size has decreased to less than twice the nonbonded cutoff."
message.

Unfortunately the root cause is hard to identify (may be external to our code).

I have compiled the wrapper myself (the binaries on the boinc page are old and had one important bug in variable substitution), but for now the failures seem unrelated.

It's a bit frustrating because everything else seems to work nicely.


If you are using openmm, line 375 of the CudaNonbondedUtilities.cpp source code is the following.

throw OpenMMException("The periodic box size has decreased to less than twice the nonbonded cutoff.");

https://github.com/openmm/openmm/blob/master/platforms/cuda/src/CudaNonbondedUtilities.cpp

Perhaps Peter Eastman can shed some light on this problem.

https://github.com/peastman
ID: 52772 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Killersocke

Send message
Joined: 18 Oct 13
Posts: 53
Credit: 406,647,419
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52773 - Posted: 4 Oct 2019, 12:29:26 UTC - in response to Message 52772.  
Last modified: 4 Oct 2019, 12:30:19 UTC

Hi,
there is not only CUDA 10.
for example:
http://www.gpugrid.net/result.php?resultid=21425993
http://www.gpugrid.net/result.php?resultid=21425617
http://www.gpugrid.net/result.php?resultid=21425545
#SWAN: FATAL: cannot find image for module [.nonbonded.cu.] for device version 750

Stderr Ausgabe
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -59 (0xffffffc5)</message>
<stderr_txt>
# GPU [GeForce RTX 2080 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce RTX 2080 Ti
# ECC : Disabled
# Global mem : 11264MB
# Capability : 7.5
# PCI ID : 0000:01:00.0
# Device clock : 1755MHz
# Memory clock : 7000MHz
# Memory width : 352bit
# Driver version : r436_45 : 43648
#SWAN: FATAL: cannot find image for module [.nonbonded.cu.] for device version 750

</stderr_txt>
]]>
ID: 52773 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rod4x4

Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 52774 - Posted: 4 Oct 2019, 14:17:44 UTC - in response to Message 52773.  

Hi,
there is not only CUDA 10.
for example:
http://www.gpugrid.net/result.php?resultid=21425993
http://www.gpugrid.net/result.php?resultid=21425617
http://www.gpugrid.net/result.php?resultid=21425545
#SWAN: FATAL: cannot find image for module [.nonbonded.cu.] for device version 750

Stderr Ausgabe
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -59 (0xffffffc5)</message>
<stderr_txt>
# GPU [GeForce RTX 2080 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce RTX 2080 Ti
# ECC : Disabled
# Global mem : 11264MB
# Capability : 7.5
# PCI ID : 0000:01:00.0
# Device clock : 1755MHz
# Memory clock : 7000MHz
# Memory width : 352bit
# Driver version : r436_45 : 43648
#SWAN: FATAL: cannot find image for module [.nonbonded.cu.] for device version 750

</stderr_txt>
]]>


Turing GPU cards are only able to do TEST Work Units at the moment.
You will need to change your GPUGRID settings to ensure only TEST Work Units are accepted for your Turing GPU.
The above errors occur for ACEMD2 Work units on Turing based cards.
ID: 52774 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 869
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52777 - Posted: 4 Oct 2019, 15:21:32 UTC - in response to Message 52774.  

Turing GPU cards are only able to do TEST Work Units at the moment.
You will need to change your GPUGRID settings to ensure only TEST Work Units are accepted for your Turing GPU.
The above errors occur for ACEMD2 Work units on Turing based cards.

How is it the other way round? Will the TEST Work Units work with cards prior Turing?
ID: 52777 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Diplomat

Send message
Joined: 1 Sep 10
Posts: 15
Credit: 888,018,989
RAC: 36,601
Level
Glu
Scientific publications
watwatwat
Message 52779 - Posted: 4 Oct 2019, 17:56:52 UTC - in response to Message 52742.  


7.2.42 is really old (but latest on berkeley download). very likely the client is estimating wrong in addition to mis-identifying the cpu. apt-get under ubuntu 18.04 got me version 7.16.1 boinc


how did you get 7.16? I have same version of ubuntu but in repos see available only 7.9.3
ID: 52779 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 614,515
Level
Tyr
Scientific publications
watwatwatwatwat
Message 52780 - Posted: 4 Oct 2019, 18:20:37 UTC - in response to Message 52779.  

He must have installed the ppa. The Ubuntu 18.04 distro only has BOINC 7.9.3.
Or he went to the source and compiled it on his own.
ID: 52780 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rod4x4

Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 52782 - Posted: 4 Oct 2019, 23:32:08 UTC - in response to Message 52777.  
Last modified: 4 Oct 2019, 23:50:31 UTC

Turing GPU cards are only able to do TEST Work Units at the moment.
You will need to change your GPUGRID settings to ensure only TEST Work Units are accepted for your Turing GPU.
The above errors occur for ACEMD2 Work units on Turing based cards.

How is it the other way round? Will the TEST Work Units work with cards prior Turing?

The TEST work units seem to be backward compatible.
My Pascal cards are receiving TEST work units and processing successfully. Interestingly, my Maxwell cards have not received a TEST work unit, but that could just be luck of the draw.
EDIT: The drivers on my Maxwell cards are quite old v388 - v391. This will explain why they are not receiving TEST work units. Nvidia driver version 418.39 or above is required for CUDA 10.1
ID: 52782 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52783 - Posted: 5 Oct 2019, 1:26:59 UTC - in response to Message 52779.  

ID: 52783 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [PUGLIA] kidkidkid3
Avatar

Send message
Joined: 23 Feb 11
Posts: 101
Credit: 1,589,743,957
RAC: 302,797
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52784 - Posted: 5 Oct 2019, 8:42:36 UTC - in response to Message 52783.  
Last modified: 5 Oct 2019, 8:43:43 UTC

Hi,
this is my last error on Windows after a suspend/resume action.

http://www.gpugrid.net/result.php?resultid=21426852

I hope it help.
K.
Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing.
(Martin Luther King)
ID: 52784 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52785 - Posted: 5 Oct 2019, 11:13:50 UTC - in response to Message 52783.  

how did you get 7.16?

https://launchpad.net/~costamagnagianfranco/+archive/ubuntu/boinc

Be careful with that one. It has not yet passed full release testing, and several serious bugs have been found already. Gianfranco is good at updating the PPA as bugs are eliminated, but doesn't increment the version number independently. I think the current PPA numbered 7.16.3 has all except one of the fixes needed: there will probably be at least a 7.16.4 before this saga is finished.
ID: 52785 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52786 - Posted: 5 Oct 2019, 11:25:40 UTC - in response to Message 52785.  
Last modified: 5 Oct 2019, 11:35:02 UTC

Thank you for the expert advice. So far 7.16.3 has worked on three other Ubuntu machines and one Win7. I did get a lot of extra downloads on WCG that I had never seen before, but expect that is a problem at their end(?). That is mainly because I have seen their "settings" reset every few months, and don't entirely trust their servers on that. Also, I just did a manual update of BOINC with no more extraneous work units downloaded, so it seems to be OK now.
ID: 52786 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
w1hue

Send message
Joined: 28 Sep 09
Posts: 21
Credit: 471,394,734
RAC: 47,512
Level
Gln
Scientific publications
watwatwatwatwatwatwat
Message 52793 - Posted: 6 Oct 2019, 23:01:48 UTC

When will there be some WUs that will run without erroring out on Windows machines? And not suck up an entire CPU in addition to the GPU?

ID: 52793 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [PUGLIA] kidkidkid3
Avatar

Send message
Joined: 23 Feb 11
Posts: 101
Credit: 1,589,743,957
RAC: 302,797
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52794 - Posted: 7 Oct 2019, 10:40:59 UTC - in response to Message 52784.  

Hi,
this is my last error on Windows after a suspend/resume action.

http://www.gpugrid.net/result.php?resultid=21426852

I hope it help.
K.


Another error on Windows WU acemd3 test, without suspend/resume action.

http://www.gpugrid.net/result.php?resultid=21428934

K.
Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing.
(Martin Luther King)
ID: 52794 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 52796 - Posted: 7 Oct 2019, 14:48:41 UTC - in response to Message 52779.  
Last modified: 7 Oct 2019, 14:48:59 UTC


7.2.42 is really old (but latest on berkeley download). very likely the client is estimating wrong in addition to mis-identifying the cpu. apt-get under ubuntu 18.04 got me version 7.16.1 boinc


how did you get 7.16? I have same version of ubuntu but in repos see available only 7.9.3

https://boinc.berkeley.edu/download_all.php
ID: 52796 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 52797 - Posted: 7 Oct 2019, 14:49:55 UTC - in response to Message 52793.  

When will there be some WUs that will run without erroring out on Windows machines? And not suck up an entire CPU in addition to the GPU?

They will always need their own CPU.
ID: 52797 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : News : More Acemd3 tests

©2025 Universitat Pompeu Fabra