New CUDA65 beta app

Message boards : News : New CUDA65 beta app
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38148 - Posted: 29 Sep 2014, 9:49:51 UTC

Dear all, please give the new acemdbeta app, ver 845, a work out. This supports all GPUs now.
It's Windows only - if you don't get WUs, you'll need to update your driver.

Matt
ID: 38148 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38150 - Posted: 29 Sep 2014, 10:02:06 UTC
Last modified: 29 Sep 2014, 10:10:54 UTC

Matt, is the 343.98 Driver accepted? I've been trying to get Beta tasks. 14/09/29 06:12:36 | GPUGRID | No tasks are available for ACEMD beta version

I have correct configure-- /run testing app/Beta app checked, not accepting other short or long. I never update to WHQL drivers, from being limited for certain functional areas, unlike Betas or Developer Driver.
ID: 38150 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38152 - Posted: 29 Sep 2014, 10:23:10 UTC - in response to Message 38150.  
Last modified: 29 Sep 2014, 10:26:35 UTC

Huh, yes. You should be getting something...
According to the logs your host #159309 got given work at 12:15 CEST.
ID: 38152 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38153 - Posted: 29 Sep 2014, 10:49:17 UTC - in response to Message 38152.  

14/09/29 06:48:02 | GPUGRID | No tasks are available for ACEMD beta version

Strange, I see no Beta tasks running on Boinc Manager. I just tried again. If driver is accepted, I will continue to try. Thanks for the help.

14/09/29 06:50:55 | GPUGRID | No tasks are available for ACEMD beta version
ID: 38153 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38154 - Posted: 29 Sep 2014, 10:52:33 UTC - in response to Message 38153.  

*Now* you should get something..
ID: 38154 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38157 - Posted: 29 Sep 2014, 11:03:29 UTC - in response to Message 38156.  
Last modified: 29 Sep 2014, 11:33:21 UTC

I did, indeed.


Update: unknown error) - exit code -97 (0xffffff9f)after 8s

The simulation has become unstable. Terminating to avoid lock-up (1)(this first time I've had this during my time at GPUGRID. GPU1 Temp was 58C.
If you don't mind errors, I will try again.

Update2: same error. GPU usage go's to 90% for seconds, after GPU usage to 0% then crashes.
ID: 38157 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 47,738
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38158 - Posted: 29 Sep 2014, 11:14:37 UTC

On the first test unit, I got an error.

9/29/2014 7:13:41 AM | GPUGRID | Computation for task 21-MJHARVEY_TEST4000-0-10-RND0794_0 finished
9/29/2014 7:13:41 AM | GPUGRID | Output file 21-MJHARVEY_TEST4000-0-10-RND0794_0_1 for task 21-MJHARVEY_TEST4000-0-10-RND0794_0 absent
9/29/2014 7:13:41 AM | GPUGRID | Output file 21-MJHARVEY_TEST4000-0-10-RND0794_0_2 for task 21-MJHARVEY_TEST4000-0-10-RND0794_0 absent
9/29/2014 7:13:41 AM | GPUGRID | Output file 21-MJHARVEY_TEST4000-0-10-RND0794_0_3 for task 21-MJHARVEY_TEST4000-0-10-RND0794_0 absent



Name 21-MJHARVEY_TEST4000-0-10-RND0794_0
Workunit 10123268
Created 29 Sep 2014 | 9:50:11 UTC
Sent 29 Sep 2014 | 11:10:35 UTC
Received 29 Sep 2014 | 11:13:11 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -97 (0xffffffffffffff9f) Unknown error number
Computer ID 127986
Report deadline 4 Oct 2014 | 11:10:35 UTC
Run time 4.10
CPU time 3.48
Validate state Invalid
Credit 0.00
Application version ACEMD beta version v8.45 (cuda65)
Stderr output
<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -97 (0xffffff9f)
</message>
<stderr_txt>
# GPU [GeForce GTX 690] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 2 :
# Name : GeForce GTX 690
# ECC : Disabled
# Global mem : 2048MB
# Capability : 3.0
# PCI ID : 0000:07:00.0
# Device clock : 1019MHz
# Memory clock : 3004MHz
# Memory width : 256bit
# Driver version : r343_98 : 34411
# GPU 0 : 67C
# GPU 1 : 42C
# GPU 2 : 69C
# GPU 3 : 70C
# The simulation has become unstable. Terminating to avoid lock-up (1)

</stderr_txt>
]]>


ID: 38158 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38163 - Posted: 29 Sep 2014, 12:46:15 UTC
Last modified: 29 Sep 2014, 13:30:12 UTC

Update#3 I've received 5 Beta tasks- all have failed, but two caused a system hang ( no error files).

FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1965/ Simulation unstable. Flag 11 value 1
# The simulation has become unstable. Terminating to avoid lock-up
# The simulation has become unstable. Terminating to avoid lock-up (2)

Simulation unstable. Flag 11 value 1
# The simulation has become unstable. Terminating to avoid lock-up
# The simulation has become unstable. Terminating to avoid lock-up (2)

Update#4 Still failing on both cards with same error--

(unknown error) - exit code -97 (0xffffff9f)


[url] http://www.gpugrid.net/workunit.php?wuid=10099983 [/url]

This work unit has 3 Linux failures (all with GTX 780) and 2 Win8.1 failures.

Update#5 received 5 more beta for total of ten-- all failed with same error number. All Tasks have started fine (90+GPUusage/14%MCU) with progress .016 intervals, before failing. Wingman with Tesla K20c/GTX780 (c.c3.5, along C.C3.0 wingman, failed also.
ID: 38163 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38164 - Posted: 29 Sep 2014, 12:50:59 UTC - in response to Message 38163.  

Yes, looks like CUDA65 is bad on everything but GM204s. Ho hum.
ID: 38164 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38166 - Posted: 29 Sep 2014, 13:18:43 UTC
Last modified: 29 Sep 2014, 13:19:08 UTC

-97 error here, on my GTX 460
# Simulation unstable. Flag 11 value 1
# The simulation has become unstable. Terminating to avoid lock-up
# The simulation has become unstable. Terminating to avoid lock-up (2)

=========================

http://www.gpugrid.net/result.php?resultid=13149151
Name 43-MJHARVEY_TEST1999-1-10-RND5744_2
Workunit 10123176
Created 29 Sep 2014 | 11:40:28 UTC
Sent 29 Sep 2014 | 12:54:10 UTC
Received 29 Sep 2014 | 13:17:01 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -97 (0xffffffffffffff9f) Unknown error number
Computer ID 153764
Report deadline 4 Oct 2014 | 12:54:10 UTC
Run time 2.56
CPU time 0.00
Validate state Invalid
Credit 0.00
Application version ACEMD beta version v8.45 (cuda65)
Stderr output

<core_client_version>7.4.22</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -97 (0xffffff9f)
</message>
<stderr_txt>
# GPU [GeForce GTX 460] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 1 :
# Name : GeForce GTX 460
# ECC : Disabled
# Global mem : 1024MB
# Capability : 2.1
# PCI ID : 0000:07:00.0
# Device clock : 1526MHz
# Memory clock : 1900MHz
# Memory width : 256bit
# Driver version : r343_98 : 34411
# Simulation unstable. Flag 11 value 1
# The simulation has become unstable. Terminating to avoid lock-up
# The simulation has become unstable. Terminating to avoid lock-up (2)

</stderr_txt>
]]>
ID: 38166 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38169 - Posted: 29 Sep 2014, 13:30:10 UTC

Yikes, I'm seeing these same errors on the Short Run queue -- I guess the Cuda65 app has been deployed there too?
ID: 38169 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38171 - Posted: 29 Sep 2014, 13:49:11 UTC - in response to Message 38169.  

It was on acemdshort briefly. It is no longer.

Matt
ID: 38171 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38172 - Posted: 29 Sep 2014, 14:10:59 UTC

14/09/29 09:48:39 | GPUGRID | No tasks are available for ACEMD beta version

Has beta app been pulled for non-C.C 5.2 cards?
ID: 38172 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38173 - Posted: 29 Sep 2014, 14:39:10 UTC - in response to Message 38172.  


Has beta app been pulled for non-C.C 5.2 cards?


Yes, it's served its purpose there. The CUDA65 build is broken on non-5.2

Matt
ID: 38173 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38175 - Posted: 29 Sep 2014, 17:37:26 UTC

846 on acemdbeta now. CUDA65 for sm 3.0 and higher.

Matt
ID: 38175 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38176 - Posted: 29 Sep 2014, 18:20:38 UTC

My 2 GTX 660 Tis, and my GTX 460, in my main rig, are now successfully simultaneously crunching 3 ACEMD beta version 8.46 (cuda65) tasks.

Thank you!
ID: 38176 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38177 - Posted: 29 Sep 2014, 18:27:58 UTC - in response to Message 38175.  

So far, so good. .004% progress intervals-- 1.000% in four minutes. 24,000s est. time to complete.
ID: 38177 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38178 - Posted: 29 Sep 2014, 18:32:29 UTC

Matt:

I even think the canary behavior works better for me now. I tried the scenario where it was failing on the 8.41 app, and now it worked fine without failure on the 8.46 beta app.

Can you please explain, in detail, how the canary behavior was changed? How exactly does behave in 8.46?

Thanks,
Jacob
ID: 38178 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38179 - Posted: 29 Sep 2014, 18:46:49 UTC

Running fine after 25 minutes on a GTX 650 Ti. It will complete in 3 hours 16 minutes (344.11 driver, Win7 64-bit).
ID: 38179 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38180 - Posted: 29 Sep 2014, 22:17:47 UTC
Last modified: 29 Sep 2014, 22:19:25 UTC

It completed OK on the GTX 650 Ti, but seems to be causing problems on some higher-end cards. But their versions of ACEMD probably have more changes than the one I got (8.46).
http://www.gpugrid.net/workunit.php?wuid=10123336

I will be trying my GTX 660 Ti next on the same machine to see what happens.
ID: 38180 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : News : New CUDA65 beta app

©2025 Universitat Pompeu Fabra