all WUs downloaded recently produce "computation error" right away

Message boards : Number crunching : all WUs downloaded recently produce "computation error" right away
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46930 - Posted: 15 Apr 2017, 21:38:21 UTC
Last modified: 15 Apr 2017, 21:45:16 UTC

MJH:

Here's what I'm seeing with 9.15:
- My PC "Speed", that has GTX 980 Ti x2 .... appears to work fine with 9.15
- My PC "RacerX", that has GTX 970 and GTX 660 Ti x2 ... appears to have problems running 9.15 apps on the 2 GTX 660 Ti GPUs. See below.

I thought CC 3.0 GPUs were still supported.
Any ideas?

<core_client_version>7.7.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -59 (0xffffffc5)
</message>
<stderr_txt>
# GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 1	:
#	Name		: GeForce GTX 660 Ti
#	ECC		: Disabled
#	Global mem	: 3072MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1045MHz
#	Memory clock	: 3004MHz
#	Memory width	: 192bit
#	Driver version	: r381_64 : 38178
#SWAN: FATAL: cannot find image for module [.nonbonded.cu.] for device version 300

</stderr_txt>
]]>
ID: 46930 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 318
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46931 - Posted: 15 Apr 2017, 21:43:15 UTC - in response to Message 46930.  

Interesting. The machines I'm having problems with also have dual GPUs of different vintages - mine have a secondary GTX 750Ti in both cases. But GPUGrid is excluded from the 750s, and only runs on the 970s.
ID: 46931 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46932 - Posted: 15 Apr 2017, 21:43:26 UTC - in response to Message 46930.  

For some reason the sm 3.0 support (and only that sm version) is broken.
That'll need a 9.16
ID: 46932 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46933 - Posted: 15 Apr 2017, 21:47:24 UTC - in response to Message 46931.  

Richard the problem with your machines is (at least) the driver version.
ID: 46933 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46934 - Posted: 15 Apr 2017, 21:57:43 UTC

Thanks MJH.

I'll do my best to test 9.16 when it's released, though I sure wish I could get email notifications for my subscribed threads ... that's been broken for a while :/ Maybe you could take a peek.
ID: 46934 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46935 - Posted: 15 Apr 2017, 22:08:54 UTC - in response to Message 46934.  

9.16 should be along in about 15 mins.
ID: 46935 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tom Miller

Send message
Joined: 21 Nov 14
Posts: 5
Credit: 1,081,640,766
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwat
Message 46936 - Posted: 15 Apr 2017, 22:11:16 UTC - in response to Message 46933.  

Failing on all machines.

Windows 10x64 on all.

4 X GTX670's
3 X GTX680's
4 X GTX770's
2 X GTX780Ti's

They all have late driver versions.
ID: 46936 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 318
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46937 - Posted: 15 Apr 2017, 22:13:01 UTC - in response to Message 46933.  
Last modified: 15 Apr 2017, 22:24:26 UTC

Richard the problem with your machines is (at least) the driver version.

I was just coming to that conclusion myself. Clean install of 381.65 completed, machine rebooted, and task e67s40_e47s2p0f68-PABLO_P04637_0_IDP-0-1-RND0199_4 is running normally. But that's an old task from 13 April, with three previous v9.15 failures. Too late to investigate whether they might have lower drivers too, or some other problem.

I'd like to verify that tasks like yesterday's ADRIA_FOLDGREED10_crystal_ss_contacts_100_ubiquitin batch run OK before we completely sign this one off, but that can wait until tomorrow (or later next week).

Apologies for interrupting your weekend - hope you can have a good break in what remains of it.


Edit - OK, I peeked :-)

Task 16239400 has the ERR_TOO_MANY_EXITS problem, and it's described as

GeForce GTX 1060 6GB (4095MB) driver: 368.81

We're going to have to work out where the break-point occurs in the 360+ driver sequence, and put out an APB to upgrade - or a min_version in the plan_class. Next week. G'night.
ID: 46937 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46938 - Posted: 15 Apr 2017, 22:44:02 UTC - in response to Message 46937.  

916 is out now: this should work with sm 300 GPUs

I've revised the scheduler to refuse work to anything with driver < 370.00 which was the previous minimum for CUDA 8. Seems the "supported version" reported by the driver is as unreliable as ever.

Matt
ID: 46938 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 318
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46939 - Posted: 15 Apr 2017, 22:54:22 UTC - in response to Message 46938.  

OK, when I start upgrading my other two tomorrow morning I'll start with 372.54, and if that works, probably stick at 373.06 (first and last of the 372 series respectively). If that doesn't work, rinse and repeat with 375.63 / 376.33 and so on.
ID: 46939 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46940 - Posted: 15 Apr 2017, 23:18:22 UTC
Last modified: 15 Apr 2017, 23:19:06 UTC

MJH:
9.16 tasks are still not working for my GTX 660 Ti GPUs.


Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	-52 (0xffffffffffffffcc) Unknown error number
Computer ID	153764
Report deadline	20 Apr 2017 | 23:01:28 UTC
Run time	2.88
CPU time	0.00
Validate state	Invalid
Credit	0.00
Application version	Long runs (8-12 hours on fastest card) v9.16 (cuda80)



Stderr output

<core_client_version>7.7.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -52 (0xffffffcc)
</message>
<stderr_txt>
# GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 1	:
#	Name		: GeForce GTX 660 Ti
#	ECC		: Disabled
#	Global mem	: 3072MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1045MHz
#	Memory clock	: 3004MHz
#	Memory width	: 192bit
#	Driver version	: r381_64 : 38178
SWAN : FATAL Unable to load module .nonbonded.cu. (300)

</stderr_txt>
]]>
ID: 46940 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 51
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46941 - Posted: 16 Apr 2017, 0:18:40 UTC

I updated my windows 10 computer to nvidia driver 381.65 from 359.06. Everything is running fine so far, but this driver is slightly slower.

Now, what is going to happen to windows xp? I would like to see it supported a little bit longer, and please don't tell me to upgrade.

If you are going to retire cuda 6.5, then please give us a warning before hand and don't do it, in this abrupt and amateurish manner.

And do keep track of your licenses' expiration dates and give us ample warning when we need to upgrade our software!

Thank you!!


ID: 46941 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PappaLitto

Send message
Joined: 21 Mar 16
Posts: 513
Credit: 4,673,458,277
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 46943 - Posted: 16 Apr 2017, 2:21:23 UTC - in response to Message 46941.  
Last modified: 16 Apr 2017, 2:21:32 UTC

Now, what is going to happen to windows xp? I would like to see it supported a little bit longer, and please don't tell me to upgrade.

If you are going to retire cuda 6.5, then please give us a warning before hand and don't do it, in this abrupt and amateurish manner.

And do keep track of your licenses' expiration dates and give us ample warning when we need to upgrade our software!

Thank you!!

+1
ID: 46943 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile DrBob

Send message
Joined: 1 Sep 08
Posts: 3
Credit: 165,982,714
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 46944 - Posted: 16 Apr 2017, 2:23:42 UTC

My GTX750Ti - driver 376.53 & GTX1050Ti - driver 378.92 are working fine now.

2 GTX460 running driver 378.92 are not getting any work even though they are above the minimum driver level and show CUDA version 8.0...

4/15/2017 9:06:31 PM | GPUGRID | Sending scheduler request: To fetch work.
4/15/2017 9:06:31 PM | GPUGRID | Requesting new tasks for Miner ASIC and NVIDIA GPU
4/15/2017 9:06:33 PM | GPUGRID | Scheduler request completed: got 0 new tasks
4/15/2017 9:06:33 PM | GPUGRID | No tasks sent
4/15/2017 9:06:33 PM | GPUGRID | No tasks are available for Short runs (2-3 hours on fastest card)
4/15/2017 9:06:33 PM | GPUGRID | No tasks are available for Long runs (8-12 hours on fastest card)
ID: 46944 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46947 - Posted: 16 Apr 2017, 3:00:16 UTC
Last modified: 16 Apr 2017, 3:17:59 UTC

9.18 is also not working for my GTX 660 Ti GPUs.
SWAN : FATAL Unable to load module .nonbonded.cu. (300)

Also, the 9.x tasks aren't responding to suspending very well, sometimes it takes 10-20 seconds after issuing the suspend command, before they actually suspend!


Here you can see where it was fine on the GTX 970, but immediately failed on the GTX 660 Ti:
https://www.gpugrid.net/result.php?resultid=16242220

Note: I am running a pre-release version of Windows 10 - but I wouldn't think that would matter, would it?

Stderr output

<core_client_version>7.7.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -52 (0xffffffcc)
</message>
<stderr_txt>
# GPU [GeForce GTX 970] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0	:
#	Name		: GeForce GTX 970
#	ECC		: Disabled
#	Global mem	: 4096MB
#	Capability	: 5.2
#	PCI ID		: 0000:09:00.0
#	Device clock	: 1367MHz
#	Memory clock	: 3505MHz
#	Memory width	: 256bit
#	Driver version	: r381_64 : 38178
# GPU 0 : 68C
# GPU 1 : 67C
# GPU 2 : 54C
# GPU 2 : 57C
# GPU 2 : 58C
# GPU 2 : 59C
# GPU 1 : 68C
# GPU 2 : 60C
# GPU 1 : 69C
# GPU 1 : 71C
# GPU 1 : 72C
# GPU [GeForce GTX 970] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0	:
#	Name		: GeForce GTX 970
#	ECC		: Disabled
#	Global mem	: 4096MB
#	Capability	: 5.2
#	PCI ID		: 0000:09:00.0
#	Device clock	: 1367MHz
#	Memory clock	: 3505MHz
#	Memory width	: 256bit
#	Driver version	: r381_64 : 38178
# GPU 0 : 67C
# GPU 1 : 66C
# GPU 2 : 58C
# GPU 1 : 69C
# GPU 2 : 59C
# GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 1	:
#	Name		: GeForce GTX 660 Ti
#	ECC		: Disabled
#	Global mem	: 3072MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1045MHz
#	Memory clock	: 3004MHz
#	Memory width	: 192bit
#	Driver version	: r381_64 : 38178
SWAN : FATAL Unable to load module .nonbonded.cu. (300)

</stderr_txt>
]]>
ID: 46947 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46950 - Posted: 16 Apr 2017, 4:43:35 UTC - in response to Message 46921.  

I was expecting it to work on 64 bit XP, actually. Given that it doesn't there's not a tremendous amount I can do to fix it immediately.

We haven't had an XP test platform for a long time: Microsoft's ended support for it 3 years ago! You really should upgrade...
Matt

Matt,
It's a bit off-topic, but let me explain:
These Windows XP x64 hosts are dedicated crunching boxes (therefore it does not matter if their OS is not supported anymore). A lot of effort have been put into them to make the GTX 980Ti work under Windows XP, selecting the right MB, "hacking" the NV driver to recognize the top-end cards, etc. The reason for *not* to upgrade them from Windows XP is to maximize their throughput (avoiding WDDM).

... But for now, if you could make a fresh CUDA 6.5 client, that would be great (and it would save us a lot of work).
Thank you in advance!

I can only fully underline what Zoltan is saying, and hope that all the many crunchers using XP for good reason will be able to continue for a while.
ID: 46950 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46951 - Posted: 16 Apr 2017, 6:15:40 UTC
Last modified: 16 Apr 2017, 6:35:10 UTC

I now tried task
e4s8_e2s3p0f357-ADRIA_FOLDGREED10_crystal_ss_contacts_100_ubiquitin_4-1-2-RND4569_2

on my Windows 10 64-bit, driver 376.53, acemd 918.80.

It errored out after 1.30 seconds:
(unknown error) - exit code -1073741790 (0xc0000022).

Why so?
ID: 46951 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46952 - Posted: 16 Apr 2017, 7:05:06 UTC - in response to Message 46951.  
Last modified: 16 Apr 2017, 7:28:30 UTC

on my Windows 10 64-bit, driver 376.53, acemd 918.80.

I now updated the driver to 381.65 and downloaded

e5s6_e3s4p0f494-ADRIA_FOLDGREED50_crystal_ss_contacts_100_ubiquitin_3-0-2-RND2192_0

It's been running well for 10 minutes ... so let's keep our fingers crossed.
The card is a GTX970.

Still I hope that a solution can be found for XP.
ID: 46952 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[CSF] Thomas H.V. DUPONT

Send message
Joined: 20 Jul 14
Posts: 732
Credit: 130,089,082
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 46956 - Posted: 16 Apr 2017, 7:52:58 UTC

Windows 10/64-bit
GenuineIntel Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
NVIDIA GeForce GTX 960M (2048MB) driver: 381.65

Running for 17 minutes
No problem for now
Fingers crossed
[CSF] Thomas H.V. Dupont
Founder of the team CRUNCHERS SANS FRONTIERES 2.0
www.crunchersansfrontieres
ID: 46956 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46957 - Posted: 16 Apr 2017, 9:30:29 UTC - in response to Message 46952.  
Last modified: 16 Apr 2017, 9:39:39 UTC

It's been running well for 10 minutes ... so let's keep our fingers crossed.
The card is a GTX970.

And, unfortunately, even with the new (latest) driver I am experiencing the same problem that I am having since some time ago, and which I desribed in this thread:

http://www.gpugrid.net/forum_thread.php?id=4511#46686

After a while (today: after about 2 hours) the GPU clock automatically drops to the "default clock" value 1152 MHz (and power consumption dropping to about 58%). And this can only be changed back to a higher value (via NVIDIA Inspector) after a restart of the PC.
BTW, the same thing happens with the GTX750Ti in the other Windows10 PC.

This has never ever happened with my two Windows XP PCs.
So one more reason NOT to give up XP by switching to Windows10 !!!
ID: 46957 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : all WUs downloaded recently produce "computation error" right away

©2025 Universitat Pompeu Fabra