Advanced search

Message boards : Graphics cards (GPUs) : New Nvidia Driver error

Author Message
Profile Dingo
Avatar
Send message
Joined: 1 Nov 07
Posts: 20
Credit: 120,466,630
RAC: 1
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 52364 - Posted: 30 Jul 2019 | 1:21:56 UTC
Last modified: 30 Jul 2019 | 1:22:59 UTC

I did the driver update for Nvidia to 431.6 and there is an error in the driver code that stops me from running GPU Grid as all the work since then has this error. It is on my windows machine with my 1080Ti.

I can run Primegrid on the machine after the update so looks like a project code issue ???


This is the machine: https://www.gpugrid.net/results.php?hostid=453402
At the very end of processing:

Name e9s120_e3s89p1f137-PABLO_V4_UCB_p27_sj403_no_salt_IDP-0-2-RND6771_0
Workunit 16678301
Created 28 Jul 2019 | 19:10:51 UTC
Sent 28 Jul 2019 | 20:27:33 UTC
Received 29 Jul 2019 | 2:37:26 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -55 (0xffffffffffffffc9) Unknown error number
Computer ID 453402
Report deadline 2 Aug 2019 | 20:27:33 UTC
Run time 21,941.82
CPU time 1,907.74
Validate state Invalid
Credit 0.00
Application version Long runs (8-12 hours on fastest card) v9.22 (cuda80)
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -55 (0xffffffc9)</message>
<stderr_txt>
# GPU [GeForce GTX 1080 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce GTX 1080 Ti
# ECC : Disabled
# Global mem : 11264MB
# Capability : 6.1
# PCI ID : 0000:0A:00.0
# Device clock : 1645MHz
# Memory clock : 5505MHz
# Memory width : 352bit
# Driver version : r431_31 : 43136
# GPU 0 : 71C
# GPU [GeForce GTX 1080 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce GTX 1080 Ti
# ECC : Disabled
# Global mem : 11264MB
# Capability : 6.1
# PCI ID : 0000:0A:00.0
# Device clock : 1645MHz
# Memory clock : 5505MHz
# Memory width : 352bit
# Driver version : r431_31 : 43136
# GPU 0 : 68C
# GPU 0 : 69C
# GPU 0 : 70C
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1965.
# SWAN swan_assert 0
____________

Proud Founder and member of



Have a look at my WebCam

Erich56
Send message
Joined: 1 Jan 15
Posts: 757
Credit: 3,402,889,227
RAC: 5,041
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52365 - Posted: 30 Jul 2019 | 5:05:18 UTC

what seems strange to me is:

Run time 21,941.82
CPU time 1,907.74

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,200,441,910
RAC: 199,757
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 52366 - Posted: 30 Jul 2019 | 5:26:44 UTC - in response to Message 52364.
Last modified: 30 Jul 2019 | 5:39:17 UTC

There are 4 recent errors reported for this host. 3 errors with v431.36 and 1 error with v431.60

v431.36 errors
1 task that failed was from a batch with a 68% failure rate, so this failure can be attributed to the bad batch,
2 tasks failed at exactly the same time with this error:

Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1965

so I suspect the attached link explains this issue...
https://www.gpugrid.net/forum_thread.php?id=4652#48209


v431.60 error
An error appears to be reported (Access violation : progress made, try to restart), and then aborted by user. What was the issue that lead you to abort the task?
There are other hosts successfully using v431.60 so I don't think the version is the issue, perhaps try another task to see if further issues are experienced.
EDIT: This error could also be attributed to a bad batch, see this thread:
https://www.gpugrid.net/forum_thread.php?id=4634#48021

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,200,441,910
RAC: 199,757
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 52367 - Posted: 30 Jul 2019 | 5:28:52 UTC - in response to Message 52365.

what seems strange to me is:

Run time 21,941.82
CPU time 1,907.74


If SWAN_SYNC is not enabled, this can be normal, especially on a fast processor (Ryzen 7 1800X)

Profile Dingo
Avatar
Send message
Joined: 1 Nov 07
Posts: 20
Credit: 120,466,630
RAC: 1
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 52368 - Posted: 30 Jul 2019 | 6:48:04 UTC
Last modified: 30 Jul 2019 | 6:48:15 UTC

OK I will try another task and see what happens. This is th task that is running now:

https://www.gpugrid.net/workunit.php?wuid=16682627

Profile Dingo
Avatar
Send message
Joined: 1 Nov 07
Posts: 20
Credit: 120,466,630
RAC: 1
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 52369 - Posted: 30 Jul 2019 | 13:32:40 UTC

OK all is fine now. Must have been a problem of the update happening while GPUGRID was running ??

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2265
Credit: 15,986,076,810
RAC: 34,769
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52371 - Posted: 30 Jul 2019 | 15:00:03 UTC - in response to Message 52369.

OK all is fine now. Must have been a problem of the update happening while GPUGRID was running ??
Exactly.

Post to thread

Message boards : Graphics cards (GPUs) : New Nvidia Driver error