Advanced search

Message boards : Number crunching : Beware of 378.49 NVIDIA driver

Author Message
Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1067
Credit: 1,146,172,214
RAC: 1,071,936
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46303 - Posted: 25 Jan 2017 | 2:21:05 UTC
Last modified: 25 Jan 2017 | 2:29:40 UTC

I am having major problems doing compute work with this 378.49 driver (the first R378 driver), released today!
It won't work at all for me! Details below.

Can you please test, then reply with your results?
- Do you have the same problems?
- What GPU? Maybe it's Maxwell-specific?
- The more results the better!

I'll be using DDU, then installing 376.60 (the last R375 driver), fresh, so I can keep crunching, available here:
http://nvidia.custhelp.com/app/answers/detail/a_id/4293

Thanks,
Jacob

--------------------------------
GPUGrid tasks will repeatedly say the following, until the task stops trying resulting in error:

1/24/2017 8:30:01 PM | GPUGRID | Task e4s8_e3s14p0f83-ADRIA_FOLD_crystal_ss_contacts_20_ntl9_1-0-1-RND3321_0 exited with zero status but no 'finished' file
1/24/2017 8:30:01 PM | GPUGRID | If this happens repeatedly you may need to reset the project.

--------------------------------
PrimeGrid OpenCL Genefer, will immediately crash.

If I run:
primegrid_genefer_3_3_0_3.12_windows_intelx86__OCLcudaGFNWR.exe -q "43322502^131072+1" -d 0
It will immediately crash, with the following error:
Error: OpenCL error detected: CL_OUT_OF_RESOURCES.

--------------------------------
i7-5960X CPU, 64 GB Ram
Windows 10 x64, Version 1607, Build 14393.693
GTX 980 Ti (2 in system - EVGA GTX 980 Ti FTW, Dell GTX 980 Ti Reference)
378.49 Drivers

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1067
Credit: 1,146,172,214
RAC: 1,071,936
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46304 - Posted: 25 Jan 2017 | 3:17:32 UTC
Last modified: 25 Jan 2017 | 3:17:47 UTC

Wicked. I have 2 systems, and I'm getting different results.

- Could this problem be specific to the GTX 980 Ti?
- Could it be specific to how much RAM is installed? :)

System where both GPUs fail to compute at all:
i7-5960X CPU, 64 GB Ram
Windows 10 x64, Version 1607, Build 14393.693
GTX 980 Ti (2 in system - EVGA GTX 980 Ti FTW, Dell GTX 980 Ti Reference)
378.49 Drivers

System where all GPUs compute just fine:
i7-965XE, 12 GB Ram
Windows 10 x64, Version 1607, Build 14393.693
EVGA GTX 970 FTW, EVGA GTX 660 Ti FTW, MSI GTX 660 Ti OC
378.49 Drivers

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1067
Credit: 1,146,172,214
RAC: 1,071,936
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46305 - Posted: 25 Jan 2017 | 7:03:25 UTC

I've repro'd the problem, using just 1 GTX 980 Ti, and just 16 GB RAM.
I suspect that this driver has completely hosed compute on any GTX 980 Ti.

Can anyone help test to confirm?

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1067
Credit: 1,146,172,214
RAC: 1,071,936
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46324 - Posted: 26 Jan 2017 | 3:41:09 UTC
Last modified: 26 Jan 2017 | 3:47:15 UTC

Note:

The new 378.49 setting for "Optimize for Compute Performance", is a factor.

When set to ON, I'm able to crunch on my GTX 980 Ti. When set to OFF, I'm unable to crunch on it. Since I sometimes game on this PC, I'm going to revert back to 376.60 drivers.

====================================
378.49-nvidia-control-panel-quick-start-guide.pdf
... says:
Feature: Optimize for Compute Performance
Values: Off (Default), On
Notes:
Windows 10, Maxwell GPUs and later.
Offers significant improvement for some Compute applications. Care should be taken when turning this setting ON, as there can be unpredictable effects with some applications and graphics features.

====================================
The following article:
http://nvidia.custhelp.com/app/answers/detail/a_id/4370
... says:
This setting is intended to provide additional performance to non-gaming applications that use large CUDA address spaces and large amounts of GPU memory when run on graphics cards based on second-generation Maxwell GPUs. Graphics cards based on other architectures do not utilize this setting.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1067
Credit: 1,146,172,214
RAC: 1,071,936
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46509 - Posted: 15 Feb 2017 | 2:08:53 UTC

Today's 378.66, does NOT fix the compute problems with the GTX 980 Ti GPU. Compute is still broken by default, for GTX 980 Ti on: 378.49, 378.57, 378.66. It's possible that GTX 980 GPUs are also affected.

NVIDIA suspects that the next R378 driver may have a fix.

I'm still using 376.60, that last known R375 driver, as a workaround for my PC that has GTX 980 Ti GPUs.

Regards,
Jacob

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1067
Credit: 1,146,172,214
RAC: 1,071,936
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46515 - Posted: 16 Feb 2017 | 12:05:27 UTC

I believe GTX 980 GPUs may also be affected.

So, if you have a GTX 980 or a GTX 980 Ti, stick with 376.60, until R378 is working properly :)

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1067
Credit: 1,146,172,214
RAC: 1,071,936
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46517 - Posted: 17 Feb 2017 | 1:17:35 UTC
Last modified: 17 Feb 2017 | 1:37:13 UTC

The issue has been fixed.

The 378.72 hotfix driver, released today, contains the fix.

Announcement: https://forums.geforce.com/default/topic/994534/geforce-drivers/announcing-geforce-hotfix-driver-378-72/
Hotfix page: http://nvidia.custhelp.com/app/answers/detail/a_id/4405

Also, I now believe it was specific to GTX 980 Ti GPUs only, and did not affect GTX 980.

Hurray - I can use R378 now :)

Profile Steve Dodd
Send message
Joined: 26 Dec 08
Posts: 14
Credit: 166,864,311
RAC: 1,338
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46552 - Posted: 24 Feb 2017 | 4:43:41 UTC

Jacob,
Thank you for all your hard work identifying/tracking/to closure this issue. Although it didn't affect me directly, I still appreciate someone willing to help with issues of this type. Good job!
____________

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1067
Credit: 1,146,172,214
RAC: 1,071,936
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46553 - Posted: 24 Feb 2017 | 5:01:58 UTC

Thanks Steve! :)

My main rig consists of 2 GTX 980 Ti GPUs, so ... I had a good reason to hound NVIDIA. And I actually am now respected by their QA guys, for my work on this bug and previous ones. I even gained a friend.

Fun fun. I'm just happy to be using R378 without issue finally!

Killersocke
Send message
Joined: 18 Oct 13
Posts: 41
Credit: 134,973,970
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 46636 - Posted: 10 Mar 2017 | 20:47:38 UTC

there is new driver out => 378.78
he works now

anon63262347
Send message
Joined: 21 Mar 16
Posts: 1
Credit: 88,992,300
RAC: 329,203
Level
Thr
Scientific publications
wat
Message 46671 - Posted: 16 Mar 2017 | 21:28:53 UTC - in response to Message 46636.

I'm running 2X 980 SC cards on Driver 378.78. Installed this driver on March.10.

Since that date, a large chunk of tasks have started either "Exiting with zero status but no finished file" or telling me the output file is absent.

I don't believe this issue is fixed. I also have "Optimize for computation performance" enabled.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1067
Credit: 1,146,172,214
RAC: 1,071,936
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46688 - Posted: 18 Mar 2017 | 6:57:37 UTC

If you turn "Optimize for computation performance" OFF, then restart ... does that fix the issue?

"Optimize for computation performance" is known to have some issues, and I don't trust it yet. Try to see if turning it off and restarting, helps you.

Erich56
Send message
Joined: 1 Jan 15
Posts: 371
Credit: 1,670,144,977
RAC: 2,991,705
Level
His
Scientific publications
watwatwat
Message 46694 - Posted: 20 Mar 2017 | 20:30:38 UTC - in response to Message 46688.

If you turn "Optimize for computation performance" OFF, then restart ... does that fix the issue?

"Optimize for computation performance" is known to have some issues, and I don't trust it yet. Try to see if turning it off and restarting, helps you.

Jakob, could you please tell me where I can find this setting.
Maybe this has to do with the problems I am experiencing with my GTX750ti and my GTX970, both running on Windows 10 with relatively new drivers.
(more details hereto I had posted in the "Graphic Cards" thread).

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1067
Credit: 1,146,172,214
RAC: 1,071,936
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46728 - Posted: 21 Mar 2017 | 18:20:28 UTC
Last modified: 21 Mar 2017 | 18:23:08 UTC

"Optimize for Compute Performance" is a new setting, in "Manage 3D settings", in the NVIDIA Control Panel. It is applicable to Maxwell or later, in Windows 10, per the "NVIDIA Control Panel Quick Start Guide" that is downloadable for each driver. It likely doesn't show if you don't have a Maxwell-or-later GPU in a Windows 10 system.

The "NVIDIA Control Panel Quick Start Guide" says:
Feature: Optimize for Compute Performance
Values: Off (Default), On
Notes: Windows 10, Maxwell GPUs and later. Offers significant improvement for some compute applications. Care should be taken when turning this setting ON, as there can be unpredictable effects with some applications and graphics features.

The UI says:
This setting allows you to significantly improve performance of some compute applications. Note that this setting may have a negative impact on some graphics features such as Sparse Texture.
Typical usage scenarios:
- Select On for higher potential performance in compute applications.
- Select Off when graphics features like Sparse Texture are used.

The first R378 driver, 378.49, had a problem where GTX 980 Ti GPUs would fail all computation, unless the user specifically turned this feature to ON. Subsequent drivers, starting with the 378.72 hotfix driver, have fixed that issue, such that I can leave it at Off, since I don't want any risk of gaming problems or problems in other applications.

That is the full summary of the feature, the bug in this post, and the drivers that broke/fixed it.

Post to thread

Message boards : Number crunching : Beware of 378.49 NVIDIA driver