Nvidia OpenCL problem for 364.* drivers

Message boards : Number crunching : Nvidia OpenCL problem for 364.* drivers
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43215 - Posted: 17 Apr 2016, 2:09:06 UTC

The OpenCL section of the Nvidia 364.72 driver, and earlier 364.* drivers, has a problem which can cause an entire computer to lock up, or cause a few dozen OpenCL tasks (often not all from the same BOINC project) to give a quick Compute Error. Problem not seen in the 362.00 driver.

Tasks from POEM@home seem the most likely to trigger this problem.

Threads on the problems:

https://www.primegrid.com/forum_thread.php?id=6769#94223

http://boinc.fzk.de/poem/forum_thread.php?id=1205#10896

I currently do not have hardware that can check whether GPUGRID has this problem, but you may want to watch for it.
ID: 43215 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MossyRock

Send message
Joined: 26 Nov 13
Posts: 17
Credit: 50,096,588
RAC: 0
Level
Thr
Scientific publications
watwatwatwat
Message 43219 - Posted: 18 Apr 2016, 11:47:20 UTC - in response to Message 43215.  
Last modified: 18 Apr 2016, 11:54:10 UTC

Great. I just applied the 364.72 Nvidia update yesterday and now all of my GPUGrid tasks are crashing. One failed after considerable time had elapsed and the last two crashed just after starting.
ID: 43219 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43220 - Posted: 18 Apr 2016, 12:15:40 UTC - in response to Message 43219.  
Last modified: 18 Apr 2016, 12:17:36 UTC

Could be something else entirely because this board is not full of WU failures due to these drivers and I've run them myself since they came out.

GPUGrid doesn't use OpenCL
ID: 43220 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MossyRock

Send message
Joined: 26 Nov 13
Posts: 17
Credit: 50,096,588
RAC: 0
Level
Thr
Scientific publications
watwatwatwat
Message 43221 - Posted: 18 Apr 2016, 14:04:00 UTC - in response to Message 43220.  

I'll try a clean install of the drivers and see if that fixes the issue.
ID: 43221 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43222 - Posted: 18 Apr 2016, 14:58:13 UTC

I have no information on whether this problem also affects CUDA tasks, but for OpenCL tasks, one task crashes after a few hours, then perhaps two dozen more (not necessarily from the same BOINC project) crash quickly. Restarting Windows appears to be required to make any more OpenCL tasks complete properly.
ID: 43222 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MossyRock

Send message
Joined: 26 Nov 13
Posts: 17
Credit: 50,096,588
RAC: 0
Level
Thr
Scientific publications
watwatwatwat
Message 43226 - Posted: 19 Apr 2016, 17:41:58 UTC - in response to Message 43222.  

I've reverted back to ver. 362.00 to see if this fixes my GPUGrid WU problems - when there's more WUs available I'll be able to tell.

It looks like my ASUS GTX650-E-1GD5 GeForce GPU didn't run ver. 364.xx very well. Yeah, I know it's an old card. There were multiple errors in Windows Event Viewer and my ASUS GPUTweak was also blowing up. Ver. 362.00 fixed that.


ID: 43226 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MossyRock

Send message
Joined: 26 Nov 13
Posts: 17
Credit: 50,096,588
RAC: 0
Level
Thr
Scientific publications
watwatwatwat
Message 43233 - Posted: 22 Apr 2016, 3:36:25 UTC - in response to Message 43226.  

Yeah, that fixed it. My WUs are completing normally now.
ID: 43233 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43254 - Posted: 26 Apr 2016, 14:00:09 UTC - in response to Message 43233.  
Last modified: 26 Apr 2016, 14:01:30 UTC

'Upgraded' to 364.72 WHQL (Clean install wouldn't work on W10x64) and found that it crashed all POEM tasks (OpenCL) [driver restarts].
Ran MW and Einstein tasks without problems and so far it's running a task here without difficulty.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43254 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43255 - Posted: 26 Apr 2016, 14:51:30 UTC - in response to Message 43254.  

'Upgraded' to 364.72 WHQL (Clean install wouldn't work on W10x64) and found that it crashed all POEM tasks (OpenCL) [driver restarts].
Ran MW and Einstein tasks without problems and so far it's running a task here without difficulty.

Jacob Klein has already reported that one to NVidia, and got David Anderson to add an option to disallow OpenCL tasks, wherever they might pop up from.

I think that's a sledgehammer to crack a very small nut, and I've told him so, but you might like to test the new v7.6.32 (you'll have to find the download yourself - it hasn't even gone into alpha testing yet).
ID: 43255 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43261 - Posted: 27 Apr 2016, 17:59:58 UTC - in response to Message 43255.  
Last modified: 27 Apr 2016, 18:40:11 UTC

:) I see my name got mentioned. Yeah, it's nice to have an option to disable OpenCL at the client, in my opinion, for cases like this where you may want the latest drivers for gaming, but can't support running OpenCL tasks due to NVIDIA.

My ticket with them is regarding the OpenCL SDK examples failing on Maxwell, but I also mentioned to them that R364 drivers are failing Poem@Home tasks and causing TDRs, BSODs, restarts, and even making other tasks fail.

The BOINC 7.6.32+ cc_config option for <no_opencl>1</no_opencl> ... works nicely as a workaround, for the scenario.

The R364 drivers are still trash, in my opinion. The main reason I run them is to help find problems to get them fixed. In addition to the horrible OpenCL woes, the R364 drivers also have a bug with brief full screen corruption any time a CUDA task starts on my eVGA GTX 980Ti FTW at 144 Hz. Junk.

The 362.00 drivers are the latest that have my solid recommendation.

Regards,
Jacob
ID: 43261 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43262 - Posted: 27 Apr 2016, 18:36:37 UTC

Thanks for posting the information about this problem and also for the recommendation concerning the latest relatively bug free drivers (362.00).
ID: 43262 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nanoprobe

Send message
Joined: 26 Feb 12
Posts: 184
Credit: 222,376,233
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 43271 - Posted: 28 Apr 2016, 15:21:17 UTC

I read somewhere that Primegrid will no longer send tasks to any computer that has these problematic drivers installed.
ID: 43271 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43272 - Posted: 28 Apr 2016, 15:46:22 UTC
Last modified: 28 Apr 2016, 16:07:37 UTC

I read somewhere that Primegrid will no longer send tasks to any computer that has these problematic drivers installed.


That would be wise, as the tasks supposedly gracefully complete with miscalculated results! :)

We're tracking the problem/solution here:
http://www.primegrid.com/forum_thread.php?id=6775
... where I have an NVIDIA dev looking into it.
So, look there for updates.

Edit: Made hyperlink clickable, sorry about that.
ID: 43272 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43273 - Posted: 28 Apr 2016, 15:55:09 UTC - in response to Message 43272.  
Last modified: 28 Apr 2016, 16:07:31 UTC

http://www.primegrid.com/forum_thread.php?id=6775

(just making it clicky so I can follow without editing every time)

Edit - looks like you've got some experienced debuggers active there. Excellent news.
ID: 43273 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43297 - Posted: 2 May 2016, 18:47:41 UTC

I have confirmed that today's 365.10 drivers do NOT fix the OpenCL problems -- PrimeGrid miscalculation and Poem@Home TDRs.

I'd recommend users to stick with 362.00, and projects to take action to prevent issuing OpenCL tasks to R364 users.
ID: 43297 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43312 - Posted: 4 May 2016, 12:40:51 UTC

I have a small status update, regarding my NVIDIA bug (Bug ID 1754468) for these OpenCL issues:
- Status changed from "Open - pending review" to "Open - in progress"
ID: 43312 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43313 - Posted: 5 May 2016, 12:20:18 UTC
Last modified: 5 May 2016, 12:35:53 UTC

Another small update -- basically, while NVIDIA fixes the problems, they're requesting additional info to potentially make "Poem@Home" and "PrimeGrid calculation" test cases that could be used in their checklist to release new drivers. That's a GREAT idea, in my opinion :)
ID: 43313 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43371 - Posted: 12 May 2016, 5:43:14 UTC

Lots of updates in these 2 threads:

Basically, the problems have been solved, but only the POEM crashes will land in the upcoming (any day) driver release. The PrimeGrid miscalcs will have to wait until the (sometime later this month) driver release.

http://www.primegrid.com/forum_thread.php?id=6775
http://boinc.fzk.de/poem/forum_thread.php?id=1205
ID: 43371 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43400 - Posted: 13 May 2016, 14:50:56 UTC
Last modified: 13 May 2016, 14:51:04 UTC

I have confirmed that the new Doom 365.19 drivers:
- Do NOT fix the OpenCL/CUDA miscalculations (Internal NVIDIA Bug ID: 200197534)
- DO fix the Poem@Home TDR/crashes (NVIDIA Bug ID: 1754468)

So... If you do any distributed computing involving OpenCL/CUDA calculating, I recommend that you **stick with 362.00** for correct calculations, until the next driver release which should have the miscalculation fix.

Thanks,
Jacob
ID: 43400 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43401 - Posted: 13 May 2016, 15:19:08 UTC - in response to Message 43400.  

I have confirmed that the new Doom 365.19 drivers:
- Do NOT fix the OpenCL/CUDA miscalculations (Internal NVIDIA Bug ID: 200197534)
- DO fix the Poem@Home TDR/crashes (NVIDIA Bug ID: 1754468)

So... If you do any distributed computing involving OpenCL/CUDA calculating, I recommend that you **stick with 362.00** for correct calculations, until the next driver release which should have the miscalculation fix.

Thanks,
Jacob
I have the 364.72 driver on 3 of my hosts, and my Einstein@home tasks are validating just fine.
So I'm not sure about the extent this issue has on CUDA tasks.
ID: 43401 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Nvidia OpenCL problem for 364.* drivers

©2026 Universitat Pompeu Fabra