Update on titans and gtx780s

Message boards : News : Update on titans and gtx780s
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 31651 - Posted: 19 Jul 2013, 10:45:18 UTC
Last modified: 19 Jul 2013, 10:51:29 UTC

Hi,
so an update is due with titans and gtx780s.
The current gpugrid application is not supporting it and we have not updated it for a reason.

At the moment Titans and GTX780s do NOT work. After a short time the application crashes. Nvidia has now recognized the problem and it is working on a fix for Titans which should be out in a month or so. It could be a new driver or a bios update.

For GTX780, the fix would be either together with the titan fix or it would not come for sometime or ever.

Best for crunching at the moment are GTX770.

gdf
ID: 31651 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
HA-SOFT, s.r.o.

Send message
Joined: 3 Oct 11
Posts: 100
Credit: 5,879,292,399
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31654 - Posted: 19 Jul 2013, 12:10:52 UTC - in response to Message 31651.  

Is it CUDA 4.2 related problem? Or is it general? I have problem with my cuda app, which stuck when syncing with cpu on linux.

Thanks
Zdenek
ID: 31654 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31655 - Posted: 19 Jul 2013, 13:11:17 UTC
Last modified: 19 Jul 2013, 13:12:57 UTC

Why wouldn't (possibly) the fix for the titan not fix the 780?

Edit: HA: its gotta be a problem with CUDA 4.5, because mine crunches fine on Einstein and is returning valid results. It does encounter a hiccup on occassion though, maybe once a week.
ID: 31655 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
HA-SOFT, s.r.o.

Send message
Joined: 3 Oct 11
Posts: 100
Credit: 5,879,292,399
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31657 - Posted: 19 Jul 2013, 14:16:23 UTC - in response to Message 31655.  
Last modified: 19 Jul 2013, 14:21:04 UTC

I have problems with my titan on linux with cuda 5.0 also. No problem with 5xx and 6xx cards. I try to figure out where is problem? driver, card or an app. (or motherboard and pci maybe)
ID: 31657 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
w6msu

Send message
Joined: 30 Nov 10
Posts: 4
Credit: 278,484,571
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31661 - Posted: 19 Jul 2013, 17:17:24 UTC

back to the gtx-670...
ID: 31661 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Operator

Send message
Joined: 15 May 11
Posts: 108
Credit: 297,176,099
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31664 - Posted: 19 Jul 2013, 17:51:26 UTC - in response to Message 31651.  
Last modified: 19 Jul 2013, 17:52:02 UTC

Nvidia has now recognized the problem and it is working on a fix for Titans which should be out in a month or so. It could be a new driver or a bios update.


Can you provide any more info on this?

Was there some sort of report published or correspondence you can refer to?

How long have you known that there was a problem and that your group intentionally decided not to update your app to run on Titans? Was this decision taken recently?

I have had a trouble ticket in with Nvidia for months now that they have not updated and obviously chosen to ignore.

They initially seemed interested in the Titan's inability to crunch on GPUGrid and then suddenly went to 'radio silence'.

Einstein, Folding@Home, etc. have all benefited in the meantime.

Obviously I'm disappointed and expected to be able to support this project with components I purchased because I believe that what you are doing is worthwhile.

It's now the middle of July and still no fix.

Operator
ID: 31664 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31668 - Posted: 19 Jul 2013, 20:23:18 UTC

It's not the fact there's no fix that bothers me. It's that they didn't say anything about thia topic the entire time. In the meanwhile, people, including myself are purchasing parts. Because why not, we haven't been told there was any serious issues.

We were told the dev was away. Serious breakdown in communication, and is making me think about switching projects.

Cheers.
ID: 31668 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile dskagcommunity
Avatar

Send message
Joined: 28 Apr 11
Posts: 462
Credit: 919,416,958
RAC: 2,149,676
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31669 - Posted: 19 Jul 2013, 21:28:31 UTC

Hmm i dont know.i dont want to sound rude...but buying hardware thats not supported is not the fault from the projectadmin i would presume O.o

But i would think more positive now and hope that the update will work for both titans and 780.
DSKAG Austria Research Team: http://www.research.dskag.at



ID: 31669 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31670 - Posted: 19 Jul 2013, 22:04:05 UTC

That I understand. What I don't like was the complete lack of communication in regards to the topic. If they knew it wasn't functioning, was able to contact NVIDIA, and get a response. That's a lot of time we were all left in the dark. Not a word was spoken.
ID: 31670 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 31671 - Posted: 19 Jul 2013, 22:30:31 UTC - in response to Message 31668.  
Last modified: 19 Jul 2013, 22:34:52 UTC

It's not the fact there's no fix that bothers me. It's that they didn't say anything about thia topic the entire time. In the meanwhile, people, including myself are purchasing parts. Because why not, we haven't been told there was any serious issues.

We were told the dev was away. Serious breakdown in communication, and is making me think about switching projects.

Cheers.


Well, the dev was away. Also the last months there have been lots of posts here about Titans not working yet, so there was decent warning for users. In any case I think that the problem was a combination of the app and the drivers and now that the app actually works as seen by the few WU's that were crunched correctly we have to wait for the NVIDIA fix from what I understand.

So I personally don't see a communication breakdown. These things happen, but hopefully it will be soon resolved so that the investment of the users is not being lost.
ID: 31671 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31673 - Posted: 19 Jul 2013, 23:23:04 UTC

Thank you for responding. What I'm attempting to get at is simply WUs failing does not bother me. That isn't a sign the cards could *never* work.

I haven't lost an investment either :) Ive been wanting to build a new rig for awhile, since the next one won't be to Maxwell comes out, and I can hopefully pair them with an 8 core has well-e

I'm looking forward to hearing back from you guys in regards to what NVIDIA does. Just remember, all we saw was very few actually completed, most failed. But some did complete. There was no reason for us to expect this updated thread today.

Again, I enjoy crunching here. I just wish there was more communication at times.
ID: 31673 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31674 - Posted: 19 Jul 2013, 23:50:48 UTC

Just to point out, these are the relevant dev responses, and why I'm bringing this up:

1) Update on Titan app:
First of all, Titans should only work on the Beta queue right now. Once we are sure it works we will extend it to the other queues. But we need to do some final testing which I think I am going to organize later today, since there is noone else actively working on it.

What we have: Supposedly the app works fine for Linux and Titans
What we lack: Even a single occurrence where a Windows Titan crunched a Beta app successfully.

So, later today I am going to tell someone in the lab to send some stable simulations to the Beta queue and if there are any successes from Titan users especially from Windows (Linux users can report too) we would have some very helpful results.

Also, MJH tells me that if you are using Titans and the WU's run for a while and then always crash before completion, then you might need to downgrade to driver 310.44 or later in the 310 series as it is supposed to help a bit in our experience.

I will keep you updated.

2) Hm didn't know that. Then maybe MJH meant that it worked locally on our Linux Titans? Sorry for the confusion. Well in any case, it's Windows we want to test right now.

3) @Zarck Yes I was watching your machine. Actually right now all the beta WUs seem to fail which would rather point to a general problem in the new app and not just the Titans. Once this batch is done I will pass on the information.

Thanks for the testing and info though!

4) Unfortunately our developer for the TITANs (and their kind) comes back in July so we will have to wait a bit more. But it is quite high on our priorities to fix the problem with the new GPU's considering that eventually everyone will switch to the new generation (and they provide great performance boost).
-------

No where here does it even hint that there is a problem near the magnitude as currently described to us. All that's really said is, Linux seems to work on ours, it appears to be a problem with our app (fixable on your end), and we need to do some final testing (final testing tends to be good).

This is all I'm getting at. There was nothing about, "At the moment Titans and GTX780s do NOT work. After a short time the application crashes". Meaning, NOTHING is working correctly, this could take some time. Best hold off on getting these parts, because we don't know what's going on or why :)

Hope my point finally get's across. Again, best of luck with NVIDIA. Hope they clear everything up in a short time frame.
ID: 31674 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 31676 - Posted: 20 Jul 2013, 8:49:40 UTC - in response to Message 31674.  
Last modified: 20 Jul 2013, 8:57:44 UTC

This is the history of the process.

First, we did not have titans. When we had some, we immediately noticed that one card seemed faulty.
This is strange as usually all cards from good manufacturers are fine.

We ordered another 4 which took a long time to arrive as there was no availability. Again one card had problems.

It started to be dubious that two cards over 8 could have problems.

We run long enough tests, and more cards seem to eventually have crashing problems.

At this point, we still believed that there was some sort of incompatibility with the application which could be worked around.

We have then sent a reproducer to nvidia. In the meanwhile, other similar applications worldwide started to see the problem. We are a couple of weeks from now.

We bought 780s, same problem.

This week, we receive information from nvidia that they have isolated the problem and they are working on a fix. However, they claim that the fix will come first for titans in weeks and only later for 780s.
I don't see why there should be any difference for the two, but this is what they say.

Sorry, we could not inform you earlier, but we had sure information from nvidia only few days ago.

The way that the problem occurs is that if you are crunching for 12h or so, some cards will never complete, some others will. Some will crash quickly, some later. Some might even for work for very long.

I'll keep you updated on when we receive the fix for titan to test. It should be in a couple of weeks. We will also test it on gtx780.

Again, this is not a specific problem of gpugrid, but as our workunits are so long, we prefer to put an application out that it is 100% stable. It is a unique situation shich we have never encountered so far.

PEOPLE with titans and 780s.
At the moment you cannot crunch in gpugrid, however the solution seems to be reasonably close now.
For titans, a few weeks.
For 780s, I would like to think that the first fix would also work for 780s. Wait a few weeks and we will tell.

gdf
ID: 31676 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GoodFodder

Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31677 - Posted: 20 Jul 2013, 8:55:03 UTC

GDF: Out of curiosity is it technically possible to break the WU into smaller units? Thanks
ID: 31677 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 31678 - Posted: 20 Jul 2013, 9:03:16 UTC - in response to Message 31677.  
Last modified: 20 Jul 2013, 9:03:45 UTC

technically it is, although inconvenient for the science, but I have tested on my linux machine yesterday and the crash hanged the machine.

We have to wait for nvidia to deliver the fix.

gdf
ID: 31678 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Zarck

Send message
Joined: 16 Aug 08
Posts: 145
Credit: 328,473,995
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31679 - Posted: 20 Jul 2013, 9:05:05 UTC - in response to Message 31674.  
Last modified: 20 Jul 2013, 9:55:54 UTC

I do not know where the problem is.

I have a Titan is a number of Boinc projects run smoothly with my Titan and any version of Boinc and nVidia driver, for example,
Moo!, DistrRTgen, PrimGrid, WCG HCC, Einstein, Seti.
Is it really a problem of nVidia driver and Boinc?

GPUGRID use functions that other projects do not use? so that GPUGRID does not work with my Titan?

Folding @ Home also runs smoothly.

I calculated what units BitCoin GPU in Utopia, it makes many units without problem, and occasionally I have a blue screen, and windows restarts.
The manager of BitCoin Utopia project asked me to send him the "Windows event log", you need this file?

@ +
*_*

Je ne sais pas ou est le problème.

J'ai une Titan est un certain nombre de projets Boinc tournent sans problème avec ma Titan et n'importe quelle version de Boinc et du pilote nVidia, comme par exemple,
Moo!, DistrRTgen, PrimGrid, WCG HCC, Einstein, Seti.
Est ce vraiment un problème du pilote nVidia et de Boinc ?

GpuGrid utillise des fonctions que d'autres projets n'utilise pas ? ce qui fait que GpuGrid ne fonctionne pas avec ma Titan ?

Folding@Home tourne aussi sans problème.

J'ai calculé quelles unités GPU sous BitCoin Utopia, il enchaine de nombreuses unités sans problème, et de temps en temps, j'ai un écran bleu, et windows redémarre.
Le gestionnaire du projet BitCoin Utopia; ma demandé de lui envoyé le "Windows event log", avez-vous besoin de ce fichier ?

@+
*_*
ID: 31679 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 31681 - Posted: 20 Jul 2013, 9:11:25 UTC - in response to Message 31679.  

Good if they work at least on other applications.
For us and other MD codes do not work, so at least is not specific of our application. I am surprised that they work at folding at home though.

gdf
ID: 31681 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31684 - Posted: 20 Jul 2013, 14:30:31 UTC

Folding@home WUs did work, but were fairly slow. Their new beta tasks however were blazing fast.

Again, I am curious as to why the fix wouldn't be for both, I know you don't know the answer, nor understand yourself, just curious is all. :)

Best of luck to Titan owners in the coming weeks. Their recent drivers have been buggy as hell with these gpus, so I'm actually wondering why these are so different compared to other cards, since its the same architecture.
ID: 31684 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael Goetz
Avatar

Send message
Joined: 2 Mar 09
Posts: 124
Credit: 124,873,744
RAC: 6,739
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 31685 - Posted: 20 Jul 2013, 15:13:54 UTC - in response to Message 31679.  

I have a Titan is a number of Boinc projects run smoothly with my Titan and any version of Boinc and nVidia driver, for example,
Moo!, DistrRTgen, PrimGrid, WCG HCC, Einstein, Seti.
Is it really a problem of nVidia driver and Boinc?

GPUGRID use functions that other projects do not use? so that GPUGRID does not work with my Titan?


As both one of the software developers and one of the administrators of the PrimeGrid project, I can assure you that PrimeGrid uses GPUs in ways that other projects do not, and run into problems that never occur with other projects. (We're very confident it's a hardware problem with the GPUs and not a software problem because increasing cooling to the GPUs, e.g. by increasing the fan speed, can fix the problem.)

Having gone through similar problems at PrimeGrid, it doesn't seem at all unusual to me that GPUGrid is having problems with some Nvidia cards that other projects don't.

GPUs are made for playing games, where a few errors won't be noticed. That's not the case with GPGPU computing, and errors will show up here that don't elsewhere. Different projects use the GPUs in different ways, and it's unfortunately not unusual for some programs to have problems on some (or even all) GPU models.
ID: 31685 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH
Project administrator
Project developer
Project scientist

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 31686 - Posted: 20 Jul 2013, 17:25:09 UTC

As Gianni said, there is a problem that affects both Titan and GTX780 cards which causes ACEMD, the GPUGrid application, to crash. There is no way that we can work around this problem successfully in our software; we need a driver fix from Nvidia. Nvidia is taking it seriously, as the problem also affects other high-profile scientific codes. The latest news that we have is that a fix is being developed and will be public in "several weeks".

Until that happy day, if you have a Titan or 780 the configuration that will give you the most chance of successfully completing GPUGRID WUs is to use Linux and driver 310.44. Empirically, this config seems to make the 780s stable, and slightly increases the mtbf of Titans.


MJH
ID: 31686 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : News : Update on titans and gtx780s

©2025 Universitat Pompeu Fabra