Message boards :
Graphics cards (GPUs) :
Recent problems for WUs on older GPUs
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
| Author | Message |
|---|---|
BymarkSend message Joined: 23 Feb 09 Posts: 30 Credit: 5,897,921 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Yep, the best driver for a 260 is Boinc 6.4.7 and driver 178.28. and cuda 2. Working fine......... Today the error rate for Kashif wus is lower, so it could have been a problem with drivers. "Silakka" Hello from Turku > Åbo. |
|
Send message Joined: 8 Sep 08 Posts: 63 Credit: 1,696,957,181 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Ubuntu 9.04 comes standard with driver version 180.44, which avoids so far to have to fiddle with manual interventions. Wiil they follow before or after GPUGRID decides to require the 185 version drivers? If a manual update of the Linux community is required, please advise in advance. Many thanks Kind regards Alain |
K1atOdessaSend message Joined: 25 Feb 08 Posts: 249 Credit: 444,646,963 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
That same GTX 260 has only 3 recent failed WUs, all of them KASHIF_HIVPR. It looks like you have a GTX 260 and an 8800GT. All three tasks failed while running on the 8800GT (device 1), not on the GTX 260. |
K1atOdessaSend message Joined: 25 Feb 08 Posts: 249 Credit: 444,646,963 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
In light of the issues with the older GPU's and the KASHIR_HIVPR WU's, what is the best version of nvidia driver to use? I have been aborting them when I see them, to get them over to a 200 series as quick as possible. I don't think it is beneficial for the project for me to let this sit in my queue for 12 hours, then run for another several before failing anyway. I'd prefer not to babysit, so should I roll back my current 185.66 to the last WHQL approved non-185.xx driver, which is 182.50? I guess I could just try this and report the results, but I wanted to know if anyone has already tried this 182.50 driver w/ an older (non-200-series) card. |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
That same GTX 260 has only 3 recent failed WUs, all of them KASHIF_HIVPR. You're right. Not my machine and I didn't see the 2 cards. But OK here's an example from a machine with only a GTX 260: http://www.gpugrid.net/result.php?resultid=663665 |
K1atOdessaSend message Joined: 25 Feb 08 Posts: 249 Credit: 444,646,963 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
That same GTX 260 has only 3 recent failed WUs, all of them KASHIF_HIVPR. :-) That one reports as "Aborted by user". So I don't think it errored out under normal circumstances -- it's was manually aborted. |
mike047Send message Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
CUDA 2.2 libs will be distributed with the application, but you will need to upgrade the driver to the latest 185 version. Is this query unworthy of an answer? mike |
|
Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
CUDA 2.2 libs will be distributed with the application, but you will need to upgrade the driver to the latest 185 version. Thanks. I'd suggest a note in the news section on the home page. That way people can start organising things. I have already set GPUgrid to "no new work" so I can finish off what I have before doing the driver upgrades. I've got a few machines to do :) BOINC blog |
AardvarkSend message Joined: 27 Nov 08 Posts: 28 Credit: 82,362,324 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Task ID 665546 had been running well along with another task. As I was about to run a program that would "use" the GPU I decided to suspend all tasks and exit Boinc. Once I had completed my task I launched Boinc, all tasks appeared still suspended. So far so good.I then resumed all tasks, and task 665546 immediately went to "compute error". I also had another task 652947 that had been running for 29 out of about 30 hours and failed (different machine). When I get the time I will compile a list of the failures and successes over the past few days. |
|
Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
which card/machine combinations are not possible to use the 185.85 version may i ask mike047 ? |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
That same GTX 260 has only 3 recent failed WUs, all of them KASHIF_HIVPR. The user is one of my team members and he reported it as being stuck. It had processed for over twice as long as his other WUs and showed no progress. He was using BOINC client v6.6.28, not v6.6.20 so that wasn't the problem. :-) |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
That same GTX 260 has only 3 recent failed WUs, all of them KASHIF_HIVPR. Here's a bunch more for your viewing pleasure: http://www.gpugrid.net/result.php?resultid=659111 http://www.gpugrid.net/result.php?resultid=664645 http://www.gpugrid.net/result.php?resultid=666952 http://www.gpugrid.net/result.php?resultid=647270 http://www.gpugrid.net/result.php?resultid=660927 http://www.gpugrid.net/result.php?resultid=666863 Certainly not as common as with the slower cards, but not at all hard to find. The last 2 are test WUs... |
K1atOdessaSend message Joined: 25 Feb 08 Posts: 249 Credit: 444,646,963 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
@Beyond - I didn't doubt you. :-) @GDF/Admin: Given these KASHIF_HIVPR seems to error out a lot, especially with "older, slower" cards, but also with new 200-series occasionally as well (as shown by Beyond), are no new ones going to be created? I can understand cleaning out the queue, but I have gotten several today and with my cards I almost certainly expect them to error out. If I catch them in my queue, I try to abort them so they can move to a 200-series with a better change of finishing in a timely manner. Is there any analysis from the project on why these particular WU's are an issue? I've read comments about the drivers possibly being an issue, but given the 2.2 CUDA software on the server will require these 185.xx drivers I expect to continue having issues with these WU's if they are still in queue. All others work fine. |
datamanSend message Joined: 18 Sep 08 Posts: 36 Credit: 100,352,867 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
As GPUGrid clearly does not want to put in much effort to support 8 and 9 series cards, I'm done here for now. I'd rather shut them down than to waste time and electricity in an endless circle jerk of BOINC versions and drivers. But hey, 3.7 million credits was a good run for me here. There will be a new GPU project out soon. Sad really, as I think some of the science was worth doing here. :) Ciao.
|
mike047Send message Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
which card/machine combinations are not possible to use the 185.85 version may i ask mike047 ? I don't have that information at hand presently. Basically I use Ubuntu 8.04lts. The 260 and 250 cards have no trouble using 180.22 and might be able to use a higher driver without issue. Some of my 8800/9600gso/9800 cards will not accept any driver above 177.82. All mother boards are Gigabyte P35/45. I don't know what the issues are with this project and I am willing "to do" a little work to be able to run this project. BUT, I am unwilling to babysit and periodically change drivers to suit a project that is becoming unwilling to respond to my queries and the queries of others. Unfortunately I have invested in many Nvidia cards that at the present cannot be used else where in Boinc. FAH is the only other place that can use my cards. I have one box working there now and it has run absolutely trouble free with NO intervention on my part. The + to FAH is that my internet is not shut down when it has to upload, the 50+m uploads from here shut my internet down...I know that is not a project fault but it is an issue for me. This is a good project with good science but it has gotten away from communicating with the participants in a timely manners. IMHO the project has slipped badly from where it was several months ago. mike |
JockMacMad TSBTSend message Joined: 26 Jan 09 Posts: 31 Credit: 3,877,912 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I can confirm my BFG GTX-260 192 Shader card is also getting alot of these errors with 185.81. One example |
JockMacMad TSBTSend message Joined: 26 Jan 09 Posts: 31 Credit: 3,877,912 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
|
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
We have tested with drivers 185.xx on a 8800GT. All the WUs fail. With driver 180.xx all WU are fine. So, we can just suggest to downgrade to older drivers (180.xx) seem to work. We have reported the issue to Nvidia. gdf |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The user is one of my team members and he reported it as being stuck. It had processed for over twice as long as his other WUs and showed no progress. He was using BOINC client v6.6.28, not v6.6.20 so that wasn't the problem. :-) Yes, and maybe no ... 6.6.20 stunk in this regard... it really sucked swamp water ... 6.6.23 and later, *I* for one thought, fixed it ... now I am not so sure. What I ***THINK*** happened is that most of the causes have been cleaned up ... but sometimes something bad happens. And THEN, you get a task that runs long. There are still issues with the way that the resource scheduling is done. I am banging my head on the wall about things that *I* think I can clearly demonstrate to be patted on the head and told to go 'way you bother me ... I mean, just last night I had five tasks all started and die in less than a second. At the moment the answer is that this is not possible. My 2,200+ log file of those two seconds notwithstanding ... Anyway, ... I am far less sanguine about how "fixed" we are ... {edit} An example: 12-TONI_HIVPR_mon_ba20-7-100-RND1398_0 and that was run on a 6.6.25 client ... 182.50 drivers I think at the time. 115 ms step size ... |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
We have managed to replicate the problem on one of our machines. This should lead to a solution soon. Be patient. gdf |
©2025 Universitat Pompeu Fabra