Author |
Message |
dataman Send message
Joined: 18 Sep 08 Posts: 36 Credit: 100,352,867 RAC: 0 Level
Scientific publications
|
Everything has been running well but had 6 errors today across 3 diffrent cards (9800GT's)
1 of these:
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 84: cufftExecC2C (gridCalc2.2)
]]>
1 of these:
Cuda error: Kernel [shake_step_2] failed in file 'shake.cu' in line 128 : unknown error.
4 of these:
Cuda error: Kernel [PmeRealSpace_compute_forces] failed in file 'PmeRealSpace.cu' in line 172 : unknown error.
What's going on?
____________
|
|
|
palmssSend message
Joined: 28 Aug 08 Posts: 7 Credit: 60,897,550 RAC: 0 Level
Scientific publications
|
I have a "PmeRealSpace" error too, with a 8800GT here http://www.gpugrid.net/result.php?resultid=631932 |
|
|
K1atOdessaSend message
Joined: 25 Feb 08 Posts: 249 Credit: 392,549,512 RAC: 1,531,184 Level
Scientific publications
|
Same here, meRealSpace error, running an 8800GT. "IBUCH_KID" WU's. Do I see a pattern forming, or just a coincidence?
Error WU 634715
|
|
|
|
I have the same error on three WU. GPU is a 8800GT.... |
|
|
dataman Send message
Joined: 18 Sep 08 Posts: 36 Credit: 100,352,867 RAC: 0 Level
Scientific publications
|
Cuda error: Kernel [fft_data_swizzle_in] failed in file 'c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu' in line 44 : unknown error.
More errors ... :(
____________
|
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
I had three go quickly one after the other in a 40 mins period today on a 9800GTX+ errors were similar to the above:
Two were the same:
Cuda error: Kernel [shake_step_1] failed in file 'shake.cu' in line 79
The third was:
Cuda error: Kernel [PmeRealSpace_compute_forces] failed in file 'PmeRealSpace.cu' in line 172 : unknown error.
Had a replacement running for about three hours - no problems so far, see what we shall see in the morning :)
Regards
Zy |
|
|
|
I have a thread about failed jobs as well, one machine lost 5 jobs and I thought it was machine specific but then one of my other machines got the same error, and had some that were valid but listed warnings messages that seem related to the actual errors, but this is after it finished but a real time system would be impossible not to mention useless unless you could sit and monitor your apps 24/7. they have come out with quite a few new software updates and problems can always arise, and not making it manditory to use the new version would not work either. If we post the errors and make the people who actually understand the software aware of errors I have found this site to be about the best for getting help when you do encounter any type of problem. |
|
|
loki126Send message
Joined: 18 Nov 08 Posts: 14 Credit: 30,687,791 RAC: 0 Level
Scientific publications
|
Same here. Its the new 7000 Credit WU´s, IBUCH_KID_shao.
Here the failed tasks: 1 and 2
I guess they dont get along well with OC:
|
|
|
K1atOdessaSend message
Joined: 25 Feb 08 Posts: 249 Credit: 392,549,512 RAC: 1,531,184 Level
Scientific publications
|
I really think there is some issue related to "IBUCH_KID" and "KASHIF_HIVPR" WU's. I have had 4 errors today and those have also errored out for other users.
My Tasks
Error tasks:
KASHIF_HIVPR
IBUCH_KID
IBUCH_KID
IBUCH_KID
<edit>
I've turn back clocks to stock to see if that matters. I've had them OC'd for 8 months, but we'll see if the new WU's are more sensitive.
</edit> |
|
|
mike047Send message
Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level
Scientific publications
|
I really think there is some issue related to "IBUCH_KID" and "KASHIF_HIVPR" WU's. I have had 4 errors today and those have also errored out for other users.
My Tasks
Error tasks:
KASHIF_HIVPR
IBUCH_KID
IBUCH_KID
IBUCH_KID
<edit>
I've turn back clocks to stock to see if that matters. I've had them OC'd for 8 months, but we'll see if the new WU's are more sensitive.
</edit>
I have had error with this series[IBUCH KID] of work units also. My cards run stock. Same cards seem to run the HIV ones OK.
____________
mike |
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
Another one last night
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 84: cufftExecC2C (gridCalc2.2)
There is an issue lurking somewhere with these WUs.
For me it started when the new ones with the Amber facility came out, shortlky after the failures started.
I am trying one more - if that fails, I stop until this is resolved
Zy
|
|
|
|
There can be bad "batches" or tasks within a batch that are just plain bad. The good news such as it is, is that here at GPU Grid the tasks tend to die fairly quickly. I will note that they have just changed and are using some new tool and this may be part of the problem.
I have seen similar issues in other projects where a change in direction can lead to significant issues with tasks failing. Rosetta when they went in the direction of starting up the effort on Mini-Rosetta caused me to leave the project for a long time as far as major support because so many tasks failed. Now they have most of the bugs out and I am back again.
Keep reporting the bad tasks and I am sure they will figure it out ... |
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
Same here. Its the new 7000 Credit WU´s, IBUCH_KID_shao.
Here the failed tasks: 1 and 2
I guess they dont get along well with OC:
I had a similar issue. It went away when I went back to 182.50 drivers. You seem to be running beta drivers.
____________
BOINC blog |
|
|
|
I got a bunch of errors also and was wondering if we add system specs (including driver version) wold it help narrow down were the real issue is?
i7-920 HT, 4 GHz on P6T
Corsair Dominator 1600 2Gx3
EVGA GTX 295 (626/1496/1036) 185.81
Corsair TX750W, WD Caviar Black 1TB
Cool Master HAF 932
Xigmatek Dark Knight-S1283V
BOINC 6.6.20 for WCG + GPUGrid 24/7/365
Steve |
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
Cuda error: Kernel [fft_data_swizzle_in] failed in file 'c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu' in line 44 : unknown error.
More errors ... :(
If you have beta drivers installed (your computers are hidden so I can't look) try the 182.50 drivers.
____________
BOINC blog |
|
|
ignasiSend message
Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level
Scientific publications
|
On the new IBUCH_KID batch errors...
They don't fail completely, but the error rate is apparently higher.
We are stopping them for safety at the moment.
thanks for your patience,
ignasi |
|
|
|
Yes Steve WCG,
Posting the specs (driver ver, boinc ver, gpu, gpu overclock, os), help to narrow down where your issue may be.
But 'un-hiding' your computers so the MODS can look at your output files also helps (they may ask for this sometimes), when you have a problem. That and enabling 'debugging' if you have a pesky problem...
____________
Consciousness: That annoying time between naps......
Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it. |
|
|
|
Specs including versions are in my sig. I will also try to provide more specifics when I post about errors but it sounds like this round is semi-global so I doubt they need any more info at this time. If mods want details of my logs all they need to do is ask and I will "unhide". Interesting way to phrase that ... I prefer to think of it as "Public" or "Private" and in general I like to keep "Private" as much is possible.
____________
Thanks - Steve |
|
|
mike047Send message
Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level
Scientific publications
|
Specs including versions are in my sig. I will also try to provide more specifics when I post about errors but it sounds like this round is semi-global so I doubt they need any more info at this time. If mods want details of my logs all they need to do is ask and I will "unhide". Interesting way to phrase that ... I prefer to think of it as "Public" or "Private" and in general I like to keep "Private" as much is possible.
I'll show mine if you'll show me yours:D
____________
mike |
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
Keep reporting the bad tasks and I am sure they will figure it out ...
Absolutely - am totally behind them in trying to find out whats wrong, it could be at my end, I dont know. Its no good just pumping out errored ones though, there is only so many they need to track an issue. Meanwhile by stopping for a while I can put the hardware through proper testing, just to eliminate that side of the equation.
Having said all that, at present the one I started this morning still running fine, 63% done, which given the others that failed on mine, is illogical on the face of it.
Regards
Zy
|
|
|
Beyond Send message
Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level
Scientific publications
|
My first 2 errors ever AFAIK, the 1st a 76-KASHIF_HIVPR WU and the 2nd one of the infamous 76-IBUCH_KID WUs.
Two different cards, both 9600 GSO. Notice a similarity in the error messages?:
<core_client_version>6.6.24</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 9600 GSO"
# Clock rate: 1674000 kilohertz
# Total amount of global memory: 402325504 bytes
# Number of multiprocessors: 12
# Number of cores: 96
# Amber: readparm : Reading parm file parameters
# PARM file in AMBER 7 format
# Encounter 10-12 H-bond term
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
MDIO ERROR: cannot open file "restart.coor"
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 50: cufftExecC2C (gridcalc2.1)
called boinc_finish
</stderr_txt>
]]>
<core_client_version>6.6.20</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 9600 GSO"
# Clock rate: 1458000 kilohertz
# Total amount of global memory: 804978688 bytes
# Number of multiprocessors: 12
# Number of cores: 96
# Amber: readparm : Reading parm file parameters
# PARM file in AMBER 7 format
# Encounter 10-12 H-bond term
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
MDIO ERROR: cannot open file "restart.coor"
ERROR: c:\cy
</stderr_txt>
]]>
|
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
Got one through ok, then the next went bang after 30 mins.
Successful one was:
http://www.gpugrid.net/result.php?resultid=636960 A GIANNI
The one that failed this time - a KASHIF_HIVPR
http://www.gpugrid.net/result.php?resultid=639025
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104: cufftExecC2R (gridcalc3)
With this one I was at the PC when it went. There was a system warning popup message, didnt get it word for word, only saw a flash as it disappeared , " something something could not be contacted, video driver restarted", dont hang your hat off that word for word, but essentially it looks as though the Video Driver lost connection, and the system auto restarted the video driver, when it did that, instant computation error.
I will ferret in the log files, I have the PC logged to death, hopefully I can dig something up about it.
Two more downloaded, A GIANNI and a KASHIF, I suspended the GIANNI, and will try another KASHIF, see what happens.
Regards
Zy
|
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
The KASHIF lasted 37 mins and went bang. A GIANNI is now running
The failed KASHIF: http://www.gpugrid.net/result.php?resultid=640997
Error was: Cuda error: Kernel [fft_data_swizzle_out] failed in file 'c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu' in line 94 : unknown error.
(Not seen a "swizzle_out" error before)
Started this one - a GIANNI - and on past performance it will probably go through ok:
http://www.gpugrid.net/result.php?resultid=641393
[Edit] Any debuging switch or log file - whatever - that I can enable this end that will help, please let me know and I will. If you want me to run a series of suspect ones (etc) let me know how, I will [/Edit]
Regards
Zy |
|
|
|
I have gotten another error of a 2-KASHIF_HIVPR-WU (result). The error appeared after more than 16 hours of computation on a 8800GT. Now I have three errors in a row. In my opinion is this unacceptable!!!!!! |
|
|
|
boinc 6.6.24 x64
By KoDAkthebest
and some ERRORS (
http://www.gpugrid.net/results.php?hostid=31714
____________
|
|
|
ignasiSend message
Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level
Scientific publications
|
We are digging into these problems.
thanks,
ignasi |
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
Hi Ignasi
I had a look at all my computation error ones this morning now that most have finally gone through. All the KASHIF one's when crunched by a 9800GTX+ or below go bang. If the wingman is a 260 inclusive and above, they go through. I am aware is a crude deduction on my part as I have a very limited overview of the problems, however it does now seem pretty solid that KASHIF's dont through on cards rated 9800GTX+ and below.
If thats starting to be the case, do you still want the cards of 9800GTX+ and below to run the KASHIF's? If you do, fine, I just hate running ones that will go bang as it only delays their crunching by cards that can do it.
If you dont, I can just abort a KASHIF if I spot one coming through.
Regards
Zy |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
I am right to say that all the problems are related to older cards, like 8800,9800 and so on?
Did anyone experience repeated failures on those workunits with a 260,275,295 or 285?
gdf |
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
Additional to my post at 9444 above.
Just remembered, and its only a part of it - its real annoying that I only got a flash of it as it went away - the error message referred to a file "nv???????" it maybe a DLL reference, cant remember. NV is probably no stunning revelation, but there it is for what its worth. Whatever the final full name, the error message claimed it had "stopped", and the system had restarted it. Instatantly I had the WU go bang. All cpu based models for other projects I run, have been unaffected by all this whether during normal running or when the KASHIFs go bang.
I seem to remember another post about a week ago, where there was a suspicion voiced about the memory size possibly being too small for these. ie at present maybe it needs 1GB cards, and goes bang on 512mb cards?
Regards
Zy |
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
Just had another KASHIF go bang, it lasted 57 mins
http://www.gpugrid.net/result.php?resultid=643475
Error message:
Cuda error: Kernel [fft_data_swizzle_out] failed in file 'c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu' in line 94 : unknown error.
swizzle_out is starting to be a common one for me.
Got to go out now and meet a Client, wont be back until around 4pm UTC.
Regards
Zy |
|
|
mike047Send message
Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level
Scientific publications
|
I have had random failures on all my cards[8800gt/9600gso/9800gt/gts250] except the gtx260-192/216.
Some fail in a short period others linger much longer.
____________
mike |
|
|
|
Yup, similar issue here.
Yesterday got a WU that got stuck at 18% on my 8800GT. No error messages though, the Boinc manager thought the process was still running but remained for at least 12 hours at the same progress...
Cancelled the WU manually and started another one 18 hours ago. Usually WU's tend to take little less than 13 hours, and the current one hasn't been reporting yet (nor a new WU got uploaded, I keep my queue very short...). Propbably this evening I will see a similar issue. |
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
Additional to my post at 9444 above.
Just remembered, and its only a part of it - its real annoying that I only got a flash of it as it went away - the error message referred to a file "nv???????" it maybe a DLL reference, cant remember. NV is probably no stunning revelation, but there it is for what its worth. Whatever the final full name, the error message claimed it had "stopped", and the system had restarted it. Instatantly I had the WU go bang. All cpu based models for other projects I run, have been unaffected by all this whether during normal running or when the KASHIFs go bang.
I seem to remember another post about a week ago, where there was a suspicion voiced about the memory size possibly being too small for these. ie at present maybe it needs 1GB cards, and goes bang on 512mb cards?
Regards
Zy
My GTS250's are only 512Mb and they seem to work with KASHIF wu. I did suggest the driver version as a culprit. I was having problems last week on my GTX260's and after uninstalling the driver (a 185 variant) and going back to 182.50 seemed to cure its problems.
____________
BOINC blog |
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
Yup, similar issue here.
Yesterday got a WU that got stuck at 18% on my 8800GT. No error messages though, the Boinc manager thought the process was still running but remained for at least 12 hours at the same progress...
Cancelled the WU manually and started another one 18 hours ago. Usually WU's tend to take little less than 13 hours, and the current one hasn't been reporting yet (nor a new WU got uploaded, I keep my queue very short...). Propbably this evening I will see a similar issue.
Ahh the "never ending wu" bug. What version of BOINC are you running? It seems to have been fixed in 6.6.23 onwards.
____________
BOINC blog |
|
|
dyemanSend message
Joined: 21 Mar 09 Posts: 35 Credit: 591,434,551 RAC: 0 Level
Scientific publications
|
See this thread also. I had hanging WUs using 6.6.17 and installing 6.6.23 didn't help. Installing Nvidia driver 185.85 fixed the hanging problem but haven't had a WU process successfully since (though may not be a driver issue - currently running a GIANNI WU and is at 67% and looking OK) |
|
|
dataman Send message
Joined: 18 Sep 08 Posts: 36 Credit: 100,352,867 RAC: 0 Level
Scientific publications
|
I am right to say that all the problems are related to older cards, like 8800,9800 and so on?
Did anyone experience repeated failures on those workunits with a 260,275,295 or 285?
gdf
I have 7 9800GT's and one 8800GT. All have experienced failures. I'm on 6.6.20 and 185.85. I'm shutting them down until this problem is fixed. Good Luck!
____________
|
|
|
|
I have had random failures on all my cards[8800gt/9600gso/9800gt/gts250] except the gtx260-192/216.
All of this are GPU lower than G200. Maybe this is a clue.
|
|
|
|
I hate to be a wet blanket.
But my 9800GT has five (5) total successful runs on just page one of my task list so it is NOT the card unless related to memory as this card has 1M VRAM ...
I am using driver 182.50, so it may be THAT ... WIn XP Pro, 32-bit is the other variant that may be an issue. BOINC Version 6.5.0 ...
The 6.6x versions did have some scheduler problems from something in the teens at least to 6.6.22 ... 6.6.23 and later seems to have cured that issue. |
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
Above I mentioned a file that was "stopped" and restarted at the same moment the WU went bang. I found the error message for it. I have no idea whether it means anything to the current problem, or what it means in itself ...... however, posted for completeness as it did happen at the exact moment the WU went bang. "nvlddmkm" was what I was struggling to remember on the system error message at the time the WU went bang.
The error message reads:
"The description for Event ID 4101 from source Display cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.
If the event originated on another computer, the display information had to be saved with the event.
The following information was included with the event:
nvlddmkm "
It was located in:
Event Viewer/Custom Views/Administrative Events
Source: display.
At the time it said it was "restarted" presumably referring to nvlddmkm - whatever that is :)
Regards
Zy |
|
|
mike047Send message
Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level
Scientific publications
|
I am right to say that all the problems are related to older cards, like 8800,9800 and so on?
Did anyone experience repeated failures on those workunits with a 260,275,295 or 285?
gdf
I have 7 9800GT's and one 8800GT. All have experienced failures. I'm on 6.6.20 and 185.85. I'm shutting them down until this problem is fixed. Good Luck!
I'll give it one more day, maybe two and I will do likewise.
I am very surprised at the admin/developers this time. Usually there is a little more input/concern shown.
Have I missed a thread from the project that explains what is happening and their concern??
____________
mike |
|
|
uBronan Send message
Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level
Scientific publications
|
I have had my fair share of those also and installed all latest drivers Win7 185.85 which include cuda 2.2 on this machine and boinc 6.6.28.
To my surprise i see now in boinc that my 9600 GT seems only be able todo cuda 1.0 instructions.
So maybe the errors created by these workunits are related to instruction which only can be performed by the newest 2x5 models.
Since non of them seem to have much errors on these units
But somehow i have had less problems with my machine since the latest drivers am installed, it runs kinda rock solid (only BF2 and gameguard games are an issue)
BUT i'll remind you guys everything i run is BETA so problems can occur.
That it runs almost without a problem on my machine is no garantee it will on yours.
I guess if you have a 2X5 card you probably will see a gain in processing speed if some of the cuda 2.2 intructions can or/and are implemented |
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
Some positives for comparison as the KASHIFs are going bang with me, I've left the hardware/software setup alone so there is fair comparison.
GIANNIs seem to run fine. I am 7hrs into a TONI_HIVPR, so touch wood that seems like it will go through, will finish in about 5/6 hours. I have a IBUCH_HIVPR lined up as the next to go.
Regards
Zy |
|
|
naja002 Send message
Joined: 25 Sep 08 Posts: 111 Credit: 10,352,599 RAC: 0 Level
Scientific publications
|
I have aborted all:
KASHIF_HIVPR
and
IBUCH_KID
and will now continue to do so.
I have 5x 8800GS and 1x 8800GT--those WUs do not complete on my rigs and most of them hang. Yesterday I completed ONE WU instead of 9-11. 5K ppd instead of 50Kppd.
Was on 6.6.17, 3 rigs 185.26, 1 rig 182.50
As of last night all rigs are: 6.6.28 and 3x 185.26, 1x 185.85--seems to have helped some.
This is an "across the farm" thing for me now. Problems initially started on the dual gpu rigs, but now it's across the board....
My rigs are not hidden. The Phunam-PC is a new setup--the intial errors are from setup, OCing, etc. I understand those. The new ones are part of this mess.
Hoping it gets sorted out soon....
EDIT: I have kept 1 KASHIF_HIVPR that appears to be running ok on a single Gpu rig. However, 1st sign of trouble and it's history..... |
|
|
|
Likewise here, failures on
KASHIF_HIVPR
and
IBUCH_KID
Two different machines. One with 32 bit Vista, 8800GT (O/C), client 6.6.20 & 185.86 driver. The other with 64 bit Vista, 9800 GX2 (Not O/C), client 6.6.20 & 182.50 driver. I have now updated both drivers to 185.85, which is latest release. |
|
|
|
Ahh the "never ending wu" bug. What version of BOINC are you running? It seems to have been fixed in 6.6.23 onwards.
Indeed, nice description of what happened here. Installed Boinc 6.5.0 and WU picked up nicely where it blocked ...
Although it was KASHIF WU, it apparently was the scheduler to blame .... |
|
|
uBronan Send message
Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level
Scientific publications
|
Well again had a unit error out of 13 hours of work, and looks like the big gun machines run them all fine.
I can't go on like this i lost hundreds of hours of time and money for nothing.
For the time being i am also shutting down the gpugrid till this issue is solved. |
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
I am aware that there is hard work going on re finding the cause/fix. If its possible that someone could timeout for 2 mins to advise us all whether you still want the KASHIFs run by lower based cards, I suspect it would help enourmously as we could then abort to leave them to the big guns knowing its not going to cause issues in the bug-finding, and we carry on with the other WUs.
At present it seems lots are shutting down from doing anything in the absense of any advice, understandably, but the other WUs seem ok.
Just a gentle suggestion ...
Regards
Zy |
|
|
naja002 Send message
Joined: 25 Sep 08 Posts: 111 Credit: 10,352,599 RAC: 0 Level
Scientific publications
|
The last KASHIF_HIVPR did in fact error out.....No more for me. I'm just going to have to check my rigs 1-2x/day and send them back....
I am aware that there is hard work going on re finding the cause/fix. If its possible that someone could timeout for 2 mins to advise us all whether you still want the KASHIFs run by lower based cards, I suspect it would help enourmously as we could then abort to leave them to the big guns knowing its not going to cause issues in the bug-finding, and we carry on with the other WUs.
At present it seems lots are shutting down from doing anything in the absense of any advice, understandably, but the other WUs seem ok.
Just a gentle suggestion ...
Regards
Zy
My guess would be that they are still releasing them because they run on the higher end cards. They can still get the work completed. However, if that's the case, then I think the server needs to be setup to issue specific WU to specific cards. The server gets plenty of info from our rigs---so I don't see why that can't be done.... |
|
|
mike047Send message
Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level
Scientific publications
|
Nothing will likely be done until sometime Monday, I am also at No New Work until problem is resolved.
____________
mike |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
The real problem is that we do not understand why these WUs crash. There are several Kashif_XXX workunits and only a set of them does crash on some machines.
We will stop the crashing WUs as more testing did not really help.
gdf |
|
|
Bymark Send message
Joined: 23 Feb 09 Posts: 30 Credit: 5,897,921 RAC: 0 Level
Scientific publications
|
I have a big problem with my new asus 260:
hostid=35303
I downgraded all drivers, and now waiting to get more task.
"reached daily quota of 4 results" heh ;),
Any suggestion? Seti gpus working fine.......
____________
"Silakka"
Hello from Turku > Åbo. |
|
|
uBronan Send message
Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level
Scientific publications
|
Sadly yes the famous units which we discussing all over the forum |
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
I have a big problem with my new asus 260:
hostid=35303
I downgraded all drivers, and now waiting to get more task.
"reached daily quota of 4 results" heh ;),
Any suggestion? Seti gpus working fine.......
The ones crashing on that machine are not the suspect WUs that they have now stopped issuing, those crashing on that machine usually run fine. He also has a 260 which is outside the problems, its the lower cards that did have issues in the past. Something else lurketh. No idea what personally, over to the Gurus for that.
Regards
Zy |
|
|
SandroSend message
Joined: 19 Aug 08 Posts: 22 Credit: 3,660,304 RAC: 0 Level
Scientific publications
|
I am right to say that all the problems are related to older cards, like 8800,9800 and so on?
Did anyone experience repeated failures on those workunits with a 260,275,295 or 285?
gdf
Yes. My GTX 260 running under 64bit Ubuntu also crashes WUs
<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
process got signal 11
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 938803200 bytes
# Number of multiprocessors: 27
# Number of cores: 216
# Amber: readparm : Reading parm file parameters
# PARM file in AMBER 7 format
# Encounter 10-12 H-bond term
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
MDIO ERROR: cannot open file "restart.coor"
</stderr_txt>
]]>
exit status: 11 (0xb)
<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
process got signal 11
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 938803200 bytes
# Number of multiprocessors: 27
# Number of cores: 216
MDIO ERROR: cannot open file "restart.coor"
</stderr_txt>
]]>
|
|
|
|
Let's gather some of that information:
- all failures reported here affect G92 and G9x-class chips
- G200 usually runs them just fine
- there are some errors with G200 as well, but this could just be the normal error rate
- Pauls G92 runs fine (and hopefully others)
-> it's a bug which is triggered by a special client configuration
- BOINC 6.6.x, 6.5.0 and 6.4.7 are definitely affected -> the version likely doen't matter
- driver 185.8x, 185.6x and 182.50 are reported to be affected, but 182.50 for XP32 works for Paul
-> did anyone try older drivers? E.g. 182.08, which has a very solid track record
- Pauls card has 1 GB of memory, whereas most G92 cards have 512 MB or less
Do we have any other reports of G9x cards, which run these tasks fine? Could anyone check the memory consumption of these WUs with RivaTuner?
EDIT: only certain WUs of the "IBUCH_KID" and "KASHIF_HIVPR" series are affected. Do we know which ones? Are the ones which work for Pauls card by pure coincidence all of the type which works?
For example my 9800GTX+ 512MB on Vista 64, 185.66 and 6.5.0 finished:
- 88-KASHIF_HIVPR_dim_ba2-2-100-RND8763_0
- 7-KASHIF_HIVPR_mon_ba5-6-100-RND3602_1
- 57-KASHIF_HIVPR_mon_ba4-4-100-RND1833_1
and failed
- 79-KASHIF_HIVPR_n1_for_ba1-4-100-RND9984_0
- 175-IBUCH_KID_shao_ba1-1-100-RND4198_2
- 93-IBUCH_KID_shao_ba2-0-100-RND9546_1
MrS
____________
Scanning for our furry friends since Jan 2002
|
|
|
mike047Send message
Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level
Scientific publications
|
I am on 6.4.5 and use either 177.82 or 180.22 on Ubuntu 64.
I have had many failures on all cards Except my 260's[192/216]
____________
mike |
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
Let's gather some of that information:
- all failures reported here affect G92 and G9x-class chips
- G200 usually runs them just fine
- there are some errors with G200 as well, but this could just be the normal error rate
- Pauls G92 runs fine (and hopefully others)
-> it's a bug which is triggered by a special client configuration
- BOINC 6.6.x, 6.5.0 and 6.4.7 are definitely affected -> the version likely doen't matter
- driver 185.8x, 185.6x and 182.50 are reported to be affected, but 182.50 for XP32 works for Paul
-> did anyone try older drivers? E.g. 182.08, which has a very solid track record
- Pauls card has 1 GB of memory, whereas most G92 cards have 512 MB or less
Do we have any other reports of G9x cards, which run these tasks fine? Could anyone check the memory consumption of these WUs with RivaTuner?
EDIT: only certain WUs of the "IBUCH_KID" and "KASHIF_HIVPR" series are affected. Do we know which ones? Are the ones which work for Pauls card by pure coincidence all of the type which works?
For example my 9800GTX+ 512MB on Vista 64, 185.66 and 6.5.0 finished:
- 88-KASHIF_HIVPR_dim_ba2-2-100-RND8763_0
- 7-KASHIF_HIVPR_mon_ba5-6-100-RND3602_1
- 57-KASHIF_HIVPR_mon_ba4-4-100-RND1833_1
and failed
- 79-KASHIF_HIVPR_n1_for_ba1-4-100-RND9984_0
- 175-IBUCH_KID_shao_ba1-1-100-RND4198_2
- 93-IBUCH_KID_shao_ba2-0-100-RND9546_1
MrS
I have 4 machines with GTS250's (512Mb). They are running under XP32 with 182.50 drivers and seem fine.
I have an i7 with dual GTX260's. It is running under XP32 with 182.50 drivers and also seems fine. I had problems a week ago with 185.xx (beta) drivers and uninstalled them before reinstalling 182.50 drivers. Problems seemed to go away after that.
All machines currently running BOINC 6.6.28.
I had one IBUCH_KID wu, which I aborted after seeing post from GDF regarding them being in error. KASHIF_HIVPR seem fine.
____________
BOINC blog |
|
|
|
Oh, so it also affects linux. MAybe it's not much point searching for windows and drivers versions then.
I had one IBUCH_KID wu, which I aborted after seeing post from GDF regarding them being in error. KASHIF_HIVPR seem fine.
Some WUs of both series are affected, but not on G200 based cards (GTX 2xx).
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
Well, I just had a crash on the i7 67-KASHIF_HIVPR_n1_for_ba3-2-100-RND8737, this is a task that died at least twice before.
The thing is, I was playing a game at the time. Low intensity turn based strategy game. But, I cannot say if that had any effect. THe game seemed to die and the graphics driver crashed. That said, the other tasks in progress seemed to stay Ok ...
More interesting is that there were three different errors ...
Of course, the task was run on three different class cards.
And I am running BOINC 6.6.28 on that machine ... still 182.50 drivers though.
|
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
I have been having a closer look at my errors , and a few from others. This bares some checking, but it appears on the face of it that the crashed ones do have a common element "signal 11". The "h-bond" message is a red herring to this. as it refers to the "Amber" processes (is that right ?), no matter the detail, it was cleared up in another thread as a non issue, just a text message re the internal processes in the WU, not its validity as a successful WU.
"Signal 11" does appear vertually every time from the ones I looked at. I am aware signal 11 is an issue way down in the Communication Layer - which in itself rings a bell considering the way current problems effects some cards and not others - some operating systems not others - but I have no idea of where to take that logic further, or even if indeed it has validity, I dont have that level of knowledge. Signal 11 I am aware can appear for many many reasons, and can be difficult to work out what the reason is, but if its the case this time, at least its the start down the right road.
Regards
Zy |
|
|
|
@Zydor: I don't see "signal 11", neither in my nor in your latest results.
@Paul: that's number 3 of these tasks which have failed on a G200 card. But the circumstances were slightly unusual.. not sure if it means anything.
@all: ouch, 2 more errors for me:
- "30-KASHIF_HIVPR_dim_ba3-4-100-RND0655_0" - seems "normal"
- "p2690000-IBUCH_pYIpYVkp01_0705-2-10-RND1281_1" - not normal
The second task registered only 3s cpu time, so it may have happened while the driver was still restarting.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
Bymark Send message
Joined: 23 Feb 09 Posts: 30 Credit: 5,897,921 RAC: 0 Level
Scientific publications
|
I have a big problem with my new asus 260:
hostid=35303
I downgraded all drivers, and now waiting to get more task.
"reached daily quota of 4 results" heh ;),
Any suggestion? Seti gpus working fine.......
The ones crashing on that machine are not the suspect WUs that they have now stopped issuing, those crashing on that machine usually run fine. He also has a 260 which is outside the problems, its the lower cards that did have issues in the past. Something else lurketh. No idea what personally, over to the Gurus for that.
Regards
Zy
Now i have exactly the same drivers boinc etc. as my fine working ati 260. Still waiting for new wu's, seti is working fine, same power 550w all should be identical, maybe a hardware problem but then I don't understand why seti gpus working without failure. Runnig one seti Gpu:
Seti acount for same computer
Hardware monitor
-----------------------------------------------------
AMD Athlon 64 X2 5600+ hardware monitor
Temperature sensor 0 33°C (91°F) [0x149] (Core #0)
Temperature sensor 1 38°C (99°F) [0x15A] (Core #1)
Dump hardware monitor
Hardware monitor
-----------------------------------------------------
GeForce GTX 260 hardware monitor
Temperature sensor 0 71°C (159°F) [0x47] (GPU Core)
____________
"Silakka"
Hello from Turku > Åbo. |
|
|
|
Well, you also got >6 errors a day, but your problem is totally unrelated to what is being discussed int his thread. Might help to ask in a separate thread, if you need further assistence. Do 3D Mark and/or Furmark run on your card? Seti stresses the hardware less than GPU-Grid.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
And for my pbs ? with driver other than 182.5. |
|
|
|
Success on 52-KASHIF_HIVPR_mon_ba3-7-100-RND3244_0. 64 bit Vista, 9800 GX2 (Not O/C), client 6.6.20 & 182.85 driver. |
|
|
|
Aardvark, so far the "KASHIF_HIVPR_mon" have also been fine for my machine. Thanks for the info.. seems like these are indeed not the trouble makers.
Profanateur, if I remember correctly you have a separate thread regarding your problem elsewhere. And since on your machine all WUs error you are facing a different problems than what is discussed here. I think I wrote some suggestions in that other thread.. well, I hope. At least I wanted to write something ;)
What do you mean by pbs?
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
pbs =problems=failure.
sorry but I'm french. |
|
|
|
I'm new here.
Have errors with this:
75-IBUCH_HIVPR_mon_ba8-4-100-RND5234 id: 451357
100-KASHIF_HIVPR_n1_for_ba4-4-100-RND3172 id: 448737
Shuttle XPC
Vista Enterprise 64 bit 2 Gb ram
AMD Opteron 2.4 GHz model 180
Geeforce 9400GT 1 Gb ram newly bought
Boinc 6.6.20
ComputerID: 35365
The Boincwoman
|
|
|
reflaSend message
Joined: 12 Feb 09 Posts: 9 Credit: 385,357 RAC: 0 Level
Scientific publications
|
xp/32 + 9600GT@181.20 + BOINC6.4.5 cannot survive! |
|
|
|
Refla, not sure what you mean. You only have successful WUs and others which are listed as "aborted by user". Sure, they can't survive if you abort them ;)
Boincwoman, your machine has not completed any WUs so far. So i'm not sure if we can attribute your failure of the "IBUCH_HIVPR" to the error discussed here. If your card is passively cooled it may be overheating (check with GPU-Z and report temperatures). Otherwise your setup should be fine.
However, the card is very slow: it has 16 shaders ("stream processors"), whereas at least 50 are officially recommended (FAQ). You'll have problems to meat the GPU-Grid deadlines and you may want to take a look at seti for your GPU.
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
Errors todays :
10/05/2009 10:53:19 GPUGRID Output file p1760000-IBUCH_pYIpYVkp01_0705-4-10-RND5135_0_1 for task p1760000-IBUCH_pYIpYVkp01_0705-4-10-RND5135_0 absent
10/05/2009 16:56:28 GPUGRID Output file p2750000-IBUCH_pYIpYVkp01_0705-4-10-RND5064_1_1 for task p2750000-IBUCH_pYIpYVkp01_0705-4-10-RND5064_1 absent
|
|
|
reflaSend message
Joined: 12 Feb 09 Posts: 9 Credit: 385,357 RAC: 0 Level
Scientific publications
|
ETA:
I aborted them because WUs' progress has not advanced in a long time(at least more than 1 hour). The situation has not changed even I rebooted my computer.
After 2 WUs, I deem if the last number in the task name more than zero, it should be a bad WU.
Details in http://www.gpugrid.net/forum_thread.php?id=1041
My English is not good enough, I hope you can understand what I mean.
:) |
|
|
|
Profanateur,
your problem is not related to what is being discussed here. Very many of your WUs error, this is different from the "KASHIF_HIVPR" and "IBUCH_KID" issue. You actually completed some, so your software should be fine.
However, you are running a very new driver and two overclocked cards, which are very different. All of these or their combination could lead to problems. I suggest you start a new thread (instead of posting a little in different threads), write down your current config (software versions, clocks, GPU temperatures) and then change some parameters, document the changes and see if it helps. By that I mean
- run only 1 of the cards to see if one is broken
- reduce all clocks to standard values
- run other stability tests
- try well-tested drivers like 182.50 or 182.08
- maybe more
If you do that we (or you yourself ;) should be able to get you going.
Regards,
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
refla,
that's strange. You're running 6.4.5, so you shouldn't be affected by the slow-6.6.20-bug. Also most of your canceled WUs may belong to the critical "KASHIF_HIVPR" and "IBUCH_KID" series, but some were also "IBUCH_pYIpYVkp01", which have not been reported to fail massively.
Furthermore your WUs are crunched just fine on G200-based cards, whereas no G9x returned any of them. Sorry, don't know what this means..
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
reflaSend message
Joined: 12 Feb 09 Posts: 9 Credit: 385,357 RAC: 0 Level
Scientific publications
|
ETA,
please tell me how to avoid/recover the case that WU's progress freezes.
You can see not only me who met this case. Before I abandoned them, other GPUGriders have done the same operation. |
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
ETA,
please tell me how to avoid/recover the case that WU's progress freezes.
You can see not only me who met this case. Before I abandoned them, other GPUGriders have done the same operation.
@refla:
I would suggest you switch to BOINC 6.6.23.
Your driver version is not shown, but as ETA has said above I would suggest 182.50 drivers as they seem to be reliable.
____________
BOINC blog |
|
|
palmssSend message
Joined: 28 Aug 08 Posts: 7 Credit: 60,897,550 RAC: 0 Level
Scientific publications
|
Hi
I have another error(Kernel [nb_k] failed in file 'nb.cu' in line 202 : unknown error) on a new type of WU http://www.gpugrid.net/result.php?resultid=645509 |
|
|
MarkJ Volunteer moderator Volunteer tester Send message
Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level
Scientific publications
|
Hi
I have another error(Kernel [nb_k] failed in file 'nb.cu' in line 202 : unknown error) on a new type of WU http://www.gpugrid.net/result.php?resultid=645509
What driver version are you using?
____________
BOINC blog |
|
|
mike047Send message
Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level
Scientific publications
|
Have the "EVIL" work units been disabled or deleted?
I have stopped work on 8[250's and below] of my cards. The two 260s are doing OK.
____________
mike |
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
Yes they stopped issuing the suspect ones on Saturday, its not all KASHIF's that are suspect, there are several types of KASHIF WUs, it was only one particular type of KASHIF WU that was giving grief.
See http://www.gpugrid.net/forum_thread.php?id=1034&nowrap=true#9506
Regards
Zy |
|
|
|
Profanateur,
your problem is not related to what is being discussed here. Very many of your WUs error, this is different from the "KASHIF_HIVPR" and "IBUCH_KID" issue. You actually completed some, so your software should be fine.
However, you are running a very new driver and two overclocked cards, which are very different. All of these or their combination could lead to problems. I suggest you start a new thread (instead of posting a little in different threads), write down your current config (software versions, clocks, GPU temperatures) and then change some parameters, document the changes and see if it helps. By that I mean
- run only 1 of the cards to see if one is broken
- reduce all clocks to standard values
- run other stability tests
- try well-tested drivers like 182.50 or 182.08
- maybe more
If you do that we (or you yourself ;) should be able to get you going.
Regards,
MrS
I have no errors with 182.50.
I said that from beginning. |
|
|
Bymark Send message
Joined: 23 Feb 09 Posts: 30 Credit: 5,897,921 RAC: 0 Level
Scientific publications
|
My solution on the 260 was Boinc 6.4.7 and driver 178.28, now working as a train.
Slow but getting faster, like a first mosquito this summer today in Turku Finland.
Thomas Bymark
____________
"Silakka"
Hello from Turku > Åbo. |
|
|
|
Profanateur wrote:
I have no errors with 182.50.
I said that from beginning.
Actually you said "And for my pbs ? with driver other than 182.5." Which I understand as "I'm not interested in my problems with 182.50, only in the problems with other drivers".
Well, no. Actually when I read that post I thought something like "Isn't that the guy with many errors and the exotic setup? What does he want to say?" Now that I know I understand you.
So if you know 182.50 works, why don't you use it?
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
|
'cause I want last release to have Occlusion ambiant in game. |
|
|
|
Then you'll be glad to hear about this ;)
MrS
____________
Scanning for our furry friends since Jan 2002 |
|
|
AndrewSend message
Joined: 9 Dec 08 Posts: 29 Credit: 18,754,468 RAC: 0 Level
Scientific publications
|
I had 2 fail on my 8800GT, one on 5th May, and one right now. My screen actually went black for a few seconds and I briefly saw windows error reporting in process explorer! Driver version 182.50 I believe. Card was stock clocks at the time (fine).
5th May one was 159-IBUCH_KID_shao_ba1-0-100-RND5509_1:
and the one just now was 53-KASHIF_HIVPR_n1_for_ba1-2-100-RND0722_1:
which had the swizzle error others have described. |
|
|
palmssSend message
Joined: 28 Aug 08 Posts: 7 Credit: 60,897,550 RAC: 0 Level
Scientific publications
|
Hi
I have another error(Kernel [nb_k] failed in file 'nb.cu' in line 202 : unknown error) on a new type of WU http://www.gpugrid.net/result.php?resultid=645509
What driver version are you using?
I have the version 181.22 driver |
|
|
reflaSend message
Joined: 12 Feb 09 Posts: 9 Credit: 385,357 RAC: 0 Level
Scientific publications
|
ETA,
please tell me how to avoid/recover the case that WU's progress freezes.
You can see not only me who met this case. Before I abandoned them, other GPUGriders have done the same operation.
@refla:
I would suggest you switch to BOINC 6.6.23.
Your driver version is not shown, but as ETA has said above I would suggest 182.50 drivers as they seem to be reliable.
MarkJ:
Thanks, I will test it. :) |
|
|
naja002 Send message
Joined: 25 Sep 08 Posts: 111 Credit: 10,352,599 RAC: 0 Level
Scientific publications
|
Well, no. Actually when I read that post I thought something like "Isn't that the guy with many errors and the exotic setup? What does he want to say?"
MrS
That may be me. If so, the many errors are from 2 sources: my fault and not my fault ;) Some of these WUs are a nightmare and I don't accept responsibility for that. However, I have had an issue or 3 on my end...those things I understand and accept responsibility for...;) The i7 upgrade produced a lot of initial errors, because of driver compatibility. I've produced 1 successful WU after another for long periods of time. When I start to develop issues--I try to sort it out and get it straight, but when the issues are really not on my end...there's not much that I can do except ride it out.
But I can say that I've used the 185.26 driver on 3 rigs (initially 4) for a month before all these issues arose. So, the issue is with the WUs being incompatible with the driver v. the Driver being incompatible with the WUs. In other words, any incompatibility change is in the WUs....not the driver. I cannot speak for any other version of 185.xx though...
Also, IBUCH_pYIpYVkp01_0705-4-10 seems to be another WU with issues, but I think that is already known....
HTH |
|
|
ignasiSend message
Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level
Scientific publications
|
Also, IBUCH_pYIpYVkp01_0705-4-10 seems to be another WU with issues, but I think that is already known....
HTH
This runs fine actually. Are you referring to any error in particular?
ignasi |
|
|
Zydor Send message
Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level
Scientific publications
|
Just had a KASHIF go bang. Its appears to be the old hassles on the face of it - just highlighting it for the record due to recent hassles with some KASHIFs.
http://www.gpugrid.net/result.php?resultid=667592
It had been running for 11hrs15 so was different from the others I had go - they were early, this was late in processing, almost finished when it went. "One of those things" I suspect.
The network connection was down at the time, a major BT Network fault that had been extant for nearly 24 hrs, the latter should have had no affect, just mentioned for completeness as it was down when the WU went bang.
Regards
Zy |
|
|
naja002 Send message
Joined: 25 Sep 08 Posts: 111 Credit: 10,352,599 RAC: 0 Level
Scientific publications
|
Also, IBUCH_pYIpYVkp01_0705-4-10 seems to be another WU with issues, but I think that is already known....
HTH
This runs fine actually. Are you referring to any error in particular?
ignasi
p3400000-IBUCH_pYIpYVkp01_0705-4-10-RND9113
p1390000-IBUCH_pYIpYVkp01_0705-3-10-RND2928
p2200000-IBUCH_pYIpYVk52804-9-10-RND5157
I'm not sure what the quadro cards are equivalent to....8 series, 9, 200.....
|
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
Please look at the driver thread.
gdf |
|
|