Message boards :
Graphics cards (GPUs) :
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme....
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 9 Oct 08 Posts: 50 Credit: 12,676,739 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hello All, I have been struggling with quite a few GPU WU failures over the past weeks and am not sure what they are. There are a number of failure scenarios, but one common one is included below. I run a Q9550 quad core on a EVGA 790i Ultra mobo with 2x GTX 260 Core 216's with some overclocking applied. I have been successful for quite a while with the overclock... running around 650 Mhz and linked with Shader. What also hasn't worked for a while is the EVGA GPU Voltage Tuner..which they broke with the 182 series drivers and up. I have used it to help stabilize the card and GPU WU's in the past while it was working. I am currently running a 185.68 driver. Any thoughts on what I can do or check would be appreciated. The C: drive file mentioned below is NOT on my hard drive so must have been compiled in with the GPU WU or Nvidia driver?? <core_client_version>6.6.23</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> # Using CUDA device 1 # Device 0: "GeForce GTX 260" # Clock rate: 1458000 kilohertz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 27 # Number of cores: 216 # Device 1: "GeForce GTX 260" # Clock rate: 799200 kilohertz # Total amount of global memory: 939261952 bytes # Number of multiprocessors: 27 # Number of cores: 216 MDIO ERROR: cannot open file "restart.coor" ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104: cufftExecC2R (gridcalc3) called boinc_finish </stderr_txt> ]]> Thanks. Neil Crunching for the benefit of humanity and in memory of my dad and other family members. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Do I understand you correctly: you used the EVGA tool to increase you GPU voltage to stabilize your OC? In that case you'll probably loose stability without the voltage bump. The maximum stable clock frequency of chips is approximately proportional to the voltage over small voltage ranges. Another possible factor is temperature: here and I guess also in Canada summer's coming. The higher the temperature the smaller the maximum stable frequency will be. So I suggest to back off you OC by a substantial margin and see if you're stable again. ~50MHz on the core should do the trick. You could also run some stability tests, maybe a 1h loop of 3D Mark 06 and / or FurMark. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 9 Oct 08 Posts: 50 Credit: 12,676,739 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thanks ET. I have backed off to 181.22 driver and started GPU voltage tuner and bumped up the default voltage about 50 mv. I am waiting for GPU WU's to download and I'll track my progress and report back. What I am still interested in is what the heck is the c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu? part of the message. Neil Crunching for the benefit of humanity and in memory of my dad and other family members. |
Crunch3rSend message Joined: 16 Mar 09 Posts: 3 Credit: 207,697,314 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
That's just for debugging purposes. When the app was compiled,some debug info was generated by the compiler to make it easier for the developer to see where exactly in the source code it crashed. It simply says that the crash occurred in "CPME_cufft.cu" and that this file is located in "c:\cygwin\home\speechserver\gpumd2\src\pme\" on the developers machine, NOT yours. Anyway, i'd be interested to hear why they use cygwin/gcc instead of VS to compile the app... |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Anyway, i'd be interested to hear why they use cygwin/gcc instead of VS to compile the app... Cheaper license ... |
|
Send message Joined: 9 Oct 08 Posts: 50 Credit: 12,676,739 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've backed off to a 181.xx driver and EVGA GPU Voltage Tuner works. I've tweaked the voltage up a little and things are looking promising. The last 5 or so work units completed successfully between my 2 GTX 260's.... I'll report again by the weekend. Looks like it was probably a GPU voltage issue. Crunching for the benefit of humanity and in memory of my dad and other family members. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
What's the temperature of your cards? MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I got this error in this wu. ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104: cufftExecC2R (gridcalc3) And this got this error: Cuda error: Kernel [fft_data_swizzle_in] failed in file 'c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu' in line 44 : the launch timed out and was terminated. And a third got this error: ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104: cufftExecC2R (gridcalc3) They were run on a GTX260+. Would have been running 185.81 drivers. Cards aren't OC'ed. No idea about temperatures as they seem to have gotten rid of the fan control option from vtune. Cards seem happy crunching Seti cuda work. Probably stuffed (beta) drivers, so i'll go back to 182.50 drivers. BOINC blog |
|
Send message Joined: 9 Oct 08 Posts: 50 Credit: 12,676,739 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've been very successful since backing off to 181.xx drivers and running EVGA voltage tuner. As MarkJ suggest above, I got these same errors before I downgraded my driver and upped by GPU card voltage slightly (about 50 mv). Now I'm running very well on that box. Mark, you might try an experiment and try the same thing. I'll check card temperatures next time (2xGTX 260 Core 216 Superclocked running at around 665 Mhz), but they typically run in the high 60's to high 70's depending on temperature in the room which can vary quite a bit. I have the fans set on auto using Precision 1.7.1. Crunching for the benefit of humanity and in memory of my dad and other family members. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Neil, with temperatures in the high 70's I wouldn't want to increase GPU voltage. At mid 60's I, for myself, could justify it (but wouldn't do so myself). However, there is no hard limit in this range: it's simply the less the better. And my "threashold temperatures" are purely subjective.. so your mileage can and likely will vary ;) Mark, recently you had really many errors with 0s cpu time, i.e. the WU did not even start. This points to a software problem. Since yesterday you seem to be going fine, did you change anything, e.g. downgrade the driver? MrS Scanning for our furry friends since Jan 2002 |
©2025 Universitat Pompeu Fabra