unspecified launch failure

Message boards : Graphics cards (GPUs) : unspecified launch failure
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
STE\/E

Send message
Joined: 18 Sep 08
Posts: 368
Credit: 4,174,624,885
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 3821 - Posted: 13 Nov 2008, 10:53:22 UTC

I get the following error every so often on This Box It's a BFG 8800GT OC running at the speed when I bought it ...

Cuda error: Kernel [frc_sum_kernel_dihed] failed in file 'force.cu' in line 252 : unspecified launch failure.

ID: 3821 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile K1atOdessa

Send message
Joined: 25 Feb 08
Posts: 249
Credit: 444,646,963
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3824 - Posted: 13 Nov 2008, 14:35:20 UTC - in response to Message 3821.  

Cuda error: Kernel [frc_sum_kernel_dihed] failed in file 'force.cu' in line 252 : unspecified launch failure.



I've received the same issue on a single task recently and I've never seen it before. I do have both 8800GT's OC'd some, but I haven't changed that in well over a month. I wouldn't think it is related. I've since completed a couple WU's fine, so I just chalked it up to something strange happened at one point in time. If it happens again, I'll have more reason to be concerned.

http://www.gpugrid.net/result.php?resultid=115911
ID: 3824 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile rebirther
Avatar

Send message
Joined: 7 Jul 07
Posts: 53
Credit: 3,048,781
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 3870 - Posted: 17 Nov 2008, 18:32:13 UTC

My first error with this log on 8800GT 1GB:

http://www.ps3grid.net/result.php?resultid=124548

Any solution or info about this error yet?
ID: 3870 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile DoctorNow
Avatar

Send message
Joined: 18 Aug 07
Posts: 83
Credit: 135,208,752
RAC: 4
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3871 - Posted: 17 Nov 2008, 18:37:48 UTC
Last modified: 17 Nov 2008, 18:40:17 UTC

Just found out that my WU which crashed this morning (near before it was finished!) had the same error:

http://www.gpugrid.net/result.php?resultid=122585

My card is a 9600GT.
And it seems it crashed my Windows too! As I came back some hours later I just found out my comp had a reboot to Linux (I have a dual-boot and Linux is standard).
Member of BOINC@Heidelberg and ATA!
ID: 3871 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile DoctorNow
Avatar

Send message
Joined: 18 Aug 07
Posts: 83
Credit: 135,208,752
RAC: 4
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3916 - Posted: 21 Nov 2008, 13:04:25 UTC
Last modified: 21 Nov 2008, 13:05:30 UTC

Another one killed itself with such a message.

What's wrong with them?
It gets really annoying, that costs me almost an entire day of crunching every time...
>:-\
Member of BOINC@Heidelberg and ATA!
ID: 3916 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 3917 - Posted: 21 Nov 2008, 14:07:22 UTC - in response to Message 3916.  

These are the same wus as before. Have you updated the drivers? Which drivers do you have?

gdf
ID: 3917 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile DoctorNow
Avatar

Send message
Joined: 18 Aug 07
Posts: 83
Credit: 135,208,752
RAC: 4
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3918 - Posted: 21 Nov 2008, 14:23:29 UTC
Last modified: 21 Nov 2008, 14:30:01 UTC

It's driver version 177.84, no change since I started crunching here.
I have no clue what could be wrong, crunched two other WUs right before without any problems:
http://www.gpugrid.net/result.php?resultid=126657
http://www.gpugrid.net/result.php?resultid=125288

Edit:
Just found out on the NVidia page that version 180.48 is now recommended for my card.
I will install and try it out, maybe it fixes the problem...
Will take some days to discover that. ;-)
Member of BOINC@Heidelberg and ATA!
ID: 3918 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3924 - Posted: 22 Nov 2008, 12:09:51 UTC
Last modified: 22 Nov 2008, 12:11:30 UTC

2 observations:

1. WUs which give this error generally run fine on other machines

2. All who reported this error in this thread are running (factory) overclocked GPUs.

I think it's worth testing if there's a link between 1. and 2. DrNow, you seem to get the errors most frequently. Could you take the core and shader clock of your card back a bit?

On G92 the core can only be adjusted on 27 MHz steps and the shader in 54 MHz steps. GPU-Z and other tools do not show you the real clock speed, but RivaTuners hardware monitor does. So I suggest you to either check the clocks with RivaTuner or to back off enough to be in a safe range, where you really change clocks. Say 54 MHz for the core and 108 MHz for the shader. Then let it run for some time and if the error happens again we know clock speed was not the cause.

Oh, and it might be a good idea to do a complete restart of your machine before the clock speed experiment. That means switch it off, take the power cord off the power supply for >15 min and switch on again.

BTW: driver 177.84 has been fine before, so I doubt it causes the errors. Could be possible, though, since the application code has changed since the time when most people ran 177.84.

MrS
Scanning for our furry friends since Jan 2002
ID: 3924 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile DoctorNow
Avatar

Send message
Joined: 18 Aug 07
Posts: 83
Credit: 135,208,752
RAC: 4
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3929 - Posted: 22 Nov 2008, 15:03:45 UTC - in response to Message 3924.  
Last modified: 22 Nov 2008, 15:04:52 UTC

2 observations:

1. WUs which give this error generally run fine on other machines

2. All who reported this error in this thread are running (factory) overclocked GPUs.

I think it's worth testing if there's a link between 1. and 2. DrNow, you seem to get the errors most frequently. Could you take the core and shader clock of your card back a bit?

Well, you could be right.
First WU after the driver change did run fine so far. I will crunch two, three other WUs to see if the error appears again.
If yes, I will take the shader rate down a bit.

As you may have readed in one of the other threads, RivaTuner accidentally did take down my shader rate without my knowledge and the WUs took much longer, but all finished without problems...
Member of BOINC@Heidelberg and ATA!
ID: 3929 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile rebirther
Avatar

Send message
Joined: 7 Jul 07
Posts: 53
Credit: 3,048,781
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 3935 - Posted: 23 Nov 2008, 9:30:14 UTC
Last modified: 23 Nov 2008, 9:38:24 UTC

Next one:
http://www.ps3grid.net/result.php?resultid=131339

driver 178.24 WinXP, RivaTuner 2.20

I cannot explain me why?! And after loosing many hours, why this error is not coming on start ^^

Before I run older version of RivaTuner to decrease the speed of the fan, around 57°C on 8800GT, so no problem. My first week all WUs are fine, but after 3-4d 2 times same error, something must be wrong somewhere but where?

Edit:
Checked all results, the error came with 6.3.21, before 6.3.19 all ok.

If anyone run >6.4 or <6.3.21, pls me know if you get the same error or not!
ID: 3935 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [BOINC@Poland]AiDec

Send message
Joined: 2 Sep 08
Posts: 53
Credit: 9,213,937
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwat
Message 3936 - Posted: 23 Nov 2008, 9:58:55 UTC
Last modified: 23 Nov 2008, 10:01:07 UTC

As I have read and heard many times Riva is not best software and can make problems with Nvidia graphic cards. Specially with newest GPU. I`ve get similar problems as long as I`ve used Riva. I would like to suggest you to use nTune which can give you bigger chance for `correct` OC. This software makes my GPUs really stable after hard OC (3x280GTX 600@702MHz).
ID: 3936 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile rebirther
Avatar

Send message
Joined: 7 Jul 07
Posts: 53
Credit: 3,048,781
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 3937 - Posted: 23 Nov 2008, 10:19:03 UTC - in response to Message 3936.  

As I have read and heard many times Riva is not best software and can make problems with Nvidia graphic cards. Specially with newest GPU. I`ve get similar problems as long as I`ve used Riva. I would like to suggest you to use nTune which can give you bigger chance for `correct` OC. This software makes my GPUs really stable after hard OC (3x280GTX 600@702MHz).


I havent OC my card, I will try ntune next time and uninstall RivaTuner to see if that issue is still present, but I have read some with linux got this error too.
ID: 3937 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3941 - Posted: 23 Nov 2008, 12:26:04 UTC

- people got the error before 6.3.21

- Rebirther, your card is factory overclocked (shader at 1.67 GHz instead of 1.50
GHz)

I cannot explain me why?! And after loosing many hours, why this error is not coming on start ^^


It is a temporary error on your machine. That means normally your machine is fine and the WUs are (normally) fine for others. That the error occurs after many hours of crunching tells you that probably something goes wrong during the calculations. It's not a permanent error, it's a "transient" one.

Such errors may be caused by really weird software constellations, bit-flips in the chip due to cosmic rays, hardware design faults which only occur in rare, exceptional situations (e.g. for CPUs several interrupts at the same time etc.) or by a chip which is just borderline to become unstable in the balance between clock frequency, voltage and operating temperature.

Saying "but it was stable for ..." does not really help. It could be that a few transistors are worse than the others (or have degraded more over time) and fail every 10^15 cycles or so, leading to a "mean time between failures" of days.

- And I don't think the mere presence of RivaTuner causes these errors. I mean, it's not even running all the time, is it? Also Rebirthers GPU is *old* enough (G92) to be supported properly.

I`ve get similar problems as long as I`ve used Riva.


Which problems do you mean exactly? The "unspecified launch failure"?

I would like to suggest you to use nTune which can give you bigger chance for `correct` OC


Well, RivaTuner and (I think) Everest are the only tools which can show you the real clock of your NV card, all others only show you the clock which you request from the system. The real clock is adjusted in steps. So if you can clock higher using nTune it may be that you're just below the next step, where it would become unstable. The internal clocks would be the same, but the number shown to you would be higher, hence it seems to be a higher OC.

MrS
Scanning for our furry friends since Jan 2002
ID: 3941 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile rebirther
Avatar

Send message
Joined: 7 Jul 07
Posts: 53
Credit: 3,048,781
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 3947 - Posted: 23 Nov 2008, 15:33:59 UTC - in response to Message 3941.  
Last modified: 23 Nov 2008, 16:17:55 UTC

- people got the error before 6.3.21

- Rebirther, your card is factory overclocked (shader at 1.67 GHz instead of 1.50
GHz)

I cannot explain me why?! And after loosing many hours, why this error is not coming on start ^^


It is a temporary error on your machine. That means normally your machine is fine and the WUs are (normally) fine for others. That the error occurs after many hours of crunching tells you that probably something goes wrong during the calculations. It's not a permanent error, it's a "transient" one.

Such errors may be caused by really weird software constellations, bit-flips in the chip due to cosmic rays, hardware design faults which only occur in rare, exceptional situations (e.g. for CPUs several interrupts at the same time etc.) or by a chip which is just borderline to become unstable in the balance between clock frequency, voltage and operating temperature.

Saying "but it was stable for ..." does not really help. It could be that a few transistors are worse than the others (or have degraded more over time) and fail every 10^15 cycles or so, leading to a "mean time between failures" of days.

- And I don't think the mere presence of RivaTuner causes these errors. I mean, it's not even running all the time, is it? Also Rebirthers GPU is *old* enough (G92) to be supported properly.

I`ve get similar problems as long as I`ve used Riva.


Which problems do you mean exactly? The "unspecified launch failure"?

I would like to suggest you to use nTune which can give you bigger chance for `correct` OC


Well, RivaTuner and (I think) Everest are the only tools which can show you the real clock of your NV card, all others only show you the clock which you request from the system. The real clock is adjusted in steps. So if you can clock higher using nTune it may be that you're just below the next step, where it would become unstable. The internal clocks would be the same, but the number shown to you would be higher, hence it seems to be a higher OC.

MrS


Factory oc, yes, but this is not a problem, you got also this error as many others too on newer cards or old ones, I dont think this is a hardware failure in all models of cards?! I have asked on alpha mailing list about this issue to limit the error, still waiting for an answer, so is it the hardware, boinc client or the project application? Drivers and other programs can be excluded.

The GPU waiting for CPU could be an issue so the WU abort by itself with this error because it can not crunch furthermore from the last point.

Update:
thx to nicolas to pointed out its not the boinc client, app 6.48 with 0% error rate, 6.52 with 20% error rate.

@GDF: can you check the application code to find out whats wrong? Or can you switch back to the old app?
ID: 3947 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3973 - Posted: 23 Nov 2008, 23:25:13 UTC - in response to Message 3947.  

Factory oc, yes, but this is not a problem


How can you be sure? Hardware errors can pop up quite seldomly. These are actually the hardest to detect, because you can never be sure if
(i) your test software can reproduce the error at all and
(ii) you tested long enough.

you got also this error as many others too


Yeah, I also noticed this one yesterday.. and guess what, I'm also running OC'ed.

I dont think this is a hardware failure in all models of cards?!


Not every OC'ed card produces these errors, don't they?

I have asked on alpha mailing list about this issue to limit the error, still waiting for an answer, so is it the hardware, boinc client or the project application? Drivers and other programs can be excluded.


I agree that we can exclude drivers and other programs. However, I'd also suspect that the BOINC client has absolutely nothing to do with this. It just launches the aecmd_.exe and all further CUDA related launches are done by the science app.

The GPU waiting for CPU could be an issue so the WU abort by itself with this error because it can not crunch furthermore from the last point.


Sounds somewhat unprobable. The GPU can not talk to BOINC, so if the CPU app stops working then "noone" would tell BOINC that an error happened. It would likely detect after a short time that the app has quit and restart it. This is the point where some trouble may be caused, when the GPU / driver is a strange state because the CUDA app was not terminated properly. Is this just a guess on your side or do you have anything hinting at such a scenario?

Update:
thx to nicolas to pointed out its not the boinc client, app 6.48 with 0% error rate, 6.52 with 20% error rate.

@GDF: can you check the application code to find out whats wrong? Or can you switch back to the old app?


- Where do you get that 20% error rate from?
- I also had another one of these "unspecified launch failure" errors - with app 6.45.
- Switching back to the old app is probably not feasible, since there were changes in the science code.
- Oh, and who's Nicolas?

MrS
Scanning for our furry friends since Jan 2002
ID: 3973 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile rebirther
Avatar

Send message
Joined: 7 Jul 07
Posts: 53
Credit: 3,048,781
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 3977 - Posted: 23 Nov 2008, 23:42:30 UTC - in response to Message 3973.  



- Where do you get that 20% error rate from?
- I also had another one of these "unspecified launch failure" errors - with app 6.45.
- Switching back to the old app is probably not feasible, since there were changes in the science code.
- Oh, and who's Nicolas?

MrS


- 20% is my error rate estimated from last calculation
- Nicolas Alvarez, also a developer of BOINC/Primegrid/IMP/Renderfarm
- we must sort out what was changed in code and causes this error
- cannot find any scenario yet (removed rivatuner, installed ntune), will see what happens... (2 cores running vmware with ubuntu linux 64bit + ABC, other 2 cores BOINC in windows with GPU + Milkyway, RCN, yoyo evo)
ID: 3977 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3981 - Posted: 23 Nov 2008, 23:52:20 UTC

Yeah, let's get some new hard facts. But by saying

- we must sort out what was changed in code and causes this error


you imply that you already know it's the science apps fault. We can not know that yet. I think it's not the app, because these errors happen with different clients and the WUs run fine on other machines.

.. gotta go to bed for today ;)

MrS
Scanning for our furry friends since Jan 2002
ID: 3981 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile DoctorNow
Avatar

Send message
Joined: 18 Aug 07
Posts: 83
Credit: 135,208,752
RAC: 4
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4022 - Posted: 25 Nov 2008, 6:46:32 UTC - in response to Message 3929.  

Well, you could be right.
First WU after the driver change did run fine so far. I will crunch two, three other WUs to see if the error appears again.
If yes, I will take the shader rate down a bit.

Well, after having finished three WUs without a problem (see here, here and here) now I have the error again with this WU, fortunately very early during the crunching.

After looking on my host-list it seems the error comes in repeatedly times and is not caused by something special.

Okay, I will reduce my shader clock now to see if it breaks the rule then. ;-)
Member of BOINC@Heidelberg and ATA!
ID: 4022 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4024 - Posted: 25 Nov 2008, 9:04:39 UTC

Well, the period of succesful WUs between failures is anything between 2 and 6.. I'd rather call that a guideline ;)

MrS
Scanning for our furry friends since Jan 2002
ID: 4024 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4080 - Posted: 29 Nov 2008, 13:57:46 UTC

I had another one, luckily in the beginning of the WU. I scaled back the OC and will see what I get.

MrS
Scanning for our furry friends since Jan 2002
ID: 4080 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Graphics cards (GPUs) : unspecified launch failure

©2025 Universitat Pompeu Fabra