Message boards :
Number crunching :
The Simulation has become unstable. Terminating to avoid lock-up.
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 5 Dec 12 Posts: 84 Credit: 1,663,883,415 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My information is below. Any suggestions would be welcomed. Name I4R48-SDOERR_BARNA5-43-100-RND4453_1 Workunit 10224954 Created 1 Nov 2014 | 12:11:35 UTC Sent 1 Nov 2014 | 13:54:44 UTC Received 1 Nov 2014 | 19:04:15 UTC Server state Over Outcome Computation error Client state Compute error Exit status -97 (0xffffffffffffff9f) Unknown error number Computer ID 140554 Report deadline 6 Nov 2014 | 13:54:44 UTC Run time 15,370.11 CPU time 991.51 Validate state Invalid Credit 0.00 Application version Long runs (8-12 hours on fastest card) v8.47 (cuda65) Stderr output <core_client_version>7.2.42</core_client_version> <![CDATA[ <message> (unknown error) - exit code -97 (0xffffff9f) </message> <stderr_txt> # GPU [GeForce GTX 970] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 0 : # Name : GeForce GTX 970 # ECC : Disabled # Global mem : 4095MB # Capability : 5.2 # PCI ID : 0000:01:00.0 # Device clock : 1342MHz # Memory clock : 3505MHz # Memory width : 256bit # Driver version : r344_32 : 34448 # GPU 0 : 53C # GPU 0 : 55C # GPU 0 : 56C # GPU 0 : 57C # GPU 0 : 58C # GPU 0 : 59C # GPU 0 : 60C # GPU 0 : 61C # The simulation has become unstable. Terminating to avoid lock-up (1) # Attempting restart (step 770000) # GPU [GeForce GTX 970] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 0 : # Name : GeForce GTX 970 # ECC : Disabled # Global mem : 4095MB # Capability : 5.2 # PCI ID : 0000:01:00.0 # Device clock : 1342MHz # Memory clock : 3505MHz # Memory width : 256bit # Driver version : r344_32 : 34448 # The simulation has become unstable. Terminating to avoid lock-up (1) </stderr_txt> ]]> |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This message shows you that something went horribly wrong. If it happens just once the app can usually recover, but if it happens too often or too quickly, it terminates. Normally your WU's seem fine, so it does not like a fundamental issue. What I do notice, though, is that your GPU clock of 1342 MHz is rather high. Do you have a heavily factory-overclocked card? If so: the manufacturer may have choosen the clock speed too agressively. Or if it's your overclock: you may have set the clock too high. I would let the system run for some more time and watch for errors. If you get more, lower the GPU clock by 26 MHz (one step is 13 MHz, you can't change less than that) and see if it helps. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Normally your WU's seem fine, so it does not like a fundamental issue. What I do notice, though, is that your GPU clock of 1342 MHz is rather high. Do you have a heavily factory-overclocked card? If so: the manufacturer may have choosen the clock speed too agressively. Or if it's your overclock: you may have set the clock too high. What I have noticed after reading many stderr output files is that the mentioned clock is the one the manufacturer has "put in the card". So to say the speed mentioned on the box. For instance one on my 780Ti runs almost always higher then the mentioned value in the stderr file. Greetings from TJ |
|
Send message Joined: 5 Dec 12 Posts: 84 Credit: 1,663,883,415 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
For the record, I've recently activated a pair of GTX 970's. It's EVGA's Super-Superclocked edition. They are running without SLI mode (old motherboard). I haven't overclocked them beyond factory settings. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
By now you seem to be running fine, mostly. I inspected about 5 WUs and only found 1 more error, from which the simulation recovered. What I also notice, though, is that your runtimes are really long for such cards and WUs. How is your CPU & GPU load? MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 5 Dec 12 Posts: 84 Credit: 1,663,883,415 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Good evening Mr. S and thank you for all of your feedback. My CPU Load is kept between 99 and 100%. I'm running an AMD Phenom II X3 720 processor and two GTX 970's. There's a pretty gaping disparity - I'm upgrading parts as I can afford them, and awesome stuff gets onto the market. I'm pretty late to the game in learning about buying CPUs, but from my understanding is that both AMD and Intel are at the end of their lifespans for their current sockets. Meanwhile the nicer solutions, like 6 and 8 core CPUs or a comparable duel CPU socket motherboard, will remain cost prohibitive for quite a while. At least I can vouch for the CPU's stability when running CPU tasks like World Community Grid. This processor has done roughly 9,100 work units with that project, and my error rate there is well under 1%. |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Good evening Mr. S and thank you for all of your feedback. You must also be running CPU WUs. How many? There's a pretty gaping disparity - I'm upgrading parts as I can afford them, and awesome stuff gets onto the market. I'm pretty late to the game in learning about buying CPUs, but from my understanding is that both AMD and Intel are at the end of their lifespans for their current sockets. Meanwhile the nicer solutions, like 6 and 8 core CPUs or a comparable duel CPU socket motherboard, will remain cost prohibitive for quite a while. The fastest (for DC) CPU currently for your socket is the Phenom II X6, available in 95W-125W models. They're available on eBay but usually go for prices that are about what they originally cost (sometimes more). My X6s handle 2 NVidia cards running GPUGRID and 5 CPU WUs from various projects. If you want more wiggle room you can limit it to 4 WUs. There's also the 83xx series which runs 8 cores but which is also quite a bit slower (for DC) per core. A good cooler for any CPU is advised. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I suspect your GPUs would appreciate more CPU support. Limit BOINC CPU usage by "use at most 66% of CPUs" in the advanced settings. That should make your BOINC run one CPU task less. Let's see if this stretches your GPUs wings! For WUs like this one your cards should take about 22 ks instead of 28 ks. This won't help stability, though ;) Edit: I don't think a CPU upgrade would make all that much sense on that old socket. And you're right, buying 6 or 8 core CPUs is prohibitive and will remain so for some time. On the AMD side I don't see anything changing this anytime soon, whereas on the Intel side the current Haswells are nice but expensive, as always. There's probably going to be an upgrade for that socket to 14 nm Broadwell around summer next year, but around the same time the next CPU architecture step should also arrive. I'm sure I'd want the newer one, but we don't have all facts yet. To summarize: if one needs or wants to buy now it's fine, otherwise better stuff will come for those patient enough. MrS Scanning for our furry friends since Jan 2002 |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Edit: I don't think a CPU upgrade would make all that much sense on that old socket. And you're right, buying 6 or 8 core CPUs is prohibitive and will remain so for some time. On the AMD side I don't see anything changing this anytime soon, whereas on the Intel side the current Haswells are nice but expensive, as always. There's probably going to be an upgrade for that socket to 14 nm Broadwell around summer next year, but around the same time the next CPU architecture step should also arrive. I'm sure I'd want the newer one, but we don't have all facts yet. Maybe, maybe not. If he wants to run CPU projects then a simple CPU upgrade for that AM3 socket would give him another 3 cores to play with. As I said the Phenom II X6 CPUs are going for near retail (often more) but there's a reason for that. They're good. Sure Haswell is the latest thing. I've got 15 of them running in my in home shop right now (Folding). For laptops they're GREAT due to low power usage in certain states. For performance desktops: yawn. No real improvement over Ivy Bridge. For DC a lot of it depends on your project. One of my favorite projects is Yoyo and there the Phenom X6 is still king. In other projects Ivy Bridge is the best. We really have to ditch our brand loyalty and paid-performance-site hype and look at the numbers rationally. I've been building/ocing/modding boxes since the 8088/8086 to V20/V30 days. Heard a lot of hype. Been bored by a lot of fans. If I was building a new desktop box right now I'd go with Haswell or Ivy Bridge. Would I ditch an AM3 platform for those: NOT. The desktop CPU landscape of late has been pretty boring. Nothing worth much on either the Intel or AMD fronts. Hopefully that will change, but I wouldn't hold my breath unless you like yourself in blue. |
|
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Edit: I don't think a CPU upgrade would make all that much sense on that old socket. And you're right, buying 6 or 8 core CPUs is prohibitive and will remain so for some time. On the AMD side I don't see anything changing this anytime soon, whereas on the Intel side the current Haswells are nice but expensive, as always. There's probably going to be an upgrade for that socket to 14 nm Broadwell around summer next year, but around the same time the next CPU architecture step should also arrive. I'm sure I'd want the newer one, but we don't have all facts yet. The standard Haswell/Ivy Quad core 85W desktop CPU readily available - the 25W/35W Quad Haswell/Ivy is hard to find and there a premium asking price. Lowering the CPU cooling requirement helps a multi GPU setup. At higher power rate: AMD runs integer near or at Haswell/Ivy speeds. AMD AVX FP is gimped. AVX runs hotter on Intel. (runs at or above TDP with FMA/AVX) Intel Broadwell adds 512bit instruction sets. AVX IPC for Haswell is slightly better than Ivy due to more (2) execution ports. You're right about CPU being stagnant. Haswell been here for year and half. But look at GP-GPU side: Nvidia been with Kelper for 3 years. AMD with GCN for over Two. Maxwell is not true compute arch yet. The Big Maxwell should change this. The one thing: hardware been steady while software opened up in last few years. There are more tools being released for developers. |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Maybe, maybe not. If he wants to run CPU projects then a simple CPU upgrade for that AM3 socket would give him another 3 cores to play with. As I said the Phenom II X6 CPUs are going for near retail (often more) but there's a reason for that. They're good. Sure Haswell is the latest thing. I've got 15 of them running in my in home shop right now (Folding). For laptops they're GREAT due to low power usage in certain states. For performance desktops: yawn. No real improvement over Ivy Bridge. For DC a lot of it depends on your project. One of my favorite projects is Yoyo and there the Phenom X6 is still king. In other projects Ivy Bridge is the best. We really have to ditch our brand loyalty and paid-performance-site hype and look at the numbers rationally. I've been building/ocing/modding boxes since the 8088/8086 to V20/V30 days. Heard a lot of hype. Been bored by a lot of fans. If I was building a new desktop box right now I'd go with Haswell or Ivy Bridge. Would I ditch an AM3 platform for those: NOT. The desktop CPU landscape of late has been pretty boring. Nothing worth much on either the Intel or AMD fronts. Hopefully that will change, but I wouldn't hold my breath unless you like yourself in blue. The Phenom X6 1035T/1045T/1065T and some 1055T CPUs are 95W and to tell the truth they're not that much slower for DC then the T1090 125W. CPU-Z reports my newest Phenom X6 T1045 as 92.6W TDP. The all time highest producing CPU at Yoyo is a Phenom X6 T1035 with mild OC Except for a gaggle of Github 40 core Xeons it's also the highest in RAC and even surpasses a few of those (and all the 32 core Xeons). The Haswells are definitely much more efficient in an idle state but for us DC junkies our machines are never at idle :-) |
|
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
All Project stats has you at #1 in USA for Total Credit! ( No ASIC projects are included in rankings) For YoYo: excluding the Ivy/Sandy Xeons- you're AMD Phenom X6 T-1045 is above: 6c12t (Gulftown) Westmere > 6c12t Ivy-E > Low power Haswell > Standard Haswell > Standard Ivy > Low Power Ivy . Haswell Xeons operate at lower core clock for AVX code along with lower Quad channel DDR4 IMC speeds. Core clocks are higher for Non-AVX. Ivy Xeons three die set-ups Core/DDR3 Quad IMC speeds are higher than Haswell's three different die configurations for 4-18C/8T-36T products. Running full bore Haswell/Ivy both are efficient. An Advanced Motherboard BIOS can offer strong energy management. Lower Voltage saturate OP for any circuitry powered for years at a time will help with longevity. A caveat: Intel integrated GPU drivers can hinder discrete boards power management with Nvidia on Intel MB. Most every SLI laptop has intel iGPU (including Iris Pro) disabled in BIOS that are Intel MB or Gigabyte. (V3 Gigabyte/Sager/Lenovo) With non-OEM BIOS you can enable iGPU at expense of CPU performance and overclocking or eco-tuning Nvidia card(s). For multi GPU set-up on desktop or server: Intel iGPU is off-die with Xeons and X-99 platform with Haswell-E and be disabled on Z97 multi GPU MB. Is OpenCL included for X6? |
|
Send message Joined: 18 Oct 13 Posts: 53 Credit: 406,647,419 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
same here http://www.gpugrid.net/result.php?resultid=13447659 Stderr Ausgabe <core_client_version>7.4.27</core_client_version> <![CDATA[ <message> (unknown error) - exit code -97 (0xffffff9f) </message> <stderr_txt> # GPU [GeForce GTX 760] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 0 : # Name : GeForce GTX 760 # ECC : Disabled # Global mem : 2048MB # Capability : 3.0 # PCI ID : 0000:01:00.0 # Device clock : 1071MHz # Memory clock : 3004MHz # Memory width : 256bit # Driver version : r343_00 : 34475 # GPU 0 : 51C # GPU 0 : 54C # GPU 0 : 56C # GPU 0 : 58C # GPU 0 : 61C # GPU 0 : 62C # GPU 0 : 64C # GPU 0 : 65C # GPU 0 : 67C # GPU 0 : 68C # GPU 0 : 69C # GPU 0 : 70C # GPU 0 : 71C # GPU 0 : 72C # GPU 0 : 73C # GPU 0 : 74C # The simulation has become unstable. Terminating to avoid lock-up (1) # Attempting restart (step 152045000) # GPU [GeForce GTX 760] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 0 : # Name : GeForce GTX 760 # ECC : Disabled # Global mem : 2048MB # Capability : 3.0 # PCI ID : 0000:01:00.0 # Device clock : 1071MHz # Memory clock : 3004MHz # Memory width : 256bit # Driver version : r343_00 : 34475 # The simulation has become unstable. Terminating to avoid lock-up (1) </stderr_txt> ]]> |
©2025 Universitat Pompeu Fabra