Advanced search

Message boards : Number crunching : New SANTIs 100% Fail Rate

Author Message
Matt
Avatar
Send message
Joined: 11 Jan 13
Posts: 216
Credit: 846,538,252
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36804 - Posted: 7 May 2014 | 18:09:51 UTC

I've received several of the new SANTI tasks and they are all immediately failing. Same error on all of them:

ERROR: file mdioload.cpp line 119: Unable to read binvelfile

http://www.gpugrid.net/result.php?resultid=10221012

http://www.gpugrid.net/result.php?resultid=10216382

http://www.gpugrid.net/result.php?resultid=10219244

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2353
Credit: 16,375,531,916
RAC: 5,811,976
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36805 - Posted: 7 May 2014 | 19:01:23 UTC - in response to Message 36804.

I've got the same outcome.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 486
Credit: 11,351,834,716
RAC: 9,275,292
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36809 - Posted: 7 May 2014 | 19:50:26 UTC

Same here.

e1s230_198x_25_32_f75-SANTI_marsalWTbound-0-32-RND4205_1 7410253 7 May 2014 | 18:14:43 UTC 7 May 2014 | 18:19:03 UTC Error while computing 3.00 0.61 --- Long runs (8-12 hours on fastest card) v8.41 (cuda42)
e1s20_941x_21_32_f49-SANTI_marsalWTbound-0-32-RND2984_1 7410196 7 May 2014 | 17:47:50 UTC 7 May 2014 | 17:52:02 UTC Error while computing 3.27 0.63 --- Long runs (8-12 hours on fastest card) v8.41 (cuda42)
e1s249_783x_17_32_f242-SANTI_marsalWTbound-0-32-RND5936_1 7410297 7 May 2014 | 17:40:46 UTC 7 May 2014 | 17:47:50 UTC Error while computing 3.55 0.66 --- Long runs (8-12 hours on fastest card) v8.41 (cuda42)
e1s371_863x_25_32_f120-SANTI_marsalWTbound-0-32-RND2148_0 7410602 7 May 2014 | 18:11:05 UTC 7 May 2014 | 18:14:42 UTC Error while computing 3.00 0.67 --- Long runs (8-12 hours on fastest card) v8.41 (cuda42)
e1s351_863x_29_32_f288-SANTI_marsalWTbound-0-32-RND6470_0 7410548 7 May 2014 | 17:52:02 UTC 7 May 2014 | 18:11:05 UTC Error while computing 3.00 0.63 --- Long runs (8-12 hours on fastest card) v8.41 (cuda42)
e1s350_863x_31_32_f219-SANTI_marsalWTbound-0-32-RND9005_0 7410546 7 May 2014 | 17:41:24 UTC 7 May 2014 | 17:47:50 UTC Error while computing 3.00 0.59 --- Long runs (8-12 hours on fastest card) v8.41 (cuda42)
e1s320_74x_28_32_f81-SANTI_marsalWTbound-0-32-RND1836_0 7410475 7 May 2014 | 17:36:11 UTC 7 May 2014 | 17:40:46 UTC Error while computing 3.16 0.61 --- Long runs (8-12 hours on fastest card) v8.41 (cuda42)
e1s296_902x_12_32_f143-SANTI_marsalWTbound-0-32-RND7466_0 7410412 7 May 2014 | 16:59:58 UTC 7 May 2014 | 17:36:11 UTC Error while computing 3.00 0.59 --- Long runs (8-12 hours on fastest card) v8.41 (cuda42)



Stefan
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 36810 - Posted: 7 May 2014 | 20:04:52 UTC - in response to Message 36809.
Last modified: 7 May 2014 | 20:06:32 UTC

Argh....sorry. It's my fault. We will cancel them.

Matt
Avatar
Send message
Joined: 11 Jan 13
Posts: 216
Credit: 846,538,252
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36811 - Posted: 7 May 2014 | 20:11:58 UTC

No worries. At least we caught it early.

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 36831 - Posted: 14 May 2014 | 9:14:18 UTC
Last modified: 14 May 2014 | 9:14:53 UTC

I've had 3 SANTI_marsalWTbound2 WUs fail on me with "The simulation has become unstable...":

http://www.gpugrid.net/result.php?resultid=10270066
http://www.gpugrid.net/result.php?resultid=10264759
http://www.gpugrid.net/result.php?resultid=10264562

The first two of these completed successfully on other hosts and because I just added a 750 Ti in my box (beside my 650 Ti), I'm wondering if the failures are expected or my setup has started producing errors.

I had to upgrade the NVidia driver to support the 750Ti and in-box temperatures have risen a few degrees with the addition of the second card. The 750 Ti runs noticeably hotter than the 650 Ti at 70+ vs 60+ C.
____________

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36832 - Posted: 14 May 2014 | 13:12:14 UTC - in response to Message 36831.

Did you use Afterburner or Precision to set a fan profile for the 750Ti?
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1626
Credit: 9,295,466,723
RAC: 18,414,230
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36833 - Posted: 14 May 2014 | 13:26:51 UTC - in response to Message 36831.

Can your power supply handle the extra load?

And, since most 750Ti cards don't have an independent power cable: can your motherboard power regulator handle it, too?

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36834 - Posted: 14 May 2014 | 14:38:11 UTC - in response to Message 36831.
Last modified: 14 May 2014 | 15:14:18 UTC

I had to upgrade the NVidia driver to support the 750Ti and in-box temperatures have risen a few degrees with the addition of the second card. The 750 Ti runs noticeably hotter than the 650 Ti at 70+ vs 60+ C.

The 750 Ti actually uses much less power than the 650 Ti, about 56 watts on a SDOERR_BARNA2 for example verses about 90 watts for the 650 Ti. So the increase in temperature is just due to the additional heat load of the second card, and the heatsinks may be correspondingly lighter on the 750 Ti. The difference in the airflow, depending on the card slot location, can make a lot of difference too.

I am running two 750 Ti's on the Longs now, with temps of 63 to 65 C. That is with a relatively slow (1000 rpm) 120 mm side fan blowing on them in a cool room though. They are the Asus cards, which have good heatsinks but not the elaborate heat pipes of the higher-power cards and may get up to the 70's in the summer.

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 36838 - Posted: 15 May 2014 | 10:04:19 UTC

Thanks for the interest and responses my fellow crunchers! :)

skgiven, I run on Linux, so sadly no cool tweaking tools for me :(

Richard, yes my PSU can handle the two cards, it's a ~600W unit. My 750Ti does have a power connector, so my motherboard shouldn't be power stressed by it.

Jim, both my cards (the 750Ti and a 650Ti) are ASUS and in fact appear to be using the exact same heatsink. I assume the 750 is a direct upgrade of the 650 and ASUS decided to just bolt the previous model's heatsink on the new model, but I don't think it was a very good decision...

I just transplanted the guts of my cruncher to a new wonderful Corsair Obsidian 550D case, putting a 1000 RPM fan blowing on the cards and the temps, while being definitely lower than with the old case, are quite different for the two cards, with the 750 being 10C hotter than the 650! I admit I didn't expect the 750 to be that hot, especially with its power consumption being about one half of the 650's! Notably, my 750Ti is an OC model, running at 1150MHz, but I honestly don't know how much of a difference that OC does make.

I currently have the 750 on top of the 650, using the x16 slot, as the second PCIE slot my motherboard has runs at x4. I guess swapping the cards (putting the 750 below) will lower the 750's temperature (and of course raise the 650's) and probably bring the two cards' temperatures closer together, which would be a good thing.
____________

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36840 - Posted: 15 May 2014 | 11:42:20 UTC - in response to Message 36838.
Last modified: 15 May 2014 | 11:43:13 UTC

I currently have the 750 on top of the 650, using the x16 slot, as the second PCIE slot my motherboard has runs at x4. I guess swapping the cards (putting the 750 below) will lower the 750's temperature (and of course raise the 650's) and probably bring the two cards' temperatures closer together, which would be a good thing.

Maybe it is the spacing of the slots. I always get motherboads that leave two slots free between the top and bottom cards, but some only leave one, which restricts the air flow. And the side fan location is not always ideal. They usually don't cool the bottom card as well, but you have the opposite problem. I think swapping them is worth a try. But I don't think you will see errors due to those temperatures or that overclock yet; it is more a question of what happens in the summer.

Stefan
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 36845 - Posted: 16 May 2014 | 9:17:42 UTC - in response to Message 36843.

It would be nice to switch this discussion to another thread because the title is as off-topic as it can get :P

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 36849 - Posted: 16 May 2014 | 10:13:31 UTC - in response to Message 36845.

It would be nice to switch this discussion to another thread because the title is as off-topic as it can get :P

I created a new thread in Graphics Cards (GPUs): http://www.gpugrid.net/forum_thread.php?id=3754
____________

Message boards : Number crunching : New SANTIs 100% Fail Rate

//