New Gianni tasks take loooong time... a warning (8-12-16)

Message boards : Graphics cards (GPUs) : New Gianni tasks take loooong time... a warning (8-12-16)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Michael

Send message
Joined: 29 Apr 16
Posts: 5
Credit: 79,699,134
RAC: 0
Level
Thr
Scientific publications
watwatwat
Message 44178 - Posted: 15 Aug 2016, 19:25:38 UTC

Just finished a Gianni that took almost two full days on a 960.

http://www.gpugrid.net/result.php?resultid=15235099

Took a long time, but it worked and didn't fail.
ID: 44178 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 44180 - Posted: 15 Aug 2016, 22:35:03 UTC - in response to Message 44178.  

Just finished a Gianni that took almost two full days on a 960.

http://www.gpugrid.net/result.php?resultid=15235099

Took a long time, but it worked and didn't fail.

It looks like 38 hours. Good job!

When I added that one had errored and there was a high error rate I was not taking into account the error rate being higher when they first release because that is all they have is the fast errors and not the ones that actually can complete yet. Zoltan pointed that out to me above. But as also mentioned, they are fragile, so any power glitch or anything has the potential to cause an error. I have errored out 2 so far, but that isn't even the majority of my errors recently. But when they do error, they cause the system to fail and need a reboot and also affect others running if they are on the same card or system. So being more fragile, I have clocked all the cards on my most problematic system down to zero overclocking above the factory boost and am hoping that helps. I had it that way for 2 days and turned it back up today and went 2 days without error. lol That will slow them down a bit, but they are already going to be over 24 hours, so what is an extra hour to 28-30 anyway?

Either way, I don't think the "error rate" on these is an issue UNLESS you have one. At that point, one is too much. The time is the issue and why I put out the warning.
ID: 44180 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44181 - Posted: 15 Aug 2016, 23:51:19 UTC - in response to Message 44177.  

...15h 6m 47s (54.407s) 980Ti/XP
...14h 58m 29s (53.909s) 980Ti/XP
...22h 18m 2s (80.282s) 980/XP

Maybe all things being equal the card does have more to do with it as well. Though I am noticing that the longest time was on an i7 CPU 870 @ 2.93GHz,
Yes, but this host have a GTX 980, while the others has GTX 980 Ti's
the shortest is on an i7-4930K CPU @ 3.40GHz, and the one similar in length to the short one is on an i3-4160 CPU @ 3.60GHz. Is there a difference in settings, usage of other processes, or whatever else that is different between the i7-4930K and the i3-4160 that would make the 3.6Ghz slightly slower than the 3.4Ghz one both on 980TIs (like pcie speed on the mobo, etc)?
The i7-4930K is running at 4.4GHz, and 5 CPU tasks are running simultaneously, while the on the i3-4160 no CPU tasks are running. But this not a clean comparison, as I've booted the i3-4160 to Windows 10 to update it to version 1607, and this task was running under Windows 10 for a short period. You can see it in the task's stderr output, as there are different driver versions present.

ID: 44181 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 51
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44182 - Posted: 15 Aug 2016, 23:54:19 UTC - in response to Message 44180.  
Last modified: 15 Aug 2016, 23:57:28 UTC

Just finished a Gianni that took almost two full days on a 960.

http://www.gpugrid.net/result.php?resultid=15235099

Took a long time, but it worked and didn't fail.

It looks like 38 hours. Good job!

When I added that one had errored and there was a high error rate I was not taking into account the error rate being higher when they first release because that is all they have is the fast errors and not the ones that actually can complete yet. Zoltan pointed that out to me above. But as also mentioned, they are fragile, so any power glitch or anything has the potential to cause an error. I have errored out 2 so far, but that isn't even the majority of my errors recently. But when they do error, they cause the system to fail and need a reboot and also affect others running if they are on the same card or system. So being more fragile, I have clocked all the cards on my most problematic system down to zero overclocking above the factory boost and am hoping that helps. I had it that way for 2 days and turned it back up today and went 2 days without error. lol That will slow them down a bit, but they are already going to be over 24 hours, so what is an extra hour to 28-30 anyway?

Either way, I don't think the "error rate" on these is an issue UNLESS you have one. At that point, one is too much. The time is the issue and why I put out the warning.



The one thing, I noticed is your CPU time is lot lower the the run time:

Run time 136,638.91
CPU time 20,414.34

Which indicates to me that you are not using the SWAN_SYNC 1, which can reduce your run time.

Click on the link below, the instructions to set this up, are at the bottom of the post:

http://www.gpugrid.net/forum_thread.php?id=4346&nowrap=true#44111
ID: 44182 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44183 - Posted: 16 Aug 2016, 0:46:10 UTC - in response to Message 44177.  

Obviously too late for any bonuses

Even the 35 hour ones award 439,250 credit which is a bonus in relation to the credit that would be awarded to any other WU in current production for that time.

But that is including the 25% bonus. The credit may be ok for fast cards but it's poor for everyone else. On top of that there's more than double the chance of a failure due to power failure/BSD and no completion at all.
ID: 44183 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Logan Carr

Send message
Joined: 12 Aug 15
Posts: 240
Credit: 64,069,811
RAC: 0
Level
Thr
Scientific publications
watwatwatwat
Message 44184 - Posted: 16 Aug 2016, 0:54:56 UTC - in response to Message 44183.  

Does anyone know how long it takes for projects with these kind of problems to be fixed? (the gianni project)

I've been casually lurking and see a lot of people having problems with the gianni project.

I have to say I'm at 20 hours with windows xp, 90% gpu usage, and I still have quite a bit to go on the gianni... (only at 53% complete)

Thanks.

p.s. I'll still post the results once it's done.


Cruncher/Learner in progress.
ID: 44184 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44185 - Posted: 16 Aug 2016, 8:09:44 UTC - in response to Message 44184.  
Last modified: 16 Aug 2016, 8:12:29 UTC

Does anyone know how long it takes for projects with these kind of problems to be fixed? (the gianni project)

I've been casually lurking and see a lot of people having problems with the gianni project.

I have to say I'm at 20 hours with windows xp, 90% gpu usage, and I still have quite a bit to go on the gianni... (only at 53% complete)

Thanks.

p.s. I'll still post the results once it's done.



There is no problem Logan just some complaining about length of time to complete and failures due to excessive over clocking probably.
ID: 44185 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44186 - Posted: 16 Aug 2016, 8:24:56 UTC - in response to Message 44185.  

failures due to excessive over clocking probably.

The failures have nothing at all to do with overclocking. They're due to an app that can't recover from outages such as power failures.
ID: 44186 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44187 - Posted: 16 Aug 2016, 8:32:57 UTC - in response to Message 44186.  
Last modified: 16 Aug 2016, 8:34:39 UTC

failures due to excessive over clocking probably.

The failures have nothing at all to do with overclocking. They're due to an app that can't recover from outages such as power failures.


There is no proof of that. However, even if it was the case, how many power outages do you have?
ID: 44187 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 44189 - Posted: 16 Aug 2016, 8:54:15 UTC

Zoltan, that's why I asked. Slower CPU and GPU would make the it significantly slower on tasks. Also OS changes may affect things too.

Bedrich, I have had issues with swan_sync on every system I have tried it on. I slows all the processes to a point that it makes the system unusable. Most of the systems I access remotely with Teamviewer and I am not sure if remoting in is affected by the setting or if it is a program or setting I have on all the systems, but I have chosen not to use it. While I was using it for that short time a few tasks completed and did not show improvement and in fact were slower across them all. I was not willing to experiment or investigate at that time and just gave up. My memory being what it is, I can only conclude that the problems were worse than the potential benefit for me to not take on the challenge. I like challenges when it comes to PCs usually.

Beyond, I am not sure what you are saying, but a 20% bonus would be on the less than 24 hour ones that award 527,100, not the ones over 24 that award 439,250. And if a Gerard or Adria took the same amount of time you would get around 200,000, so there is more awarded for these longer units.

Logan, I am not sure if this length issue is considered a problem. The error rate may actually be one though. I think it usually takes one of the forum volunteer moderators to contact someone on the inside to get an issue resolved, which is one reason why we have them to help us and the project and let the scientists and students keep their time on the work.
I ask the mods now, if you haven't already, please contact someone about the error issue with these and inquire about shortening the units as well for the sake of our cards and times, or take Beyond's idea of adding a new level of maybe "Very Long Tasks" for new tasks created for the series 10 NVIDIA cards. After I posted the comment about the error rate possibly not being an accurate length of time to tell if they are erroring out more or not I had 4 error out on me across 2 different systems all GIANNI totaling almost 45.75 hours of work before they errored out. Maybe I am noticing it more and brought undue attention to it too early or maybe there is something to it, but would like some feedback as well on if there is a potential issue, if so can it be corrected, and possibly has it already been corrected and we are just erroring out the old broken ones. Thanks.
ID: 44189 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44192 - Posted: 16 Aug 2016, 15:30:27 UTC - in response to Message 44187.  

failures due to excessive over clocking probably.

The failures have nothing at all to do with overclocking. They're due to an app that can't recover from outages such as power failures.

There is no proof of that. However, even if it was the case, how many power outages do you have?

Frequent but usually only for a few seconds. Long enough to wreak havoc with computers. You should be thankful that you live in an area that's more reliable. The proof is that there's about a 50% failure rate when this happens. Zoltan has posted about the problem too. If you won't believe anyone else, maybe you'll believe him. BTW, other than some factory OCs, none of my cards are OCed. In fact some are down-clocked.
ID: 44192 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44193 - Posted: 16 Aug 2016, 15:44:28 UTC - in response to Message 44189.  
Last modified: 16 Aug 2016, 15:45:59 UTC

Beyond, I am not sure what you are saying, but a 20% bonus would be on the less than 24 hour ones that award 527,100, not the ones over 24 that award 439,250. And if a Gerard or Adria took the same amount of time you would get around 200,000, so there is more awarded for these longer units.

As I understand it there's a 50% bonus for completing a WU in under 24 hours (including UL/DL time) and a 25% bonus for under 48 hours. So for instance a 200 credit base rate unit would get 250 credits if completed in 47 hours and 300 credits in 23 hours. Someone please clue me in if I'm mistaken.

I ask the mods now, if you haven't already, please contact someone about the error issue with these and inquire about shortening the units as well for the sake of our cards and times, or take Beyond's idea of adding a new level of maybe "Very Long Tasks" for new tasks created for the series 10 NVIDIA cards. After I posted the comment about the error rate possibly not being an accurate length of time to tell if they are erroring out more or not I had 4 error out on me across 2 different systems all GIANNI totaling almost 45.75 hours of work before they errored out.

Sorry to hear. It's no fun having large amounts of GPU time wasted. Hopefully the admins will improve the next app's fault tolerance, add a separate queue for super long WUs and also find a way to lower WU the error rate. The larger the WUs become, the more important it is to address these issues. Good for the project and good for their volunteers.
ID: 44193 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael

Send message
Joined: 29 Apr 16
Posts: 5
Credit: 79,699,134
RAC: 0
Level
Thr
Scientific publications
watwatwat
Message 44195 - Posted: 16 Aug 2016, 19:07:42 UTC - in response to Message 44182.  

Will that influence other projects I'm running?
I got a 4 core i5, with 3 cores (that is 75%) running CPU WCG tasks and the last one remaining for GPUGRID and POEM@Home (when no GPUGRID are available).
ID: 44195 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Logan Carr

Send message
Joined: 12 Aug 15
Posts: 240
Credit: 64,069,811
RAC: 0
Level
Thr
Scientific publications
watwatwatwat
Message 44196 - Posted: 16 Aug 2016, 19:16:59 UTC - in response to Message 44195.  
Last modified: 16 Aug 2016, 19:17:19 UTC

Alright all, thanks for clearing some things up for me.

Here's my results:

http://www.gpugrid.net/result.php?resultid=15236421


Took about 1 day and 14 hours, but hey, I got a decent amount of credit for how long it took.

Hope the result helps someone

Cheers,

LC
Cruncher/Learner in progress.
ID: 44196 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44197 - Posted: 16 Aug 2016, 19:50:14 UTC - in response to Message 44196.  

Took about 1 day and 14 hours, but hey, I got a decent amount of credit for how long it took

Not so much. Here's your last GERARD_FXCXCL12RX:
Time: 54,279.73 - 53,837.44 - Credits: 267,900.00

Here's the GIANNI_D3C36bCHL:
Time: 137,043.38 - 136,562.60 - Credits: 351,400.00

2.5x the time, 1.3x the credits. Add to that: 2.5x the chance for failure due to many unforeseen factors.
ID: 44197 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 51
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44202 - Posted: 16 Aug 2016, 23:17:01 UTC - in response to Message 44195.  

Will that influence other projects I'm running?
I got a 4 core i5, with 3 cores (that is 75%) running CPU WCG tasks and the last one remaining for GPUGRID and POEM@Home (when no GPUGRID are available).



You would be better off having 2 cores crunching your CPU project, one core supporting your GPU and one core free to run the operating system.



ID: 44202 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44204 - Posted: 17 Aug 2016, 1:25:58 UTC - in response to Message 44192.  

failures due to excessive over clocking probably.

The failures have nothing at all to do with overclocking. They're due to an app that can't recover from outages such as power failures.

There is no proof of that. However, even if it was the case, how many power outages do you have?

Frequent but usually only for a few seconds. Long enough to wreak havoc with computers. You should be thankful that you live in an area that's more reliable. The proof is that there's about a 50% failure rate when this happens. Zoltan has posted about the problem too. If you won't believe anyone else, maybe you'll believe him. BTW, other than some factory OCs, none of my cards are OCed. In fact some are down-clocked.


I am sorry for your power outages, thought that the USA was beyond such things. In this part of the UK we count power outages in YEARS although there was a 2 day one last December due to flooding of a substation which is the longest power outage in my 64 year history, guess we're just lucky.
ID: 44204 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44206 - Posted: 17 Aug 2016, 2:37:17 UTC - in response to Message 44204.  
Last modified: 17 Aug 2016, 2:40:13 UTC

I am sorry for your power outages, thought that the USA was beyond such things. In this part of the UK we count power outages in YEARS although there was a 2 day one last December due to flooding of a substation which is the longest power outage in my 64 year history, guess we're just lucky.

Thanks. Even though some (most likely mentally challenged) claim climate change to be a myth, we've been having crazy storms and frequent torrential downpours (another one just today). Goes great with the neighborhood underground power lines. Animal species previously unknown here have been steadily moving in from the south. Actually the most frequent reason for outages seems to be lightning strikes on the further out above ground lines. It's improved from a couple years ago when there used to be a few seconds outage almost every day at 7am. If you think the USA power grid is suspect, you should get a load of our abysmal internet service (except in big cities and where Google has graced the population). The horrible broadband speeds makes doing GPUGrid even more challenging. Yeah, greedy monopolies are great... :-(

I'm crossing my fingers as my next door neighbor is having a new sewer system installed. Last time that happened a ways down the block the idiot contractors cut though the power and phone lines even though they were marked on the ground with bright neon orange paint. Took 3 days to get it fixed.
ID: 44206 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44211 - Posted: 17 Aug 2016, 11:49:25 UTC - in response to Message 44206.  

Actually the most frequent reason for outages seems to be lightning strikes on the further out above ground lines. It's improved from a couple years ago when there used to be a few seconds outage almost every day at 7am.

I was forced to start using uninterruptible power supplies when I went with ramdisks and large write caches a few years ago. But the UPS also take care of the brief (less than a second) power glitches we get here in the spring and summer due to switching loads around and lightning strikes. Otherwise, the power is very reliable where I am, but that varies a lot in the U.S. And our power company is now implementing a smart grid for automatically routing around downed power lines, to help isolate the problem.

I once had an expert on buried telephone lines tell me that they are just as susceptible to lighting strikes as the overhead lines, since the lighting has no problem finding the best conductor anyplace. However, optical fiber cables have largely solved that problem for the Internet, and it is good where I am, but that varies a lot too. The U.S. is a big country; Europeans don't always realize how different it is from one section to another. (Americans don't always realize it either.)

Global Warming will force a lot of investment in infrastructure upgrades though, assuming the affected areas still want access and power, etc.
ID: 44211 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nanoprobe

Send message
Joined: 26 Feb 12
Posts: 184
Credit: 222,376,233
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 44212 - Posted: 17 Aug 2016, 12:53:13 UTC - in response to Message 44192.  


Frequent but usually only for a few seconds. Long enough to wreak havoc with computers. You should be thankful that you live in an area that's more reliable. The proof is that there's about a 50% failure rate when this happens. Zoltan has posted about the problem too. If you won't believe anyone else, maybe you'll believe him. BTW, other than some factory OCs, none of my cards are OCed. In fact some are down-clocked.

A quality UPS would solve that issue if you could do it. We have momentary glitches and surges where I live also. I bit the bullet and put UPSs on all 8 of my DC machines 1 at a time. Even put 1 on my fridge after a surge took out a $600 control board but that's another story.
ID: 44212 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Graphics cards (GPUs) : New Gianni tasks take loooong time... a warning (8-12-16)

©2025 Universitat Pompeu Fabra