The problem with the size of sent jobs.

Message boards : Server and website : The problem with the size of sent jobs.
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
ElleSolomina
Avatar

Send message
Joined: 22 Mar 14
Posts: 43
Credit: 625,577,901
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 44363 - Posted: 31 Aug 2016, 21:35:56 UTC

Good day!

Please explain how to fix the problem with the stupid project planner, who on two different video cards in two different hosts sends only the big WU, although the settings chosen both. Also, there is a problem with the fact that now the calculation only included from 23 to 7 hours and even faster machine does not have time to perform the task on time. This is how you can set up a time to make it work properly and seeing that the host can not do the job machine would send short WU? :(

P.S. currently configured chose WU only small but the scheduler simply does not send.
ID: 44363 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greger

Send message
Joined: 6 Jan 15
Posts: 76
Credit: 25,499,534,331
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 44364 - Posted: 31 Aug 2016, 22:12:03 UTC - in response to Message 44363.  

Look at server status to what type of workunits they have active.
https://www.gpugrid.net/server_status.php

As it is now there are only long wu, but they have now also put in in very long wu. I believe it was almost half year since i saw short work units so don“t expect to get those now when they released new long.

As other do set to run only short in app config and wait. Until they have wu ready you can put that host to another project until that day.
ID: 44364 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ElleSolomina
Avatar

Send message
Joined: 22 Mar 14
Posts: 43
Credit: 625,577,901
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 44365 - Posted: 31 Aug 2016, 22:24:53 UTC - in response to Message 44364.  

Thank you. Perhaps with my hardware it makes no sense to consider all GPUGRID too much fuss is obtained.
as far as I can determine in here http://www.gpugrid.net/forum_thread.php?id=4301#43339 the subject with 660 and 650 cards right to go to other projects.
ID: 44365 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 44367 - Posted: 1 Sep 2016, 1:32:36 UTC
Last modified: 1 Sep 2016, 2:07:47 UTC

If the GPU can finish the non GIANNI tasks in the 5 day time period, I think it is worth running GPUGRID still and just keeping an eye on it, if you are inclined to keep an eye on it anyway, and make sure it is running a working ADRIA or a GERRARD.

Also consider all the acceleration methods of completing tasks faster. I think if my mobile NVIDIA K2100M is doing the work in just under 5 days with all the mods on like SWAN_SYNC, running no CPU tasks, setting the config files right to optimize performance and priority, and setting the NVIDIA Control Panel settings to optimize the virtual 3d rendering.
1 Corinthians 9:16 "For though I preach the gospel, I have nothing to glory of: for necessity is laid upon me; yea, woe is unto me, if I preach not the gospel!"
Ephesians 6:18-20, please ;-)
http://tbc-pa.org
ID: 44367 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[CSF] Aleksey Belkov

Send message
Joined: 26 Dec 13
Posts: 87
Credit: 1,292,358,731
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44376 - Posted: 1 Sep 2016, 15:15:59 UTC - in response to Message 44365.  

Thank you. Perhaps with my hardware it makes no sense to consider all GPUGRID too much fuss is obtained.
as far as I can determine in here http://www.gpugrid.net/forum_thread.php?id=4301#43339 the subject with 660 and 650 cards right to go to other projects.

Not necessary. Recently my GTX 660 finished 2 long GIANNI tasks in 2.3 days.
More than enough time to return task before deadline.
https://www.gpugrid.net/result.php?resultid=15244862
https://www.gpugrid.net/result.php?resultid=15258440
ID: 44376 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44377 - Posted: 1 Sep 2016, 15:46:42 UTC - in response to Message 44376.  

Thank you. Perhaps with my hardware it makes no sense to consider all GPUGRID too much fuss is obtained.
as far as I can determine in here http://www.gpugrid.net/forum_thread.php?id=4301#43339 the subject with 660 and 650 cards right to go to other projects.

Not necessary. Recently my GTX 660 finished 2 long GIANNI tasks in 2.3 days.
More than enough time to return task before deadline.
https://www.gpugrid.net/result.php?resultid=15244862
https://www.gpugrid.net/result.php?resultid=15258440

Now if these super long WUs were put in a separate queue where the GTX980/TI users could opt in they'd be done much more reliably and efficiently. Then those with normal GPUs could also more efficiently run the normal long WUs without having to try to avoid the crazy super long ones. Very easy to implement.
ID: 44377 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[CSF] Aleksey Belkov

Send message
Joined: 26 Dec 13
Posts: 87
Credit: 1,292,358,731
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44386 - Posted: 1 Sep 2016, 22:01:00 UTC - in response to Message 44377.  

I agree that it would be nice to have additional function, to get only the recommended jobs for your hardware.
ID: 44386 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 44388 - Posted: 2 Sep 2016, 6:31:28 UTC - in response to Message 44377.  
Last modified: 2 Sep 2016, 6:31:56 UTC

Thank you. Perhaps with my hardware it makes no sense to consider all GPUGRID too much fuss is obtained.
as far as I can determine in here http://www.gpugrid.net/forum_thread.php?id=4301#43339 the subject with 660 and 650 cards right to go to other projects.

Not necessary. Recently my GTX 660 finished 2 long GIANNI tasks in 2.3 days.
More than enough time to return task before deadline.
https://www.gpugrid.net/result.php?resultid=15244862
https://www.gpugrid.net/result.php?resultid=15258440

Now if these super long WUs were put in a separate queue where the GTX980/TI users could opt in they'd be done much more reliably and efficiently. Then those with normal GPUs could also more efficiently run the normal long WUs without having to try to avoid the crazy super long ones. Very easy to implement.

Or they could just change the description of "Long Runs" to "8-27 hours on fastest cards" and the "Short Runs" to "3-9 hours on fastest cards". Either way, new classifications, new categories, none of them are in danger of the 5 day timeout except on the oldest cards that according to the "Stats" page are already phased out and yet can still complete a task, so they continue to run them. Remember that 5 days is the threshold for a task, not a bonus period of credit. If you can return a task in 5 days including upload, download, actual running, paused periods, etc, then you don't have any problems, you just have a grudge about the credit. And since the longer running units do offer more credit (though maybe not proportional completely in all timed runs[the longer it is the less credit there is per runtime hour] which is always true anyway) anyway, so a long run or a longer run, it doesn't matter. If you can't finish and return a WU within 5 days of real time, then it is time to consider a different project. Until/unless then, its not a problem, let them run.
ID: 44388 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 44389 - Posted: 2 Sep 2016, 6:46:30 UTC

Also something to consider is that the "Time estimated to complete" may run much faster than realtime at some points when taking averages of different WUs and learning how fast your system can do WUs. Like sometimes it will tell me after doing several GIANNI WUs that the next GERRARD will take 1 day 6 hours, but the time counts at 2-3 seconds per second and completes in 8-12 hours. Then if it does a bunch of the GIANNIs and ADRIAs in a row and gets a GIANNI, it will say it has 15 hours to go and it takes a day and 6 hours moving at 1 second ever 2-3 or real time. The estimated time is not an exact thing, but an estimation based on the other tasks you have been doing and the expected time of the task.

My laptop, for example, does a GIANNI in 4 days and 20 hours. That is just short of the timeout and they do complete and upload in time and get credit. In actuality, the average credit for the laptop has been up by 15,000 on RAC since doing these (very close to) 5 day WUs. A GERRARD will take it 2 days and 10-15 hours. I do have to wait longer to see a credit jump and it does drop in the meantime, but the avg goes up on the stats pages and such. My point is, currently one has run 1 day and 4 hours. It took maybe 5 minutes at most to download. It will take maybe 10 minutes to upload. It says there is an estimated time to complete at 4 days and 13.5 hours. I know it will complete at 4 days 20 hours or so and the estimated time is running 20 second of realtime per 30 seconds of estimated time.It is 26% done and 1 day 4 hours. So in all actuality, this one may finish at about 4 days 16 hours or less! It is a K2100M, so it in no way is as powerful as many of even the older desktop model cards and speed.
ID: 44389 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44400 - Posted: 2 Sep 2016, 14:45:58 UTC - in response to Message 44388.  

Remember that 5 days is the threshold for a task, not a bonus period of credit. If you can return a task in 5 days including upload, download, actual running, paused periods, etc, then you don't have any problems, you just have a grudge about the credit.

This would be true if they fix the app in the next version. Right now there's better than a 50/50 chance that a WU will fail if a power glitch occurs. A 5 day WU is 5 times more likely for this to happen than a 1 day WU. In addition look at the normal failure rate of the super long WUs right now. It's almost 50%. It doesn't make anyone very happy to lose 5 days worth of crunching due to a bad WU or any other reason for that matter.
ID: 44400 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 44426 - Posted: 5 Sep 2016, 14:05:35 UTC - in response to Message 44400.  

The GIANNI and ADRIA tasks aside, my point is valid. As Zoltan points out, when a new task batch is put out, the first few returned are usually failures, not based on the task, but based on the hosts that get them, fail them, and return them as failed at a much faster and higher rate than the normal host that works can return them. So that does indeed make the % listings on the Performance page misleading to a point and moreso if the task batch is newer. And I suspect that the ADRIA and GIANNI failures have been making the other tasks fail by affecting them inside the BOINC software, inside the OS, or because of a reboot/crash of the system. That perhaps if the GIANNI and ARDIA tasks had not been failing as they do, the GERRARD tasks would have a slightly better success ratio right now.

But I do agree completely that a 5 day run WU has 5 times more potential time to fail than a 1 day (or 22 hour one over a 7 hour one depending on your card ability) and that is a problem that is surely compounded by power issues.
ID: 44426 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44429 - Posted: 5 Sep 2016, 18:13:46 UTC - in response to Message 44400.  
Last modified: 5 Sep 2016, 18:15:24 UTC

Right now there's better than a 50/50 chance that a WU will fail if a power glitch occurs. A 5 day WU is 5 times more likely for this to happen than a 1 day WU.

This problem happens more likely on hosts with fast GPUs (GTX 970 and above), so the chance of such failure is *not* in direct ratio with the runtime (as longer runtime means slower GPU, thus the OS has time to write the contents of the files needed for restart to the disk). I think the chance of the error is in inverse ratio (non-linear) with the GPU speed.

In addition look at the normal failure rate of the super long WUs right now. It's almost 50%.

This error rate includes user aborted workunits too, as such workunits are considered as failures. So this error rate is distorted by users (including you) selectively aborting tasks based on their length / error rate / earned bonus. It's quite awkward that you refer to the error rate of these tasks, while you are actively increasing it. However, they should have been put to another queue, but there won't be more than two. Brace yourself, that workunits get longer every time a new GPU generation is released, and we're facing this right now.
ID: 44429 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44434 - Posted: 6 Sep 2016, 1:55:18 UTC - in response to Message 44429.  

Right now there's better than a 50/50 chance that a WU will fail if a power glitch occurs. A 5 day WU is 5 times more likely for this to happen than a 1 day WU.

This problem happens more likely on hosts with fast GPUs (GTX 970 and above), so the chance of such failure is *not* in direct ratio with the runtime (as longer runtime means slower GPU, thus the OS has time to write the contents of the files needed for restart to the disk). I think the chance of the error is in inverse ratio (non-linear) with the GPU speed.

Yet another thunderstorm last night and only the 2 WUs on the 650Ti cards survived. All 15 of the 750Ti WUs either crashed or restarted from zero. I'm finding also that if a WU restarts from zero after a power outage it will almost always either error later or fail validation if it somehow finishes. If it happens to restart where it left off it is generally good to go. Therefore it's best to abort the restarts (from zero type).
Sure hope that the admins do something to make this app more fault tolerant.
ID: 44434 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44437 - Posted: 6 Sep 2016, 8:03:47 UTC - in response to Message 44434.  

Yet another thunderstorm last night and only the 2 WUs on the 650Ti cards survived. All 15 of the 750Ti WUs either crashed or restarted from zero.

Have you tried to turn off write caching for your disks?

Windows key + R ->
Devmgmt.msc <ENTER> ->
Disk drives ->
select your BOINC disk (double click) ->
Policies tab ->
Un-check (both) write caching option(s) ->
OK ->
Close device manager
ID: 44437 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44442 - Posted: 6 Sep 2016, 14:48:15 UTC - in response to Message 44437.  
Last modified: 6 Sep 2016, 14:51:56 UTC

Yet another thunderstorm last night and only the 2 WUs on the 650Ti cards survived. All 15 of the 750Ti WUs either crashed or restarted from zero.

Have you tried to turn off write caching for your disks?

Windows key + R ->
Devmgmt.msc <ENTER> ->
Disk drives ->
select your BOINC disk (double click) ->
Policies tab ->
Un-check (both) write caching option(s) ->
OK ->
Close device manager

Zoltan, THANKS MUCH for this. I've now turned off write caching on the BOINC drive on all the machines. Hopefully that will help. Losing 15 WUs in an instant was irritating. Usually I lose over 50% in an outage, but 15 out of 17 is ridiculous. Last night we had another huge thunderstorm but luckily no power outage. I can hear another storm in the distance moving toward us right now, so this may get a test soon. Hopefully the admins will improve/fix the app when they do the next build so things like this won't be necessary. I don't think that the weather on this planet is going to get less violent at least in the near future. :-(
ID: 44442 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44444 - Posted: 6 Sep 2016, 16:31:56 UTC - in response to Message 44442.  

I've now turned off write caching on the BOINC drive on all the machines. Hopefully that will help.

It should help. You're the best test subject to tell if my theory is right. I'm really curious to know.
ID: 44444 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44446 - Posted: 6 Sep 2016, 17:30:19 UTC - in response to Message 44444.  

I've now turned off write caching on the BOINC drive on all the machines. Hopefully that will help.

It should help. You're the best test subject to tell if my theory is right. I'm really curious to know.

The current storm passed by to the north. I will certainly let you know when this happens again. Sometimes we'll go a long time with no outages, sometimes it's frequent. Usually they're only a few seconds. Occasionally longer but of course the damage is the same either way. In fact it's maybe better for the hardware to actually wind all the way down (drives) before restarting. Thanks again for posting this workaround, have my fingers crossed...
ID: 44446 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44448 - Posted: 7 Sep 2016, 2:29:45 UTC - in response to Message 44444.  

I've now turned off write caching on the BOINC drive on all the machines. Hopefully that will help.

It should help. You're the best test subject to tell if my theory is right. I'm really curious to know.


You *could* flip the switch on a power strip, as a very similar test case. Just sayin'. Probably not nice to the PC or disks, but, is exactly the same test case I believe.
ID: 44448 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 44449 - Posted: 7 Sep 2016, 4:49:14 UTC - in response to Message 44442.  

Also trying this on the 2 systems I get my errors on. The one with all the errors with the 3 TI Classies and the one at the bad power location. Let's see if the error rate goes down on these.
ID: 44449 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44451 - Posted: 7 Sep 2016, 5:18:08 UTC - in response to Message 44448.  

I've now turned off write caching on the BOINC drive on all the machines. Hopefully that will help.

It should help. You're the best test subject to tell if my theory is right. I'm really curious to know.

You *could* flip the switch on a power strip, as a very similar test case. Just sayin'. Probably not nice to the PC or disks, but, is exactly the same test case I believe.

Great idea, and wonderful of you to volunteer! Even better, I'd suggest cycling the main house breaker on and off while the family watches a movie or plays online games!
ID: 44451 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Server and website : The problem with the size of sent jobs.

©2026 Universitat Pompeu Fabra