Redundant results

Message boards : Graphics cards (GPUs) : Redundant results
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Talknuser

Send message
Joined: 7 Apr 09
Posts: 4
Credit: 1,121,005
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 8678 - Posted: 21 Apr 2009, 7:13:20 UTC

How many people are receiving the same workunit and exactly what does the time-limit mean?

I've had two units cancelled now after working on them for like 10 hours: 552067 and 542428. Both units would have finished well within the time limit!

Is the cancellation an error or project policy?

In other words; am I wasting my time here?
ID: 8678 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Zydor

Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 8680 - Posted: 21 Apr 2009, 9:47:54 UTC - in response to Message 8678.  
Last modified: 21 Apr 2009, 9:51:00 UTC

I have had one cancelled in the last twenty, but that one was not running. I also note another post on this two days ago that remains unanswered. WUs should not be cancelled pre-emptively if they are running.

I also would be grateful for a response on this, cancellation of models already running is an abuse of donated free time & resources and should not happen. Its a good idea to cancel redundent WUs that have yet to run. I can understand how they can become redundent, and I applaud the existence of such a facility - its win/win all round for all concerned.

However it is not acceptable to cancel those already running without pro rata credit for effort already expended, or at the very least quietly kill them off on upload. There is a high level of Trust involved in crunching what we allow automatically on machines as a free donated resource and free personal time & effort - to be reliable, safe and of value. Pre-emptive action in the way this appears to be implemented, is abuse of that Trust, and is a matter of important Principle.

Regards
Zy
ID: 8680 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8688 - Posted: 21 Apr 2009, 21:32:41 UTC

Mhh, I was not aware that WUs are also canceled if they already started. This is good for the project, but it's unacceptable for the participant if 0 credits are awarded for x hours of GPU time.

Talknuser, could you provide a link to the WUs in question or (temporarly) unhide your computers?

MrS
Scanning for our furry friends since Jan 2002
ID: 8688 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8689 - Posted: 21 Apr 2009, 21:34:33 UTC

I am *NOT* a member of the project, but, there are all kinds of technical reasons for the project to issue work and then cancel tasks that have not been started.

The standard cancellation tool in the BOINC system will not cancel tasks that you have started to process so that you should get credit.

THe point is, that for what ever the reason is, the project is in fact looking out for all of us by canceling work that is not needed.

BOINC has some automated mechanisms but sometimes they don't have just the perfect control so that we would never see tasks downloaded and not needing running.

In truth, they HAVE canceled streams of tasks before they were issued too ... so ...

If you can't stand it then the alternative is to leave (sadly) because this is the nature of the beast here ... sometimes we get tasks that get "recalled" ... heck, I probably get more of them than just about anyone ... look in my account for computer w2 ...
ID: 8689 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8695 - Posted: 21 Apr 2009, 22:00:02 UTC - in response to Message 8689.  

Paul, the problem here is that he says these canceled WUs had already started and quite some computation time went into them.
For project speed it's still better to cancel them if they become redundant. But it's not fair to the user, so this should not happen, unless credits are payed via trickles or some similar system.

MrS
Scanning for our furry friends since Jan 2002
ID: 8695 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8697 - Posted: 21 Apr 2009, 22:03:30 UTC - in response to Message 8695.  

Paul, the problem here is that he says these canceled WUs had already started and quite some computation time went into them.
For project speed it's still better to cancel them if they become redundant. But it's not fair to the user, so this should not happen, unless credits are payed via trickles or some similar system.

MrS

That shows you how bad I am doing today ... I guess I shoiud quite while Iam behind ...
ID: 8697 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8702 - Posted: 21 Apr 2009, 22:09:36 UTC - in response to Message 8697.  

Well, I also just did a stupid mistake (another thread) and refuse to recognize that it's actually time for me to go into bed since a half hour. See you tomorrow ;)

MrS
Scanning for our furry friends since Jan 2002
ID: 8702 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Talknuser

Send message
Joined: 7 Apr 09
Posts: 4
Credit: 1,121,005
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 8709 - Posted: 22 Apr 2009, 6:18:34 UTC - in response to Message 8702.  

Unhidden - only have 2 machines here so far :)

The Work unit ID shows that no work is done, but that is not the case!

Although I was not there to actually watch the cancellation, this smallish rig was well underway which both results at the time I left it...

Then, suddenly they were cancelled/redundant. Not the end of the world, but certainly a waste of time, and definitely a problem to smaller machines if this is a general issue...
ID: 8709 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Snow Crash

Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8721 - Posted: 22 Apr 2009, 14:20:20 UTC - in response to Message 8709.  

I don't think this is a general issue, I keep pretty close track of my WUs as I am sure many other people do also. I hestitate to say this but ... is it possible you made a mistake when you looked at the tasks? You did have one error and one complete. Can your card actually crunch two Wus at the same time... the task list will say "In Process" as soon as it gets sent to you, it does not mean the are actively being crunched all at the same time. Currently you have two tasks that say "Im Process" but if you check BOINC Manager I think you will see 1 that is processing and another that is waiting to run.

Steve
ID: 8721 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Talknuser

Send message
Joined: 7 Apr 09
Posts: 4
Credit: 1,121,005
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 8734 - Posted: 22 Apr 2009, 18:32:11 UTC - in response to Message 8721.  

@Snow Crash

Like you I like to keep tabs on my units - at least when I start a project. When I'm sure things work I don't care ;)

And there's no chance this could be a mistake. The first one was an error for some reason - probably because I was still setting up the linux box at the time.
The second one got cancelled by the server with 10 hours or more completed. Same thing happened to #3.

Number 4 was actually allowed to finish and upload in time :)

Let's see what happens to #5 and #6 ;)

Anyway, this is really not worth wasting time on as no one from the project seems to bother. I only reported it because cycles were being wasted, and because I was not the first one to have this problem...

Have fun out there :)
ID: 8734 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8745 - Posted: 22 Apr 2009, 21:04:13 UTC - in response to Message 8734.  

I assume the host you're talking about has to be this one. Let's try to dissect what's happening:
(I'm assuming your machine runs 24/7 and that linux BOINC doesn't suspend running GPU-Grid tasks.)


- 1st WU recieved should have started 1st. It supposedly ran for 22.5h, until it was canceled at 9:48 on the 19th.

- 2nd WU sent to you ran supposedly for 10h after the 1st one was canceled. It stopped with an error and at that time lists 9h of CPU time.

Does GPU-Grid occupy an entire core of your linux machine? If it does, the above looks probable. If not, say it's "only" 30 - 50% of one core, the situation looks different: in this case the 2nd WU could not have accumulated that much cpu time within 10h and thus must have been running before the 1st WU was canceled. Which would likely mean that the 1st WU did not yet start when it was canceled.


- the next 2 WUs were sent at the same time, so we can't say which one started first. Let's call them S for success and F for fail.

- S may have run from 8:54 on the 20th until 11:16 on the 22nd. That's 50:22h of wall clock time. It registers a run time of 47:05h. So under perfect conditions (i.e. WU runtime = wall clock time) there was a maximum possible runtime interval of 3:17h for "F".

- F could in principle have run before or after S. However, after S is impossible becuase S finished on the 22nd, whereas F was canceled on the 21st. If we assume F ran before S there's another problem: it would have started on the 20th and was aborted on the 21th. So it would have run for far more than the maximum of 3:17h, which it is allowed due to the minimum runtime of S.

-> if the linux BOINC doesn't suspend running GPU-Grid tasks it is clear that F could not have been started when it was aborted.


Anyway, this is really not worth wasting time on as no one from the project seems to bother. I only reported it because cycles were being wasted, and because I was not the first one to have this problem...


Your report is greatly appreciated! The project can't fix what they don't know. And the project staff is quite busy, so they usually only reply if they have something worthwhile to say. No reply doesn't mean they're not watching :)

MrS
Scanning for our furry friends since Jan 2002
ID: 8745 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Talknuser

Send message
Joined: 7 Apr 09
Posts: 4
Credit: 1,121,005
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 8769 - Posted: 23 Apr 2009, 8:59:54 UTC - in response to Message 8745.  

@ ETA

Thanks for your in debth breakdown, which made me think :)

A couple of comments/observations:

* The unit with the error actually ran first, as #1 got stuck in the download queue.

* I've been monitoring the box closely today, and Boinc, for some reason, seems to allocate only 0.05 CPU to GPUGRID, meaning that this particular box (running 24/7) in practice runs GPUGRID for only 40% of the time as opposed to the expected 100% :(

With the above in mind, unit #2 (the erroneuos one that ran first) actually would not have stopped running until after about 22.5 hours (as opposed to 9 hours), which is AFTER the two units in question were cancelled. Meaning that neither of the cancelled units would have had time to start!

So, provided the above observations/assumptions are correct for the whole period, nothing was in fact wasted - except your time and mine ;)

Sorry to miss that this box was not running full tilt guys :)

Next step is to find out why, but I won't bother you with that :D
ID: 8769 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Zydor

Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 8771 - Posted: 23 Apr 2009, 9:27:17 UTC - in response to Message 8769.  
Last modified: 23 Apr 2009, 9:34:17 UTC

The level of cpu use at 0.05% is a good thing. It indicates a low level of cpu involvement in the gpu application. In gpu crunching the cpu is there to load up the gpu with initial data set (hence the pause when a gpu wu first starts, the data is being loaded by the cpu into the gpu), and also passes "what to do next" instructions to the gpu. The gpu - in crude terms - is inherently stupid compared to the cpu, as it does not have integral instruction sets, its a pure number cruncher, and relies on the cpu to drive it and tell it what to do next.

The lower the cpu number the better, as is it indicates a more efficient gpu app. The latter then frees the cpu to get on with other things, such as more time to crunch a cpu based application - or let you get on with the latest powerpoint presentation, etc etc, with minimal to zero lag/disruption.

Many BOINC gpu projects have much higher cpu assist percentages, the low number is a pat on the back to the gpu app devs, not a figure of concern or worry.

In SETI for example, their CUDA wu (non optimised) runs at 0.15% cpu assist, using an optimised SETI app it will run at 0.04%. The slightly higher figure of 0.05% in GPUGRID is an indicator of the complexity of the model being run compared to SETI's.

Regards
Zy
ID: 8771 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael Goetz
Avatar

Send message
Joined: 2 Mar 09
Posts: 124
Credit: 124,873,744
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 8772 - Posted: 23 Apr 2009, 9:43:03 UTC - in response to Message 8771.  
Last modified: 23 Apr 2009, 9:43:24 UTC

The lower the cpu number the better, as is it indicates a more efficient gpu app.


I never really thought about it much, but I suspect that the amount of CPU used will have a lot to do with the speed of the CPU vs. the speed of the GPU, not just the efficiency of the software.

Put a monster video card in a computer and you'll see a much higher CPU usage than you would with a mediocre video card. The CPU has to work that much harder to keep the GPU running. Same effect with using a slow CPU vs. a fast CPU.

Put an 8000 class video card in an i7 machine and I'm sure you'll see *very* low CPU utilization!
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

ID: 8772 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Zydor

Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 8774 - Posted: 23 Apr 2009, 9:59:22 UTC - in response to Message 8772.  
Last modified: 23 Apr 2009, 10:00:55 UTC

Valid point, and does, to a degree, have just that effect. Inevitably there is a "floor" below which the cpu assist number will not go, it will never be zero as the gpu has no inherent Instruction Set "Intelligence". As gpu applications become more refined, they will perform faster, as the gpu app is tweeked to both perform the maths in a more efficient way, and ask for less help from the cpu.

GPU crunching is still in its infancy, there is a lot of "sledgehammer to crack a nut" going on inside the beast, and there is a huge latent power lurking in there yet to be fully tapped. The MW WU explosion was due to a model written especially for the gpu, not just an "adapted" cpu model. Other factors were clearly involved, double precision/single precision yaddie yadda that lead to short term fanboyism re ATI/NVidia cards. In truth in the long term it will even out in performance/card vendor terms as gpu apps become more refined and specially written for a gpu.

The low cpu involvement is why the lower power cpu based machines can still produce cracking results with a gpu app, the gpu is doing all the work. In those cases there are hardware issues such as can the "older" cpu run the card on its motherboard in terms of data throughput (x16 x8 channel PCI etc etc).

Regards
Zy
ID: 8774 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael Goetz
Avatar

Send message
Joined: 2 Mar 09
Posts: 124
Credit: 124,873,744
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 8776 - Posted: 23 Apr 2009, 11:24:17 UTC - in response to Message 8774.  

... as the gpu has no inherent Instruction Set "Intelligence".


Are you SURE about that? I'm pretty sure the GPUs are actually full blown computer cores. (Probably not x86-ish type CPUs; I think they're custom RISC processors.)

Granted, it's been about six months since I read through the documentation that comes with the CUDA SDK, but my impression was that the multi-processors on the Nvidia cards are complete CPU's in and of themselves.

Yes, there are vector processors (aka "shaders") on the cards. But each group of 8 shaders is attached to one of these multiprocessors which have full instruction sets. A GTX280 or 285, for example, has 30 multiprocessors -- essentially it's a (somewhat slow) 30-core CPU. What makes it so powerful is that the arithmetic unit on each of those cores is a vector processing unit that can do 8 calculations in parallel. Not to mention that there are 30 of those cores, which, in aggregate, have a total of 240 shaders.

It's possible that I'm misremembering what I read, or perhaps I misunderstood it, but my impression was that the CUDA processors could handle arbitrarily complicated programs all by themselves. The only shortcomings would be if you needed more memory than was available on the video card, or if I/O was required. Then you needed some coordination with the CPU.

The tricky part (and this applies to any vector processing system, not just CUDA), is writing the code in such a way that the parallelism is exploited to its fullest. That is a quite complex topic. (For example, if you have a branch instruction (an IF statement in a high level language) on a vector processor, what happens if the 8 different shaders/vector-processors don't yield the same branch result?)

Mike
ID: 8776 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
uBronan
Avatar

Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 8778 - Posted: 23 Apr 2009, 11:57:50 UTC
Last modified: 23 Apr 2009, 12:09:10 UTC

I don't agree with you that GPU's are full blown processors they are made to do some tasks but do them as fast a possible, and are not nearly as complex as a CPU.
Maybe in time we will see this change because they can be made some kind of intelligent but for now basically are raw data monsters.
They calculate some instructions and indeed because they are split up in 240 smaller ones do it lightning fast.
Still a CPU will tell it what todo and feeds it with a packet which it can work on and then go back to other work till it gets a signal from the GPU that it did the job.
So in every way the GPU is just a simple co-processor which can calculate fast.

PS Look at the mythbuster example about GPU the cpu is made to let the robot move in a circle and shoot some paint pellets and then move to the middle to make the eyes and mouth, but the gpu simply shoots many colors at once making it look like it did more.
But you can't compare them at all because the cpu have to make very complex moves and extras to come to a result while the gpu cannon just had to shoot all the pellets at once.
So in itself the GPU did more work yes but with very very simple instructions
ID: 8778 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael Goetz
Avatar

Send message
Joined: 2 Mar 09
Posts: 124
Credit: 124,873,744
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 8781 - Posted: 23 Apr 2009, 12:33:29 UTC - in response to Message 8778.  
Last modified: 23 Apr 2009, 12:54:47 UTC

EDIT: post greatly shortened; I'm not going to argue about this. Read for yourself:

Here's the CUDA SDK documentation: http://www.nvidia.com/object/cuda_develop.html. In particular, you might want to take a look at this document.

Mike
ID: 8781 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>France] Thierry Cornet

Send message
Joined: 19 Apr 09
Posts: 1
Credit: 1,053,798
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 8861 - Posted: 24 Apr 2009, 19:30:06 UTC

569200 404560 23 Apr 2009 7:23:33 UTC 24 Apr 2009 8:24:51 UTC Over Redundant result Cancelled by server 0.00 --- ---
569139 404534 23 Apr 2009 7:24:14 UTC 24 Apr 2009 18:25:03 UTC Over Redundant result Cancelled by server 0.00 --- ---
554332 397382 20 Apr 2009 18:53:34 UTC 21 Apr 2009 19:21:46 UTC Over Client error Aborted by user 0.00 0.00 ---
549590 395293 19 Apr 2009 20:44:48 UTC 21 Apr 2009 18:50:53 UTC Over Redundant result Cancelled by server 0.00 --- ---
549584 395292 19 Apr 2009 20:44:48 UTC 21 Apr 2009 18:49:42 UTC Over Redundant result Cancelled by server 0.00 --- ---
549466 395240 19 Apr 2009 20:45:23 UTC 20 Apr 2009 17:12:28 UTC Over Redundant result Cancelled by server 0.00 --- ---

A very strange Boinc project. Working Working working without any credit
I give computer time not only for the credits (some I'm used to crunch for don't give much per hour) but this project has really the world record !!! 0 credits per hour
At least is it a usefull project ???

I was happy that my GPU could help boinc projects but I'm going to leave this project without any regret if there's no way for me to get at least one credit...
Is it normal that when you finish your WU not the first(but before max time of course) you get not even one credit ??? Please at least one so that I get more than 0 credit after hours of hard work ;-)
I'm new on this project.
Maybe it's a temporary bug. Any help ?



ID: 8861 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8864 - Posted: 24 Apr 2009, 19:51:38 UTC

Note the zero compute time ... you lost nothing.
ID: 8864 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Graphics cards (GPUs) : Redundant results

©2025 Universitat Pompeu Fabra