Message boards :
Number crunching :
Cancelled by Server - Suggestion
Message board moderation
Previous · 1 · 2 · 3
| Author | Message |
|---|---|
Stefan LedwinaSend message Joined: 16 Jul 07 Posts: 464 Credit: 298,573,998 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The task you've linked wasn't cancelled by the server but it had a computation error... <message> pixelicious.at - my little photoblog |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The error happened during scheduled server communications. I know the log file reports it as a computational error, but that is as vague an error message as you’re ever going to find! It only said that on the server too. On my system it made no mention of any error! I think the error occurred because the server called in the data before the job was finished (about 3 hours short and 2 days to spare) and viewed the data as erroneous. I don’t debug so I can’t interpret the data. |
DingoSend message Joined: 1 Nov 07 Posts: 20 Credit: 128,376,317 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So I think that if a user has crunched a WU and it is still within the time I should get credit for it. Look at this wu. I had crunched it for 64,753.00 secs but got nothing as the server canceled it. http://www.gpugrid.net/result.php?resultid=564654 Proud Founder and member of Have a look at my WebCam |
Stefan LedwinaSend message Joined: 16 Jul 07 Posts: 464 Credit: 298,573,998 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This one got not cancelled because of a redundant result but because of something else. You can see minimum quorum =1 and initial replication=1. Maybe it got cancelled because it was part of a bad batch of tasks... So it was better to cancel it server-side than to let it run even longer and let it error out (which would also give you 0 credits)... pixelicious.at - my little photoblog |
HydropowerSend message Joined: 3 Apr 09 Posts: 70 Credit: 6,003,024 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
Are you sure it is not a hardware issue ? I get these "Incorrect function. (0x1) - exit code 1 (0x1)" quite often, but only on GPU3. I have now completely underclocked this unit. Join team Bletchley Park, the innovators. |
|
Send message Joined: 7 Apr 09 Posts: 2 Credit: 1,614,790 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Why have two cards compete against each other? Wouldn't it be possible to do one of the following: 1. Eliminate the competition altogether and have all cards work on separate work units. 2. Employ a SETI@Home resolution where they send out the same work unit to three crunchers and require a minimum of 2 comparable results to reach a 'quorum' and view the work unit as satisfactorily completed and granting credit to all who submitted the verified work unit based on the lowest credit granted. In other words, this would penalize those with faster cards because they get less credit than would otherwise be granted. |
|
Send message Joined: 7 Apr 09 Posts: 2 Credit: 1,614,790 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Two things to consider: First and foremost, I don't care about competition or seeing how many work unit I can complete in a given day or how many work units other users accumulate against me. I care about doing worthwhile research. If my work units are getting canceled because there are others out there beating me to the punch, then I view it as not necessary and I will go donate my GPU to SETI@Home, they are always in need of crunchers. Granted, I prefer doing medical research because that may actually pay off versus listening to a signal. Second, if my work units are canceled as redundant, then wrapping back to the first reason, it is a waste for me to donate my computers time in terms of electricity. It is a cost to me to keep my computer running full time, even while I am not using it. Maybe its not my place to say this, but you should really get your system for sending work units out to people in line. Worry less about competition and more on just getting as many verified work units done. Competition is a waste... |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
[quote] The available projects that use the Nvida GPU now include: SaH, SaH Beta, The Lattice Project, Ramsey, Aqua, and soon we hope MilkyWay... quote] Thanks for the info. I like the look of Aqua, and I have signed up to that project. I dont find the others too apealing - maths for the sake of maths is not my thing! Perhaps I will opt into the Milkyway project when they get their act together. |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thanks for the info. Aqua is looking at quantum computing ... pure math application ... :) Also, though they will be releasing information into the public domain, be aware that the project is being run by a commercial firm which is using the data to perfect their product line. In other words, it is not a university project that is fully in the public domain... Not trying to talk you out of the project, just so you have all the facts ... :) Also, they are having similar problems with tasks crashing, locking up, and running overly long ... one of the reasons I stopped contributing there ... too high a risk and they have not yet implemented trickles though they did start looking into them ... |
|
Send message Joined: 10 Nov 08 Posts: 8 Credit: 876,616,559 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Nvidia-GPU on Linux works only with the GPUGRID-project . I newly tried the D-Wave's Adiabatic QUantum Algorithms - CUDA Enabled for Linux on 64 platform without any success. However they, today, have released the 3.23 version and I look forward to add AQUA to the short list of CUDA enabled project for Linux. Others are still only enabled for Windows/x86. Ramsey-GPU is NOT listed as enabled on the official Ramsey app. list. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
SKGiven, I hope it's clear by now that your WUs are not being canceled while they run but that they error out and you therefore get no credit. If you suspect it happens because of server communication I suggest the following: set a cache of >1 day and restrict network usage to ~1h a day. Let it run for some time and take a look: if your WUs error out when there couldn't have been server communication you know for sure that your assumption is wrong. And you likely won't see an error reported by the BOINC client as the GPU-Grid app detects the error, logs it and gracefully shuts down. Dingo, apparently you're using a very old BOINC client (5.x) which should not even be able to work with GPU-Grid! And it reports GPU-Grid app version 5.03, which must be wrong. Under these circumstances I wouldn't guarantee for anything. Jeff Harrington, sorry, but your suggestion sucks ;) 1. Eliminate the competition altogether and have all cards work on separate work units. That's the usual mode. 2. Employ a SETI@Home resolution where they send out the same work unit to three crunchers and require a minimum of 2 comparable results to reach a 'quorum' and view the work unit as satisfactorily completed and granting credit to all who submitted the verified work unit based on the lowest credit granted. In other words, this would penalize those with faster cards because they get less credit than would otherwise be granted. Here the credits per WU are determined by the amount of work they contain (i.e. flops neccessary to complete them). This is much more precise and fair than any time or benchmark-based system could ever be. Furthermore what you propose as standard solution is the worst case in our current solution. If it works out well we're more efficient than that. From reading your 2nd post I get the feeling you completely miss the point of this thread and what's currently being done. Sorry, it's a long thread already. So let me just quickly state the core points again: if WUs are sent out to more than one GPU and one result is successfully returned, the server cancels the other results upon the next scheduler contact of those hosts if, and only if the WUs had not been started yet. Otherwise the other hosts can finish the WUs regularly and receive credits just as usual. Canceling those WUs avoids wasting cpu time, not the other way around! It would actually be even more efficient to cancel WUs already in progress, but the credit system couldn't handle this, so it's a no-go. And calling it a competition is somewhat misleading.. you may want to read the initial posts again. If it's still not clear I could probably summarize this issue. MrS Scanning for our furry friends since Jan 2002 |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
So, after all these posts, some things we are going to change. 1) We will be using much less target_nresults = 2, so everyone has his own result. 2) We will upload a new application which is compiled for 1.3 cuda compute capability (CC) cards (216,280,etc). This allows us to use some optimization and the code is faster. So, there will be two apps, a 1.1 CC compliant and a 1.3. Length of 1.3 WUs will be at least twice as long, we will have a user preference to select only the 1.3 app if you wish. gdf |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Sounds like a very good idea! Effective enough to (hopefully) make (some) people happy and simple enough so it can be handled. MrS Scanning for our furry friends since Jan 2002 |
©2025 Universitat Pompeu Fabra