Message boards :
Graphics cards (GPUs) :
What do "results" look like, why no independent validation?
Message board moderation
| Author | Message |
|---|---|
JStatesonSend message Joined: 31 Oct 08 Posts: 186 Credit: 3,578,903,157 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have never seen any "results" on this project though it is not as if all other projects return visible results. The only results I have ever seen are those showing info about the hardware: milliseconds per step, elapsed time, type of GPU, etc includeing computation errors. I do not see any way to compare the results I return to GPUGRID with the results returned by other participants. How does one know that the results returned are actually valid? Unlike other projects there appear to be no wingmen who process the same WU and thus perform a sanity check that the results match. The reason I ask this is because it has become apparent on the SETI CUDA forum that once an Nvidia display error occurs that subsequent CUDA work units can be processed incorrectly without any computation error showing up. In addition, the same problem on a wingmans system can seemingly provide confirmation that an invalid result is actually good. Question: If a SETI CUDA work unit leaves the Nvidia board in some corrupted state that renders subsequent SETI CUDA's invalid, how does one know that if a GPUGRID WU's gets processed, that its result is not also messed up? peace |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
That's surely a question worth asking. MrS Scanning for our furry friends since Jan 2002 |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Hi, in molecular simulations there is not an easy (automatic) way to check if results are correct. It is quite likely depending on the specific simulations. So far, we have found only very very few results which had an output truncated for instance. This high good result rate is due to the fact that bad results are discarded as generating errors in following WUs (just 5 errors will abort the WU). So, the fact that output of WU is used as input of another WU (most likely delivered to another host) prevents errors to propagate to the point where we analyze them. Hope it helps. gdf |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi, This is one of those areas that sadly most projects neglect. That of explaining to the participants what the project is doing and how it is doing it. In the dark ages of history I used to try to capture nuggets like these and then to flesh them out so that the participant base could understand better what the project is doing. My gut feeling is that one of the reasons we have so much difficult attracting new and less committed participants is that almost no information about what the projects are doing actually makes it out in any organized fashion. That was why I had pushed so hard for a BOINC wide wiki so that we could develop the explanations of what the project was doing and how the experiments worked. Sadly the only project that took this task seriously (or was it just one guy on one project?) was CPDN where the mechanics of each experiment and model were explained in non-technical ways so that you could understand what was the point of the work we are doing ... |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I think the main question is: if a fault does not lead to an obvious computation error but rather to a slightly wrong number here and there.. can this be detected without a wingman? Depending on how chaotic the system is this could lead to big errors in final results.. or could easily be corrected by following WUs. MrS Scanning for our furry friends since Jan 2002 |
JStatesonSend message Joined: 31 Oct 08 Posts: 186 Credit: 3,578,903,157 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I think the main question is: if a fault does not lead to an obvious computation error but rather to a slightly wrong number here and there.. can this be detected without a wingman? Depending on how chaotic the system is this could lead to big errors in final results.. or could easily be corrected by following WUs. Thanks for the observation ETA. It is nice to know that not everyone smokes the same stuff here. |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I think the main question is: if a fault does not lead to an obvious computation error but rather to a slightly wrong number here and there.. can this be detected without a wingman? Depending on how chaotic the system is this could lead to big errors in final results.. or could easily be corrected by following WUs. Or the system could depend on the chaos in the result stream to "properly" allow the system to diverge along the potential paths and only the statistical aggregation of all of the models is of interest. If I recall correctly this is something of the nature of what CPDN is doing ... though they are not using the output of one model to feed the next ... The only other project that I can think of that is using the output of models to feed forward is Milky Way ... |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Either it recovers as the system will move towards the right sampling or it will fail. This is for not systematic errors. A card which produce continuous memory errors will simply fail the workunits. gdf |
©2025 Universitat Pompeu Fabra