Message boards :
News :
WU: NOELIA_INS1P
Message board moderation
Previous · 1 · 2 · 3 · 4
Author | Message |
---|---|
![]() Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Windows can't catch these calculation errors because, frankly, it doesn't see them. The GPU-Grid app sends some commands to the GPU, the GPU processes something and returns results to the app. Unless the GPU behaves in any different way (doesn't respond any more etc.), there's no way for the OS to tell if the data returned is correct or garbage. Specifically not even GPU-Grid can now this, unless they already know the result.. but they can check their results for sanity and, luckily for us, errors may often have no effect (on the long-term simulation result) or catastrophic effects. I suppose molecular dynamics is comparably tolerant to single calculation errors. Imagine it this way: if a force is calculated too large in one time step and as a result an atom is moved further than it should it timestep n, then it will likely get too close to other atoms in time step n+1 and hence recieve a greater repelling force than what it would have gotten in the correct position. Thus small errors don't build up over time. Not sure it really works like this.. but I think Matt once said something which sounded to me like this :) MrS Scanning for our furry friends since Jan 2002 |
![]() Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The claim is that errors can be caused by "not having enough voltage" or by "having too high of a temperature". Something I've read that seems relevant to this explanation: Today's CPU chips are approaching the lower limit of the voltages at which the transistors work properly. Therefore, the power used by each CPU core can't get much lower. Instead, the companies are increasing the total speed by putting more CPU cores in each CPU package. Intel in also using a different method - hyperthreading. This method gives each CPU core two sets of registers, so that while the CPU is waiting for memory operations for the program running with one of these sets, the CPU can use the other set to run the other program using that set. This makes the CPU act as if it had twice as many CPU cores as it actually does. If a programmer want to use more than one of these CPU cores at the same time for the same program, that programmer must study parallel programming first, in order to handle the communications between the different CPU cores properly. I used to be an electronic engineer, specializing in logic simulation, often including timing analysis. |
Send message Joined: 18 Apr 14 Posts: 43 Credit: 1,192,135,172 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Just crunched my fist one on a GTX 770 at 76° in 31,145.32. Nice 153,150.00 Points ;-) Regards, Josef ![]() |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Just crunched my fist one on a GTX 770 at 76° in 31,145.32. Nice 153,150.00 Points ;-) You finished indeed a Noelia WU, but not this one but the new one: NOELIA_BI. But more important, your 770 can do better, mine finishes these new Noelia's in about 27000 seconds, but temperature is only 66-67°C. And the colder a GPU runs, the faster (and more error free) it does. So perhaps you can experiment with some settings to get the temperature a few degrees lower. Greetings from TJ |
Send message Joined: 18 Apr 14 Posts: 43 Credit: 1,192,135,172 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
[quote]your 770 can do better, mine finishes these new Noelia's in about 27000 seconds, but temperature is only 66-67°C. THX for your advice. To lower the temperature, I'm usig the nvidia inspektor with the following settings: I unchecked Auto-Fan and set it to 60% which speeds the fan from 1300 to 1770 1/min what is still ear-friedly. But that reduces the temperature by only 3 degrees. So I have to check the Priorize Temperature box and put the slider to 68°. Which slows down cpu-clock a little bit. Is there a better approach? ![]() Regards, Josef ![]() |
![]() Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
But more important, your 770 can do better, mine finishes these new Noelia's in about 27000 seconds, but temperature is only 66-67°C. And the colder a GPU runs, the faster (and more error free) it does. So perhaps you can experiment with some settings to get the temperature a few degrees lower. You might have accidently been looking at your 780 Ti. Here's your 3 Noelia results from the 770 so far: # GPU [GeForce GTX 770] Platform [Windows] Rev [3301M] VERSION [42] # Approximate elapsed time for entire WU: 29643.715 s # GPU [GeForce GTX 770] Platform [Windows] Rev [3301M] VERSION [42] # Approximate elapsed time for entire WU: 29572.861 s # GPU [GeForce GTX 770] Platform [Windows] Rev [3301M] VERSION [42] # Approximate elapsed time for entire WU: 29676.489 s |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
You are absolutely correct Beyond, my mistake. Sorry for that MrJo. Still a difference of 2000 seconds. I have never seen nVidia inspector before, I use PrecisionX from EVGA or MSI's Afterburner. I have set a fan curve that goes to 100% at 70°C but the card is allowed to go to 75% before the program must throttle the GPU clock. Power target is set to 100%. Currently with ambient temperature of 32.6°C the 770 runs at 68°C and 1149MHz. Sits in the second slot, the first is occupied by the 780Ti. Hope this helps a bit. Greetings from TJ |
Send message Joined: 18 Apr 14 Posts: 43 Credit: 1,192,135,172 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Now I have tested the MSI Afterburner. There you can set a custom fan curve. However, I have a problem with that: In order to lower the temperature by 3-4 ° C, the fan speed increases to 3300 1/min. This is unpleasant. With my GTX 680 I was able to reduce the temperature by 8 degrees, as I dismounted the cooler and renewed the thermal paste;-) Unfortunately, the same procedure for the GTX 770 delivered nothing, since their thermal paste was not dried out. Too new;-) So I will reduce gpu-clock a little bit to remain below 70 degrees. Reducing from 1150 MHz to 1080-1100 reduces the temperature by 5 degrees. Regards, Josef ![]() |
Send message Joined: 4 Oct 12 Posts: 53 Credit: 333,467,496 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi, potx1x225-NOELIA_INSP-5-13-RND8250_1: Have a odd error - task failed within 3secs. Hopefully it is a one off and not a bad batch; however in case it is not: ERROR: file mdioload.cpp line 81: Unable to read bincoordfile 23:01:01 (3684): called boinc_finish http://www.gpugrid.net/result.php?resultid=12864293 |
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 47,738 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi, I had the same error in 4 units so far. Here is an example of one: potx1x492-NOELIA_INSP-3-13-RND4560_6 Workunit 9908013 Created 22 Jul 2014 | 19:40:28 UTC Sent 22 Jul 2014 | 21:46:12 UTC Received 22 Jul 2014 | 23:03:18 UTC Server state Over Outcome Computation error Client state Compute error Exit status -98 (0xffffffffffffff9e) Unknown error number Computer ID 127986 Report deadline 27 Jul 2014 | 21:46:12 UTC Run time 4.05 CPU time 2.06 Validate state Invalid Credit 0.00 Application version Long runs (8-12 hours on fastest card) v8.41 (cuda60) Stderr output <core_client_version>7.2.42</core_client_version> <![CDATA[ <message> (unknown error) - exit code -98 (0xffffff9e) </message> <stderr_txt> # GPU [GeForce GTX 690] Platform [Windows] Rev [3301M] VERSION [60] # SWAN Device 1 : # Name : GeForce GTX 690 # ECC : Disabled # Global mem : 2048MB # Capability : 3.0 # PCI ID : 0000:04:00.0 # Device clock : 1019MHz # Memory clock : 3004MHz # Memory width : 256bit # Driver version : r337_00 : 33788 ERROR: file mdioload.cpp line 81: Unable to read bincoordfile 19:03:38 (5576): called boinc_finish </stderr_txt> ]]> http://www.gpugrid.net/result.php?resultid=12864314 |
![]() Send message Joined: 26 Sep 08 Posts: 4 Credit: 321,147,075 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Same error here: ERROR: file mdioload.cpp line 81: Unable to read bincoordfile potx1x284-NOELIA_INSP-2-13-RND0923 : WU 9908067 potx1x225-NOELIA_INSP-5-13-RND8250 : WU 9907982 Bye, Grubix. |
Send message Joined: 5 May 13 Posts: 187 Credit: 349,254,454 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This error does not affect NOELIAs only, I had a SANTI_p53final fail on me the other day with the exact same error: ERROR: file mdioload.cpp line 81: Unable to read bincoordfile ![]() |
©2025 Universitat Pompeu Fabra