Error after 14 hours of calculations and no points

Message boards : Number crunching : Error after 14 hours of calculations and no points
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44004 - Posted: 17 Jul 2016, 23:23:26 UTC

Okay. Good luck.
ID: 44004 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Zarck

Send message
Joined: 16 Aug 08
Posts: 145
Credit: 328,473,995
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44044 - Posted: 25 Jul 2016, 22:42:25 UTC - in response to Message 44004.  
Last modified: 25 Jul 2016, 22:45:12 UTC

@+
*_*
ID: 44044 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Zarck

Send message
Joined: 16 Aug 08
Posts: 145
Credit: 328,473,995
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44045 - Posted: 25 Jul 2016, 22:43:49 UTC - in response to Message 44004.  

I also had a problem with Collatz Gpu, my problem is corrected Collatz with the solution below, I hope it fix my problem with GPUGrid.

"When it crashes without displaying anything in the log, the problem is usually the C++ runtime. BOINC will send both 32 and 64 bit apps for reasons I won't go into right now, but because of that, you will need to have both the 32-bit and the 64-bit Microsoft C++ runtime installed.".

@+
*_*
ID: 44045 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44046 - Posted: 26 Jul 2016, 1:11:13 UTC - in response to Message 44045.  
Last modified: 26 Jul 2016, 1:13:24 UTC

When I install a new partition, I usually install all current versions, of all the runtimes.
My current list is below.

You can download the files from:
https://support.microsoft.com/en-us/kb/2977003
(a good site to bookmark)

Microsoft Visual C++ 2005 Redistributable (x64) - 8.0.61000 - Service Pack 1 MFC Security Update
Microsoft Visual C++ 2005 Redistributable (x86) - 8.0.61001 - Service Pack 1 MFC Security Update
Microsoft Visual C++ 2008 Redistributable (x64) - 9.0.30729.6161 - Service Pack 1 MFC Security Update
Microsoft Visual C++ 2008 Redistributable (x86) - 9.0.30729.6161 - Service Pack 1 MFC Security Update
Microsoft Visual C++ 2010 Redistributable (x64) - 10.0.40219 - Service Pack 1 MFC Security Update
Microsoft Visual C++ 2010 Redistributable (x86) - 10.0.40219 - Service Pack 1 MFC Security Update
Microsoft Visual C++ 2012 Redistributable (x64) - 11.0.61030 - Visual Studio 2012 Update 4
Microsoft Visual C++ 2012 Redistributable (x86) - 11.0.61030 - Visual Studio 2012 Update 4
Microsoft Visual C++ 2013 Redistributable (x64) - 12.0.30501
Microsoft Visual C++ 2013 Redistributable (x86) - 12.0.30501
Microsoft Visual C++ 2015 Redistributable (x64) - 14.0.23026
Microsoft Visual C++ 2015 Redistributable (x86) - 14.0.23026
ID: 44046 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44054 - Posted: 27 Jul 2016, 10:17:44 UTC - in response to Message 44046.  

Found both my GPUGrid tasks had failed overnight and the system had restarted. Had to force another restart as my first monitor stopped displaying anything - my first GPU stopped working properly (display went to VGA, despite being connected via a DVI cable).
A 368.22 driver crash report popped up.
When trying to upgrade to 3.0.2.196 (BETA) the Microsoft Visual C++ Runtime Library reported a Runtime Error! – terminated in an unusual way. The NV Beta update would not install. Will try the latest non-beta (368.81).
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 44054 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44055 - Posted: 27 Jul 2016, 11:26:33 UTC
Last modified: 27 Jul 2016, 11:26:58 UTC

GPUGrid still has problems, when an "abrupt unexpected restart" (like a BSOD or a power outage) happens. Basically, after the unexpected restart, when Windows loads, and BOINC loads, sometimes GPUGrid tasks will caused TDRs, BSODs, and can even make other projects' tasks fail as fallout.

What I usually do, when that rarely happens for me, is, as soon as possible after BOINC loads, hit Activity -> Suspend. Then, suspend all projects except GPUGrid, then suspend all tasks except 1, and determine (on a per-GPU-task basis) which ones are good and which ones are bad. Abort the bad ones.

If you believe your driver is somehow corrupted, DDU (Display Driver Uninstaller) works great! You can even run this tool in Normal mode (doesn't require safe mode). Just be sure to create a System Restore point beforehand.

I wish GPUGrid would fix their problems resuming from this scenario, or at least terminate the task gracefully without causing the system to crap itself!
ID: 44055 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44056 - Posted: 27 Jul 2016, 16:16:28 UTC - in response to Message 44055.  

GPUGrid still has problems, when an "abrupt unexpected restart" (like a BSOD or a power outage) happens. Basically, after the unexpected restart, when Windows loads, and BOINC loads, sometimes GPUGrid tasks will caused TDRs, BSODs, and can even make other projects' tasks fail as fallout.

Have been meaning to start a thread about this problem. While I almost never see BSODs, we've been having storms and power glitches. Every time the power goes down 17 WUs have about a 50% chance of failing on restart. As you say, it will now and then take out other projects when the GPUGrid WU fails. Whats even worse is that some WUs will appear to restart from the beginning but then will be marked invalid at completion. If you see that one has restarted, abort it as it will most likely just be a waste of time.

I wish GPUGrid would fix their problems resuming from this scenario, or at least terminate the task gracefully without causing the system to crap itself!

If GPUGrid wants to increase their usability, this is the first issue they should fix. Making their application more fault tolerant should be at the top of their priority list. More important than pascal I think...
ID: 44056 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Zarck

Send message
Joined: 16 Aug 08
Posts: 145
Credit: 328,473,995
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44057 - Posted: 27 Jul 2016, 23:14:14 UTC - in response to Message 44056.  

Driver 369.00 available,

https://developer.nvidia.com/opengl-driver

@+
*_*
ID: 44057 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44058 - Posted: 28 Jul 2016, 8:45:35 UTC - in response to Message 44056.  

GPUGrid still has problems, when an "abrupt unexpected restart" (like a BSOD or a power outage) happens. Basically, after the unexpected restart, when Windows loads, and BOINC loads, sometimes GPUGrid tasks will caused TDRs, BSODs, and can even make other projects' tasks fail as fallout.

Have been meaning to start a thread about this problem. While I almost never see BSODs, we've been having storms and power glitches. Every time the power goes down 17 WUs have about a 50% chance of failing on restart. As you say, it will now and then take out other projects when the GPUGrid WU fails. Whats even worse is that some WUs will appear to restart from the beginning but then will be marked invalid at completion. If you see that one has restarted, abort it as it will most likely just be a waste of time.

I've noticed that the possibility of this error is higher for faster GPUs. Some files used for restarting the calculation get corrupted (filled by zeroes) in the slot folder of the given GPUGrid task. I think the reason for this corruption is that these files are written too frequently when the GPU is fast, so the OS never writes their content to the disk. To overcome this the app should use the non-cached write API call of the OS for these files. The user can disable write-behind caching in the meantime. (Device manager -> Disk drives -> select your BOINC disk (double click) -> Policies tab -> Un-check (both) write caching option -> OK)

I wish GPUGrid would fix their problems resuming from this scenario, or at least terminate the task gracefully without causing the system to crap itself!

If GPUGrid wants to increase their usability, this is the first issue they should fix. Making their application more fault tolerant should be at the top of their priority list. More important than pascal I think...

The need for a new app for the Pascal GPUs is a great opportunity to hit two birds with one stone.
ID: 44058 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Zarck

Send message
Joined: 16 Aug 08
Posts: 145
Credit: 328,473,995
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44059 - Posted: 28 Jul 2016, 12:06:48 UTC - in response to Message 44058.  

I managed to get two units in a row without crashing, provided it lasts.

J'ai réussi a faire deux unités de suite sans plantages, pourvu que cela dure.

15218345 11681924 189775 27 Jul 2016 | 11:20:11 UTC 28 Jul 2016 | 9:21:49 UTC Terminé et validé 53,029.11 18,985.91 267,900.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65)
15216985 11676464 189775 26 Jul 2016 | 10:43:52 UTC 27 Jul 2016 | 11:50:13 UTC Terminé et validé 56,439.90 17,339.16 203,375.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65

@+
*_*)


ID: 44059 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Zarck

Send message
Joined: 16 Aug 08
Posts: 145
Credit: 328,473,995
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44060 - Posted: 28 Jul 2016, 12:07:00 UTC - in response to Message 44058.  
Last modified: 28 Jul 2016, 12:07:14 UTC

I managed to get two units in a row without crashing, provided it lasts.

J'ai réussi a faire deux unités de suite sans plantages, pourvu que cela dure.

15218345 11681924 189775 27 Jul 2016 | 11:20:11 UTC 28 Jul 2016 | 9:21:49 UTC Terminé et validé 53,029.11 18,985.91 267,900.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65)
15216985 11676464 189775 26 Jul 2016 | 10:43:52 UTC 27 Jul 2016 | 11:50:13 UTC Terminé et validé 56,439.90 17,339.16 203,375.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65)

@+
*_*
ID: 44060 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,739,145,728
RAC: 95,752
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44063 - Posted: 28 Jul 2016, 23:05:36 UTC - in response to Message 44058.  

The need for a new app for the Pascal GPUs is a great opportunity to hit two birds with one stone.


Hopefully, the new app will improve number crunching performance, as well. Maybe, it can be less CPU dependent.




ID: 44063 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Zarck

Send message
Joined: 16 Aug 08
Posts: 145
Credit: 328,473,995
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44064 - Posted: 29 Jul 2016, 11:55:00 UTC - in response to Message 44063.  

Too bad there is not enough units for everyone, I passed on Poem Gpu.

@+
*_*
ID: 44064 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Error after 14 hours of calculations and no points

©2026 Universitat Pompeu Fabra