Message boards :
Number crunching :
Lotf of errors for two weeks
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 25 Nov 13 Posts: 66 Credit: 282,724,028 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I crunch short runs on my notebook. It has a Nvidia GT 740M which is not sth. very fast but could finish long hours at two days. However, for about two weeks there are lots of errors. Some works run for a short time and errorrr... And some passes %90 and then errorrr!... All the works ended with errors are short run Noelia works. When I check them some ended up with errors on other users and some succeeded to finish, so I think they're not really faulty. But it seems I have a problem with my system. I use Kubuntu 15 with Nvidia driver 346.59 running via optirun command which is a way to utilise Nvidia GPU of Optimus notebooks. Driver seems to be old but this is the latest one in Kubuntu repo. I may need to test latest drivers from Xorg-Edgar's repo but sometimes they break Bumblebee and CUDA, so I didn't hurry to install them. Also it was crunching fine since I installed this OS. dmesg errors I get are these: [ 7007.280414] NVRM: GPU at PCI:0000:07:00: GPU-932eafe5-81a1-4d7e-b0cb-1f8f59518f5c [ 7007.280427] NVRM: Xid (PCI:0000:07:00): 13, Graphics SM Warp Exception on (GPC 0, TPC 1): Out Of Range Address [ 7007.280437] NVRM: Xid (PCI:0000:07:00): 13, Graphics SM Global Exception on (GPC 0, TPC 1): Physical Multiple Warp Errors [ 7007.280444] NVRM: Xid (PCI:0000:07:00): 13, Graphics Exception: ESR 0x504e48=0x1000e 0x504e50=0x4 0x504e44=0x13eff2 0x504e4c=0x7f [ 7007.280459] NVRM: Xid (PCI:0000:07:00): 13, Graphics Exception: ChID 0005, Class 0000a1c0, Offset 00001b0c, Data 00000000 What problems can cause this? Notebook hardware tired from running 7/24 Gpugrid? Recently I've drilled some holes under the laptop and put some heatsinks on GPU VRM's and some other parts. Also running Notepal U3+ to push some air to cool down the VRM's, they're really hot, untouchable. Most of the time GPU runs less than 70 deg, and max. 73C. It was crunching problem-free even at 90degrees. So what can cause this, for two weaks? Also other games and crunching projects run normally on GPU. |
|
Send message Joined: 13 Mar 12 Posts: 9 Credit: 112,964,646 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Have huge ratio for tasks/errors as same as previous person. Updated drivers today to 355, boinc version is 7.6.6 for last month or so. Could it be 7.6.6 version issue or driver or something else? as task runs for days, it's not good to loose such amount of work done. thanks |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Please post the computer ID that is having the problem, and also the details reported by GPU-Z. It seems possible that it is clocked too high. |
|
Send message Joined: 13 Mar 12 Posts: 9 Credit: 112,964,646 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
all GPUs are going by default https://www.gpugrid.net/show_host_detail.php?hostid=172749 https://www.gpugrid.net/show_host_detail.php?hostid=172862 |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
For your GTS 450 computer, you may need to lower the clocks. You should try running the Heaven benchmark for several minutes, to see if it's stable at current clocks, and if not, you'll need to lower the GPU Offset using a tool like MSI Afterburner. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The same BOINC & driver combination is working fine on my main PC, and there have been no general problem reports. So something on your side seems to be wrong. The task logs of the 450 show lines like this: # Simulation unstable. Flag 9 value 8175 Which means serious garbage has been computed. A few things I'd do, in addition to what Jacob said: - check if GPU fan still works - power off, remove power cord, wait 10+ minutes and try again - also lower the GPU memory clock - try a different project MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 13 Mar 12 Posts: 9 Credit: 112,964,646 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thanks My question was about your end and possible drivers or project' new version. I have 5 hosts with nVidia cards and 3 of them with older cards stopped getting tasks or failed to execute them. Those hosts were intact fr last couple of years doing only designated boinc tasks, nothing wrong with fans or anything being changed. Your advises are good, but meanwhile be aware about possible issues with drivers, boinc shell updates, your project files updates, windows updates etc |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thanks The fact, Costa, your hardware is too slow for this project now (at least for Long Runs). I'll bet that responders have thought it but daren't say it for fear of offending. |
|
Send message Joined: 13 Mar 12 Posts: 9 Credit: 112,964,646 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
so I just cut long runs then? also where is ATI support, I have couple of cards |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
so I just cut long runs then? Yes cut long runs for sure and then only run your 560 for short runs. To put things into perspective if you obtained a second hand GTX660TI and got rid of the rest your RAC would go up by 300% approximately, you would be able to do Long Runs and you would save a lot of money on electricity costs. You should be able to get 660ti off ebay for no more than £60 GBP I don't know how much that is in Australian dollars. As for ATI cards, unfortunately this project has made the decision not develop an app for anything but CUDA which means Nvidia only. Can't see that changing anytime soon. Good luck to you. Richard This might be a good buy http://www.ebay.com.au/itm/Asus-Geforce-GTX-660-Ti-2GB-Graphics-Card-/141767298182?hash=item2101fd4c86 |
|
Send message Joined: 13 Mar 12 Posts: 9 Credit: 112,964,646 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
thanks mate |
|
Send message Joined: 18 Oct 13 Posts: 53 Credit: 406,647,419 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I agree with him. I also find a lot of mistakes since 2 weeks And no, it's not often propagated here overclocking the GPU All other programs will run without errors on my system. System: WIN 10 CPU I-7 NVIDIA GTX 760 |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
What is the exact make and model of your GPU? And what does GPU-Z show for values "GPU Clock" and "Default clock"? Believe it or not, we are trying to help, but you have to be willing to provide details, and be willing to investigate/troubleshoot by doing things like downclocking when requested. GPUGrid puts a certain kind of stress on GPUs, that other distributed apps and games cannot replicate. I've seen firsthand where I've had to downclock in order to get stability in GPUGrid, whereas the same GPU can run other apps/games at higher clocks. So, downclocking to attain GPUGrid stability, is normal here. |
|
Send message Joined: 18 Oct 13 Posts: 53 Credit: 406,647,419 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Jacob, i'm NOT interested to begin a old discussion with you about overclocking. https://www.gpugrid.net/forum_thread.php?id=4097&nowrap=true#41152 |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So, you can't be bothered to provide the details, or to try things to fix the problem? Well, then you can't be helped. Yes, sometimes batches have problems. But if you're having problems with multiple batches, then it's time to start troubleshooting, if you want to fix it. If you want to have a bad attitude, and not troubleshoot, then you can't expect it to fix itself. |
|
Send message Joined: 13 Mar 12 Posts: 9 Credit: 112,964,646 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
failing crunchers nVidia GTX275 633/633Mhz nVidia GTX450 783/783Mhz drivers updated to latest win7-32 |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
failing crunchers GTX 275, 633 reference clock, compute capability 1.3 (Tesla), fab 55nm GTS 450, 783 reference clock, compute capability 2.1 (Fermi) GPUGrid relies on newer compute capabilities, and newer driver/Cuda versions, and can't easily support older GPUs. https://www.gpugrid.net/forum_thread.php?id=2507 NVIDIA no-longer develops drivers for the pre-Fermi types. They try to support them, but it's very limited support. http://nvidia.custhelp.com/app/answers/detail/a_id/3473 My recommendation would be: - Have the GTX 275 work on some other project, like maybe Folding@Home or SETI@Home - Continue to troubleshoot the GTS 450 if you want to do GPUGrid work, by doing things like cleaning the fans, using MSI Afterburner to set a custom fan curve to keep the temp below 70*C, doing clean driver installs, lowering clocks even below reference, etc. https://en.wikipedia.org/wiki/CUDA https://en.wikipedia.org/wiki/GeForce_200_series https://en.wikipedia.org/wiki/GeForce_400_series Regards, Jacob |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
failing crunchers My advice on this occasion only would be to shut up. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My advice on this occasion only would be to shut up. Wow. Have a better day. |
|
Send message Joined: 18 Oct 13 Posts: 53 Credit: 406,647,419 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Very helpful answer :-( Do you really think I post error if the same batches always emerge only fault with me? Ex: http://www.gpugrid.net/workunit.php?wuid=11166870 http://www.gpugrid.net/workunit.php?wuid=11175897 |
©2026 Universitat Pompeu Fabra