Message boards :
Number crunching :
Too many errors!!
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I suspect this is happening to everyone, something needs to be done right away. |
|
Send message Joined: 2 Jul 16 Posts: 338 Credit: 7,987,341,558 RAC: 259 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Maybe so but nothing happens at this project 'right away'. |
|
Send message Joined: 16 Jun 12 Posts: 17 Credit: 292,288,806 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Getting many frozen units, usually around 6% or so the time runs, but after 2-3 hours nothing is happening. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The GPU 0 in your PC is quite hot (85°C=185°F), that may cause workunits to freeze. |
|
Send message Joined: 16 Jun 12 Posts: 17 Credit: 292,288,806 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
More errors and time wasted. I will wait until I have an answer/correction is made. Shame since I would like to crunch for medical science. |
|
Send message Joined: 2 Jul 16 Posts: 338 Credit: 7,987,341,558 RAC: 259 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Retvari Zoltan gave you the answer. No one else responded as it is probably correct and the 1st thing that should be fixed. |
|
Send message Joined: 4 Mar 18 Posts: 53 Credit: 2,815,476,011 RAC: 0 Level ![]() Scientific publications
|
Though temperature may have something to do with it, it may be other things as well. GPUGrid long work units especially may 'freeze' as you describe if interrupted frequently. This can happen if you have BOINC Computing preferences set to "Suspend when computer is in use" or "Suspend GPU computing when computer is in use", or "Suspend when non-BOINC CPU usage is above ____%" I had this problem when first starting GPUGrid about a year ago. I got help via the Forum to figure it out: http://www.gpugrid.net/forum_thread.php?id=4699#48749 I would recommend straight up disabling the "Suspend...." BOINC features and let the computer sort out how much to allocate when you are doing other things. Also, even when I had the problem, I found that if I selected "Snooze GPU" (and waited; GPUGrid WUs take a while to actually 'snooze' on Windows) then unchecked "Snooze GPU", the majority of the time the WU would continue. Sure, still some wasted time, but at least the WU was completed. |
|
Send message Joined: 4 Mar 18 Posts: 53 Credit: 2,815,476,011 RAC: 0 Level ![]() Scientific publications
|
The GPU 0 in your PC is quite hot (85°C=185°F), that may cause workunits to freeze. My personal experience is that this temperature doesn't cause problems. I have been crunching GPUGrid for over a year now (24/7) with a 1070 in a case with airflow problems that crunches GPUGrid WUs at between 82 and 85 degrees. Feel free to check my recent tasks to confirm. I might be decreasing the life of the GPU (time will tell), but at least in my case everything is working OK. (I am now looking at cooling options for this one. Because of my experience from my first build (Linux only), I have gained some confidence in trying modifications for the Dell machine with the 1070.) |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
GPUGrid long work units especially may 'freeze' as you describe if interrupted frequently.That's correct. But his tasks (except for the last ones) haven't been suspended (interrupted) at all. So this left the high GPU temperature as the most probable cause. The cause could be also: - overcomitted CPU - inadequate PSU (maybe the PSU got too hot too) - overclocked GPU memory - overclocked GPU - overclocked CPU memory - overclocked CPU - SLI cable - burnt PCIe power connector(s) - burnt M/B power connector (the 12V pins) - interfering application which use GPU acceleration (for example: browsers, video playback software, games) So all the usual stuff. Of course it could be something we can't think of, as we don't know his system. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Recent GPUs tolerate high temperatures much better than the GPUs tolerated it 5-6 years ago, but the physics haven't changed since then. Thermal expansion and contraction stayed the same over the years. The damage it causes is the same. The capacitors are more advanced now than they were 5-6 years ago, but they still prefer lower temperatures.The GPU 0 in your PC is quite hot (85°C=185°F), that may cause workunits to freeze.My personal experience is that this temperature doesn't cause problems. I have been crunching GPUGrid for over a year now (24/7) with a 1070 in a case with airflow problems that crunches GPUGrid WUs at between 82 and 85 degrees. Feel free to check my recent tasks to confirm.I did. Some of your tasks hit 90°C. I might be decreasing the life of the GPU (time will tell),You do. but at least in my case everything is working OK.For now. |
|
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
85° is just too hot no matter which way you look at it, there's no margin for error. I would feel very uncomfortable running my GPU's that hot, I'm in the mid 70's now and keep a very close watch on them. |
|
Send message Joined: 21 Mar 16 Posts: 513 Credit: 4,673,458,277 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Not only prone to errors when very hot, but they also run less efficiently. Heated metal has a higher resistivity than colder metal. The GPU will use more power at 85-90C than 60-70C |
©2025 Universitat Pompeu Fabra