WU: OPM simulations

Message boards : News : WU: OPM simulations
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 · Next

AuthorMessage
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43492 - Posted: 22 May 2016, 7:07:59 UTC - in response to Message 43491.  

The basic problem appears to be that there is a conflict between the X11VNC server and BOINC. I can do one or the other, but not both. I will just uninstall X11VNC and maybe I can make do with BoincTasks for monitoring this machine, which is a dedicated machine anyway. Hopefully, a future Linux version will fix it.
ID: 43492 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43493 - Posted: 22 May 2016, 9:20:30 UTC - in response to Message 43491.  
Last modified: 22 May 2016, 9:41:56 UTC

Got Ubuntu 16.04-x64 LTS up and running last night via a USB stick installation.
After 200MB of system updates, restarts and switching to the 361.42 binary drivers (which might not have been necessary - maybe a restart would have sufficed?) I configured Coolbits, restarted again and installed Boinc. Restarted again and then opened Boinc and attached to here. The work here is sparse, so I'm running POEM tasks. Configured Thermal Settings (GPU Fan speed to 80%).
For comparison/reference, most POEM tasks take ~775sec (13min) to complete (range is 750 to 830sec) but some longer runs take ~1975sec (33min). Credit is either 5500 or 9100. Temps range from 69C to 73C, GPU clock is ~1278. It's an older AMD system and only PCIE2 x16 (5GT/s), but works fine with 2 CPU tasks and one GPU task running. Seems faster than W10x64. Memory mostly @ 6008MHz but occasionally jumped to 7010 of it's own accord (which I've never seen before on a 970). 30 valid GPU tasks since last night, no invalid's or errors.

Overall I found 16.04 easier than previous distrobutions to setup for GPU crunching. Many previous distributions didn't have the GPU drivers in the repositories for ages. Hopefully with this being a LTS version the repository drivers will be maintained/updated reasonably frequently.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43493 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43494 - Posted: 22 May 2016, 12:48:18 UTC

3d3lR4-SDOERR_opm996-0-1-RND0292 I've received it after 10 days and 20 hours and 34 minutes
1. death's (hell yeah, death is International) desktop with i7-3770K and GTX 670 it has 41 successive errors
2. Alen's desktop with Core2 Quad Q6700 and GTS 450 it has 30 errors, 1 valid and 1 too late tasks (at least the errors have fixed)
3. Robert's desktop with i7-5820K and GTX 770 it has 1 not started by deadline, 1 error, 2 user aborts and 4 successful tasks (seems ok now)
4. Megacruncher TSBT's Xeon E5-2650v3 with GTX 780 it has 52 successive errors
5. ServicEnginIC's Pentium Dual-Core E6300 with GTX 750 it has 1 error and 5 successful tasks
6. Jonathan's desktop with i7-5820K and GTX 960 it has 7 successful and 3 timed out tasks
ID: 43494 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43495 - Posted: 22 May 2016, 20:58:09 UTC - in response to Message 43485.  

The POEM website does show one work unit completed under Linux at 2,757 seconds, which is faster than the 3,400 seconds that I get for that series (1vii) under Windows.

Not that it matter much, but I must have misread BOINCTasks, and was comparing a 1vii to a 2k39, which always runs faster. So the Linux advantage is not quite that large. Comparing the same type of work units (this time 2dx3d) shows about 20.5 minutes for Win7, and 17 minutes for Linux, or about a 20% improvement (all on GTX 960s). That may be about what we see here.

By the way, BOINCTasks is working nicely on Win7 to monitor the Linux machine, though you have to jump through some hoops to set the permissions on the folders in order to copy the app_config, gui_rpc_auth.cfg and remote_hosts.cfg. And that is after you find where Linux puts them; they are a bit spread out as compared to the BOINC Data folder in Windows. It is a learning experience.
ID: 43495 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43496 - Posted: 23 May 2016, 8:33:12 UTC

How does a host with 2 cards have 6 WUs in progress https://www.gpugrid.net/results.php?hostid=326161 at one time Monday 23 May 8:36 UTC
ID: 43496 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vagelis Giannadakis

Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 43498 - Posted: 23 May 2016, 10:49:15 UTC

On the topic of WU timeouts, while all the issues raised in this discussion can cause them, let me point to the most probable (IMO) cause, thoroughly reported by affected users, but not resolved as yet:

GPUGRID's network issues

Most of you know the discussions about these issues, with people reporting they can't upload results, downloads taking for ever, etc. One other way these issues manifest themselves is by "phantom" WU assignments, whereby a host requests for work, the server grants it work, but the HTTP request times out for the host and it never receives the positive response. The WU is assigned to the host, but the host has no knowledge of this, does not download it and the WU remains there, waiting to timeout!

This has happened for me two or three times. I wanted to post the errored-out tasks, but they have been deleted.
ID: 43498 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43500 - Posted: 23 May 2016, 13:58:33 UTC - in response to Message 43498.  

On the topic of WU timeouts, while all the issues raised in this discussion can cause them, let me point to the most probable (IMO) cause, thoroughly reported by affected users, but not resolved as yet:

GPUGRID's network issues

Most of you know the discussions about these issues, with people reporting they can't upload results, downloads taking for ever, etc. One other way these issues manifest themselves is by "phantom" WU assignments, whereby a host requests for work, the server grants it work, but the HTTP request times out for the host and it never receives the positive response. The WU is assigned to the host, but the host has no knowledge of this, does not download it and the WU remains there, waiting to timeout!

This has happened for me two or three times. I wanted to post the errored-out tasks, but they have been deleted.

"GPUGRID's network issues"

The GPUGrid network issues are a problem and they never seem to be addressed. Just looked at the WUs supposedly assigned to my machines and there are 2 phantom WUs that the server thinks I have, but I don't:

https://www.gpugrid.net/workunit.php?wuid=11594782

https://www.gpugrid.net/workunit.php?wuid=11602422

As you allude, some of the timeout issues here are due to poor network setup/performance or perhaps BOINC misconfiguration. Maybe someone from one of the other projects could help them out. Haven't seen issues like this anywhere else and have been running BOINC extensively since its inception. Some of (and perhaps a lot of) the timeouts complained about in this thread are due to this poor BOINC/network setup/performance (take your pick).

ID: 43500 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43501 - Posted: 23 May 2016, 14:00:28 UTC - in response to Message 43496.  
Last modified: 23 May 2016, 14:04:52 UTC

How does a host with 2 cards have 6 WUs in progress https://www.gpugrid.net/results.php?hostid=326161 at one time Monday 23 May 8:36 UTC

GPUGrid issues 'up to' 2 tasks per GPU and that system has 3 GPU's, though only 2 are NVidia GPU's!

CPU type AuthenticAMD
AMD A10-7700K Radeon R7, 10 Compute Cores 4C+6G [Family 21 Model 48 Stepping 1]

Coprocessors [2] NVIDIA GeForce GTX 980 (4095MB) driver: 365.10, AMD Spectre (765MB)

It's losing the 50% credit bonus but at least it's a reliable system.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43501 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43511 - Posted: 24 May 2016, 1:05:38 UTC - in response to Message 43501.  
Last modified: 24 May 2016, 1:19:14 UTC

How does a host with 2 cards have 6 WUs in progress https://www.gpugrid.net/results.php?hostid=326161 at one time Monday 23 May 8:36 UTC

GPUGrid issues 'up to' 2 tasks per GPU and that system has 3 GPU's, though only 2 are NVidia GPU's!

CPU type AuthenticAMD
AMD A10-7700K Radeon R7, 10 Compute Cores 4C+6G [Family 21 Model 48 Stepping 1]

Coprocessors [2] NVIDIA GeForce GTX 980 (4095MB) driver: 365.10, AMD Spectre (765MB)

It's losing the 50% credit bonus but at least it's a reliable system.


If it has only 2 CUDA GPU's it should only get 2 WU's per CUDA GPU since this project does NOT send WU's to NON Cuda cards.

Card switching is the answer, basically you put 3 cards into one host and get 6 tasks you then take a card out and put it into another host and get more tasks.

And if you are asking whether I am accusing Caffeine of doing that....YES. I am.
ID: 43511 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43516 - Posted: 24 May 2016, 8:42:59 UTC - in response to Message 43511.  
Last modified: 24 May 2016, 8:48:09 UTC

How does a host with 2 cards have 6 WUs in progress https://www.gpugrid.net/results.php?hostid=326161 at one time Monday 23 May 8:36 UTC

GPUGrid issues 'up to' 2 tasks per GPU and that system has 3 GPU's, though only 2 are NVidia GPU's!

CPU type AuthenticAMD
AMD A10-7700K Radeon R7, 10 Compute Cores 4C+6G [Family 21 Model 48 Stepping 1]

Coprocessors [2] NVIDIA GeForce GTX 980 (4095MB) driver: 365.10, AMD Spectre (765MB)

It's losing the 50% credit bonus but at least it's a reliable system.


If it has only 2 CUDA GPU's it should only get 2 WU's per CUDA GPU since this project does NOT send WU's to NON Cuda cards.

Card switching is the answer, basically you put 3 cards into one host and get 6 tasks you then take a card out and put it into another host and get more tasks.

And if you are asking whether I am accusing Caffeine of doing that....YES. I am.


Work fetch is where Boinc Manager comes into play and confuses the matter. GPUGrid would need to put in more server side configurations and routines to try and better deal with that, or remove the AMD app (possibly), but this problem just happened upon GPUGrid.
Setting 1 WU per GPU would be simpler, and more fair (especially with so few tasks available), and go a long way to rectifying the situation.

While GPUGrid doesn't presently have an active AMD/ATI app, it sort-of does have an AMD/ATI app - the MT app for CPU's+AMD's:
https://www.gpugrid.net/apps.php
Maybe somewhere on the GPUGrid server they can set something up so as not to send so many tasks, but I don't keep up with all the development of Boinc these days.

It's not physical card switching (inserting and removing cards) because that's an integrated AMD/ATI GPU.
Ideally the GPUGrid's server would recognised that there are only 2 NVidia GPU's and send out tasks going by that number, but it's a Boinc server that's used. While there is a way for the user/cruncher to exclude a GPU type against a project (Client_Configuration), it's very hands on and if AMD work did turn up here they wouldn't get any.
http://boinc.berkeley.edu/wiki/Client_configuration

My guess is that having 3 GPU's (even though one isn't useful) inflates the status of the system; as the system returns +ve results (no failure) it's rated highly, but more-so because there are 3 GPU's in it. So its even more likely to get work than a system with 2 GPU's with an identical yield.
Not sure anything is being done deliberately. It's likely the iATI is being used exclusively for display purposes and why would you want to get 25% less credit for the same amount of work? From experience 'playing' with the use of various integrated and mixed GPU types, they are a pain to setup, and when you get it working you don't want to change anything. That might be the case here.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43516 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tomas Brada

Send message
Joined: 3 Nov 15
Posts: 38
Credit: 6,768,093
RAC: 0
Level
Ser
Scientific publications
wat
Message 43517 - Posted: 24 May 2016, 10:37:23 UTC
Last modified: 24 May 2016, 10:39:24 UTC

x.
ID: 43517 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tomas Brada

Send message
Joined: 3 Nov 15
Posts: 38
Credit: 6,768,093
RAC: 0
Level
Ser
Scientific publications
wat
Message 43518 - Posted: 24 May 2016, 10:37:29 UTC

I notice lot of Users have great trouble installing Linux+BOINC.
I am playing with the idea to write a sort-of guide to set-up basic Debian install and configuration of BOINC. Would you appreciate it?
It should be on-line this or the next week.

About that WU problem: Prime Grid project utilize "tickles". Large tasks are sent out with short deadline and if the tickle is successful and your computer active works on the task, the deadline is extended. Gpugrid project could benefit from this.

ID: 43518 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43519 - Posted: 24 May 2016, 12:20:12 UTC

"Trickles", not "tickles".

And I happen to think that GPUGrid's current deadlines are sufficient for most of its users to get done on time; I believe we don't need trickles. Interesting idea, though! NOTE: RNA World also uses trickles to auto-extend task deadlines on the server. Some of my tasks are approaching 300 days of compute time already :)
ID: 43519 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43520 - Posted: 24 May 2016, 12:48:59 UTC - in response to Message 43518.  
Last modified: 24 May 2016, 12:51:26 UTC

How to - install Ubuntu 16.04 x64 Linux & setup for GPUGrid

Discussion of Ubuntu 16.04-x64 LTS installation and configuration

ClimatePrediction uses trickle uploads too. Was suggested for here years ago but wasn't suitable then and probably still isn't.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43520 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43527 - Posted: 24 May 2016, 18:02:14 UTC - in response to Message 43494.  

3d3lR4-SDOERR_opm996-0-1-RND0292 I've received it after 10 days and 20 hours and 34 minutes
1. death's (hell yeah, death is International) desktop with i7-3770K and GTX 670 it has 41 successive errors
2. Alen's desktop with Core2 Quad Q6700 and GTS 450 it has 30 errors, 1 valid and 1 too late tasks (at least the errors have fixed)
3. Robert's desktop with i7-5820K and GTX 770 it has 1 not started by deadline, 1 error, 2 user aborts and 4 successful tasks (seems ok now)
4. Megacruncher TSBT's Xeon E5-2650v3 with GTX 780 it has 52 successive errors
5. ServicEnginIC's Pentium Dual-Core E6300 with GTX 750 it has 1 error and 5 successful tasks
6. Jonathan's desktop with i7-5820K and GTX 960 it has 7 successful and 3 timed out tasks

Here's the kind of thing that I find most mystifying. Running a 980Ti GPU, then holding the WU for 5 days until it gets sent out again, negating the usefulness of the next users contribution and missing all bonuses. Big waste of time and resources:

https://www.gpugrid.net/workunit.php?wuid=11602161
ID: 43527 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43528 - Posted: 24 May 2016, 18:16:01 UTC - in response to Message 43527.  
Last modified: 24 May 2016, 18:27:51 UTC

Doesn't bother cooling the $650 card either!

# GPU 0 : 55C
# GPU 0 : 59C
# GPU 0 : 64C
# GPU 0 : 68C
# GPU 0 : 71C
# GPU 0 : 73C
# GPU 0 : 75C
# GPU 0 : 76C
# GPU 0 : 77C
# GPU 0 : 78C
# GPU 0 : 79C
# GPU 0 : 80C
# GPU 0 : 81C
# BOINC suspending at user request (exit)
# GPU [GeForce GTX 980 Ti] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 0 :
# Name : GeForce GTX 980 Ti
# ECC : Disabled
# Global mem : 4095MB
# Capability : 5.2
# PCI ID : 0000:02:00.0
# Device clock : 1076MHz
# Memory clock : 3805MHz
# Memory width : 384bit
# Driver version : r364_69 : 36510

https://www.gpugrid.net/result.php?resultid=15108452

Also clear that the user keeps a high cache level and frequently gets wonky credits:

https://www.gpugrid.net/results.php?hostid=331964

PS. Running a GERARD_FXCX... task on my Linux system (GTX970), and two on my W10 system. Looks like Linux is about ~16% faster than under W10, and that's with the W10 system being slightly Overclocked and being supported by a faster CPU. As observed before, the difference is likely higher for bigger cards. So with a GTX980Ti it's probably greater (maybe ~20%) and with a GTX750Ti it's probably less (maybe ~11%).
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43528 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43529 - Posted: 24 May 2016, 18:23:34 UTC
Last modified: 24 May 2016, 18:24:14 UTC

I have 2 GTX 980 Ti's in my new rig. I have an aggressive MSI Afterburner fan profile, that goes 0% fan @ 50*C, to 100% fan @ 90*C.

One of my GPUs sees temps up-to-85*C. Another up-to-75*C. I consider the cooling adequate, so long as the clocks are stable. I'm working on finding the max stable overclocks, presently.

Example result:
https://www.gpugrid.net/result.php?resultid=15106295

So ... sometimes, a system just runs hot. Stable, but hot. All my systems are hot, overclocked to max stable clocks, CPU and GPU. They refuse to take their shirts off, and deliver a rockstar performance every time.
ID: 43529 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43530 - Posted: 24 May 2016, 18:24:08 UTC - in response to Message 43528.  

Possibly running other NV projects and not getting back to the GPUGrid WU until BOINC goes into panic mode.
ID: 43530 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43531 - Posted: 24 May 2016, 18:34:15 UTC - in response to Message 43529.  

I have 2 GTX 980 Ti's in my new rig. I have an aggressive MSI Afterburner fan profile, that goes 0% fan @ 50*C, to 100% fan @ 90*C.

One of my GPUs sees temps up-to-85*C. Another up-to-75*C. I consider the cooling adequate, so long as the clocks are stable. I'm working on finding the max stable overclocks, presently.

Example result:
https://www.gpugrid.net/result.php?resultid=15106295

So ... sometimes, a system just runs hot. Stable, but hot. All my systems are hot, overclocked to max stable clocks, CPU and GPU. They refuse to take their shirts off, and deliver a rockstar performance every time.


# GPU 1 : 74C
# GPU 0 : 78C
# GPU 0 : 79C
# GPU 0 : 80C
# GPU 1 : 75C
# GPU 0 : 81C
# GPU 0 : 82C
# GPU 0 : 83C
# BOINC suspending at user request (exit)
# GPU [GeForce GTX 980 Ti] Platform [Windows] Rev [3212] VERSION [65]

That suggests to me that the GPU was running too hot, the task became unstable and the app suspended crunching for a bit and recovered (recoverable errors). Matt added that suspend-recover feature some time ago IIRC.

FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43531 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43532 - Posted: 24 May 2016, 18:42:16 UTC - in response to Message 43531.  
Last modified: 24 May 2016, 18:43:06 UTC

I have 2 GTX 980 Ti's in my new rig. I have an aggressive MSI Afterburner fan profile, that goes 0% fan @ 50*C, to 100% fan @ 90*C.

One of my GPUs sees temps up-to-85*C. Another up-to-75*C. I consider the cooling adequate, so long as the clocks are stable. I'm working on finding the max stable overclocks, presently.

Example result:
https://www.gpugrid.net/result.php?resultid=15106295

So ... sometimes, a system just runs hot. Stable, but hot. All my systems are hot, overclocked to max stable clocks, CPU and GPU. They refuse to take their shirts off, and deliver a rockstar performance every time.


# GPU 1 : 74C
# GPU 0 : 78C
# GPU 0 : 79C
# GPU 0 : 80C
# GPU 1 : 75C
# GPU 0 : 81C
# GPU 0 : 82C
# GPU 0 : 83C
# BOINC suspending at user request (exit)
# GPU [GeForce GTX 980 Ti] Platform [Windows] Rev [3212] VERSION [65]

That suggests to me that the GPU was running too hot, the task became unstable and the app suspended crunching for a bit and recovered (recoverable errors). Matt added that suspend-recover feature some time ago IIRC.

No. I believe you are incorrect.

The simulation will only terminate/retry when it says:
# The simulation has become unstable. Terminating to avoid lock-up (1)
# Attempting restart

Try not to jump to conclusions about hot machines :)

I routinely search my results, for "stab", and if I find an instability message matching that text, I know to downclock my overclock a bit more.
ID: 43532 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 · Next

Message boards : News : WU: OPM simulations

©2025 Universitat Pompeu Fabra