WU: OPM simulations

Author	Message
Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 43492 - Posted: 22 May 2016, 7:07:59 UTC - in response to Message 43491. The basic problem appears to be that there is a conflict between the X11VNC server and BOINC. I can do one or the other, but not both. I will just uninstall X11VNC and maybe I can make do with BoincTasks for monitoring this machine, which is a dedicated machine anyway. Hopefully, a future Linux version will fix it. ID: 43492 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43493 - Posted: 22 May 2016, 9:20:30 UTC - in response to Message 43491. Last modified: 22 May 2016, 9:41:56 UTC Got Ubuntu 16.04-x64 LTS up and running last night via a USB stick installation. After 200MB of system updates, restarts and switching to the 361.42 binary drivers (which might not have been necessary - maybe a restart would have sufficed?) I configured Coolbits, restarted again and installed Boinc. Restarted again and then opened Boinc and attached to here. The work here is sparse, so I'm running POEM tasks. Configured Thermal Settings (GPU Fan speed to 80%). For comparison/reference, most POEM tasks take ~775sec (13min) to complete (range is 750 to 830sec) but some longer runs take ~1975sec (33min). Credit is either 5500 or 9100. Temps range from 69C to 73C, GPU clock is ~1278. It's an older AMD system and only PCIE2 x16 (5GT/s), but works fine with 2 CPU tasks and one GPU task running. Seems faster than W10x64. Memory mostly @ 6008MHz but occasionally jumped to 7010 of it's own accord (which I've never seen before on a 970). 30 valid GPU tasks since last night, no invalid's or errors. Overall I found 16.04 easier than previous distrobutions to setup for GPU crunching. Many previous distributions didn't have the GPU drivers in the repositories for ages. Hopefully with this being a LTS version the repository drivers will be maintained/updated reasonably frequently. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 43493 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 43494 - Posted: 22 May 2016, 12:48:18 UTC 3d3lR4-SDOERR_opm996-0-1-RND0292 I've received it after 10 days and 20 hours and 34 minutes 1. death's (hell yeah, death is International) desktop with i7-3770K and GTX 670 it has 41 successive errors 2. Alen's desktop with Core2 Quad Q6700 and GTS 450 it has 30 errors, 1 valid and 1 too late tasks (at least the errors have fixed) 3. Robert's desktop with i7-5820K and GTX 770 it has 1 not started by deadline, 1 error, 2 user aborts and 4 successful tasks (seems ok now) 4. Megacruncher TSBT's Xeon E5-2650v3 with GTX 780 it has 52 successive errors 5. ServicEnginIC's Pentium Dual-Core E6300 with GTX 750 it has 1 error and 5 successful tasks 6. Jonathan's desktop with i7-5820K and GTX 960 it has 7 successful and 3 timed out tasks ID: 43494 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 43495 - Posted: 22 May 2016, 20:58:09 UTC - in response to Message 43485. The POEM website does show one work unit completed under Linux at 2,757 seconds, which is faster than the 3,400 seconds that I get for that series (1vii) under Windows. Not that it matter much, but I must have misread BOINCTasks, and was comparing a 1vii to a 2k39, which always runs faster. So the Linux advantage is not quite that large. Comparing the same type of work units (this time 2dx3d) shows about 20.5 minutes for Win7, and 17 minutes for Linux, or about a 20% improvement (all on GTX 960s). That may be about what we see here. By the way, BOINCTasks is working nicely on Win7 to monitor the Linux machine, though you have to jump through some hoops to set the permissions on the folders in order to copy the app_config, gui_rpc_auth.cfg and remote_hosts.cfg. And that is after you find where Linux puts them; they are a bit spread out as compared to the BOINC Data folder in Windows. It is a learning experience. ID: 43495 · Rating: 0 · rate: / Reply Quote

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 43496 - Posted: 23 May 2016, 8:33:12 UTC How does a host with 2 cards have 6 WUs in progress https://www.gpugrid.net/results.php?hostid=326161 at one time Monday 23 May 8:36 UTC ID: 43496 · Rating: 0 · rate: / Reply Quote

Vagelis Giannadakis Send message Joined: 5 May 13 Posts: 187 Credit: 349,254,454 RAC: 0 Level Scientific publications	Message 43498 - Posted: 23 May 2016, 10:49:15 UTC On the topic of WU timeouts, while all the issues raised in this discussion can cause them, let me point to the most probable (IMO) cause, thoroughly reported by affected users, but not resolved as yet: GPUGRID's network issues Most of you know the discussions about these issues, with people reporting they can't upload results, downloads taking for ever, etc. One other way these issues manifest themselves is by "phantom" WU assignments, whereby a host requests for work, the server grants it work, but the HTTP request times out for the host and it never receives the positive response. The WU is assigned to the host, but the host has no knowledge of this, does not download it and the WU remains there, waiting to timeout! This has happened for me two or three times. I wanted to post the errored-out tasks, but they have been deleted. ID: 43498 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 43500 - Posted: 23 May 2016, 13:58:33 UTC - in response to Message 43498. On the topic of WU timeouts, while all the issues raised in this discussion can cause them, let me point to the most probable (IMO) cause, thoroughly reported by affected users, but not resolved as yet: GPUGRID's network issues Most of you know the discussions about these issues, with people reporting they can't upload results, downloads taking for ever, etc. One other way these issues manifest themselves is by "phantom" WU assignments, whereby a host requests for work, the server grants it work, but the HTTP request times out for the host and it never receives the positive response. The WU is assigned to the host, but the host has no knowledge of this, does not download it and the WU remains there, waiting to timeout! This has happened for me two or three times. I wanted to post the errored-out tasks, but they have been deleted. "GPUGRID's network issues" The GPUGrid network issues are a problem and they never seem to be addressed. Just looked at the WUs supposedly assigned to my machines and there are 2 phantom WUs that the server thinks I have, but I don't: https://www.gpugrid.net/workunit.php?wuid=11594782 https://www.gpugrid.net/workunit.php?wuid=11602422 As you allude, some of the timeout issues here are due to poor network setup/performance or perhaps BOINC misconfiguration. Maybe someone from one of the other projects could help them out. Haven't seen issues like this anywhere else and have been running BOINC extensively since its inception. Some of (and perhaps a lot of) the timeouts complained about in this thread are due to this poor BOINC/network setup/performance (take your pick). ID: 43500 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43501 - Posted: 23 May 2016, 14:00:28 UTC - in response to Message 43496. Last modified: 23 May 2016, 14:04:52 UTC How does a host with 2 cards have 6 WUs in progress https://www.gpugrid.net/results.php?hostid=326161 at one time Monday 23 May 8:36 UTC GPUGrid issues 'up to' 2 tasks per GPU and that system has 3 GPU's, though only 2 are NVidia GPU's! CPU type AuthenticAMD AMD A10-7700K Radeon R7, 10 Compute Cores 4C+6G [Family 21 Model 48 Stepping 1] Coprocessors [2] NVIDIA GeForce GTX 980 (4095MB) driver: 365.10, AMD Spectre (765MB) It's losing the 50% credit bonus but at least it's a reliable system. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 43501 · Rating: 0 · rate: / Reply Quote

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 43511 - Posted: 24 May 2016, 1:05:38 UTC - in response to Message 43501. Last modified: 24 May 2016, 1:19:14 UTC How does a host with 2 cards have 6 WUs in progress https://www.gpugrid.net/results.php?hostid=326161 at one time Monday 23 May 8:36 UTC GPUGrid issues 'up to' 2 tasks per GPU and that system has 3 GPU's, though only 2 are NVidia GPU's! CPU type AuthenticAMD AMD A10-7700K Radeon R7, 10 Compute Cores 4C+6G [Family 21 Model 48 Stepping 1] Coprocessors [2] NVIDIA GeForce GTX 980 (4095MB) driver: 365.10, AMD Spectre (765MB) It's losing the 50% credit bonus but at least it's a reliable system. If it has only 2 CUDA GPU's it should only get 2 WU's per CUDA GPU since this project does NOT send WU's to NON Cuda cards. Card switching is the answer, basically you put 3 cards into one host and get 6 tasks you then take a card out and put it into another host and get more tasks. And if you are asking whether I am accusing Caffeine of doing that....YES. I am. ID: 43511 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43516 - Posted: 24 May 2016, 8:42:59 UTC - in response to Message 43511. Last modified: 24 May 2016, 8:48:09 UTC How does a host with 2 cards have 6 WUs in progress https://www.gpugrid.net/results.php?hostid=326161 at one time Monday 23 May 8:36 UTC GPUGrid issues 'up to' 2 tasks per GPU and that system has 3 GPU's, though only 2 are NVidia GPU's! CPU type AuthenticAMD AMD A10-7700K Radeon R7, 10 Compute Cores 4C+6G [Family 21 Model 48 Stepping 1] Coprocessors [2] NVIDIA GeForce GTX 980 (4095MB) driver: 365.10, AMD Spectre (765MB) It's losing the 50% credit bonus but at least it's a reliable system. If it has only 2 CUDA GPU's it should only get 2 WU's per CUDA GPU since this project does NOT send WU's to NON Cuda cards. Card switching is the answer, basically you put 3 cards into one host and get 6 tasks you then take a card out and put it into another host and get more tasks. And if you are asking whether I am accusing Caffeine of doing that....YES. I am. Work fetch is where Boinc Manager comes into play and confuses the matter. GPUGrid would need to put in more server side configurations and routines to try and better deal with that, or remove the AMD app (possibly), but this problem just happened upon GPUGrid. Setting 1 WU per GPU would be simpler, and more fair (especially with so few tasks available), and go a long way to rectifying the situation. While GPUGrid doesn't presently have an active AMD/ATI app, it sort-of does have an AMD/ATI app - the MT app for CPU's+AMD's: https://www.gpugrid.net/apps.php Maybe somewhere on the GPUGrid server they can set something up so as not to send so many tasks, but I don't keep up with all the development of Boinc these days. It's not physical card switching (inserting and removing cards) because that's an integrated AMD/ATI GPU. Ideally the GPUGrid's server would recognised that there are only 2 NVidia GPU's and send out tasks going by that number, but it's a Boinc server that's used. While there is a way for the user/cruncher to exclude a GPU type against a project (Client_Configuration), it's very hands on and if AMD work did turn up here they wouldn't get any. http://boinc.berkeley.edu/wiki/Client_configuration My guess is that having 3 GPU's (even though one isn't useful) inflates the status of the system; as the system returns +ve results (no failure) it's rated highly, but more-so because there are 3 GPU's in it. So its even more likely to get work than a system with 2 GPU's with an identical yield. Not sure anything is being done deliberately. It's likely the iATI is being used exclusively for display purposes and why would you want to get 25% less credit for the same amount of work? From experience 'playing' with the use of various integrated and mixed GPU types, they are a pain to setup, and when you get it working you don't want to change anything. That might be the case here. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 43516 · Rating: 0 · rate: / Reply Quote

Tomas Brada Send message Joined: 3 Nov 15 Posts: 38 Credit: 6,768,093 RAC: 0 Level Scientific publications	Message 43517 - Posted: 24 May 2016, 10:37:23 UTC Last modified: 24 May 2016, 10:39:24 UTC x. ID: 43517 · Rating: 0 · rate: / Reply Quote

Tomas Brada Send message Joined: 3 Nov 15 Posts: 38 Credit: 6,768,093 RAC: 0 Level Scientific publications	Message 43518 - Posted: 24 May 2016, 10:37:29 UTC I notice lot of Users have great trouble installing Linux+BOINC. I am playing with the idea to write a sort-of guide to set-up basic Debian install and configuration of BOINC. Would you appreciate it? It should be on-line this or the next week. About that WU problem: Prime Grid project utilize "tickles". Large tasks are sent out with short deadline and if the tickle is successful and your computer active works on the task, the deadline is extended. Gpugrid project could benefit from this. ID: 43518 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 43519 - Posted: 24 May 2016, 12:20:12 UTC "Trickles", not "tickles". And I happen to think that GPUGrid's current deadlines are sufficient for most of its users to get done on time; I believe we don't need trickles. Interesting idea, though! NOTE: RNA World also uses trickles to auto-extend task deadlines on the server. Some of my tasks are approaching 300 days of compute time already :) ID: 43519 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43520 - Posted: 24 May 2016, 12:48:59 UTC - in response to Message 43518. Last modified: 24 May 2016, 12:51:26 UTC How to - install Ubuntu 16.04 x64 Linux & setup for GPUGrid Discussion of Ubuntu 16.04-x64 LTS installation and configuration ClimatePrediction uses trickle uploads too. Was suggested for here years ago but wasn't suitable then and probably still isn't. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 43520 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 43527 - Posted: 24 May 2016, 18:02:14 UTC - in response to Message 43494. 3d3lR4-SDOERR_opm996-0-1-RND0292 I've received it after 10 days and 20 hours and 34 minutes 1. death's (hell yeah, death is International) desktop with i7-3770K and GTX 670 it has 41 successive errors 2. Alen's desktop with Core2 Quad Q6700 and GTS 450 it has 30 errors, 1 valid and 1 too late tasks (at least the errors have fixed) 3. Robert's desktop with i7-5820K and GTX 770 it has 1 not started by deadline, 1 error, 2 user aborts and 4 successful tasks (seems ok now) 4. Megacruncher TSBT's Xeon E5-2650v3 with GTX 780 it has 52 successive errors 5. ServicEnginIC's Pentium Dual-Core E6300 with GTX 750 it has 1 error and 5 successful tasks 6. Jonathan's desktop with i7-5820K and GTX 960 it has 7 successful and 3 timed out tasks Here's the kind of thing that I find most mystifying. Running a 980Ti GPU, then holding the WU for 5 days until it gets sent out again, negating the usefulness of the next users contribution and missing all bonuses. Big waste of time and resources: https://www.gpugrid.net/workunit.php?wuid=11602161 ID: 43527 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43528 - Posted: 24 May 2016, 18:16:01 UTC - in response to Message 43527. Last modified: 24 May 2016, 18:27:51 UTC Doesn't bother cooling the $650 card either! # GPU 0 : 55C # GPU 0 : 59C # GPU 0 : 64C # GPU 0 : 68C # GPU 0 : 71C # GPU 0 : 73C # GPU 0 : 75C # GPU 0 : 76C # GPU 0 : 77C # GPU 0 : 78C # GPU 0 : 79C # GPU 0 : 80C # GPU 0 : 81C # BOINC suspending at user request (exit) # GPU [GeForce GTX 980 Ti] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 0 : # Name : GeForce GTX 980 Ti # ECC : Disabled # Global mem : 4095MB # Capability : 5.2 # PCI ID : 0000:02:00.0 # Device clock : 1076MHz # Memory clock : 3805MHz # Memory width : 384bit # Driver version : r364_69 : 36510 https://www.gpugrid.net/result.php?resultid=15108452 Also clear that the user keeps a high cache level and frequently gets wonky credits: https://www.gpugrid.net/results.php?hostid=331964 PS. Running a GERARD_FXCX... task on my Linux system (GTX970), and two on my W10 system. Looks like Linux is about ~16% faster than under W10, and that's with the W10 system being slightly Overclocked and being supported by a faster CPU. As observed before, the difference is likely higher for bigger cards. So with a GTX980Ti it's probably greater (maybe ~20%) and with a GTX750Ti it's probably less (maybe ~11%). FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 43528 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 43529 - Posted: 24 May 2016, 18:23:34 UTC Last modified: 24 May 2016, 18:24:14 UTC I have 2 GTX 980 Ti's in my new rig. I have an aggressive MSI Afterburner fan profile, that goes 0% fan @ 50C, to 100% fan @ 90C. One of my GPUs sees temps up-to-85C. Another up-to-75C. I consider the cooling adequate, so long as the clocks are stable. I'm working on finding the max stable overclocks, presently. Example result: https://www.gpugrid.net/result.php?resultid=15106295 So ... sometimes, a system just runs hot. Stable, but hot. All my systems are hot, overclocked to max stable clocks, CPU and GPU. They refuse to take their shirts off, and deliver a rockstar performance every time. ID: 43529 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 43530 - Posted: 24 May 2016, 18:24:08 UTC - in response to Message 43528. Possibly running other NV projects and not getting back to the GPUGrid WU until BOINC goes into panic mode. ID: 43530 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43531 - Posted: 24 May 2016, 18:34:15 UTC - in response to Message 43529. I have 2 GTX 980 Ti's in my new rig. I have an aggressive MSI Afterburner fan profile, that goes 0% fan @ 50C, to 100% fan @ 90C. One of my GPUs sees temps up-to-85C. Another up-to-75C. I consider the cooling adequate, so long as the clocks are stable. I'm working on finding the max stable overclocks, presently. Example result: https://www.gpugrid.net/result.php?resultid=15106295 So ... sometimes, a system just runs hot. Stable, but hot. All my systems are hot, overclocked to max stable clocks, CPU and GPU. They refuse to take their shirts off, and deliver a rockstar performance every time. # GPU 1 : 74C # GPU 0 : 78C # GPU 0 : 79C # GPU 0 : 80C # GPU 1 : 75C # GPU 0 : 81C # GPU 0 : 82C # GPU 0 : 83C # BOINC suspending at user request (exit) # GPU [GeForce GTX 980 Ti] Platform [Windows] Rev [3212] VERSION [65] That suggests to me that the GPU was running too hot, the task became unstable and the app suspended crunching for a bit and recovered (recoverable errors). Matt added that suspend-recover feature some time ago IIRC. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 43531 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 43532 - Posted: 24 May 2016, 18:42:16 UTC - in response to Message 43531. Last modified: 24 May 2016, 18:43:06 UTC I have 2 GTX 980 Ti's in my new rig. I have an aggressive MSI Afterburner fan profile, that goes 0% fan @ 50C, to 100% fan @ 90C. One of my GPUs sees temps up-to-85C. Another up-to-75C. I consider the cooling adequate, so long as the clocks are stable. I'm working on finding the max stable overclocks, presently. Example result: https://www.gpugrid.net/result.php?resultid=15106295 So ... sometimes, a system just runs hot. Stable, but hot. All my systems are hot, overclocked to max stable clocks, CPU and GPU. They refuse to take their shirts off, and deliver a rockstar performance every time. # GPU 1 : 74C # GPU 0 : 78C # GPU 0 : 79C # GPU 0 : 80C # GPU 1 : 75C # GPU 0 : 81C # GPU 0 : 82C # GPU 0 : 83C # BOINC suspending at user request (exit) # GPU [GeForce GTX 980 Ti] Platform [Windows] Rev [3212] VERSION [65] That suggests to me that the GPU was running too hot, the task became unstable and the app suspended crunching for a bit and recovered (recoverable errors). Matt added that suspend-recover feature some time ago IIRC. No. I believe you are incorrect. The simulation will only terminate/retry when it says: # The simulation has become unstable. Terminating to avoid lock-up (1) # Attempting restart Try not to jump to conclusions about hot machines :) I routinely search my results, for "stab", and if I find an instability message matching that text, I know to downclock my overclock a bit more. ID: 43532 · Rating: 0 · rate: / Reply Quote