Advanced search

Message boards : Number crunching : External GPU - BOINC's device number does not match acemd-922-80.exe's device number

Author Message
Matt Falcon
Send message
Joined: 4 Nov 17
Posts: 9
Credit: 40,025,925
RAC: 0
Level
Val
Scientific publications
wat
Message 50993 - Posted: 3 Dec 2018 | 17:51:11 UTC

I've got a 1060 6GB in an Akitio Node Thunderbolt 3 enclosure, and during the night, I'd like it to help heat my room, so I use it for BOINC.

I need to prevent GPUGrid from trying to use my laptop's built-in 940MX, though.

So, I searched around, and came up with cc_config.xml options of <exclude_gpu> <device_num> 0. That worked to avoid the 940MX. However, like most modern laptops, the computer also has an Intel GPU... I knew that'd get a bit tricky.

Seems that GPU numbering in BOINC is not well documented nor consistent - but worse than that, GPUGrid doesn't even honor the command line settings it's given.

If I run acemd-922-80.exe with --device 0, it reports back the 1060 (that's not right in any case, except maybe if that means "give me default" and nVidia funnels it to the fastest GPU).
If I run it with --device 1, it reports the 940MX. Sadly, that's the command line that BOINC is giving it, while believing it's running on "device 1" (my 1060).
If I run it with --device 2, it runs again on the 1060.

BOINC thinks that device 0 is the 940MX and device 1 is the 1060, and the Intel GPU is its own special little thing.

So, BOINC is telling GPUGrid to run on device 1 (1060) but GPUGrid starts running on the 940MX instead. Basically, to use GPUGrid, I'll have to dedicate my machine to doing only that, because BOINC and GPUGrid will overlap WUs on a single GPU instead of using one WU one each.

Le halp?

mmonnin
Send message
Joined: 2 Jul 16
Posts: 244
Credit: 647,700,389
RAC: 847
Level
Lys
Scientific publications
wat
Message 50994 - Posted: 3 Dec 2018 | 21:00:22 UTC - in response to Message 50993.
Last modified: 3 Dec 2018 | 21:01:40 UTC

With multiple manufactures they will each be device 0. You'll probably need the <type> command as well to exclude the Intel device at index 0.
https://boinc.berkeley.edu/wiki/Client_configuration

The event log at start of the BOINC client will mention the Device ID per manufacture.

Matt Falcon
Send message
Joined: 4 Nov 17
Posts: 9
Credit: 40,025,925
RAC: 0
Level
Val
Scientific publications
wat
Message 50997 - Posted: 4 Dec 2018 | 7:23:55 UTC - in response to Message 50994.

Hm, but the problem isn't that it's trying to run on the Intel card, but just that the index is different between BOINC (which calls GPUGrid), and GPUGrid itself. So BOINC thinks it's telling GPUGrid to run on one card, but it's really running on another.

Excluding a GPU from BOINC will just keep a WU from running on a device, but it won't change the indexing of cards.

Present example - a lucky break - is that two GPUGrid WUs are running right now - one "short run" and one "long run". It thinks that the "long" is running on device 0 (the 940MX) and the "short" is on device 1 (1060). The reality is, thankfully, the opposite. But this issue prevents me from keeping GPUGrid from running on the 940MX at all, which puts WUs at risk of being aborted because the 940MX is too slow for long run WUs... and I'd much rather have the 940MX crunching other projects instead.

Zalster
Avatar
Send message
Joined: 26 Feb 14
Posts: 174
Credit: 4,013,368,076
RAC: 109,212
Level
Arg
Scientific publications
watwatwat
Message 50998 - Posted: 4 Dec 2018 | 8:22:55 UTC - in response to Message 50997.
Last modified: 4 Dec 2018 | 8:23:56 UTC

Just curious, what does the startup log says about all this? and what does your cc_config look like. Could you post both so we can look them over?

Matt Falcon
Send message
Joined: 4 Nov 17
Posts: 9
Credit: 40,025,925
RAC: 0
Level
Val
Scientific publications
wat
Message 50999 - Posted: 4 Dec 2018 | 9:27:44 UTC

Sure - here's my cc_config.xml, just enabling multiple GPUs:

<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>


And the startup log...

12/4/2018 1:20:42 AM | | Starting BOINC client version 7.14.2 for windows_x86_64
12/4/2018 1:20:42 AM | | log flags: file_xfer, sched_ops, task
12/4/2018 1:20:42 AM | | Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
12/4/2018 1:20:42 AM | | Data directory: C:\ProgramData\BOINC
12/4/2018 1:20:42 AM | | Running under account Falcon
12/4/2018 1:20:44 AM | | CUDA: NVIDIA GPU 0: GeForce GTX 1060 6GB (driver version 417.01, CUDA version 10.0, compute capability 6.1, 4096MB, 3564MB available, 4568 GFLOPS peak)
12/4/2018 1:20:44 AM | | CUDA: NVIDIA GPU 1: GeForce 940MX (driver version 417.01, CUDA version 10.0, compute capability 5.0, 2048MB, 1686MB available, 881 GFLOPS peak)
12/4/2018 1:20:44 AM | | OpenCL: NVIDIA GPU 0: GeForce GTX 1060 6GB (driver version 417.01, device version OpenCL 1.2 CUDA, 6144MB, 3564MB available, 4568 GFLOPS peak)
12/4/2018 1:20:44 AM | | OpenCL: NVIDIA GPU 0: GeForce GTX 1060 6GB (driver version 417.01, device version OpenCL 1.2 CUDA, 6144MB, 3564MB available, 4568 GFLOPS peak)
12/4/2018 1:20:44 AM | | OpenCL: NVIDIA GPU 1: GeForce 940MX (driver version 417.01, device version OpenCL 1.2 CUDA, 2048MB, 1686MB available, 881 GFLOPS peak)
12/4/2018 1:20:44 AM | | OpenCL: NVIDIA GPU 1: GeForce 940MX (driver version 417.01, device version OpenCL 1.2 CUDA, 2048MB, 1686MB available, 881 GFLOPS peak)
12/4/2018 1:20:44 AM | | OpenCL: Intel GPU 0: Intel(R) HD Graphics 630 (driver version 22.20.16.4799, device version OpenCL 2.1, 6489MB, 6489MB available, 211 GFLOPS peak)
12/4/2018 1:20:44 AM | | OpenCL CPU: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz (OpenCL driver vendor: Intel(R) Corporation, driver version 7.2.0.10, device version OpenCL 2.1 (Build 10))
12/4/2018 1:20:44 AM | | Host name: DESKTOP-JKHBDQ2
12/4/2018 1:20:44 AM | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz [Family 6 Model 158 Stepping 9]
12/4/2018 1:20:44 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 pbe fsgsbase bmi1 hle smep bmi2
12/4/2018 1:20:44 AM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.17763.00)
12/4/2018 1:20:44 AM | | Memory: 15.86 GB physical, 20.34 GB virtual
12/4/2018 1:20:44 AM | | Disk: 476.34 GB total, 177.30 GB free
12/4/2018 1:20:44 AM | | Local time is UTC -8 hours
12/4/2018 1:20:44 AM | | No WSL found.
12/4/2018 1:20:44 AM | GPUGRID | Found app_config.xml
12/4/2018 1:20:44 AM | GPUGRID | Missing <app_config> in app_config.xml
12/4/2018 1:20:44 AM | Milkyway@Home | Found app_config.xml
12/4/2018 1:20:44 AM | SETI@home | Found app_config.xml
12/4/2018 1:20:44 AM | | Config: use all coprocessors
12/4/2018 1:20:44 AM | climateprediction.net | URL http://climateprediction.net/; Computer ID 1385937; resource share 1500
12/4/2018 1:20:44 AM | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 12162809; resource share 0
12/4/2018 1:20:44 AM | GPUGRID | URL http://www.gpugrid.net/; Computer ID 493460; resource share 750
12/4/2018 1:20:44 AM | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 715608; resource share 100
12/4/2018 1:20:44 AM | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 8365185; resource share 100
12/4/2018 1:20:44 AM | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 3461727; resource share 750
12/4/2018 1:20:44 AM | World Community Grid | General prefs: from World Community Grid (last modified 02-Jan-2018 18:29:34)
12/4/2018 1:20:44 AM | World Community Grid | Host location: none
12/4/2018 1:20:44 AM | World Community Grid | General prefs: using your defaults
12/4/2018 1:20:44 AM | | Reading preferences override file
12/4/2018 1:20:44 AM | | Preferences:
12/4/2018 1:20:44 AM | | max memory usage when active: 12181.12 MB
12/4/2018 1:20:44 AM | | max memory usage when idle: 14617.35 MB
12/4/2018 1:20:44 AM | | max disk usage: 180.25 GB
12/4/2018 1:20:44 AM | | max CPUs used: 2
12/4/2018 1:20:44 AM | | (to change preferences, visit a project web site or select Preferences in the Manager)
12/4/2018 1:20:44 AM | | Setting up project and slot directories
12/4/2018 1:20:44 AM | | Checking active tasks
12/4/2018 1:20:44 AM | | Setting up GUI RPC socket
12/4/2018 1:20:44 AM | | Checking presence of 696 project files
12/4/2018 1:20:44 AM | GPUGRID | Sending scheduler request: Requested by project.
12/4/2018 1:20:44 AM | GPUGRID | Requesting new tasks for Intel GPU


Now... this is definitely odd. If I exclude the 1060... (man, it takes forever to shut down GPUGrid ;) )

12/4/2018 1:23:25 AM | | Starting BOINC client version 7.14.2 for windows_x86_64
12/4/2018 1:23:25 AM | | log flags: file_xfer, sched_ops, task
12/4/2018 1:23:25 AM | | Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
12/4/2018 1:23:25 AM | | Data directory: C:\ProgramData\BOINC
12/4/2018 1:23:25 AM | | Running under account Falcon
12/4/2018 1:23:26 AM | | CUDA: NVIDIA GPU 0: GeForce 940MX (driver version 417.01, CUDA version 10.0, compute capability 5.0, 2048MB, 1686MB available, 881 GFLOPS peak)
12/4/2018 1:23:26 AM | | OpenCL: NVIDIA GPU 0: GeForce 940MX (driver version 417.01, device version OpenCL 1.2 CUDA, 2048MB, 1686MB available, 881 GFLOPS peak)
12/4/2018 1:23:26 AM | | OpenCL: Intel GPU 0: Intel(R) HD Graphics 630 (driver version 22.20.16.4799, device version OpenCL 2.1, 6489MB, 6489MB available, 211 GFLOPS peak)
12/4/2018 1:23:26 AM | | OpenCL CPU: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz (OpenCL driver vendor: Intel(R) Corporation, driver version 7.2.0.10, device version OpenCL 2.1 (Build 10))
12/4/2018 1:23:26 AM | | Host name: DESKTOP-JKHBDQ2
12/4/2018 1:23:26 AM | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz [Family 6 Model 158 Stepping 9]
12/4/2018 1:23:26 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 pbe fsgsbase bmi1 hle smep bmi2
12/4/2018 1:23:26 AM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.17763.00)
12/4/2018 1:23:26 AM | | Memory: 15.86 GB physical, 20.34 GB virtual
12/4/2018 1:23:26 AM | | Disk: 476.34 GB total, 177.30 GB free
12/4/2018 1:23:26 AM | | Local time is UTC -8 hours
12/4/2018 1:23:26 AM | | No WSL found.
12/4/2018 1:23:26 AM | GPUGRID | Found app_config.xml
12/4/2018 1:23:26 AM | GPUGRID | Missing <app_config> in app_config.xml
12/4/2018 1:23:26 AM | Milkyway@Home | Found app_config.xml
12/4/2018 1:23:26 AM | SETI@home | Found app_config.xml
12/4/2018 1:23:26 AM | | Config: use all coprocessors
12/4/2018 1:23:26 AM | climateprediction.net | URL http://climateprediction.net/; Computer ID 1385937; resource share 1500
12/4/2018 1:23:26 AM | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 12162809; resource share 0
12/4/2018 1:23:26 AM | GPUGRID | URL http://www.gpugrid.net/; Computer ID 493460; resource share 750
12/4/2018 1:23:26 AM | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 715608; resource share 100
12/4/2018 1:23:26 AM | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 8365185; resource share 100
12/4/2018 1:23:26 AM | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 3461727; resource share 750
12/4/2018 1:23:26 AM | World Community Grid | General prefs: from World Community Grid (last modified 02-Jan-2018 18:29:34)
12/4/2018 1:23:26 AM | World Community Grid | Host location: none
12/4/2018 1:23:26 AM | World Community Grid | General prefs: using your defaults
12/4/2018 1:23:26 AM | | Reading preferences override file
12/4/2018 1:23:26 AM | | Preferences:
12/4/2018 1:23:26 AM | | max memory usage when active: 12181.12 MB
12/4/2018 1:23:26 AM | | max memory usage when idle: 14617.35 MB
12/4/2018 1:23:26 AM | | max disk usage: 180.25 GB
12/4/2018 1:23:26 AM | | max CPUs used: 2
12/4/2018 1:23:26 AM | | (to change preferences, visit a project web site or select Preferences in the Manager)
12/4/2018 1:23:26 AM | | Setting up project and slot directories
12/4/2018 1:23:26 AM | | Checking active tasks
12/4/2018 1:23:26 AM | | Setting up GUI RPC socket
12/4/2018 1:23:26 AM | | Checking presence of 696 project files
12/4/2018 1:23:26 AM | GPUGRID | Sending scheduler request: To fetch work.
12/4/2018 1:23:26 AM | GPUGRID | Requesting new tasks for Intel GPU
12/4/2018 1:23:28 AM | GPUGRID | Scheduler request completed: got 0 new tasks


... then of course GPU 0 becomes the 940MX. That's probably where I thought GPU 0 was the 940MX. Restarted it again, and they again shuffled so the 1060 is in "0" slot and 940mx in the "1" slot. But come to think of it, I hadn't really had concrete evidence of the relationship.

So, say I'm running without the 1060 connected. Is there a way in cc_config.xml to exclude a GPU by its name, not by ID (which seems to change to stay sequential, no matter what they end up being)?

So, sounds like there's no issue in the device ID assignment, but still leaves a hole in figuring out how to keep GPUGrid from trying to run on the 940mx, specifically...

Zalster
Avatar
Send message
Joined: 26 Feb 14
Posts: 174
Credit: 4,013,368,076
RAC: 109,212
Level
Arg
Scientific publications
watwatwat
Message 51000 - Posted: 4 Dec 2018 | 14:24:35 UTC - in response to Message 50999.

I see a couple of issues.

First your app_config is missing a <app_config> somewhere in there as BOINC is ignoring it.

Looks like BOINC assigns the 1060 as device 0 when it's connected and when it's not then the 940MX is device 0

We can ignore the intel build in as it does not have CUDA so none of the work units will run on it.

So that takes us back to only 2 usable GPUs, the external 1060 and the built in 940MX.

You will probably have to use the ignore device 1 to use the external GPU but have the device attached prior to starting BOINC. If you wait to attach after you launch BOINC then it's going to get confused.

Let's see if straightens things out.
____________

mmonnin
Send message
Joined: 2 Jul 16
Posts: 244
Credit: 647,700,389
RAC: 847
Level
Lys
Scientific publications
wat
Message 51001 - Posted: 4 Dec 2018 | 14:34:52 UTC

BOINC is seeing 3 GPUs, two of which are device 0. 1 per device manufacture. Setup cc_config per the Wiki link with device 1 for the 940MX with type option.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 495
Credit: 4,213,764,376
RAC: 3,192,902
Level
Arg
Scientific publications
watwatwat
Message 51002 - Posted: 4 Dec 2018 | 14:41:59 UTC

The intel GPU is not supported by GPUGrid so I don't think this will affect the GPU order

Matt Falcon
Send message
Joined: 4 Nov 17
Posts: 9
Credit: 40,025,925
RAC: 0
Level
Val
Scientific publications
wat
Message 51003 - Posted: 4 Dec 2018 | 16:46:41 UTC
Last modified: 4 Dec 2018 | 16:56:22 UTC

Right, the Intel GPU was never part of the problem :P

My app_config is blank, intentionally, as I originally tried setting the options there (where I wanted it to be app-specific) and just left the blank file there instead of digging in to delete it. Shouldn't be affecting anything.

My goal is to be able to use BOINC with or without the 1060 attached, without it trying to run GPUGrid on the built-in GPU. So, making sure the 1060 is attached first isn't much an option (plus, BOINC can't hotplug GPUs, so it has to be shut down first to attach/remove the GPU).

I'll have to look into that "type option" real quick. That'd do the trick if it's a thing :)

edit: there doesn't appear to be any way to direct an ID assignment to a specific card name with cc_config.xml. :(

also overnight update: 8-1/2 hours into crunching, and the 1060 is 78.5% done with the Long Runs WU, and the 940MX is 35% done with a Short WU. Yeah, kinda expected that... imagine if it accidentally got a "long run" ;)

Zalster
Avatar
Send message
Joined: 26 Feb 14
Posts: 174
Credit: 4,013,368,076
RAC: 109,212
Level
Arg
Scientific publications
watwatwat
Message 51004 - Posted: 4 Dec 2018 | 18:20:17 UTC - in response to Message 51003.

Ok, now that I have a better understanding of what you want to do, we should try to BOINC to ignore the built in GPU. Never tried this on GPUGrid but.....

On Einstein I did have success with the following

<cc_config>
<options>
<exclude_gpu>
<url>http://einstein.phys.uwm.edu/</url>
<device_num>2</device_num>
<app>einsteinbinary_BRP4G</app>
</exclude_gpu>
<exclude_gpu>
<url>http://einstein.phys.uwm.edu/</url>
<device_num>0</device_num>
<app>einsteinbinary_BRP5</app>
</exclude_gpu>
<exclude_gpu>
<url>http://einstein.phys.uwm.edu/</url>
<device_num>1</device_num>
<app>einsteinbinary_BRP5</app>
</exclude_gpu>
</options>
</cc_config>


so I think we should be able to use it here. Will need to trim it. I think it might work without the <app></app> section so that it ignores all work from this website. So maybe something like

<cc_config>
<options>
<exclude_gpu>
<url>http://www.gpugrid.net/</url>
<device_num>1</device_num>
</exclude_gpu>
</options>
</cc_config>


Not on a machine that is currently running GPUGrid so can't test it.

Z


____________

Matt Falcon
Send message
Joined: 4 Nov 17
Posts: 9
Credit: 40,025,925
RAC: 0
Level
Val
Scientific publications
wat
Message 51005 - Posted: 5 Dec 2018 | 7:22:37 UTC
Last modified: 5 Dec 2018 | 7:23:36 UTC

That would just keep GPUGrid from using any device numbered "1", which would keep the 940MX from doing anything when the 1060 is connected (as 0), but with it disconnected (as now, writing this from bed :) ), it'd revert to using device 0, which would be the 940MX.

I guess there's just no way to do this with the current BOINC architecture, without a way to identify a specific GPU by name. Not a huge issue now that I know the device IDs are properly assigned, though I worry if a "long" WU gets stuck being computed by "1". During the cold winters, I'm almost always running BOINC with the GPU connected for heating, so it's OK.

Probably best to add that just as a safeguard, if nothing else. :)

Thanks for the input, everyone! I think it was a mix of my own brainfart (in the initial subject, here) and limitations of BOINC itself. I'll call this resolved!

mmonnin
Send message
Joined: 2 Jul 16
Posts: 244
Credit: 647,700,389
RAC: 847
Level
Lys
Scientific publications
wat
Message 51006 - Posted: 5 Dec 2018 | 22:13:53 UTC - in response to Message 51005.

That would just keep GPUGrid from using any device numbered "1", which would keep the 940MX from doing anything when the 1060 is connected (as 0), but with it disconnected (as now, writing this from bed :) ), it'd revert to using device 0, which would be the 940MX.

I guess there's just no way to do this with the current BOINC architecture, without a way to identify a specific GPU by name. Not a huge issue now that I know the device IDs are properly assigned, though I worry if a "long" WU gets stuck being computed by "1". During the cold winters, I'm almost always running BOINC with the GPU connected for heating, so it's OK.

Probably best to add that just as a safeguard, if nothing else. :)

Thanks for the input, everyone! I think it was a mix of my own brainfart (in the initial subject, here) and limitations of BOINC itself. I'll call this resolved!


That's why the <type> command is needed to ignore just the single manufacture of a certain ID which can only be one single card in any given system.

This is what I used to ignore the 1st NV GPU listed in BOINC startup in my own system.

<exclude_gpu>-->
<url>http://xansons4cod.com/xansons4cod/</url>
<device_num>0</device_num>
<type>NVIDIA</type>
</exclude_gpu>

<exclude_gpu>-->
<url>https://albertathome.org/</url>
<device_num>0</device_num>
<type>NVIDIA</type>
</exclude_gpu>


That is a 980Ti that I use for FAH and for awhile I had a 2nd NV 970 card I used for BOINC. Adding this to cc_config is the only way to keep BOINC from running on the 980Ti and only on the 970.

Change the URL and device_num = 1 when its displayed as 'CUDA: NVIDIA GPU 1: GeForce 940MX"

Matt Falcon
Send message
Joined: 4 Nov 17
Posts: 9
Credit: 40,025,925
RAC: 0
Level
Val
Scientific publications
wat
Message 51008 - Posted: 6 Dec 2018 | 8:12:31 UTC

That's the thing, there are two nVidia cards here when the 1060 is connected. It shifts so that 0 = 1060, 1 = 940MX (BOTH are nVidia!), but when the 1060 is unplugged, it becomes 0 = 940MX (again, nVidia). So all I'd be doing with that config is excluding "whatever device 1 happens to be", which would still leave it with 0 = 940MX being enabled when the 1060 is unplugged.

It's OK. I did that anyway, since I'm not really running BOINC with the GPU unplugged. It just needs to keep the 940MX from trying to crunch a "long" WU ;) And it's doing that just fine now. :)

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 244
Credit: 234,046,463
RAC: 1,114
Level
Leu
Scientific publications
wat
Message 51009 - Posted: 6 Dec 2018 | 21:52:14 UTC

I'm going to be going this route myself and exclude the RTX 2080 from both Einstein and GPUGRid. Want to keep the other 3 Nvidia cards running those projects though.

Zalster
Avatar
Send message
Joined: 26 Feb 14
Posts: 174
Credit: 4,013,368,076
RAC: 109,212
Level
Arg
Scientific publications
watwatwat
Message 51010 - Posted: 6 Dec 2018 | 23:47:43 UTC - in response to Message 51009.

Glad I could help Keith.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 244
Credit: 234,046,463
RAC: 1,114
Level
Leu
Scientific publications
wat
Message 51011 - Posted: 7 Dec 2018 | 3:42:38 UTC

I'm not having any luck with excluding device 0 for both GPUGrid and Einstein. Doing so prevents all Seti cpu tasks from running and puts them into "waiting to run" status.

The excludes are recognized as correct in the Event Log startup. But they prevent Seti cpu tasks from running. Seti gpu tasks continue to run as well as Milkyway gpu tasks.

Don't know what's causing the problem. Only way to make things work correctly for Seti is to remove the excludes for a basic cc_config.

For now I will just have to suspend both projects on that host until someone can tell me what is going on and how to correctly exclude the 2080 from those projects but allow the other Nvidia cards to run those projects.

Zalster
Avatar
Send message
Joined: 26 Feb 14
Posts: 174
Credit: 4,013,368,076
RAC: 109,212
Level
Arg
Scientific publications
watwatwat
Message 51012 - Posted: 7 Dec 2018 | 6:43:05 UTC - in response to Message 51011.

Keith PM your cc_config so I can see what it looks like.
____________

Matt Falcon
Send message
Joined: 4 Nov 17
Posts: 9
Credit: 40,025,925
RAC: 0
Level
Val
Scientific publications
wat
Message 51172 - Posted: 1 Jan 2019 | 22:49:23 UTC
Last modified: 1 Jan 2019 | 22:58:59 UTC

Okay, after a while of crunching like this, I've determined that there IS definitely a bug of some kind in how GPUGrid and/or BOINC handles device ID assignments.

Take a look at this screenshot:
https://imgur.com/1BjG4ka
(can't seem to attach, and also can't seem to thumbnail-link, so this is the best I can give you. jeez, I even had to URL-tag wrap it to make it clickable)

Here, you can see BOINC is assigning GPUGrid to ID#0 (the 1060) and Einstein@Home to ID#1 (which should be the 940MX) - a perfect assignment.

However, if you look at the nVidia status window, it shows both projects competing for processing power on the 1060. It works, but it dramatically slows down the GPUGrid workunit. I discovered this just last night as a WU I was crunching had been running for over 1 full day on the 1060, something that should obviously never happen... as soon as I was able to rip other projects away from that GPU and let the same WU continue running, its GPU TDP shot up by about 30% (from 40% TDP to 70% TDP) and the GPUGrid WU started crunching way faster. Still not sure if it completed the WU in the deadline... :(

I tried just shutting off GPU computing for all other projects, as so:

<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
<exclude_gpu>
<url>http://www.gpugrid.net/</url>
<device_num>1</device_num>
</exclude_gpu>
<exclude_gpu>
<url>http://setiathome.berkeley.edu/</url>
<type>NVIDIA</type>
</exclude_gpu>
<exclude_gpu>
<url>http://milkyway.cs.rpi.edu/milkyway/</url>
<type>NVIDIA</type>
</exclude_gpu>
<exclude_gpu>
<url>http://einstein.phys.uwm.edu/</url>
<type>NVIDIA</type>
<type>intel_gpu</type>
</exclude_gpu>
</options>
</cc_config>


... But as you can see, the Einstein@Home listing is out here giving zero f^ks about what I told it.

Look at the log in that screenshot as well. The IDs are clearly assigned, and should not be overlapping on the same GPU.

What can we do with this information?

Matt Falcon
Send message
Joined: 4 Nov 17
Posts: 9
Credit: 40,025,925
RAC: 0
Level
Val
Scientific publications
wat
Message 51173 - Posted: 2 Jan 2019 | 1:36:05 UTC
Last modified: 2 Jan 2019 | 1:36:51 UTC

wow this forum's functionality is extremely limited (not being able to edit after a time period, not being able to post images inline without an external host, etc etc etc 2003 internet things)

anyway, solved my own problem, found that nVidia's own "what's running on what" indicator is actually faulty. Another dual-GPU system I had was reporting similar issues with BOINC, but I wasn't even using GPUGrid on that one. It's just the tool that's stupid. GPU-Z shows the real info, and all is well in the world.

Zalster
Avatar
Send message
Joined: 26 Feb 14
Posts: 174
Credit: 4,013,368,076
RAC: 109,212
Level
Arg
Scientific publications
watwatwat
Message 51174 - Posted: 2 Jan 2019 | 2:22:47 UTC - in response to Message 51173.
Last modified: 2 Jan 2019 | 2:24:05 UTC

Since you are using windows, might want to look at BoincTasks and SIVx64. I like it as it gives better idea of what my system is doing.

http://rh-software.com/
https://efmer.com/
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 244
Credit: 234,046,463
RAC: 1,114
Level
Leu
Scientific publications
wat
Message 51175 - Posted: 2 Jan 2019 | 2:32:08 UTC - in response to Message 51173.

Have you tried Nvidia's built-in tool that is available? nvidia-smi?? It shows what is running on each gpu, the amount of utilization, memory usage, power usage in watts and each application running on each particular gpu.

You can find it in C:\Program Files\NVIDIA Corporation\NVSMI

Just open a command window and type nvidia-smi.

Aurum
Send message
Joined: 12 Jul 17
Posts: 92
Credit: 7,219,554,643
RAC: 1,118,023
Level
Tyr
Scientific publications
wat
Message 51180 - Posted: 2 Jan 2019 | 13:23:27 UTC
Last modified: 2 Jan 2019 | 13:28:41 UTC

Have you tried???

<use_all_gpus>0</use_all_gpus>
A 1060 (6.1) is more capable than a 940MX (5.0).

<use_all_gpus>0|1</use_all_gpus>
If 1, use all GPUs (otherwise only the most capable ones are used). Requires a client restart.

https://developer.nvidia.com/cuda-gpus

Post to thread

Message boards : Number crunching : External GPU - BOINC's device number does not match acemd-922-80.exe's device number