Message boards :
Wish list :
Correct reporting of coprocessors (GPUs)
Message board moderation
| Author | Message |
|---|---|
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
When you have more than one GPU in a system but of mixed version (from same company ATI/NVidia) the computer information details are inaccurate. Only one GPU is listed and the second or third cards are only indicated by a number [2]. For example:
FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
MJHSend message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]()
|
Yeah, that's pretty annoying. I've not yet turned my attention to website side of things. It's on the list though.. Matt |
|
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This "issue" however is also present at other projects, so it seems to me more a BOINC problem than on project level. Greetings from TJ |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Yeah, ideally this would be done by Berkeley, but Matt's GPUGrid app reads the GPU's, knows which GPU a WU is using and now reports this correctly in the stderr output file (on Windows), so in theory it could GPU info to the Computer Information Details under Coprocessor (rather than using what Boinc reports). As long as we have the information we might as well use it to correct what isn't read/reported accurately. Maybe some of the methods and code could be passed to Boinc central for Boinc incorporation, at some stage? FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Matt's GPUGrid app reads the GPU's, knows which GPU a WU is using and now reports this correctly in the stderr output file Then hopefully we can get a fix for sending WUs to GPUs with too little memory to handle them, currently Noelia WUs which need a GPU with 1GB of memory whhile Santi and Nathan WUs run great on 768mb GPUs. It gets old aborting stuck Noelias every day (which also slow other [non-nvidia] processes to a crawl). |
MJHSend message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]()
|
Yes - if you can persuade DA to fix it, that'd be great. MJH |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Errors, omissions and bugs excepted, BOINC already reports considerable details of each GPU in a system - have a look at the internals of a sched_request.xml file sometime (the most recent one for each attached project is kept in the root of the BOINC Data directory until overwritten). If the 6GB of the Titan is still not being detected properly, that might be a 32-bit glitch in the CUDA runtime support in the consumer-grade drivers. We had problems in some cases with Keplers too: BOINC can't afford to purchase samples of every high-end card, and Rom Walton solved that one by remoting in to my GTX 670 (a problem with the high word of a 64-bit variable passed 'by reference' not being initialised in the way Rom expected, IIRC). If you have test Titans in the lab that could be made available to Rom via remote access, that might help fix that bug. The issue of multiple GPUs in a single host being shown as multiple instances of the best GPU is cosmetic: although full details are passed back, the BOINC database hasn't been adapted to store, and more importantly display, such fine-grained detail. But I do think there's a fundamental design problem which is going to cause problems here. When a task is allocated to a volunteer's computer, the capability checking is done on the server: questions like "does the host have a >1 GB CUDA card", "does the host have a double-precision ATI card" can be addressed. But - under the present design - those task requirements are not passed back to the host and re-tested locally when the task is scheduled. Thus, if a particular host has both a 1 GB card and a 768 MB card, there's no way of telling it (after allocation) not to run a 1 GB task on the 768 MB card. That whole problem of heterogeneous resources within a single host is going to require a major re-write, I think. |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
When a task is allocated to a volunteer's computer, the capability checking is done on the server: questions like "does the host have a >1 GB CUDA card", "does the host have a double-precision ATI card" can be addressed. But - under the present design - those task requirements are not passed back to the host and re-tested locally when the task is scheduled. However for hosts with one NVidia GPU or multiple GPUs with similar amounts of memory it is currently possible to allocate appropriate sized tasks. Is that correct? |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
When a task is allocated to a volunteer's computer, the capability checking is done on the server: questions like "does the host have a >1 GB CUDA card", "does the host have a double-precision ATI card" can be addressed. But - under the present design - those task requirements are not passed back to the host and re-tested locally when the task is scheduled. I'm pretty confident it is. If every NVidia card in a system meets, for example, a cuda55_himem plan class requiring >= 1 GB, server and client should work in harmony. But with mixed card memory sizes, the server would still allocate tasks, and there's nothing the project can do to make them pick the right card. Individual users can set up detailled cc_config options (they'd have to set <use _all_gpus> anyway, so that might be possible). But I fear that some users might not, or might get it wrong. It's not a 'set and forget' scenario, as far as the project is concerned. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The issue of multiple GPUs in a single host being shown as multiple instances of the best GPU is cosmetic Actually, it doesn't report multiple instances of the best GPU ↲ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
©2025 Universitat Pompeu Fabra