GPUs not being used?

Message boards : Graphics cards (GPUs) : GPUs not being used?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 38982 - Posted: 20 Nov 2014, 23:13:01 UTC

I built a box and put 1 EVGA GeForceGTX 780 in it. It worked fine with GPUGRID for 2 months. I just put 2 more in there; one EVGA and one PNY (yes completely compatible and working in the system.) Now that they are in, the first one is working exactly as it was at 90-92%. The PNY is working also at 90%. The BOINC Manager shows 2 tasks being run now. The third card, which is showing as properly installed, works with a monitor as the main monitor, and also shows up with NVidiaInspector (like the other 2) is not working any tasks and is sitting at 0%. This third one will, on occasion, go up to 9% or somewhere lower than that, as I switch windows, but I cannot get it to do a task with GPUGRID. I don't see any settings for the project or the BOINC Manager itself, so I need so advice or help on how to get this third graphics card working on its own task like the other 2 are. TY
ID: 38982 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Carlesa25
Avatar

Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38985 - Posted: 20 Nov 2014, 23:44:20 UTC - in response to Message 38982.  
Last modified: 20 Nov 2014, 23:45:06 UTC

Hi, possibly modifying - cc_config.xml - to use all GPUs BOINC system:

<cc_config>
<options>
<report_results_immediately>1</report_results_immediately>
<use_all_gpus>1</ use_all_gpus>
</options>
</cc_config>

With this configuration task forces report without waiting and force BOINC to use all GPUs present.

I hope will be useful. Greetings.
ID: 38985 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 38986 - Posted: 21 Nov 2014, 0:20:18 UTC - in response to Message 38985.  

OK, when I do a search for that file, I can't find it. I found reference to it in stdoutgpudetect.txt where it keeps repeating the message:

20-Nov-2014 05:10:51 [---] cc_config.xml not found - using defaults

Should I make a new file with that name and the text you gave, should I reinstall, or is there a way to force the program to create a new copy of it?
ID: 38986 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 38987 - Posted: 21 Nov 2014, 1:16:09 UTC - in response to Message 38986.  

I went ahead and made that file and started BOINC again. It looks like it has been accepted so I will wait out a full cycle of tasks to see if the other GPU kicks in.
ID: 38987 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Carlesa25
Avatar

Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38990 - Posted: 21 Nov 2014, 12:08:31 UTC - in response to Message 38987.  

I went ahead and made that file and started BOINC again. It looks like it has been accepted so I will wait out a full cycle of tasks to see if the other GPU kicks in.


Hello: "cc_config.xml" is on - boinc / data - (using OS-Windows) if you are using a version of Boinc> 7.2.42 will be a file with many variables (all 0) just look for the same as I have appointed you modify them and putting - 1 - instead of - 0 -

If an older version just paste the file - cc_config.xml - in - boinc / data - and restart in the "Event Log" Boinc Manager will see if you read the configuration file and if it detects all GPUs. Greetings.
ID: 38990 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38999 - Posted: 21 Nov 2014, 23:47:00 UTC

It looks like this thread needs a link on how to setup cc_config.xml.
Here:
http://boinc.berkeley.edu/wiki/Client_configuration
ID: 38999 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39151 - Posted: 16 Dec 2014, 3:44:24 UTC - in response to Message 38999.  

Carlesa, thank you again.

Jacob, thank you very much for that page (and linked pages from that page)!

Sorry it took a while. After a short while of working, that PC stopped working. After putting in the second and third GPU and adding another 4Tb HDD I exceeded the PSU and killed it. Now that I have a new PSU in it I have modified the cc_config.xml a little more. It is still only running the 2 GPUs and the third is doing nothing without help.

By "help" I mean that I have cheated the instructions that BOINC is running under and manually added a slot "2" to the Slots folder of BOINC in the ProgramData folder. Then I copy the oldest of the 2 slots contents (minus the lockfile) to the new slot folder. After that, I open cmd, cd to the slot 2 folder, then do the command:
C:\ProgramData\BOINC\slots\2>C:\ProgramData\BOINC\projects\www.gpugrid.net\acemd.847-65.exe projects/www.gpugrid.net/acemd.847-65.exe   --device 2
The output to this is:
# ACEMD Molecular Dynamics Version [3212]
# CUDA Synchronisation mode: BLOCKING
# CUDA Synchronisation mode: BLOCKING
# SWAN: Created context 0 on GPU 2
# SWAN Device 2 :
#       Name            : GeForce GTX 780
#       ECC             : Disabled
#       Global mem      : 3072MB
#       Capability      : 3.5
#       PCI ID          : 0000:03:00.0
#       Device clock    : 993MHz
#       Memory clock    : 3004MHz
#       Memory width    : 384bit
# SWAN Device 2 :
#       Name            : GeForce GTX 780
#       ECC             : Disabled
#       Global mem      : 3072MB
#       Capability      : 3.5
#       PCI ID          : 0000:03:00.0
#       Device clock    : 993MHz
#       Memory clock    : 3004MHz
#       Memory width    : 384bit
#       Driver version  : r343_00 : 34475
# SWAN: Configuring Peer Access:
# -
# SWAN NVAPI Version: NVidia Complete Version 1.10


Hopefully I am not corrupting the results, but it then goes and does the one task twice as fast taking the total GPU usage from 48% to 72.5%. Notwithstanding, it only works per task and once it is done (twice as fast) the cmd comes back to a command prompt and BOINC continues with only the 2 tasks running on the first two GPUs.

So I still need help. I am just not sure of anything right now when it comes to what is going wrong, but that may be my lack of knowledge about the program.
1 Corinthians 9:16 "For though I preach the gospel, I have nothing to glory of: for necessity is laid upon me; yea, woe is unto me, if I preach not the gospel!"
Ephesians 6:18-20, please ;-)
http://tbc-pa.org
ID: 39151 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39152 - Posted: 16 Dec 2014, 3:56:10 UTC - in response to Message 39151.  

I actually do see a few corrupted (errored out) tasks in my online task logs, so it looks like cheating does have its consequences. I need to find a solution that actually downloads and works 3 tasks as one per GPU the way the program is supposed to and not 'rigged to blow'.
ID: 39152 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39153 - Posted: 16 Dec 2014, 4:01:36 UTC - in response to Message 39152.  

In addition to any help that can be offered on these forums, is anyone willing to help people on here actually directly check my installation, files, settings, etc via something like TeamViewer? Direct help can save a lot of frustration for me trying to make it work and for those helping by just troubleshooting and doing the fix instead of the back and forth. TYYTYTYTYTYVM in advance for any help or suggestions that are given.
ID: 39153 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39195 - Posted: 18 Dec 2014, 19:47:32 UTC - in response to Message 39153.  
Last modified: 18 Dec 2014, 19:48:33 UTC

Your GPU's are too hot. You need to keep them reasonably cool. Use MSI Afterburner or similar to set fan speeds.

FAQ - Useful Tools
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 39195 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39221 - Posted: 19 Dec 2014, 23:42:24 UTC - in response to Message 39195.  
Last modified: 19 Dec 2014, 23:43:16 UTC

I am using nvidiaInspector's Overclocking options to do nothing but up the fan speed, but what would make you think they are too hot? Does heated GPUs cause the BOINC Manager to only load 2 slots and use 2 devices when 3 are noticed by the OS, nvidiaInspector, and can manually be loaded via command line? I wouldn't think heat is the reason it only loads the top 2 (device 0 and device 1) even if there are 3 slots with units in them, which rarely happens unless I "Suspend" one unit of work and it loads a new third one to run on the one I turned off. But if I do "Resume" any '3rd' unit after 2 are already running, it will sit in "Waiting" mode until one of the other 2 finishes and then it will turn back on and run.

I even tried changing the third "init_data.xml" while the manager was off to
<gpu_device_num>2</gpu_device_num>
<gpu_opencl_dev_index>2</gpu_opencl_dev_index>
, but as soon as the manager is started again, it changes those values back to 0 or 1 and the same result happens.

At this point I have to ask...
Does anyone run 3 different GPUs in one computer and all 3 GPUs load and run work units continuously? Is the program built to even allow that?
ID: 39221 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39222 - Posted: 19 Dec 2014, 23:48:01 UTC - in response to Message 39221.  
Last modified: 19 Dec 2014, 23:51:00 UTC

I mean I honestly bought 2 extra $400 GPUs to run THIS project and it is frustrating that one refuses to be used. I don't even game!

Which BTW, I have switched positions of the GPU cards and no matter which configuration, the top 2 are used by GPUGRID and the bottom one sits idle. So device 0 and device 1 run the project and device 2 will not.
ID: 39222 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey

Send message
Joined: 2 Jan 09
Posts: 303
Credit: 7,321,800,090
RAC: 270
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39223 - Posted: 20 Dec 2014, 11:42:34 UTC - in response to Message 39222.  
Last modified: 20 Dec 2014, 11:43:28 UTC

I mean I honestly bought 2 extra $400 GPUs to run THIS project and it is frustrating that one refuses to be used. I don't even game!

Which BTW, I have switched positions of the GPU cards and no matter which configuration, the top 2 are used by GPUGRID and the bottom one sits idle. So device 0 and device 1 run the project and device 2 will not.


Are you leaving any cpu cores free for the gpu's to use? I guessing you DID do the cc_config.xml file to <use_all_gpus> too? Does Boinc itself see all 3 gpu's? Look at the 'event log' on startup and it should list all 3 gpu's, if not you may have to load the drivers again for the 3rd card. Windows sometimes requires that to happen for each gpu in the system, other times it doesn't. After that it may come down to the motherboard, what brand and model do you have?
ID: 39223 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39230 - Posted: 20 Dec 2014, 19:01:49 UTC - in response to Message 39223.  

The log does see all three GPUs listed one 3 different lines and numbers them 0, 1, and 2.
I did change <use_all_gpus> to a value of 1.
I have 1 CPU allocated at the value of 1%. The rest of my CPU cores and usage I have allocated to distributed.net and have for years. Does the amount of cores or the % of cores from the CPU affect the project's usage of GPUs to where it would only allow 2 GPUs to run tasks?

As far as the specs, I am running an Intel Core i7 4960X CPU @ 3.60GHz OCd to 4124.9 MHz (33.0 x 125.0 MHz), 64GB Quad Channel DDR3 RAM @ 833.4 MHz, on an ASUSTeK X79-DELUXE Rev 1.xx, with American Megatrends Inc. BIOS 0701 - 01/07/2014 ROM size 8192 KB.

Hope this helps you help me. (Again, I have TeamViewer running if any kind soul would like to pop in and take a look, I would be happy to allow that.)
ID: 39230 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey

Send message
Joined: 2 Jan 09
Posts: 303
Credit: 7,321,800,090
RAC: 270
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39241 - Posted: 21 Dec 2014, 12:39:01 UTC - in response to Message 39230.  
Last modified: 21 Dec 2014, 12:39:48 UTC

The log does see all three GPUs listed one 3 different lines and numbers them 0, 1, and 2.
I did change <use_all_gpus> to a value of 1.
I have 1 CPU allocated at the value of 1%. The rest of my CPU cores and usage I have allocated to distributed.net and have for years. Does the amount of cores or the % of cores from the CPU affect the project's usage of GPUs to where it would only allow 2 GPUs to run tasks?

As far as the specs, I am running an Intel Core i7 4960X CPU @ 3.60GHz OCd to 4124.9 MHz (33.0 x 125.0 MHz), 64GB Quad Channel DDR3 RAM @ 833.4 MHz, on an ASUSTeK X79-DELUXE Rev 1.xx, with American Megatrends Inc. BIOS 0701 - 01/07/2014 ROM size 8192 KB.

Hope this helps you help me. (Again, I have TeamViewer running if any kind soul would like to pop in and take a look, I would be happy to allow that.)


Try suspending your cpu project and see if the 3rd gpu starts crunching, if so then yes it's causing problems.

As for "I did change <use_all_gpus> to a value of 1", 1 means yes and zero means no, so yes you should be using all 3.

There IS a problem at some projects where Boinc won't use two Nvidia cards no matter what the settings are, I wonder if you have found a new problem with 3 cards? The only thing someone can do at those projects is use the <exclude_gpu> line to make one crunch for a different project. To test that do you happen to have an AMD card laying around? If so can you take out the 3rd Nvidia gpu and put in the AMD one and see if it tries to get work or not?

Have you tried using a 'dummy plug' on the cards that do NOT have a monitor plugged into them yet? Windows has a bad habit of disabling things during startup if nothing is plugged into a device, if a gpu is disabled that way it won't be enabled except thru a restart.

The only other thing I can think of is have you looked on the Asus message boards to see if there is a problem using 3 cards on that model motherboard?

I do not use Team Viewer so would not feel comfortable using it, sorry.
ID: 39241 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39248 - Posted: 21 Dec 2014, 18:21:47 UTC - in response to Message 39241.  

OK, quick update.

After killing the CPU project, still no love for the GPUGRID project getting more units than 2 to work on at one time.

I never had any graphics cards and only ever used on-board graphics before building this rig. First time I ever had the money to make rather than buy mass produced cheap rigs.

I have not tried using a dummy plug, but thanks for asking. Thinking about that question, I noticed that the one that won't get a task is the one that I have the monitor actually plugged in to. That may or may not make a difference since I rebooted with the monitor unplugged from the PC completely and it still won't load a third task.

I have been watching the Event Log as I try to Update for new tasks and start and stop the BOINC Manager and I noticed that even when I Update for a new task, it reads
12/21/2014 1:03:20 PM | GPUGRID | Sending scheduler request: Requested by user.
12/21/2014 1:03:20 PM | GPUGRID | Not requesting tasks
12/21/2014 1:03:22 PM | GPUGRID | Scheduler request completed
This leads me to believe that the issue is in the program itself and not with the hardware. This may be a false lead, but it is not a big leap to get to that conclusion either. If the program sees 3 GPUs
12/21/2014 12:56:04 PM |  | CUDA: NVIDIA GPU 0: GeForce GTX 780 (driver version 344.75, CUDA version 6.5, compute capability 3.5, 3072MB, 2779MB available, 4878 GFLOPS peak)
12/21/2014 12:56:04 PM |  | CUDA: NVIDIA GPU 1: GeForce GTX 780 (driver version 344.75, CUDA version 6.5, compute capability 3.5, 3072MB, 2809MB available, 4698 GFLOPS peak)
12/21/2014 12:56:04 PM |  | CUDA: NVIDIA GPU 2: GeForce GTX 780 (driver version 344.75, CUDA version 6.5, compute capability 3.5, 3072MB, 2809MB available, 4576 GFLOPS peak)
12/21/2014 12:56:04 PM |  | OpenCL: NVIDIA GPU 0: GeForce GTX 780 (driver version 344.75, device version OpenCL 1.1 CUDA, 3072MB, 2779MB available, 4878 GFLOPS peak)
12/21/2014 12:56:04 PM |  | OpenCL: NVIDIA GPU 1: GeForce GTX 780 (driver version 344.75, device version OpenCL 1.1 CUDA, 3072MB, 2809MB available, 4698 GFLOPS peak)
12/21/2014 12:56:04 PM |  | OpenCL: NVIDIA GPU 2: GeForce GTX 780 (driver version 344.75, device version OpenCL 1.1 CUDA, 3072MB, 2809MB available, 4576 GFLOPS peak)
12/21/2014 12:56:04 PM |  | Host name: BeastMode
12/21/2014 12:56:04 PM |  | Processor: 12 GenuineIntel        Intel(R) Core(TM) i7-4960X CPU @ 3.60GHz [Family 6 Model 62 Stepping 4]
12/21/2014 12:56:04 PM |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes f16c rdrandsyscall nx lm avx vmx tm2 dca pbe fsgsbase smep
12/21/2014 12:56:04 PM |  | OS: Microsoft Windows 8.1: Professional x64 Edition, (06.03.9600.00)
12/21/2014 12:56:04 PM |  | Memory: 63.94 GB physical, 107.43 GB virtual
12/21/2014 12:56:04 PM |  | Disk: 465.42 GB total, 337.59 GB free
12/21/2014 12:56:04 PM |  | Local time is UTC -5 hours
12/21/2014 12:56:04 PM |  | Config: report completed tasks immediately
12/21/2014 12:56:04 PM |  | Config: use all coprocessors
12/21/2014 12:56:04 PM |  | Config: fetch minimal work
12/21/2014 12:56:04 PM |  | Config: fetch on update
12/21/2014 12:56:04 PM | GPUGRID | URL http://www.gpugrid.net/; Computer ID xxxxxx; resource share 100
12/21/2014 12:56:04 PM | GPUGRID | General prefs: from GPUGRID (last modified 19-Dec-2014 18:57:09)
12/21/2014 12:56:04 PM | GPUGRID | Computer location: home
12/21/2014 12:56:04 PM | GPUGRID | General prefs: no separate prefs for home; using your defaults
12/21/2014 12:56:04 PM |  | Preferences:
12/21/2014 12:56:04 PM |  | max memory usage when active: 65470.82MB
12/21/2014 12:56:04 PM |  | max memory usage when idle: 65470.82MB
12/21/2014 12:56:04 PM |  | max disk usage: 232.71GB
12/21/2014 12:56:04 PM |  | max CPUs used: 1
12/21/2014 12:56:04 PM |  | (to change preferences, visit a project web site or select Preferences in the Manager)
12/21/2014 12:56:04 PM |  | Not using a proxy
but won't get tasks for them all, then a hardware issue seems less likely than something in the code itself or a setting I am just missing. I do have it set on the site to fetch work for 5 days, but I am not sure if that setting is only valid if you have other connection settings set?

I will check the ASUS website for issues with 3 GPUs.

If I did find a new bug with using 3 GPUs, how/to whom would I report such a thing? Is this forum enough for them to see that and respond, test, or fix the issue?
ID: 39248 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39250 - Posted: 21 Dec 2014, 21:11:52 UTC
Last modified: 21 Dec 2014, 21:29:24 UTC

There is a lot of confusion in this thread.

I run 3 GPUs, and have no work fetch issues with them while running BOINC 7.4.36.

Manipulating slots directories, or running executables directly, or editing projects/slots .xml files, is wrong wrong wrong. Don't do it.

I see you have BOINC showing 3 GPUs in the Event Log when BOINC starts up. That's good. So, it is finding them. Is the concern that you have downloaded tasks that won't run? Or is the concern that it won't even download 3 tasks?

Assuming it is a work fetch concern... Okay, edit your cc_config.xml and turn on <work_fetch_debug>, then restart BOINC, then show us what a work fetch iteration looks like.

Also, please note that, at the moment, GPUGrid is on fumes, and may not be able to provide GPU work for every request. If you are attached to other projects that have GPU apps, you should be able to get GPU work from them.

The best way for us to help you diagnose a work fetch behavior, is to turn on work_fetch_debug and give us some output to look at. I'm an expert at looking at work fetch output. If you'd like to try to translate it yourself, feel free to have a look at this post:
http://www.bitcoinutopia.net/bitcoinutopia/forum_thread.php?id=691&postid=7369

Once you give us some debug output, we can try to help you further.

PS: If you are still up for doing a TeamViewer session, I would be willing to connect and take a peek. I'm an excellent troubleshooter, usually. Send me a Private Message with details.

Regards,
Jacob
ID: 39250 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39252 - Posted: 22 Dec 2014, 2:52:17 UTC - in response to Message 39250.  
Last modified: 22 Dec 2014, 2:53:16 UTC

{{{Warning, long answer on its way.}}}

Thank you very much for your reply Jacob. I think I caused some of the confusion by being ignorant of what information to provide, some because of my ignorance of how BOINC works compared to other distributed projects I have worked on in the past, and some because of my fondness and willingness to troubleshoot and tinker to fix things in order to understand them rather than understand them in order to troubleshoot and tinker. Some of the confusion was also caused by replies to my issues with good answers that I just didn't understand or did not apply.

I think I understand your information better than much of the answers and help that I have gotten so far. Deciphering (not writing) code and reading the manual are two of my strong points because of my previous job working with the coders and doing customer support for a project I learned first for the company after our company acquired a different company that relied more on code than on Windows front end programs. I was pretty much in charge of helping rewrite the manual while simultaneously going through the manual with the product in hand making sure what the manual said is what the product did. Then I was tasked with teaching much of the rest of the staff on the product line that the company eventually made its main line for years. After that, I did customer service and troubleshooting along with working back and forth between the coders and the CS dept on bugs, new issues, upgrades, and old versions. So reading your information about logs and reading the linked 'man' pages on scheduling, I know much of why I confused people in my requests and why they were confused with my answers.

So I turned the BOINC client. Then I turned on the <work_fetch_debug> and thought that while I was at it, I would turn on the <sched_op_debug> to see the output of both. Here is the result:
12/21/2014 8:35:17 PM |  | Starting BOINC client version 7.4.27 for windows_x86_64
12/21/2014 8:35:17 PM |  | log flags: file_xfer, sched_ops, task, sched_op_debug, slot_debug, task_debug
12/21/2014 8:35:17 PM |  | log flags: work_fetch_debug
12/21/2014 8:35:17 PM |  | Libraries: libcurl/7.33.0 OpenSSL/1.0.1h zlib/1.2.8
12/21/2014 8:35:17 PM |  | Data directory: C:\ProgramData\BOINC
12/21/2014 8:35:17 PM |  | Running under account Mike
12/21/2014 8:35:17 PM |  | CUDA: NVIDIA GPU 0: GeForce GTX 780 (driver version 344.75, CUDA version 6.5, compute capability 3.5, 3072MB, 2665MB available, 4878 GFLOPS peak)
12/21/2014 8:35:17 PM |  | CUDA: NVIDIA GPU 1: GeForce GTX 780 (driver version 344.75, CUDA version 6.5, compute capability 3.5, 3072MB, 2809MB available, 4698 GFLOPS peak)
12/21/2014 8:35:17 PM |  | CUDA: NVIDIA GPU 2: GeForce GTX 780 (driver version 344.75, CUDA version 6.5, compute capability 3.5, 3072MB, 2809MB available, 4576 GFLOPS peak)
12/21/2014 8:35:17 PM |  | OpenCL: NVIDIA GPU 0: GeForce GTX 780 (driver version 344.75, device version OpenCL 1.1 CUDA, 3072MB, 2665MB available, 4878 GFLOPS peak)
12/21/2014 8:35:17 PM |  | OpenCL: NVIDIA GPU 1: GeForce GTX 780 (driver version 344.75, device version OpenCL 1.1 CUDA, 3072MB, 2809MB available, 4698 GFLOPS peak)
12/21/2014 8:35:17 PM |  | OpenCL: NVIDIA GPU 2: GeForce GTX 780 (driver version 344.75, device version OpenCL 1.1 CUDA, 3072MB, 2809MB available, 4576 GFLOPS peak)
12/21/2014 8:35:17 PM |  | Host name: BeastMode
12/21/2014 8:35:17 PM |  | Processor: 12 GenuineIntel        Intel(R) Core(TM) i7-4960X CPU @ 3.60GHz [Family 6 Model 62 Stepping 4]
12/21/2014 8:35:17 PM |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes f16c rdrandsyscall nx lm avx vmx tm2 dca pbe fsgsbase smep
12/21/2014 8:35:17 PM |  | OS: Microsoft Windows 8.1: Professional x64 Edition, (06.03.9600.00)
12/21/2014 8:35:17 PM |  | Memory: 63.94 GB physical, 107.43 GB virtual
12/21/2014 8:35:17 PM |  | Disk: 465.42 GB total, 337.70 GB free
12/21/2014 8:35:17 PM |  | Local time is UTC -5 hours
12/21/2014 8:35:17 PM |  | Config: report completed tasks immediately
12/21/2014 8:35:17 PM |  | Config: use all coprocessors
12/21/2014 8:35:17 PM |  | Config: fetch minimal work
12/21/2014 8:35:17 PM |  | Config: fetch on update
12/21/2014 8:35:17 PM | GPUGRID | URL http://www.gpugrid.net/; Computer ID 189656; resource share 100
12/21/2014 8:35:17 PM | GPUGRID | General prefs: from GPUGRID (last modified 21-Dec-2014 12:59:37)
12/21/2014 8:35:17 PM | GPUGRID | Computer location: home
12/21/2014 8:35:17 PM | GPUGRID | General prefs: no separate prefs for home; using your defaults
12/21/2014 8:35:17 PM |  | Preferences:
12/21/2014 8:35:17 PM |  | max memory usage when active: 65470.82MB
12/21/2014 8:35:17 PM |  | max memory usage when idle: 65470.82MB
12/21/2014 8:35:17 PM |  | max disk usage: 232.71GB
12/21/2014 8:35:17 PM |  | (to change preferences, visit a project web site or select Preferences in the Manager)
12/21/2014 8:35:17 PM |  | [work_fetch] Request work fetch: Prefs update
12/21/2014 8:35:17 PM |  | [work_fetch] Request work fetch: Startup
12/21/2014 8:35:17 PM |  | Not using a proxy
12/21/2014 8:35:18 PM |  | [work_fetch] ------- start work fetch state -------
12/21/2014 8:35:18 PM |  | [work_fetch] target work buffer: 180.00 + 432000.00 sec
12/21/2014 8:35:18 PM |  | [work_fetch] --- project states ---
12/21/2014 8:35:18 PM | GPUGRID | [work_fetch] REC 355175.655 prio -1.000 can request work
12/21/2014 8:35:18 PM |  | [work_fetch] --- state for CPU ---
12/21/2014 8:35:18 PM |  | [work_fetch] shortfall 5186160.00 nidle 12.00 saturated 0.00 busy 0.00
12/21/2014 8:35:18 PM | GPUGRID | [work_fetch] share 1.000
12/21/2014 8:35:18 PM |  | [work_fetch] --- state for NVIDIA GPU ---
12/21/2014 8:35:18 PM |  | [work_fetch] shortfall 1296540.00 nidle 3.00 saturated 0.00 busy 0.00
12/21/2014 8:35:18 PM | GPUGRID | [work_fetch] share 1.000
12/21/2014 8:35:18 PM |  | [work_fetch] ------- end work fetch state -------
12/21/2014 8:35:18 PM | GPUGRID | [sched_op] Starting scheduler request
12/21/2014 8:35:18 PM | GPUGRID | [work_fetch] request: CPU (1.00 sec, 12.00 inst) NVIDIA GPU (1.00 sec, 3.00 inst)
12/21/2014 8:35:18 PM | GPUGRID | Sending scheduler request: To fetch work.
12/21/2014 8:35:18 PM | GPUGRID | Requesting new tasks for CPU and NVIDIA GPU
12/21/2014 8:35:18 PM | GPUGRID | [sched_op] CPU work request: 1.00 seconds; 12.00 devices
12/21/2014 8:35:18 PM | GPUGRID | [sched_op] NVIDIA GPU work request: 1.00 seconds; 3.00 devices
12/21/2014 8:35:20 PM | GPUGRID | Scheduler request completed: got 0 new tasks
12/21/2014 8:35:20 PM | GPUGRID | [sched_op] Server version 613
12/21/2014 8:35:20 PM | GPUGRID | No tasks sent
12/21/2014 8:35:20 PM | GPUGRID | No tasks are available for Short runs (2-3 hours on fastest card)
12/21/2014 8:35:20 PM | GPUGRID | No tasks are available for ACEMD beta version
12/21/2014 8:35:20 PM | GPUGRID | No tasks are available for Long runs (8-12 hours on fastest card)
12/21/2014 8:35:20 PM | GPUGRID | No tasks are available for the applications you have selected.
12/21/2014 8:35:20 PM | GPUGRID | Project requested delay of 31 seconds
12/21/2014 8:35:20 PM | GPUGRID | [slot] linked projects/www.gpugrid.net/logogpugrid.png to projects/www.gpugrid.net/stat_icon
12/21/2014 8:35:20 PM | GPUGRID | [slot] linked projects/www.gpugrid.net/project_1.png to projects/www.gpugrid.net/slideshow_ga_00
12/21/2014 8:35:20 PM | GPUGRID | [slot] linked projects/www.gpugrid.net/project_1.png to projects/www.gpugrid.net/slideshow_cellmd_00
12/21/2014 8:35:20 PM | GPUGRID | [slot] linked projects/www.gpugrid.net/project_2.png to projects/www.gpugrid.net/slideshow_ga_01
12/21/2014 8:35:20 PM | GPUGRID | [slot] linked projects/www.gpugrid.net/project_2.png to projects/www.gpugrid.net/slideshow_cellmd_01
12/21/2014 8:35:20 PM | GPUGRID | [slot] linked projects/www.gpugrid.net/project_3.png to projects/www.gpugrid.net/slideshow_ga_02
12/21/2014 8:35:20 PM | GPUGRID | [slot] linked projects/www.gpugrid.net/project_3.png to projects/www.gpugrid.net/slideshow_cellmd_02
12/21/2014 8:35:20 PM | GPUGRID | [work_fetch] backing off CPU 580 sec
12/21/2014 8:35:20 PM | GPUGRID | [work_fetch] backing off NVIDIA GPU 312 sec
12/21/2014 8:35:20 PM | GPUGRID | [sched_op] Deferring communication for 00:00:31
12/21/2014 8:35:20 PM | GPUGRID | [sched_op] Reason: requested by project
12/21/2014 8:35:20 PM |  | [work_fetch] Request work fetch: RPC complete
12/21/2014 8:35:52 PM |  | [work_fetch] Request work fetch: Backoff ended for GPUGRID

As you can tell from the log, and as I had previously not mentioned but should have, is my usage of BOINC. I am not sure I made it clear in all that I have written or maybe I did but it was scattered across several posts:
The ONLY project I run with BOINC is GPUGRID. I have, since starting to troubleshoot, turned on all of the resources of my computer to the BOINC clinet, now knowing that the GPUGRID project has very little use for my CPUs, my memory (virtual or physical), my network resources, or my drive space and what little it does need, it has plenty to draw from without denting anything else at all. I do have a CPU intensive distributed project running from distributed.net and it does NOT use the BOINC client at all, as it is a separate install completely. The only other distributed project I ever worked on before distributed.net was the United Devices project that ran under several names such as grid.org, Intel's Crunch for the Cure, and UD.com/uniteddevices.org. That was pretty much the first ever publicly accessible distributed project and it also was not BOINC, but a stand-alone install. So in turn, this being my first and only BOINC project, I was not aware of project priorities (to add to your confusion of what might be causing my issue, but your input still helped and hopefully will continue [and conclude] immensely), how the work fetch even works, or why I was asked about my CPU availability when GPUGRID seems only to use a total of like 1% for each running task anyway.

So looking at the logs and the results after both debugs are turned on, it seems that is it asking for work for 3 GPUs and that it sees all 3 GPUs with both debugs and without any. I have not tinkered enough to see how many tasks it has actually stored into memory/hard drive to work on, but as I mention earlier in this thread, I once was able to "Suspend" or pause one taks and another one started. In that one instance, I was not able to pause anymore and get more to start. I assumed that was due to the fact that it knew I only had 3 GPUs, so it would not allow 4 active tasks, even if some are "Suspended", but now it may be because it only collected 3 tasks when fetching, so it only had 3 to work with until they were done. But the question still remains, why will only 2 GPUs work on tasks at one time even when it has 3 tasks to work on and knows I have 3 GPUs to work them on?

Additional: I have also, during the course of the past 2 days, uninstalled and reinstalled the BOINC client completely as to undo any tinkering and troubleshooting I had done. I know some information gets passed to the servers which in turn got passed back down to the client, but those things, I think, are more practical use than my experimental troubleshooting related.

Also, yeah, I did confirm that manually adding slots and copying files and running from the command line DOES return tasks that are either a complete error or cannot be validated. So yeah, learned my lesson there on that troubleshooting/tinkering escapade.

To answer one of your direct questions, and hopefully you have already figured out the answer
So, it is finding them. Is the concern that you have downloaded tasks that won't run? Or is the concern that it won't even download 3 tasks?
The answer is no. lol My issue is not tasks that download and never run. My issue is not with the client to fail to download 3 tasks. The issue is that it may download 3 tasks, but never runs on more than 2 tasks at one time. It will load 1 task on one GPU and then a second task on a second GPU and then not run a third task while those other 2 are running. So most times it is running (when not on holiday) 2 tasks (one each on 2 different GPUs) and never will it run 3 although occasionally it is only running 1 due to the fact that once both of the first 2 tasks run, the third task will want to complete before the client gets more tasks. So if it downloads 2, it will run those two until those 2 are done. If it downloads 3, it will finish all 3 before getting any more. If it downloads 1, then it will go get a second one, but will then, in turn, not get more until both are done. I hope that is clear on all the iterations I have witnessed. I realize now that it would seem my "min" is 1 and my "max" is 3 (based on the amount of resources available when the work_fetch does its evaluations). I also may not have changed (before the holiday slow down) the report_results_immediately, which may or may not have an effect on the work fetch process or just has to do with the way results are reported for 'scoring' purposes.

You say you are running BOINC 7.4.36, but according to http://boinc.berkeley.edu/download_all.php?xml=1 the recommended Windows 64-bit version is 7.4.27 and that is the version I am running. Should I find an update or run the 32bit version, which seems to have a higher version number in order to try to fix this? Or a Beta version that I don't have?

I realize that now, as the work units are tough to find out of GPUGRID, may not be the best time to get you in here to troubleshoot, as I am not working any tasks at all. When the holidays are over I will certainly be back all over this and ready to let someone take a personal look. If you think you can figure it out without actual tasks loaded, I am willing to have you take a look.

A question sort of off topic, but when I was doing the work for UD/grid.org, they would send out a minimum (with no max) on how many times any one work unit would be sent out. Many of them were probably run hundreds of times. The reason for this is error reduction in getting consistent results from different end users (reducing "jitter"), some end users would take too long or not return results at all, and during times (like this holiday) when they would all simply be away they would let the servers give out copies of the same tasks over and over. Why doesn't GPUGRID do this? The first answer that comes to mind is that BOINC has so many projects running that a great majority of the users could get active tasks from so many other sources that some time off from GPUGRID won't even go noticed. But I would think that as long as somebody somewhere wants to work on your project, keep feeding them work, even if just for validation and jitter reduction reasons.

Mike
ID: 39252 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39253 - Posted: 22 Dec 2014, 3:45:05 UTC - in response to Message 39252.  
Last modified: 22 Dec 2014, 3:52:25 UTC

Ok, nice answer :) I see you have a technical background, that's good. I'll give you answers that are hopefully at the right "level" of making sense for you. PS: Because of the GPUGrid work shortage, I won't be able to conclusively tell you what the problem is. But I am willing to troubleshoot this as long as it takes to help you solve it.

Here goes.

BOINC is primarily meant to set it and forget it. Configure your settings, attach to some projects, and let it do it's thing. The Advanced view has tabs for Projects and Tasks, and you should familiarize yourself with the buttons there. Especially the Tasks grid. When a GPU task is running, the task's status will include something like "Running (0.667 CPUs + 1 NVIDIA GPU (device 0))", telling you how much CPU it is budgeting as "used" for the GPU task, and which GPU it is running on.

I'd also recommend attaching to more projects than just GPUGrid, and setting up your "Use at most X% of the processors" computing preference to be equal to the number of threads you'd like to let BOINC manage.

Looking at your log posting, I see...

Starting BOINC client version 7.4.27
... If you'd like to upgrade to the latest release candidates (I do recommend 7.4.36), feel free to bookmark:
http://boinc.berkeley.edu/download_all.php

12/21/2014 8:35:17 PM | | CUDA: NVIDIA GPU 0: GeForce GTX 780 (driver version 344.75, CUDA version 6.5, compute capability 3.5, 3072MB, 2665MB available, 4878 GFLOPS peak)
12/21/2014 8:35:17 PM | | CUDA: NVIDIA GPU 1: GeForce GTX 780 (driver version 344.75, CUDA version 6.5, compute capability 3.5, 3072MB, 2809MB available, 4698 GFLOPS peak)
12/21/2014 8:35:17 PM | | CUDA: NVIDIA GPU 2: GeForce GTX 780 (driver version 344.75, CUDA version 6.5, compute capability 3.5, 3072MB, 2809MB available, 4576 GFLOPS peak)
12/21/2014 8:35:17 PM | | OpenCL: NVIDIA GPU 0: GeForce GTX 780 (driver version 344.75, device version OpenCL 1.1 CUDA, 3072MB, 2665MB available, 4878 GFLOPS peak)
12/21/2014 8:35:17 PM | | OpenCL: NVIDIA GPU 1: GeForce GTX 780 (driver version 344.75, device version OpenCL 1.1 CUDA, 3072MB, 2809MB available, 4698 GFLOPS peak)
12/21/2014 8:35:17 PM | | OpenCL: NVIDIA GPU 2: GeForce GTX 780 (driver version 344.75, device version OpenCL 1.1 CUDA, 3072MB, 2809MB available, 4576 GFLOPS peak)
... perfectly seeing your 3 GPUs (note: driver 347.09 beta is available now, and appears to work fine in BOINC)

Damn nice machine! 12-threads, 64GB RAM, 3 GTX 780 GPUs -- #Jealous

12/21/2014 8:35:18 PM | | [work_fetch] ------- start work fetch state -------
12/21/2014 8:35:18 PM | | [work_fetch] target work buffer: 180.00 + 432000.00 sec
... I see your buffer settings are maintain at least 0 days (BOINC enforces a 3-minute low-water-mark to allow some time to ask projects for work, which is why you see 180.00 there)
... and allow an additional 432000secs = 5 days cache

12/21/2014 8:35:18 PM | | [work_fetch] --- project states ---
12/21/2014 8:35:18 PM | GPUGRID | [work_fetch] REC 355175.655 prio -1.000 can request work
... "can request work" is good. BOINC would say "can't request work" and give a reason, if a reason applied. Some reasons are: Project set to No New Tasks, Project set to Suspended, or one of the Project's tasks is suspended (that's right, it will not request more work from a Project, if you have a suspended task for that project).

12/21/2014 8:35:18 PM | | [work_fetch] --- state for CPU ---
12/21/2014 8:35:18 PM | | [work_fetch] shortfall 5186160.00 nidle 12.00 saturated 0.00 busy 0.00
... 12 CPUs, times 432180 high-water-mark-per-resource, equals 5186160.00 instance-seconds of shortfall. nidle (number idle) shows 12 idle CPUs.

12/21/2014 8:35:18 PM | | [work_fetch] --- state for NVIDIA GPU ---
12/21/2014 8:35:18 PM | | [work_fetch] shortfall 1296540.00 nidle 3.00 saturated 0.00 busy 0.00
... 3 GPUs, times 432180 high-water-mark-per-resource, equals 1296540.00 instance-seconds of shortfall. nidle (number idle) shows 3 idle NVIDIA GPUs.

12/21/2014 8:35:18 PM | | [work_fetch] ------- end work fetch state -------
... time for work fetch to decide if it should ask a project for work

12/21/2014 8:35:18 PM | GPUGRID | [sched_op] Starting scheduler request
12/21/2014 8:35:18 PM | GPUGRID | [work_fetch] request: CPU (1.00 sec, 12.00 inst) NVIDIA GPU (1.00 sec, 3.00 inst)
12/21/2014 8:35:18 PM | GPUGRID | Sending scheduler request: To fetch work.
12/21/2014 8:35:18 PM | GPUGRID | Requesting new tasks for CPU and NVIDIA GPU
... it has decided to ask GPUGRID for 12 CPU instances, and 3 NVIDIA GPU instances. I'm a bit curious why the sec values are only 1.00. I would have expected the full shortfalls on each. But this is okay for now.

12/21/2014 8:35:18 PM | GPUGRID | [sched_op] CPU work request: 1.00 seconds; 12.00 devices
12/21/2014 8:35:18 PM | GPUGRID | [sched_op] NVIDIA GPU work request: 1.00 seconds; 3.00 devices
... The "scheduler" (a BOINC Project server-side-process) received the request successfully.

12/21/2014 8:35:20 PM | GPUGRID | Scheduler request completed: got 0 new tasks
12/21/2014 8:35:20 PM | GPUGRID | [sched_op] Server version 613
12/21/2014 8:35:20 PM | GPUGRID | No tasks sent
12/21/2014 8:35:20 PM | GPUGRID | No tasks are available for Short runs (2-3 hours on fastest card)
12/21/2014 8:35:20 PM | GPUGRID | No tasks are available for ACEMD beta version
12/21/2014 8:35:20 PM | GPUGRID | No tasks are available for Long runs (8-12 hours on fastest card)
12/21/2014 8:35:20 PM | GPUGRID | No tasks are available for the applications you have selected.
12/21/2014 8:35:20 PM | GPUGRID | Project requested delay of 31 seconds
... and has replied that it has no tasks for either of the resources, per your selections. NOTE: GPUGrid actually DOES have CPU tasks now, with their "Test application for CPU MD" multi-threaded application. You could edit your web preferences, to turn on that app (might also have to check the "Run test applications?" checkbox too), if you'd like to run CPU tasks from GPUGrid. I'm sure they'd appreciate your CPU support.

12/21/2014 8:35:20 PM | GPUGRID | [work_fetch] backing off CPU 580 sec
12/21/2014 8:35:20 PM | GPUGRID | [work_fetch] backing off NVIDIA GPU 312 sec
12/21/2014 8:35:20 PM | GPUGRID | [sched_op] Deferring communication for 00:00:31
12/21/2014 8:35:20 PM | GPUGRID | [sched_op] Reason: requested by project
12/21/2014 8:35:20 PM | | [work_fetch] Request work fetch: RPC complete
... Because you needed work for CPU and didn't get any from this Project, BOINC enforces a semi-random "resource backoff timer" for that Project. 580 secs, in this case.
... Because you needed work for NVIDIA GPU and didn't get any, it also enforced a backoff of 312 secs for that resource.
... And the server said "don't come back here for at least 31 seconds please" :)

Sorry, but it is a bad time to be troubleshooting this with GPUGrid, since they are "on fumes" in terms of having GPU tasks available. See the Server Status "unsent" column here: http://www.gpugrid.net/server_status.php

Once you get 3 GPUGrid tasks in the Task grid, I'd like to see a screenshot of the behavior you saw. Are you sure you saw 3 GPUGrid GPU tasks listed in the Tasks grid, and BOINC was only running 2 of them? I don't know how to troubleshoot further without seeing the issue.

One other thing to note is that, I believe GPUGrid has a server-side rule that only allows "2 GPU tasks per GPU" to be on a client. For you, that means you should only ever see up-to-6 GPUGrid GPU tasks in your Task grid, regardless of your buffer settings.

Once you get 3-or-more tasks, host a screenshot and let us take a look. :) Then we might have to turn on more of those awesome debug flags to get geeky!

Oh, you asked about whether tasks get sent out multiple times. BOINC Projects set this up on a per-application basis; they can decide how many "instances" of a work unit are initially replicated, and can also set how many results must be in agreement before considering it completed, and how many error results should trigger abortion of the work unit. If you look at some tasks (here are mine: http://www.gpugrid.net/results.php?hostid=153764)... if you click the "Work Unit", you'll see that unit's values for "minimum quorum" (# results that must agree), "initial replication" (# tasks initially created/sent), and "max # of error/total/success tasks". GPUGrid does not do verification on their GPU apps, but most other project choose to verify their results with at least 2 successful returns.

I too used to run Distributed.net back in the day, but I go for the science research now. You might consider attaching BOINC to more projects, like World Community Grid, Citizen Science Grid, etc.

Merry Christmas,
Jacob

PS: Here are my list of running projects (I give WCG 4x the Resource Share as my others, and I have a couple 0-Resource-Share projects, which are "Backup Projects" to BOINC, meaning it will only get work from those if there is no other work from other projects)... and here's a what my Tasks grid looks like. I use an app_config.xml to tell BOINC to "consider 0.667 CPU budgeted per GPUGrid Task", and then I run with "Use at most 100% CPUs." That way, if BOINC runs 3 GPUGrid tasks, it automatically budgets ("frees up") 0.667*3=2.001 CPUs. Fun.

http://1drv.ms/13tRIpR
http://1drv.ms/1zTv9I5
ID: 39253 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39254 - Posted: 22 Dec 2014, 10:57:04 UTC - in response to Message 39221.  

I am using nvidiaInspector's Overclocking options to do nothing but up the fan speed, but what would make you think they are too hot?


Just answering this question, if you click on a task you ran you can see the logs which include temperature:

http://www.gpugrid.net/result.php?resultid=13575551

Name I11R36-SDOERR_BARNA5-62-100-RND9235_0
Workunit 10453012
Created 21 Dec 2014 | 11:26:11 UTC
Sent 21 Dec 2014 | 11:26:33 UTC
Received 21 Dec 2014 | 17:17:11 UTC
Server state Over
Outcome Validate error
Client state Done
Exit status 0 (0x0)
Computer ID 189656
Report deadline 26 Dec 2014 | 11:26:33 UTC
Run time 20,692.87
CPU time 2,424.98
Validate state Invalid
Credit 0.00
Application version Long runs (8-12 hours on fastest card) v8.47 (cuda65)
Stderr output

<core_client_version>7.4.27</core_client_version>
<![CDATA[
<stderr_txt>
# GPU [GeForce GTX 780] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 0 :
# Name : GeForce GTX 780
# ECC : Disabled
# Global mem : 3072MB
# Capability : 3.5
# PCI ID : 0000:01:00.0
# Device clock : 1058MHz
# Memory clock : 3104MHz
# Memory width : 384bit
# Driver version : r343_00 : 34475
# GPU 0 : 43C
# GPU 1 : 34C
# GPU 2 : 27C
# GPU 0 : 47C
# GPU 0 : 50C
# GPU 0 : 53C
# GPU 0 : 55C
# GPU 0 : 58C
# GPU 0 : 60C
# GPU 0 : 63C
# GPU 0 : 64C
# GPU 0 : 66C
# GPU 0 : 67C
# GPU 0 : 68C
# GPU 0 : 70C
# GPU 0 : 71C
# GPU 0 : 72C
# GPU 0 : 73C
# GPU 0 : 75C
# GPU 0 : 76C
# GPU 0 : 77C
# GPU 0 : 78C
# GPU 1 : 35C
# GPU 2 : 28C
# GPU 0 : 79C
# GPU 0 : 80C
# GPU 1 : 36C
# GPU 1 : 37C
# GPU 0 : 81C
# GPU 2 : 29C
# GPU 0 : 82C
# GPU 1 : 38C
# GPU 0 : 83C
# GPU 1 : 44C
# GPU 2 : 30C
# GPU 1 : 46C
# GPU 1 : 47C
# GPU 0 : 84C
# BOINC suspending at user request (exit)
# GPU [GeForce GTX 780] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 0 :
# Name : GeForce GTX 780
# ECC : Disabled
# Global mem : 3072MB
# Capability : 3.5
# PCI ID : 0000:01:00.0
# Device clock : 1058MHz
# Memory clock : 3104MHz
# Memory width : 384bit
# Driver version : r343_00 : 34475
# GPU 0 : 66C
# GPU 1 : 42C
# GPU 2 : 30C
# GPU 0 : 68C
# GPU 0 : 70C
# GPU 0 : 71C
# GPU 0 : 72C
# GPU 0 : 73C
# GPU 0 : 74C
# GPU 0 : 75C
# GPU 0 : 76C
# GPU 1 : 43C
# GPU 0 : 77C
# GPU 0 : 78C
# GPU 2 : 31C
# GPU 0 : 79C
# GPU 0 : 80C
# GPU 1 : 44C
# GPU 2 : 32C
# GPU 1 : 45C
# GPU 2 : 33C
# GPU 2 : 34C
# GPU 1 : 46C
# GPU 2 : 38C
# GPU 2 : 43C
# GPU 2 : 46C
# GPU 2 : 49C
# GPU 2 : 51C
# GPU 2 : 54C
# GPU 2 : 56C
# GPU 2 : 58C
# GPU 2 : 60C
# GPU 2 : 61C
# GPU 2 : 63C
# GPU 1 : 49C
# GPU 2 : 64C
# GPU 0 : 81C
# GPU 1 : 57C
# GPU 2 : 66C
# GPU 1 : 61C
# GPU 2 : 67C
# GPU 1 : 64C
# GPU 2 : 68C
# GPU 1 : 68C
# GPU 2 : 69C
# GPU 1 : 71C
# GPU 2 : 70C
# GPU 0 : 82C
# GPU 1 : 74C
# GPU 2 : 71C
# GPU 1 : 76C
# GPU 2 : 72C
# GPU 1 : 77C
# GPU 0 : 83C
# GPU 1 : 78C
# GPU 1 : 79C
# GPU 2 : 73C
# GPU 0 : 84C

# GPU 1 : 80C
# GPU 0 : 85C
# GPU 0 : 86C
# GPU 0 : 87C
# GPU 0 : 88C
# GPU 0 : 89C
# GPU 0 : 90C
# GPU 0 : 91C
# GPU 0 : 92C
# GPU 0 : 93C
# GPU 0 : 94C
# GPU 0 : 95C

# GPU 1 : 81C
# GPU 1 : 82C
# GPU 0 : 96C
# GPU 1 : 83C
# GPU 2 : 74C
# GPU 1 : 84C
# GPU 2 : 75C
# GPU 1 : 85C
# GPU 2 : 76C
# GPU 1 : 86C
# GPU 2 : 77C
# GPU 1 : 87C
# GPU 1 : 88C
# Time per step (avg over 3675000 steps): 5.520 ms
# Approximate elapsed time for entire WU: 20700.631 s
# PERFORMANCE: 87466 Natoms 5.520 ns/day 0.000 ms/step 0.000 us/step/atom
12:15:44 (6544): called boinc_finish

Outcome Validate error


While Boinc might not be seeing the GPU's the GPUGrid App clearly sees all 3 GPU's; GPU 0, 1 and 2 are underlined above.
Whatever the problem there you are not sufficiently cooling all the GPU's. 95C or 96C is dangerously high IMO and my primary concern would be that you could damage your GPU's or other hardware. I suggest you start by working safely - hard drives don't like being cooked, neither do motherboards, RAM modules...
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 39254 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Graphics cards (GPUs) : GPUs not being used?

©2025 Universitat Pompeu Fabra