Message boards :
Graphics cards (GPUs) :
GPU Task Performance (vs. CPU core usage, app_config, multiple GPU tasks on 1 GPU, etc.)
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
| Author | Message |
|---|---|
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I guess even you have yet to come to grips with measurable performance improvement.. Note "measurable". What are the numbers? I do understand the problems of doing that, and I do commend you for your persistence, but until I see the numbers I won't be convinced. I'm hoping your results are positive. Not looking good. The competed WU is uploading but no new WU to take its place. In fact I did not get a new WU till the completed one uploaded, like before. But... It occurs to me --- when a 2X WU completes and there is no third WU to take its place until the upload is done, does the remaining, active WU grab full control of the GPU even though its been given only 50% to work with? I took this screen shot while the "third" WU was downloading and just one WU was performing: Looks like the GPU is very busy! If it is the case that the remaining WU grabs full control of the GPU, there's no contest - X2 probably wins! |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Right. So, when you say <gpu_usage>0.5</gpu_usage> ... all that is doing is telling BOINC how much to "consider allocated" for each task, for purposes of deciding how many tasks to start, therefor allowing you to do 2-at-a-time. That app_config.xml setting does NOT LIMIT the GPU Usage or Load, in any way. In fact, what you say is true; when one of the 2-at-a-time tasks gets done, the remaining task now utilizes the GPU Usage as if it was the only one ever running on it (to an extent; I believe I have proof that the task gets started in a certain "mode" based on GPU RAM available, which may possibly have an affect on GPU Load, but hopefully doesn't.) You can test this by suspending certain tasks while watching GPU Load (though, I caution you, suspending a NOELIA task can crash the drivers and make GPU tasks error out, even tasks on other GPUs or tasks doing work for other projects.) So... as an example: If you had 3 GPUGrid tasks: - Task A: normally gets 63% GPU Load - Task B: normally gets 84% GPU Load - Task C: normally gets 79% GPU Load; not downloaded yet then... The walkthrough of the scenario is: You are running Tasks A and B on the same GPU, getting 93% GPU Load... Task A gets done, so uploading Task A. During that upload, only Task B is running, at 84% GPU Load. Then, once you get the Task C downloaded, you can run both B and C together, at 98% GPU Load. Hope that makes sense. That's the behavior I'm used to seeing on other projects, and would expect to see here too. Also, I see you consider 91% GPU Load "very busy". I would consider 98% to be "very busy" :) You might be interested in running eVGA Precision X; although it was designed for overclocking, I use it as a very handy tool for showing monitoring history over time, and putting GPU Temps as icons in my system tray. |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
At 3:05am BOINC reported a failed NOELIA. No WU downloaded to replace it. It was not until 5:07am, when BOINC reported the completion of the other WU, that two NOELIAs were downloaded. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
|
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
What's your cache at? Tell me how I find that info... Ta. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have concluded my test (where I had only 1 active GPU, which was processing 2-tasks-at-once, with a 1.5 day min buffer, and wanted to see when the 3rd new task gets started). Note, I'm using BOINC v7.1.1 alpha, which includes a major work fetch tweaking as compared to the v7.0.64 public release. Assuming I'm reading the logs below correctly, here is what I see: 26-May-2013 03:20:29: Computation for Task 2 finished 26-May-2013 03:20:29: At this time, GPUGrid was in "resource backoff", meaning we couldn't ask it for NVIDIA work, because the last time we asked it for NVIDIA work we didn't get any (because of the maximum-2-in-progress-per-GPU server-side rule), and BOINC automatically creates an incrementing backoff timer when that happens. The resource backoff was still effective for 3314.63 secs (~55 minutes) 26-May-2013 03:20:43: Upload of Task 2 results started 26-May-2013 03:32:59: GPUGrid was still in "resource backoff" for 2564.79 secs (~43 minutes) 26-May-2013 03:33:42: Upload of Task 2 results finished 26-May-2013 03:33:44: BOINC reported the GPUGrid task, and piggybacked a request for NVIDIA work (since there were no other contactable projects that supported NVIDIA work, and we still needed some). Note: This piggyback request should also happen on v7.0.64 also, but I cannot guarantee that. 26-May-2013 03:33:46: RPC completed; BOINC did get 1 new task from GPUGrid 26-May-2013 03:33:48: Download of Task 3 started 26-May-2013 03:34:09: Download of Task 3 finished; Task 3 started processing So, according to this... There was a ~14 minute "layover" where BOINC was only allowed to run 1 task on the GPU, due to GPUGrid's server-side limitation. But it did gracefully handle the scenario, did eventually get the 3rd task, and started it promptly. It worked as I expected it to work, given the server-side limitation, but it's not optimal, because we should be allowed to keep the GPU continuously fully loaded with 2 tasks. :( I wonder if we can convince GPUGrid to relax the limit, to max-3-in-progress-per-GPU, instead of max-2-in-progress-per-GPU? In theory, that should close this gap, as it would allow the 3rd task to be downloaded/ready whenever the min-buffer says the client needed it (earlier than Task 2 completion). Full log snippet: 26-May-2013 03:20:29 [---] [work_fetch] Request work fetch: application exited 26-May-2013 03:20:29 [GPUGRID] Computation for task I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1 finished 26-May-2013 03:20:29 [---] [work_fetch] work fetch start 26-May-2013 03:20:29 [---] [work_fetch] ------- start work fetch state ------- 26-May-2013 03:20:29 [---] [work_fetch] target work buffer: 129600.00 + 8640.00 sec 26-May-2013 03:20:29 [---] [work_fetch] --- project states --- 26-May-2013 03:20:29 [GPUGRID] [work_fetch] REC 261498.750 prio -13.919910 can req work 26-May-2013 03:20:29 [---] [work_fetch] --- state for CPU --- 26-May-2013 03:20:29 [---] [work_fetch] shortfall 0.00 nidle 0.00 saturated 149576.75 busy 0.00 26-May-2013 03:20:29 [GPUGRID] [work_fetch] fetch share 0.000 (no apps) 26-May-2013 03:20:29 [---] [work_fetch] --- state for NVIDIA --- 26-May-2013 03:20:29 [---] [work_fetch] shortfall 122118.58 nidle 0.50 saturated 0.00 busy 0.00 26-May-2013 03:20:29 [GPUGRID] [work_fetch] fetch share 0.000 (resource backoff: 3314.63, inc 19200.00) 26-May-2013 03:20:29 [---] [work_fetch] ------- end work fetch state ------- 26-May-2013 03:20:29 [---] [work_fetch] No project chosen for work fetch 26-May-2013 03:20:43 [GPUGRID] Started upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_0 26-May-2013 03:20:43 [GPUGRID] Started upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_1 26-May-2013 03:20:43 [GPUGRID] Started upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_2 26-May-2013 03:20:43 [GPUGRID] Started upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_3 26-May-2013 03:21:06 [GPUGRID] Finished upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_0 26-May-2013 03:21:06 [GPUGRID] Started upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_7 26-May-2013 03:21:07 [GPUGRID] Finished upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_7 26-May-2013 03:21:07 [GPUGRID] Started upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_9 26-May-2013 03:21:12 [GPUGRID] Finished upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_3 26-May-2013 03:21:12 [GPUGRID] Started upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_10 26-May-2013 03:21:13 [GPUGRID] Finished upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_10 26-May-2013 03:21:38 [GPUGRID] Finished upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_1 26-May-2013 03:21:38 [GPUGRID] Finished upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_2 26-May-2013 03:32:59 [---] [work_fetch] work fetch start 26-May-2013 03:32:59 [---] [work_fetch] ------- start work fetch state ------- 26-May-2013 03:32:59 [---] [work_fetch] target work buffer: 129600.00 + 8640.00 sec 26-May-2013 03:32:59 [---] [work_fetch] --- project states --- 26-May-2013 03:32:59 [GPUGRID] [work_fetch] REC 261379.396 prio -13.917504 can req work 26-May-2013 03:32:59 [---] [work_fetch] --- state for CPU --- 26-May-2013 03:32:59 [---] [work_fetch] shortfall 0.00 nidle 0.00 saturated 148218.82 busy 0.00 26-May-2013 03:32:59 [GPUGRID] [work_fetch] fetch share 0.000 (no apps) 26-May-2013 03:32:59 [---] [work_fetch] --- state for NVIDIA --- 26-May-2013 03:32:59 [---] [work_fetch] shortfall 123046.19 nidle 0.50 saturated 0.00 busy 0.00 26-May-2013 03:32:59 [GPUGRID] [work_fetch] fetch share 0.000 (resource backoff: 2564.79, inc 19200.00) 26-May-2013 03:32:59 [---] [work_fetch] ------- end work fetch state ------- 26-May-2013 03:32:59 [---] [work_fetch] No project chosen for work fetch 26-May-2013 03:33:42 [GPUGRID] Finished upload of I2HDQ_5R9-SDOERR_2HDQd-3-4-RND0408_1_9 26-May-2013 03:33:42 [---] [work_fetch] Request work fetch: project finished uploading 26-May-2013 03:33:44 [---] [work_fetch] ------- start work fetch state ------- 26-May-2013 03:33:44 [---] [work_fetch] target work buffer: 129600.00 + 8640.00 sec 26-May-2013 03:33:44 [---] [work_fetch] --- project states --- 26-May-2013 03:33:44 [GPUGRID] [work_fetch] REC 261379.396 prio -13.917369 can req work 26-May-2013 03:33:44 [---] [work_fetch] --- state for CPU --- 26-May-2013 03:33:44 [---] [work_fetch] shortfall 0.00 nidle 0.00 saturated 148162.47 busy 0.00 26-May-2013 03:33:44 [GPUGRID] [work_fetch] fetch share 0.000 (no apps) 26-May-2013 03:33:44 [---] [work_fetch] --- state for NVIDIA --- 26-May-2013 03:33:44 [---] [work_fetch] shortfall 123101.89 nidle 0.50 saturated 0.00 busy 0.00 26-May-2013 03:33:44 [GPUGRID] [work_fetch] fetch share 1.000 26-May-2013 03:33:44 [---] [work_fetch] ------- end work fetch state ------- 26-May-2013 03:33:44 [GPUGRID] [work_fetch] set_request() for NVIDIA: ninst 1 nused_total 0.500000 nidle_now 0.500000 fetch share 1.000000 req_inst 0.500000 req_secs 123101.894609 26-May-2013 03:33:44 [GPUGRID] [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA (123101.89 sec, 0.50 inst) 26-May-2013 03:33:44 [GPUGRID] Sending scheduler request: To report completed tasks. 26-May-2013 03:33:44 [GPUGRID] Reporting 1 completed tasks 26-May-2013 03:33:44 [GPUGRID] Requesting new tasks for NVIDIA 26-May-2013 03:33:46 [GPUGRID] Scheduler request completed: got 1 new tasks 26-May-2013 03:33:46 [---] [work_fetch] Request work fetch: RPC complete 26-May-2013 03:33:48 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-LICENSE 26-May-2013 03:33:48 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-COPYRIGHT 26-May-2013 03:33:48 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_1 26-May-2013 03:33:48 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_2 26-May-2013 03:33:49 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-LICENSE 26-May-2013 03:33:49 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-COPYRIGHT 26-May-2013 03:33:49 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_3 26-May-2013 03:33:49 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-pdb_file 26-May-2013 03:33:52 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_1 26-May-2013 03:33:52 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_3 26-May-2013 03:33:52 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-psf_file 26-May-2013 03:33:52 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-par_file 26-May-2013 03:33:55 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_2 26-May-2013 03:33:55 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-conf_file_enc 26-May-2013 03:33:56 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-par_file 26-May-2013 03:33:56 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-conf_file_enc 26-May-2013 03:33:56 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-metainp_file 26-May-2013 03:33:56 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_7 26-May-2013 03:33:57 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-metainp_file 26-May-2013 03:33:57 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_7 26-May-2013 03:33:57 [GPUGRID] Started download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_10 26-May-2013 03:33:58 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-I61R18-NATHAN_dhfr36_5-18-32-RND7448_10 26-May-2013 03:34:06 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-pdb_file 26-May-2013 03:34:09 [GPUGRID] Finished download of I61R18-NATHAN_dhfr36_5-19-psf_file 26-May-2013 03:34:09 [GPUGRID] Starting task I61R18-NATHAN_dhfr36_5-19-32-RND7448_0 using acemdlong version 618 (cuda42) in slot 11 |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
What's your cache at? Boinc Manager (Advanced View), Tools, Computing Preferences, network usage tab, minimum work buffer + maximum additional work buffer. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
At 3:05am BOINC reported a failed NOELIA. No WU downloaded to replace it. Did you have the work_fetch_debug option on at that time? If so, can you provide the log for the work-fetch sequence where the failed task was reported? It will hopefully be able to tell us why a request for work was not also piggybacked. |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Boinc Manager (Advanced View), Tools, Computing Preferences, network usage tab, minimum work buffer + maximum additional work buffer. Ah. Didn't know the buffer was also called the cache. 2 days. |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Did you have the work_fetch_debug option on at that time? If so, can you provide the log for the work-fetch sequence where the failed task was reported? It will hopefully be able to tell us why a request for work was not also piggybacked. I have the debug log. The failed task was reported at 03:05:40. I've endlessly scrolled the log around that time, looking for the failure. It doesn't help that I don't know what I'm looking for!! |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Did you have the work_fetch_debug option on at that time? If so, can you provide the log for the work-fetch sequence where the failed task was reported? It will hopefully be able to tell us why a request for work was not also piggybacked. So, a "work fetch sequence" starts at the text "work fetch start", and ends a few lines after the text "end work fetch state". I say a few lines after, because BOINC tells us the result of the sequence after that text. For reference, a couple posts up (where I posted the conclusion of my test where I got that 3rd task), within it are a few "work fetch sequences". You can either use the Event Log to find the relevant lines, or (if you have closed BOINC) you can find a copy of the logs stored to file in your Data directory (location is shown as a log entry at BOINC startup; I think the default location is C:\ProgramData\BOINC). There are actually 2 files: stdoutdae.txt has the most recent log events, and stdoutdae.old has older log events from the prior BOINC session. What I'm interested in is the 2 "work fetch sequences" around time 03:05:40... the sequence right before that task was reported, and the sequence that occurred at the same time that task was reported. Make sense? |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The lazy way around this is to have another GPU in the system, an ATI, or maybe use cc_config to exclude a second NVidia for this project. While that should keep the work flowing, it's not a proper fix. The project setting of no more than 2 WU's per GPU won't be changed. It's not the problem anyway. The problem is that the "resource backoff" remains after a resource becomes free. It needs to be reset/zeroed. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I wonder if we can convince GPUGrid to relax the limit, to max-3-in-progress-per-GPU, instead of max-2-in-progress-per-GPU? In theory, that should close this gap, as it would allow the 3rd task to be downloaded/ready whenever the min-buffer says the client needed it (earlier than Task 2 completion). I suppose extending the limit straight to 3 tasks per WU would be detrimental overall. In this case anyone with a large work buffer setting would get 3 tasks. this increases WU turn around time (bad for the project) and makes people miss the credit bonus (bad for crunchers). The limit is there in the first place to ensure quick turn around of WUs. You can argue that "I know my system can handle them in time" and "I'm running 2 WUs in parallel, so I want a 3rd task".. which leads to the problem that the server can't differentiate between regular users and such ones running 2 WUs in parallel. A possible solution would be to introduce the number of WUs per GPU as a "possibly dangerous" parameter in the profile, like they did at Einstein, so that server could allow up to 3 WUs only for such hosts. However, the proejct team seems rather busy right now. And this might introduce support issues, as everytime some new error pops up we'd have to ask people to back to running single WUs and replicate the issue. Could be done, but makes things more complicated for little gain (I'm not saying "negligible" on purpose here). MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The lazy way around this is to have another GPU in the system, an ATI, or maybe use cc_config to exclude a second NVidia for this project. While that should keep the work flowing, it's not a proper fix. I believe you're wrong. The problem is that GPUGrid won't give a 3rd task, until 1 of the 2 other tasks is reported. I have privately emailed some of the GPUGrid admins, requesting the change to max-3-in-progress-per-GPU. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm just thinking about the gap between when a WU is reported and a new one is downloaded - your ~14 minute "layover" FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
BTW: running 2 WUs in parallel should be quite good for regular short quue tasks on higher end GPUs. Here GPU utilization was generally quite low, there's no problem with the bonus credit deadline and throughput could be increased significantly. Although these GPUs should be running long queue tasks anyway. Well, the app_config could be set up this way. In fact, I might just change mine like this, just in case the long queue runs dry. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I suppose extending the limit straight to 3 tasks per WU would be detrimental overall. In this case anyone with a large work buffer setting would get 3 tasks. this increases WU turn around time (bad for the project) and makes people miss the credit bonus (bad for crunchers). The limit is there in the first place to ensure quick turn around of WUs. The BOINC server-side-scheduler has mechanisms to allocate tasks appropriately. Won't it only possibly increase WU turn around time if a given application is out of tasks to allocate to computers requesting work? I'm not sure how often that happens, but even then, the server-side-scheduler can be setup to handle it gracefully I believe (possibly sending tasks to additional hosts in case the new host completes it first). I don't see this limit-increase-request as detrimental. I see it as logical and beneficial. BTW: running 2 WUs in parallel should be quite good for regular short quue tasks on higher end GPUs. Here GPU utilization was generally quite low, there's no problem with the bonus credit deadline and throughput could be increased significantly. Although these GPUs should be running long queue tasks anyway. My research indicates that long-run tasks usually get around 84-90% GPU Load on their own, but they get 98% GPU Load when ran combined. Short-run tasks generally get much less GPU Load, as you stated. I think I saw 65% once. I don't currently have data on combining those yet, but I believe it would be very beneficial to combine them. I don't think of this in terms of "getting bonus credit". I think of it in terms of "getting science done". I would hope that anyone using 0.5 gpu_usage would also think the same way, but if they were also concerned about bonus credits, obviously they'd have to do some research to see how quickly they can get tasks done. I have set my app_config to 0.5 gpu_usage for all of the GPUGrid applications. My new goal is to keep the GPU Load of the 660 Ti as high as possible (running 2-at-a-time), even if it means I have to exclude work on the GTX 460 (which is now running SETI/Einstein, but not GPUGrid, since it only has 1GB, and cannot do 2-at-a-time). I wish I could specify "only do 2 at a time on THIS GPU", so that I could continue to do GPUGrid work on the GTX 460... and I will be asking the BOINC devs about that feature, when they redesign BOINC to treat each GPU as its own resource, instead of just "NVIDIA" as a resource. That is on their to-do list, believe it or not, and the goal is to get rid of all the necessity for GPU exclusions. I may eventually switch it back to 1-at-a-time, so that the GTX 460 can again crunch GPUGrid (a project that I currently prefer, over SETI/Einstein). Despite lots of people being against change, I suppose I'm an instigator in promoting change. I see a problem, I go after the fix. I see something untried and untested, I push it hard to see what happens. And now I'm rambling. :) |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
That would require more administration, put more strain on the server, and might require a server update. They are struggling to keep the work flowing at present, so fine-tuning to facilitate a handful of people who want to use app_config is very low priority. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
26/05/2013 03:05:08 | | [work_fetch] --- state for CPU --- 26/05/2013 03:05:08 | | [work_fetch] shortfall 691200.00 nidle 4.00 saturated 0.00 busy 0.00 26/05/2013 03:05:08 | Poem@Home | [work_fetch] fetch share 0.000 (blocked by prefs) 26/05/2013 03:05:08 | GPUGRID | [work_fetch] fetch share 0.000 (no apps) 26/05/2013 03:05:08 | | [work_fetch] --- state for NVIDIA --- 26/05/2013 03:05:08 | | [work_fetch] shortfall 138786.29 nidle 0.00 saturated 15406.17 busy 0.00 26/05/2013 03:05:08 | Poem@Home | [work_fetch] fetch share 0.000 26/05/2013 03:05:08 | GPUGRID | [work_fetch] fetch share 0.000 (resource backoff: 14617.84, inc 19200.00) 26/05/2013 03:05:08 | | [work_fetch] ------- end work fetch state ------- 26/05/2013 03:05:08 | | [work_fetch] No project chosen for work fetch 26/05/2013 03:06:08 | | [work_fetch] work fetch start 26/05/2013 03:06:08 | | [work_fetch] choose_project() for NVIDIA: buffer_low: yes; sim_excluded_instances 0 26/05/2013 03:06:08 | | [work_fetch] no eligible project for NVIDIA 26/05/2013 03:06:08 | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0 26/05/2013 03:06:08 | | [work_fetch] no eligible project for CPU 26/05/2013 03:06:08 | | [work_fetch] ------- start work fetch state ------- 26/05/2013 03:06:08 | | [work_fetch] target work buffer: 172800.00 + 0.00 sec 26/05/2013 03:06:08 | | [work_fetch] --- project states --- 26/05/2013 03:06:08 | Poem@Home | [work_fetch] REC 236.219 prio 0.000000 can't req work: suspended via Manager 26/05/2013 03:06:08 | GPUGRID | [work_fetch] REC 40724.365 prio -1.103737 can req work 26/05/2013 03:06:08 | | [work_fetch] --- state for CPU --- 26/05/2013 03:06:08 | | [work_fetch] shortfall 691200.00 nidle 4.00 saturated 0.00 busy 0.00 26/05/2013 03:06:08 | Poem@Home | [work_fetch] fetch share 0.000 (blocked by prefs) 26/05/2013 03:06:08 | GPUGRID | [work_fetch] fetch share 0.000 (no apps) 26/05/2013 03:06:08 | | [work_fetch] --- state for NVIDIA --- 26/05/2013 03:06:08 | | [work_fetch] shortfall 138844.32 nidle 0.00 saturated 15346.68 busy 0.00 26/05/2013 03:06:08 | Poem@Home | [work_fetch] fetch share 0.000 26/05/2013 03:06:08 | GPUGRID | [work_fetch] fetch share 0.000 (resource backoff: 14557.83, inc 19200.00) 26/05/2013 03:06:08 | | [work_fetch] ------- end work fetch state ------- 26/05/2013 03:06:08 | | [work_fetch] No project chosen for work fetch 26/05/2013 03:07:08 | | [work_fetch] work fetch start 26/05/2013 03:07:08 | | [work_fetch] choose_project() for NVIDIA: buffer_low: yes; sim_excluded_instances 0 26/05/2013 03:07:08 | | [work_fetch] no eligible project for NVIDIA 26/05/2013 03:07:08 | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0 26/05/2013 03:07:08 | | [work_fetch] no eligible project for CPU |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
That would require more administration, For all you know, it could be as simple as updating an integer column in a database. put more strain on the server, Actually, wouldn't it put less strain? Right now, even 1-at-a-time crunchers are requesting work from GPUGrid, and being denied, which uses network resources. BOINC has a resource backoff, sure, but increasing the limit of max-per-GPU would actually help this scenario. Perhaps you were referring to the task scheduler, which may be already setup to resend tasks to additional hosts if needed. I suppose it's possible that increasing the limit may add strain there, but only if we were completely running out of tasks frequently, I believe. and might require a server update. If updating a limit has become that hard to implement, then they have implemented the limit incorrectly. I doubt that's the case. They are struggling to keep the work flowing at present, so fine-tuning to facilitate a handful of people who want to use app_config is very low priority. This, too, is fine. I'm used to getting the cold shoulder from the GPUGrid admins, by now. I expect the possibility that my request will go completely ignored. But I believe it's a valid request, and so, I privately asked them anyway. - Jacob |
©2025 Universitat Pompeu Fabra