Message boards :
Graphics cards (GPUs) :
Really low Run Times, but still Completed and Successful?
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next
| Author | Message |
|---|---|
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Toni / Nate / GDF: Just had another one of these problems. It was using an app_config.xml file that was set for 0.001 CPU and 1.000 GPU, with a cc_config.xml file setup such that GPUGrid could run on either the GTX 660 Ti or the GTX 460. So, again, I don't think the problem is with running multiple tasks at the same time on the same GPU. But it might still have something to do with using an app_config.xml file. I am still testing that. HAVE YOU DONE ANYTHING SERVER SIDE TO CORRECT THE VALIDATOR AND CORRECT THE APPLICATION FROM ERRONEOUSLY COMPLETING LIKE THIS? It would also help if we could see, in the Stderr output, what Device the work unit was running on. Regards, Jacob Name I14R3-NATHAN_dhfr36_5-9-32-RND6302_0 Workunit 4380107 Created 20 Apr 2013 | 2:43:43 UTC Sent 20 Apr 2013 | 4:01:07 UTC Received 20 Apr 2013 | 6:50:11 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 149974 Report deadline 25 Apr 2013 | 4:01:07 UTC Run time 3.28 CPU time 0.83 Validate state Valid Credit 70,800.00 Application version Long runs (8-12 hours on fastest card) v6.18 (cuda42) Stderr output <core_client_version>7.0.64</core_client_version> |
|
Send message Joined: 5 Dec 11 Posts: 147 Credit: 69,970,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Toni / Nate / GDF: This is going to sound narky, but here goes. This basically proves that there is something about running 2 GPUGrid tasks simultaneously on the same card will cause them to fail regularly. Regardless of the fat that other projects manage to do it successfully, GPUGrid obviously does not, so it's probably time to stop until such time as the project managers give the ok. I can understand the efficiencies you are seeking, but at the moment, all you are doing is dumping errornous results into the system. Whatever else they are working on at the moment is taking priority over this. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Simba123: Did you read my post completely and carefully? The most recent task that I reported, could have only run on 1 of the 2 GPUs, alone (ie: not 2-at-once-on-same-GPU), based on my settings. Thus, again, I've been able to PROVE that these problems are happening even when NOT running 2-at-once-on-same-GPU. I'm still trying to figure out exactly what causes the problem. Want to help? |
|
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Did you read my post completely and carefully? Are you saying that you are getting really short runs that are being validated while running 1 wu per GPU? If that's the case, we might all be turning in fubar results. I've been following this thread sense you started it and I thought this only happened when you ran 2 wu's simultaneously on 1 GPU, then you would get a task that finished in a few seconds and the validating server didn't catch it and you got points for a valid run. I must not understand completely because it would seem the cat's out of the bag now and others could duplicate you're procedure to try and get some quick points. I realize that it doesn't happen every time from what you have written, I think it would be good if you could explain from the top exactly what's going on so I would know (and others) what to look for. I don't want to turn in fubar results, is there anything special we can look for? Is the whole project in jeopardy because of this? |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
You must not have been following the thread very closely. I'm trying my best to be informative and provide accurate information. But relax - :) - The whole project is not in jeopardy. I wish the admins were much more proactive (sigh), but they make it sound like they can dig themselves out of this problem, even if results are quick-returned with this problem. Here's a summary of what my testing has shown thus far: - I have seen this issue only on Nathan Long-runs and Nathan Short-runs - I have seen this issue running <gpu_usage> of 0.5 (aka: 2-at-once-on-same-GPU) - I have seen this issue running <gpu_usage> of 1 (aka: Still using an app_config.xml file, with a <cpu_usage> of 0.001, but not doing 2-at-once-on-same-GPU) - I have not yet seen this issue when not running an app_config.xml file; am testing this currently (I removed the GPUGrid project, re-added the GPUGrid project, verified it doesn't have an app_config.xml, got some tasks, verified they say something like "0.728 CPUs + 1 NVIDIA GPU", and am letting them complete without adding an app_config.xml file) - I have not yet tested using <exclude_gpu> in the cc_config.xml to limit GPUGrid.net to just 1 of my 2 GPUs. - Toni said (several days ago) that I'm the only one affected by the problem; but I sure wish someone could help to reproduce it with me, to better understand it. Right now, my gut tells me that this is an application problem, likely related to: a) Using an app_config.xml file even when it has <gpu_usage> set to 1.000... or b) Having 2 homogenous GPUs in the system (maybe dependent on whether GPUGrid is allowed to run on both, but not necessarily dependent) You mention "what to look for"... Well, I'm just looking at my results, and seeing if the the "Run time" values look correct for the types of tasks that I've completed. That's all I'm doing. Hope that helps to summarize. I'm still testing to resolve this issue. Any help would be welcomed. If you want to help, then try to provide evidence that one of my 2 "gut feelings" is correct. - If you only have 1 GPU, then try using an app_config.xml file (use the one below, with <cpu_usage> set to 0.001, and <gpu_usage> set to 1, with <max_concurrent> set to 9999)... to see if it causes a problem. - If you have 2 homogenous GPUs, try letting GPUGrid run on both and see if it causes a problem. Sorry for being miffed about this whole thing. I care a ton. - Jacob Klein Reference app_config.xml file: <app_config>
<app>
<name>acemdbeta</name>
<max_concurrent>9999</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>.001</cpu_usage>
</gpu_versions>
</app>
<app>
<name>acemdlong</name>
<max_concurrent>9999</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>.001</cpu_usage>
</gpu_versions>
</app>
<app>
<name>acemdshort</name>
<max_concurrent>9999</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>.001</cpu_usage>
</gpu_versions>
</app>
</app_config> |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've used app_config to run two tasks on one GPU (GTX660Ti 2GB), but I did not observe any WU's getting credit for only running a few seconds. I've also been using app_config to run one task, and again no such problems. The difference is that I don't saturate the CPU; I always leave some free. Thus, it's your setup. Most likely the problem stems from completely saturating the CPU, which is a setup that has caused many problems at many projects for many years and is not recommended for crunching here. During the first few seconds of a WU loading, it uses a lot of the CPU: <cpu_usage> of 0.001 looks Bad. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
skgiven: Thank you for reporting your results. When crunching 1 task, <cpu_usage> of 0.001 should be no different than the normal setup where GPUGRid tasks use 0.072 CPU by default and app_config.xml is not used. In both of those cases, my computer will allocate the same number of jobs to fill the available CPU resources. So, I do not think using <cpu_usage> of 0.001 can be the problem. It's possible that just USING the app_config.xml file could be a problem, but I also believe that to be very unlikely, and am still testing to prove it. It sounds like you might be suggesting that using a BOINC setting of 100% CPU, or using "extreme" app_config settings, could be a cause. While I don't believe that to be the case... Let's try to get more evidence! Please help. Are you running with a BOINC setting of 100% CPU (so that your CPU resources are slightly over-committed?) If not, would you please consider changing your BOINC setting to use 100% CPU? Also, are you running using the exact same app_config.xml file I posted 2 posts ago? If not, would you please consider changing your app_config.xml file to be exactly the same as the one I posted 2 posts ago? The goal should be for you to attempt to reproduce the problem, so we can identify its cause. And, if you believe it's a "setup" issue, due to either the CPU % setting or the app_config.xml file setting, then TEST IT! Use my CPU % setting, and my app_config.xml file, on your machine -- and see what your results are! Note: It may take a week to get the problem to surface, from my experience. So, even if you make the changes and restart BOINC, it's a waiting game to see the results. I'm betting, and hoping, that: - you change your settings to 100% CPU - you change your app_config.xml file to use my exact same app_config.xml file - you restart BOINC - you run it for a week - you still aren't able to reproduce the problem, even after that week. .... I want this problem to be related to homogenous GPUs. Will you please help, by performing those tests? Most importantly, I want GPUGrid.net admins to further acknowledge the problem, to put forth effort into identifying its cause (including useful statements in stderr.txt), to fix the application, and to fix the validator. One can only hope that these things become a priority. Thanks, Jacob |
|
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thanks Jacob for explaining everything for me all in one post, sometimes I'm a little slow, it makes much more sense to me now. I appreciate you're stubborn determination in getting to the bottom of these anomalies, I just hope some other contributors aren't taking advantage of this flaw in the software. |
|
Send message Joined: 2 Jan 09 Posts: 303 Credit: 7,321,800,090 RAC: 245 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
skgiven: I have read this hole thread and Jacob you are one dedicated cruncher! I have ONE suggestion though, why not stop all cpu crunching for an hour and change your cc_config file to allow 1.0 cpu usage, instead of 0.001, and just 'see what happens'. If no changes occur then you will have confirmed that the 0.001 is NOT the problem, if everything works normally then you MAY have found a touch of the problem. The REAL problem is the Server validating bad units, but YOU can't fix that part. I KNOW you sort of did that when you with no cc_config file, but if you would run two gpu units at cone and up the cpu percentage you will at least eliminate a 'perceived' and 'possible' problem. IF my idea works then you can cut the numbers until you find the error point and then back off just a bit, kind of like overclocking, too much and you get errors, just right and the machine screams thru the units! |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thanks for the suggestion. I highly doubt that adjusting the <cpu_usage> within the app_config.xml file would trigger the problem, but it is on my list of things to test. Right now, I'm trying to reproduce the issue without an app_config.xml file. For some reason, I haven't been able to get that to happen yet. Other tests I will do include: - Test with app_config set for 1.000 CPU - Test with BOINC set to use less CPU % - Test with only 1 GPU in the system It's not good enough to test for an hour. Sometimes it takes several days for this issue to happen. So, I'd say each test should take about 2 weeks, to "be sure" of the results. I guess I have to be patient and diligent with my testing. If anyone else can replicate the problem (or at least try very hard to), I would very much appreciate any information you find. Thanks, Jacob |
|
Send message Joined: 5 Dec 11 Posts: 147 Credit: 69,970,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It's more likely to happen when a workunit starts. CPU usage seems to peak at that point, with a setting limiting the cpu to .0001, that's probably where it is occasionally choking. It may also depend on the type of GPUs. I run a 660Ti and a 560Ti. the 660Ti uses as much CPU time as GPU time, whereas the 560Ti uses about 1/10th. Even though the cpu is not saturated by any means, I have to keep 2 cores free to keep GPU usage steady instead of bouncing around. and that only works with nothing else using the rig. If were to start web browsing etc, GPU usage starts jumping up and down again. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Some of you still do not understand something very basic. All of these 3 scenarios result in the exact same CPU saturation/allocation under my system's setup: - Not using an app_config.xml file (so GPUGrid tasks use their default CPU usage usually around 0.73 CPU) - Setting <CPU_Usage> to 0.999 via app_config.xml file - Setting <CPU_Usage> to 0.001 via app_config.xml file In all 3 of these scenarios, my system will run 1 GPUGrid task on the GTX 660 Ti, 2 WCG tasks on my GTX460, and 6 additional CPU tasks. And in all of these scenarios, the GPUGrid task is granted the exact same "amount of CPU"; ie: just because it says "0.001" CPU, that does not mean it is somehow "limiting" the CPU usage. It's just a number that's used to calculate how many total tasks to run right now. That's all. You can do research on how CPUs work, and how BOINC works, to confirm this. Regarding CPU overloading, I disagree about the necessity to reserve cores. From my observations using ProcessExplorer (Google it and use it for yourself!), GPUGrid tasks are setup to run at a process priority of 6, with an active-thread priority of 6 or 7. These priorities are higher than regular CPU tasks, meaning that GPUGrid tasks are never starved of the CPU. Now, if you browse a webpage or move a window, then that requires GPU, and the GPU usage may fluctuate... but in terms of competing with other CPU tasks, GPUGrid tasks are never starved of CPU, because GPUGrid tasks have higher priority. You can do research into Windows priorities, and watch Task Manager's CPU Usage, and use ProcessExplorer, to confirm all of this information for yourself. Regarding the differences between types of GPUs, the applications appear to be setup to treat different GPU architectures differently. A GTX 660 Ti appears to be setup to request to use a full core all of the time (I believe this is equivalent to the now-deprecated "Swan_Sync" "0" setting). A GTX 460, however, because it is a different architecture, appears to be setup to only request to use a small portion of a core. This is all normal behavior, and is in fact probably a reason that the "default CPU Usage" of the GPUGrid tasks is a value less than 1 (the GPUGrid admins do not want to reserve a whole core unnecessarily, which would waste CPU resources) I have not ruled out that overloading the CPU could be a causal factor in my problem, but based on my understanding of CPUs and processes, I consider it quite unlikely. One of the tests to prove this would be to control it by decreasing BOINC's "use at most x% of the processors" setting, or to control it by setting <cpu_usage> to 1.000 in the app_config.xml file; both of these are tests on my to-do list from 2 posts ago. But currently I'm still trying to get the issue to occur without an app_config.xml file. Finally, if you believe I'm wrong in any of this, or if you believe you have some other "answer", I encourage you to use the scientific method and PROVE it. Perform the setting changes necessary to recreate the problem. Seriously. Test, test, test! I challenge you to find the cause of the problem; I would appreciate any and all help testing. And, when you've tested for a sufficient enough time to prove something, report your results! Guessing at the problem is no-longer helpful to me. Thank you, Jacob |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I just had a task fail, without an app_config.xml file... but it actually failed (instead of being instant-done completed successfully with credit). So, I'm a bit confused. Nathan, did you change anything? The details are below... they include an Exit Status of -1, a Client state of Compute error, and no credit granted. (ie: If it had to fail, it at least failed gracefully, without being marked successful and without granting credit). Note: I was having internet connectivity issues during the time that it started and promptly failed. Don't know if that is a causal factor or not. Name I9R28-NATHAN_dhfr36_3-31-32-RND4985_4 Workunit 4368491 Created 22 Apr 2013 | 18:29:13 UTC Sent 22 Apr 2013 | 20:20:40 UTC Received 23 Apr 2013 | 3:38:48 UTC Server state Over Outcome Computation error Client state Compute error Exit status -1 (0xffffffffffffffff) Unknown error number Computer ID 149974 Report deadline 27 Apr 2013 | 20:20:40 UTC Run time 7.22 CPU time 5.99 Validate state Invalid Credit 0.00 Application version Long runs (8-12 hours on fastest card) v6.18 (cuda42) Stderr output <core_client_version>7.0.64</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1 (0xffffffff) </message> <stderr_txt> MDIO: cannot open file "output.restart.coor" </stderr_txt> ]]> I'll continue to test. |
|
Send message Joined: 5 Dec 11 Posts: 147 Credit: 69,970,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
looking at that taskhttp://www.gpugrid.net/workunit.php?wuid=4368491 it has failed 8 times, I had 1 like that yesterday too, so I'd say it was a 'normal' failed task, rather than anything else, |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Simba, I agree that the task that had the computational error, which failed for several other users, is unrelated to my research and testing. Thanks for pointing that out -- I should have spotted that earlier. Also, just a note for the thread, I am now testing with nVidia Beta v320.00 drivers. Jacob |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've had another workunit that was Completed Successfully in a really short time. Looking at the logs, it ran on my GTX 660 Ti, while the GTX 460 was busy with 2 WCG tasks. It happened while I was using an app_config.xml file that had: - 9999 max_concurrent - 0.001 cpu_usage - 1.000 gpu_usage This frustrates me, because so far, I've only been able to reproduce the problem when I use an app_config file with those settings. But I still really believe that using an app_config.xml file shouldn't be causing the problem. So, again, to continue to test my theory, I've reset the project, and will be running without an app_config.xml file for a while. I'll chime in again, in about 2 more weeks, to report more results. Task details: Name I62R17-NATHAN_dhfr36_5-25-32-RND8674_0 Workunit 4399064 Created 26 Apr 2013 | 23:37:52 UTC Sent 26 Apr 2013 | 23:38:46 UTC Received 27 Apr 2013 | 6:14:45 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 149974 Report deadline 1 May 2013 | 23:38:46 UTC Run time 4.27 CPU time 0.87 Validate state Valid Credit 70,800.00 Application version Long runs (8-12 hours on fastest card) v6.18 (cuda42) Stderr output <core_client_version>7.0.64</core_client_version> |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Finally, if you believe I'm wrong in any of this Nope, sounds all right! Going to test with your app_config now, although I'm only running GPU-Grid as backup for POEM nowadays.. so even if I'd get the error it might take me a long time. "Good" that there's so few work at POEM ;) MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've been running it for a few days by now, no anomalies. Except that the claimed CPU utilization of 0.001 results in my BOINC 7.0.44 starting 8 Einstein threads on my 8-threaded i7+HT. You said in your case you'd get 7 CPU tasks, whether you use the app_config or not. Running 8 Einsteins still makes my GTX660Ti use about a full core, but GPU utilization is fluctuating between 86 and 90%, whereas it stays at a nice and steady 92% without any CPU tasks. From quickly experimenting it seems like 3 or 4 CPU tasks would be enough to get full GPU-grid performance. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've been running it for a few days by now, no anomalies. I didn't say what you think I said. I had said: In all 3 of these scenarios, my system will run 1 GPUGrid task on the GTX 660 Ti, 2 WCG tasks on my GTX460, and 6 additional CPU tasks Each WCG GPU task was set to use 1.000 CPU, and the GPUGrid task was set to use <1.000 CPU.... so BOINC would schedule the 1 GPUGrid task, the 2 WCG GPU tasks, and then 6 additional CPU tasks. Now that the WCG GPU app (Help Conquer Cancer) hcc1 is done, my computer now runs (without an app_config.xml file): 1 GPUGrid task on the GTX 660 Ti (0.729 CPU + 1 NVIDIA GPU) 1 GPUGrid task on the GTX 460 (0.729 CPU + 1 NVIDIA GPU) 7 CPU tasks I still cannot recreate the "completed quickly yet gave credit" problem when not using an app_config.xml file. It's quite frustrating. I'll be switching back over to using the 0.001 CPU app_config.xml file soon, to see if I can recreate the issue while I don't have any WCG HCC tasks. Then, assuming I can recreate it, I'll switch over to using a 0.729 CPU app_config.xml file, which I have not yet tried. I will find the exact causes to this bug, I swear this to you. Grrrrrr. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
WCG's HCC on GPU WU's began by using a full CPU, then used the GPU (high GPU utilization), and then used a full CPU thread again. When the CPU was being used the GPU was not and when the GPU was being used the CPU was not. In this situation, you could therefore be running two HCC tasks without using any CPU (depending on how far along the tasks were) or be fully using two CPU threads. In my opinion when using the app_config and specifying that GPUGrid tasks use 0.001 CPU's this meant that when some GPUGrid tasks started two CPU threads would be allocated and be fully used by HCC tasks. Along with 7 CPU tasks this meant that at that critical time for a GPUGrid task (the start), the CPU was already saturated and Boinc was being told (by the app_config file) that GPUGrid apps hardly needed any CPU (0.001). They failed as a result of CPU starvation. They were incorrectly granted credit because the system didn't catch the error, but app_config is new and the server hasn't been equipped to catch such unknown errors. While this might be a problem for the researchers, perhaps future server updates will help protect against such problems. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
©2025 Universitat Pompeu Fabra