WU: NOELIA_KLEBEs

Author	Message
Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 32537 - Posted: 30 Aug 2013, 13:25:39 UTC - in response to Message 32525. Last modified: 30 Aug 2013, 13:27:35 UTC Watching two different third-party developers working on SETI (one specialising in CUDA, the other in OpenCL), we get the opposite outcome: OpenCL on ATI is inefficient unless a spare CPU core is available, but CUDA on Nvidia requires very little CPU. I'm not a developer myself (at least, not at the level these guys program), but from the peanut gallery it looks as if CPU usage is very much down to the skill of the developer, and how well they know their platform and tools. That is quite true from my own experience also (as a user only), but I think we are talking about two different things. Neither ATI on OpenCL nor Nvidia on CUDA require a CPU core unless the project developer requires it. And usually CUDA can be made more efficient with CPU usage. Certainly that is the case with Folding with their separate OpenCL core_16 (for AMD cards only) and CUDA core_15 versions (obviously for Nvidia cards only); the CUDA one is much better (less than 1 percent verses maybe 20 percent or more). But I'm interested by the OpenCL on Nvidia point. That does seem to be a common observation - I wonder if it has necessarily to be so? Or maybe Mvidia didn't port some of their synch technology from CUDA to the OpenCL toolchain yet? All I know is that on Folding with their newest OpenCL core_17, which runs on both AMD and Nvidia, the situation is reversed. It requires only 1 or 2 percent on AMD cards (e.g., my HD 7870 on an i7-3770), whereas on an Nvidia card it reserves a full core (e.g., on my GTX 660 Ti). The question has been asked on the Folding forum as to whether that is necessary, and the answer is that Nvidia has not implemented the option in OpenCL to use less than a full core. Apparently they could if they wanted to, but maybe for performance reasons (so the speculation goes) they want their cards to perform the best they can, so they just grab the whole core. It helps solve the problem you mentioned above, where users don't always know to leave a core free I suppose. ID: 32537 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 32539 - Posted: 30 Aug 2013, 15:30:19 UTC - in response to Message 32537. But I'm interested by the OpenCL on Nvidia point. That does seem to be a common observation - I wonder if it has necessarily to be so? Or maybe Mvidia didn't port some of their synch technology from CUDA to the OpenCL toolchain yet? All I know is that on Folding with their newest OpenCL core_17, which runs on both AMD and Nvidia, the situation is reversed. It requires only 1 or 2 percent on AMD cards (e.g., my HD 7870 on an i7-3770), whereas on an Nvidia card it reserves a full core (e.g., on my GTX 660 Ti). The question has been asked on the Folding forum as to whether that is necessary, and the answer is that Nvidia has not implemented the option in OpenCL to use less than a full core. Apparently they could if they wanted to, but maybe for performance reasons (so the speculation goes) they want their cards to perform the best they can, so they just grab the whole core. It helps solve the problem you mentioned above, where users don't always know to leave a core free I suppose. That was my suspicion too. In trying to pass messages between the two developers - apparently the new CUDA way is to use 'callback' rather than 'spin' synch - I was invited to refer to the NVidia toolkit documentation to find examples for the OpenCL implementation. I couldn't find any. If there are any unbiased developer observers of this thread, it would be useful to hear if there is any factual basis for our observations - and for the rumour I've heard that NVidia might pull away from OpenCL support entirely. That would be a shame, if true - both NVidia and ATI (as it was then) were founder members of the Khronos Group in January 2000. It would be a pity if competition drove out collaboration, and we returned to the days of two incompatible native-code development environments. ID: 32539 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 32540 - Posted: 30 Aug 2013, 15:49:38 UTC - in response to Message 32539. Last modified: 30 Aug 2013, 15:50:09 UTC Perhaps you might create a new thread devoted toward finding the OpenCL/CUDA information. This thread is for "WU: NOELIA_KLEBEs" :) ID: 32540 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 32541 - Posted: 30 Aug 2013, 15:58:22 UTC - in response to Message 32540. Last modified: 30 Aug 2013, 15:59:09 UTC Perhaps you might create a new thread devoted toward finding the OpenCL/CUDA information. This thread is for "WU: NOELIA_KLEBEs" :) And NOELIA_KLEBEbeta's which run fine by the way on my 660 and 770 with 8.02! Noelia and MJH did a good job with this. Greetings from TJ ID: 32541 · Rating: 0 · rate: / Reply Quote

The King's Own Send message Joined: 25 Apr 12 Posts: 32 Credit: 945,543,997 RAC: 0 Level Scientific publications	Message 32544 - Posted: 30 Aug 2013, 17:08:01 UTC http://www.gpugrid.net/result.php?resultid=7221215 Would progress to 0.21% and then sit while elapsed time increased. Switched from 660Ti to 580 with same result. Aboirted after 3 hrs 38 min on 580. ID: 32544 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 32545 - Posted: 30 Aug 2013, 17:17:35 UTC - in response to Message 32544. King's Own. 8.00 is deprecated - your problem is fixed in the current release. ID: 32545 · Rating: 0 · rate: / Reply Quote

The King's Own Send message Joined: 25 Apr 12 Posts: 32 Credit: 945,543,997 RAC: 0 Level Scientific publications	Message 32546 - Posted: 30 Aug 2013, 17:31:23 UTC - in response to Message 32545. Thank you. However; i. I would have to be convinced that this my problem? ii. Why are deprecated WU being dispatched? That certainly is not a problem caused by me. ID: 32546 · Rating: 0 · rate: / Reply Quote

Ascholten Send message Joined: 21 Dec 10 Posts: 7 Credit: 78,122,357 RAC: 0 Level Scientific publications	Message 32551 - Posted: 30 Aug 2013, 21:01:37 UTC IM getting zero output file errors every few minutes and it slams the fan on my card and resets the task. It is also showing my video cards in like slots 7 and 8. I believe the should be slot 0 and 1 or is that moot? This has been going on a few days, I reset the project and aborted a few tasks thinking they were the problem to find it's ongoing. I see this is an issue that is known?? Any etr? Thank you Aaron ID: 32551 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 32553 - Posted: 30 Aug 2013, 21:04:22 UTC - in response to Message 32525. Richard wrote: Or maybe Mvidia didn't port some of their synch technology from CUDA to the OpenCL toolchain yet? That's what I suppose as well, without being a GPU developer. Over a year ago nVidias performance at POEM OpenCL was horrible, but they only used ~50% of one core. A driver update doubled performance but since then they're using a full CPu core. To me it seems like "just use a full core" was a quick fix. And now they don't want to push OpenCL any further than they have to and just stick with this solution. MrS Scanning for our furry friends since Jan 2002 ID: 32553 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 32560 - Posted: 31 Aug 2013, 5:17:44 UTC How can you all see that a full core is used with one GPUGRID WU? Greetings from TJ ID: 32560 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 32562 - Posted: 31 Aug 2013, 8:31:57 UTC - in response to Message 32560. Last modified: 31 Aug 2013, 8:46:15 UTC How can you all see that a full core is used with one GPUGRID WU? It can be checked in the Windows Task Manager: look for the acemd.80x-55.exe (or acemd.80x-42.exe) on the "Processes" tab. If its CPU usage is 1-2%, then it's not using a full core, otherwise the CPU usage is 100/the number of your CPU's threads (12-13% on a 8-threaded CPU, 8% on a 12-threaded CPU). You can check the past workunits' CPU usage at your hosts' task list: if the "CPU time" (almost) equals the "run time", then the task used a full core, if the "CPU time" is significantly less than the "run time", then it didn't use a full core. ID: 32562 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 32564 - Posted: 31 Aug 2013, 12:18:12 UTC - in response to Message 32562. Just note that the NOELIA_KLEBE WU's don't use a full CPU core/thread - Never have. My Boinc scheduler has them at 0.595 CPU's, but actual use is less than that (2 or 3% of the entire CPU, which means <=0.25 CPU threads). FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 32564 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 32568 - Posted: 31 Aug 2013, 14:39:01 UTC - in response to Message 32564. Thanks for the info, skgiven. I have overridden the CPU requirements, via app_config.xml. Because I have 2 GPUs that do GPUGrid (1 Fermi, 1 Kepler), I had set cpu_usage to 0.5 for all GPUGrid app types, so that when both cards are working on GPUGrid, BOINC reserves 1 total CPU core for them, keeping the CPU slightly above saturation. I've since changed my logic a bit so as to slightly undersaturate the CPU; I accomplished that by changing cpu_usage to 1.0 for all GPUGrid app types, so a logical CPU core is reserved for each, which I think is what you guys always recommended anyway. Long story short, I used Process Explorer to confirm that NOELIA_KLEBE units strangely do not use a full CPU on my Kepler card, whereas it seems to me that every other GPUGrid task does use a full CPU on my Kepler card. It matters to me since they are "mixed in" with other tasks in the "long" app, and my cpu_usage setting now applies to some tasks that won't use a full core. In a perfect world, and if I were an admin, I might consider placing "strange types" like this in a separate app queue, maybe. Thank you very much for confirming this is "normal" for NOELIA_KLEBE on Kepler. Jacob. ID: 32568 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 32571 - Posted: 31 Aug 2013, 15:26:26 UTC - in response to Message 32562. How can you all see that a full core is used with one GPUGRID WU? It can be checked in the Windows Task Manager: look for the acemd.80x-55.exe (or acemd.80x-42.exe) on the "Processes" tab. If its CPU usage is 1-2%, then it's not using a full core, otherwise the CPU usage is 100/the number of your CPU's threads (12-13% on a 8-threaded CPU, 8% on a 12-threaded CPU). You can check the past workunits' CPU usage at your hosts' task list: if the "CPU time" (almost) equals the "run time", then the task used a full core, if the "CPU time" is significantly less than the "run time", then it didn't use a full core. Thanks Zoltan, This is what I thought and in this way I look at task manager. With Noelia WU, the one we have now and in the past use 1-3%. Rosetta is using 13% per core. I have also seen Nathans not using less then 13% and Santi's that use not 13% all the time. It was fluctuating from 2% steady to 11% for seconds and then back to 2% again. But I am not watching task manager a lot. Greetings from TJ ID: 32571 · Rating: 0 · rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,850,145,728 RAC: 136,439 Level Scientific publications	Message 32575 - Posted: 31 Aug 2013, 20:09:27 UTC I had a NOEL_KLEBEbeta WU error out because of this: 8/31/2013 3:59:44 PM \| GPUGRID \| Aborting task 063px53-NOELIA_KLEBEbeta2-2-3-RND7138_0: exceeded elapsed time limit 4172.44 (250000000.00G/2062.63G) 8/31/2013 3:59:47 PM \| GPUGRID \| Computation for task 063px53-NOELIA_KLEBEbeta2-2-3-RND7138_0 finished Here is the link: http://www.gpugrid.net/result.php?resultid=7230639 The unit was good to that point. ID: 32575 · Rating: 0 · rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,850,145,728 RAC: 136,439 Level Scientific publications	Message 32576 - Posted: 31 Aug 2013, 23:25:51 UTC - in response to Message 32575. I had another NOEL_KLEBEbeta WU error out. http://www.gpugrid.net/result.php?resultid=7231327 8/31/2013 7:16:37 PM \| GPUGRID \| Aborting task 063px63-NOELIA_KLEBEbeta2-2-3-RND4579_0: exceeded elapsed time limit 3857.75 (250000000.00G/64804.69G) 8/31/2013 7:16:41 PM \| GPUGRID \| Computation for task 063px63-NOELIA_KLEBEbeta2-2-3-RND4579_0 finished 8/31/2013 7:16:41 PM \| GPUGRID \| Output file 063px63-NOELIA_KLEBEbeta2-2-3-RND4579_0_1 for task 063px63-NOELIA_KLEBEbeta2-2-3-RND4579_0 absent 8/31/2013 7:16:41 PM \| GPUGRID \| Output file 063px63-NOELIA_KLEBEbeta2-2-3-RND4579_0_2 for task 063px63-NOELIA_KLEBEbeta2-2-3-RND4579_0 absent 8/31/2013 7:16:41 PM \| GPUGRID \| Output file 063px63-NOELIA_KLEBEbeta2-2-3-RND4579_0_3 for task 063px63-NOELIA_KLEBEbeta2-2-3-RND4579_0 absent ID: 32576 · Rating: 0 · rate: / Reply Quote

nanoprobe Send message Joined: 26 Feb 12 Posts: 184 Credit: 222,376,233 RAC: 0 Level Scientific publications	Message 32587 - Posted: 1 Sep 2013, 12:14:42 UTC I updated to the 326.98 driver last night. The 1 NOEL_KLEBEbeta WU I received failed with 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED error message. http://www.gpugrid.net/result.php?resultid=7232695 The 20 or so MJHARVEY_TEST14 betas I received all finished and validated. ID: 32587 · Rating: 0 · rate: / Reply Quote

nanoprobe Send message Joined: 26 Feb 12 Posts: 184 Credit: 222,376,233 RAC: 0 Level Scientific publications	Message 32598 - Posted: 1 Sep 2013, 16:42:00 UTC - in response to Message 32587. I updated to the 326.98 driver last night. The 1 NOEL_KLEBEbeta WU I received failed with 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED error message. http://www.gpugrid.net/result.php?resultid=7232695 The 20 or so MJHARVEY_TEST14 betas I received all finished and validated. Spoke too soon. One of the MJHARVEY_TEST betas failed with the same 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED error message. http://www.gpugrid.net/result.php?resultid=7233613 ID: 32598 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 32678 - Posted: 4 Sep 2013, 10:26:16 UTC Just a note: there are also NOELA_KLEBE WUs on the acemdbeta queue. Somewhat confusingly, those are test WUs for the beta app and aren't part of this batch. If you have problems, please check the application that was used and report it over on thread about the beta application if appropriate: http://www.gpugrid.net/forum_thread.php?id=3465 Thanks! MJH ID: 32678 · Rating: 0 · rate: / Reply Quote