Message boards : News : WU: NOELIA_KLEBEs
Author | Message |
---|---|
So far I've had | |
ID: 32423 | Rating: 0 | rate: / Reply Quote | |
Encountering issues with Noelia WUs: | |
ID: 32430 | Rating: 0 | rate: / Reply Quote | |
Since I saw a few error posts popping out about Noelia's new WU's and there was no official thread... I make this thread to collect them all. Once they all come to the office I will inform them. | |
ID: 32438 | Rating: 0 | rate: / Reply Quote | |
I've had 5 NOELIA's fail in the past 24 hours. | |
ID: 32443 | Rating: 0 | rate: / Reply Quote | |
Hi, for some reason some of you are having problems with this WUs in the new application and we've moved them to the beta queue to have a proper look. I've also just sent 50 WU under the name KLEBEbeta with a much simpler configuration file. These simulations are really important, and fixing this bug will also help for future similar projects in drug discovery. Please report any problems you might have on groups KLEBEs and KLEBEbeta. | |
ID: 32444 | Rating: 0 | rate: / Reply Quote | |
Noelia, | |
ID: 32446 | Rating: 0 | rate: / Reply Quote | |
I've probably fixed the fault. There'll be an updated acemdbeta app very soon. | |
ID: 32447 | Rating: 0 | rate: / Reply Quote | |
801 is now live. | |
ID: 32449 | Rating: 0 | rate: / Reply Quote | |
Lol What a "luck" ^^ | |
ID: 32450 | Rating: 0 | rate: / Reply Quote | |
Thanks for the prompt response. I got an 801 KLEBEbeta, and it's at least getting off the ground now. | |
ID: 32451 | Rating: 0 | rate: / Reply Quote | |
Would anyone with a cc 1.3 card - Geforce GTX 200 series - please try some of the current acemdbeta v801 Noelia-KLEBE WUs and report back here? | |
ID: 32452 | Rating: 0 | rate: / Reply Quote | |
I tested suspending one of these KLEBEbeta tasks, and it caused a driver reset. So, the problem still persists. | |
ID: 32453 | Rating: 0 | rate: / Reply Quote | |
On my Titan box I've gotten two of these NOELIAs. | |
ID: 32454 | Rating: 0 | rate: / Reply Quote | |
Jacob, | |
ID: 32455 | Rating: 0 | rate: / Reply Quote | |
Jacob, There's a whole thread about it, where I posted in as much detail as I could about the problem, on 4/4/2013 (4+ months ago), here: http://www.gpugrid.net/forum_thread.php?id=3333 It happens whenever a NOELIA task (especially KLEBE) is suspended for any reason, including: - BOINC set to Snooze - BOINC set to Snooze GPU - BOINC set to Suspend - BOINC set to Suspend GPU - BOINC set to Suspend due to exclusive app running - BOINC set to Suspend GPU due to exclusive GPU app running - GPUGrid project set to Suspend - NOELIA KLEBE task set to Suspend - BOINC exited with "Stop running tasks" checked Something in the KLEBE exit logic has been causing driver resets and watchdog timeouts, for several months, for many of your Windows users. I sure hope you guys can work together to get a handle on it! Note: I do use the "Leave application in memory when suspended" setting, but so far as I know, that is irrelevant to GPU tasks. When a GPU task is suspended, BOINC has to remove it from memory, regardless of that user setting. It treats GPU tasks differently because there's no PageFile backing the GPU RAM. Thanks for looking into this. It's my biggest problem across all of my 20 BOINC projects. | |
ID: 32456 | Rating: 0 | rate: / Reply Quote | |
I have two NOELIA KLEEBEbeata on my 770 and they start, then when 0.021% complete, no more progress but they keep running, 2h57m16s elapsed and 0h0m0s remaining. This was in app 8.00 and I have now aborted these WU's and try the new 8.01 app. | |
ID: 32457 | Rating: 0 | rate: / Reply Quote | |
I have two NOELIA KLEEBEbeata on my 770 and they start, then when 0.021% complete, no more progress but they keep running, 2h57m16s elapsed and 0h0m0s remaining. This was in app 8.00 and I have now aborted these WU's and try the new 8.01 app. Same thing here , but after 30 minutes i stop it. http://www.gpugrid.net/result.php?resultid=7221521 ____________ | |
ID: 32459 | Rating: 0 | rate: / Reply Quote | |
Would anyone with a cc 1.3 card - Geforce GTX 200 series - please try some of the current acemdbeta v801 Noelia-KLEBE WUs and report back here? Ok i started one with 8.01. But this can take some time even on my 670mhz 285gtx..I normaly dont run gpugrid on this anymore. It will need about 33hours. The short run 8.00 was ok on this card. I dont think anybody still uses a powerhungry 200series on long runs O.o ____________ DSKAG Austria Research Team: http://www.research.dskag.at | |
ID: 32464 | Rating: 0 | rate: / Reply Quote | |
With the new 8.01 app they run normal! | |
ID: 32467 | Rating: 0 | rate: / Reply Quote | |
Would anyone with a cc 1.3 card - Geforce GTX 200 series - please try some of the current acemdbeta v801 Noelia-KLEBE WUs and report back here? Oh and when somebody started one too on 200series, plz tell me, got my energybill today, so i would love to stop it the next hours when not needed :p ____________ DSKAG Austria Research Team: http://www.research.dskag.at | |
ID: 32468 | Rating: 0 | rate: / Reply Quote | |
063px6-NOELIA_KLEBEbeta-0-3-RND7897_0 Workunit stuck at 0.021% (8.00 app though).
<core_client_version>7.0.64</core_client_version> <![CDATA[ <message> aborted by user </message> ]]>
| |
ID: 32470 | Rating: 0 | rate: / Reply Quote | |
I have noticed that, using the 8.01 app on a NOELIA_KLEBEbeta task, on my GTX 660 Ti, the process does not utilize a full CPU core (like other GPUGrid tasks normally do for that GPU). It's like SWAN_SYNC is not set correctly. Though I'm still getting good (85-91%) GPU utilization for the task. | |
ID: 32471 | Rating: 0 | rate: / Reply Quote | |
If it is still running then that's plenty long enough to demonstrate that all is well, thanks. You can kill it off. Matt | |
ID: 32472 | Rating: 0 | rate: / Reply Quote | |
It should have exactly the same load profile as 8.00 did. MJH | |
ID: 32473 | Rating: 0 | rate: / Reply Quote | |
Ok it ran one hour, was at 3,3%, 95% gpu load, used 515MB VRAM, cpu was busy working on LHC and still computed normal. Thx Aborted it. ^^ ____________ DSKAG Austria Research Team: http://www.research.dskag.at | |
ID: 32475 | Rating: 0 | rate: / Reply Quote | |
Have revved the beta app to 8.02. This might also fix the driver-hang-on-suspend problem. | |
ID: 32476 | Rating: 0 | rate: / Reply Quote | |
I have noticed that, using the 8.01 app on a NOELIA_KLEBEbeta task, on my GTX 660 Ti, the process does not utilize a full CPU core (like other GPUGrid tasks normally do for that GPU). It's like SWAN_SYNC is not set correctly. Though I'm still getting good (85-91%) GPU utilization for the task. I thought NOELIA's never used a full CPU core, that's the way it's always been. We've talked about it before in different threads. | |
ID: 32481 | Rating: 0 | rate: / Reply Quote | |
That's fine if that's the case. I don't know, which is why I asked. I'm "used" to seeing tasks on my "Kepler" (GTX 660 Ti) taking a full CPU core (via SWAN_SYNC) automatically. Maybe NOELIA tasks work differently. | |
ID: 32482 | Rating: 0 | rate: / Reply Quote | |
ID 156955: i7-3770K (no HT, 4 CPU cores) GTX560Ti, W7 64bit driver 328.80. active_task> <project_master_url>http://www.gpugrid.net/</project_master_url> <result_name>063px30-NOELIA_KLEBEbeta2-0-3-RND2325_0</result_name> <checkpoint_cpu_time>564.224400</checkpoint_cpu_time> <checkpoint_elapsed_time>2994.810513</checkpoint_elapsed_time> <fraction_done>0.048235</fraction_done> </active_task> Seems to run OK so far. Concurrently running: 4x CPU Asteroids SSE3, 1x GPU Einstein BPRS on Intel HD4000 Note: previous 6.18 CUDA 4.2 application could run on 875 MHz core clock. Factory OC of the GPU is 900 MHz. | |
ID: 32484 | Rating: 0 | rate: / Reply Quote | |
That's fine if that's the case. I don't know, which is why I asked. I'm "used" to seeing tasks on my "Kepler" (GTX 660 Ti) taking a full CPU core (via SWAN_SYNC) automatically. Maybe NOELIA tasks work differently. I understand, I know your a very busy man, I thought I saw you debugging apps for other projects in some different forums some place else. I don't know how you manage to keep track of them all. | |
ID: 32487 | Rating: 0 | rate: / Reply Quote | |
:) Yeah, thanks. I've helped Einstein fix a bug, MindModeling fix a bug, GPUGrid fix a couple things, Test4Theory fix a bug, Rosetta fix their app, SETI fix a GPU estimate problem, got nVidia to fix a monitor-sleep issue, and more. And I also do alpha/beta testing of the actual BOINC software, and have worked directly with the BOINC devs. | |
ID: 32488 | Rating: 0 | rate: / Reply Quote | |
That's fine if that's the case. I don't know, which is why I asked. I'm "used" to seeing tasks on my "Kepler" (GTX 660 Ti) taking a full CPU core (via SWAN_SYNC) automatically. Maybe NOELIA tasks work differently. I still consider this different CPU load as a malfunction. However, with this low CPU load the GPU load is still above 95%, so we can turn this question the way around: is it sure that the other tasks need a full CPU thread to feed a Kepler GPU? | |
ID: 32503 | Rating: 0 | rate: / Reply Quote | |
Have revved the beta app to 8.02. This might also fix the driver-hang-on-suspend problem. I think you could promote this 8.02 to the production queue at once, as it is proved to be better than the 8.00. | |
ID: 32504 | Rating: 0 | rate: / Reply Quote | |
It's there now. | |
ID: 32505 | Rating: 0 | rate: / Reply Quote | |
That's fine if that's the case. I don't know, which is why I asked. I'm "used" to seeing tasks on my "Kepler" (GTX 660 Ti) taking a full CPU core (via SWAN_SYNC) automatically. Maybe NOELIA tasks work differently. On the Folding forum, there have been extended discussions of Nvidia CPU core usage under CUDA. It contrasts to the case of AMD cards running OpenCL, which typically require only a few percent of a CPU core. As I recall, Nvidia provides the option to the developers to reserve a full CPU core when running under CUDA using spin states, which I don't understand anyway. If the application developers want to ensure that they have enough CPU support, they can reserve it, even though typically not all of it is actually in use So maybe the other tasks don't really require a full core, except that it may be useful to reserve it for stability or performance or whatever. EDIT: To further complicate matters, Nvidia cards running OpenCL always require a full CPU core; there is no option not to. | |
ID: 32507 | Rating: 0 | rate: / Reply Quote | |
8.02 beta tasks seem to work ok on 780s, but now all other tasks fail | |
ID: 32520 | Rating: 0 | rate: / Reply Quote | |
I still consider this different CPU load as a malfunction. Watching two different third-party developers working on SETI (one specialising in CUDA, the other in OpenCL), we get the opposite outcome: OpenCL on ATI is inefficient unless a spare CPU core is available, but CUDA on Nvidia requires very little CPU. I'm not a developer myself (at least, not at the level these guys program), but from the peanut gallery it looks as if CPU usage is very much down to the skill of the developer, and how well they know their platform and tools. But I'm interested by the OpenCL on Nvidia point. That does seem to be a common observation - I wonder if it has necessarily to be so? Or maybe Mvidia didn't port some of their synch technology from CUDA to the OpenCL toolchain yet? | |
ID: 32525 | Rating: 0 | rate: / Reply Quote | |
8.02 app running the Noelia Beta WU's (one on the CUDA4.2 and the other on the 5.5 app). But I'm interested by the OpenCL on Nvidia point. That does seem to be a common observation - I wonder if it has necessarily to be so? Or maybe Mvidia didn't port some of their synch technology from CUDA to the OpenCL toolchain yet? The GK104 cards are supposed to be OpenCL 1.2 but the drivers are only OpenCL1.1, which means the toolkit can't be 1.2. AMD/ATI supports OpenCL1.2, Intel supports OpenCL1.2, NVidia says it's GPU's are OpenCL1.2 but their drivers prevent the cards from being used for OpenCL1.2. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help | |
ID: 32526 | Rating: 0 | rate: / Reply Quote | |
On my Linux systems I have the STABLE Repository drivers (304.88), supposedly only CUDA 5.0. | |
ID: 32531 | Rating: 0 | rate: / Reply Quote | |
The intent is that you'll get 55 only if the driver revision is >= 315.15 Alas, the scheduler has a will of its own. MJH | |
ID: 32535 | Rating: 0 | rate: / Reply Quote | |
Watching two different third-party developers working on SETI (one specialising in CUDA, the other in OpenCL), we get the opposite outcome: OpenCL on ATI is inefficient unless a spare CPU core is available, but CUDA on Nvidia requires very little CPU. That is quite true from my own experience also (as a user only), but I think we are talking about two different things. Neither ATI on OpenCL nor Nvidia on CUDA require a CPU core unless the project developer requires it. And usually CUDA can be made more efficient with CPU usage. Certainly that is the case with Folding with their separate OpenCL core_16 (for AMD cards only) and CUDA core_15 versions (obviously for Nvidia cards only); the CUDA one is much better (less than 1 percent verses maybe 20 percent or more). But I'm interested by the OpenCL on Nvidia point. That does seem to be a common observation - I wonder if it has necessarily to be so? Or maybe Mvidia didn't port some of their synch technology from CUDA to the OpenCL toolchain yet? All I know is that on Folding with their newest OpenCL core_17, which runs on both AMD and Nvidia, the situation is reversed. It requires only 1 or 2 percent on AMD cards (e.g., my HD 7870 on an i7-3770), whereas on an Nvidia card it reserves a full core (e.g., on my GTX 660 Ti). The question has been asked on the Folding forum as to whether that is necessary, and the answer is that Nvidia has not implemented the option in OpenCL to use less than a full core. Apparently they could if they wanted to, but maybe for performance reasons (so the speculation goes) they want their cards to perform the best they can, so they just grab the whole core. It helps solve the problem you mentioned above, where users don't always know to leave a core free I suppose. | |
ID: 32537 | Rating: 0 | rate: / Reply Quote | |
But I'm interested by the OpenCL on Nvidia point. That does seem to be a common observation - I wonder if it has necessarily to be so? Or maybe Mvidia didn't port some of their synch technology from CUDA to the OpenCL toolchain yet? That was my suspicion too. In trying to pass messages between the two developers - apparently the new CUDA way is to use 'callback' rather than 'spin' synch - I was invited to refer to the NVidia toolkit documentation to find examples for the OpenCL implementation. I couldn't find any. If there are any unbiased developer observers of this thread, it would be useful to hear if there is any factual basis for our observations - and for the rumour I've heard that NVidia might pull away from OpenCL support entirely. That would be a shame, if true - both NVidia and ATI (as it was then) were founder members of the Khronos Group in January 2000. It would be a pity if competition drove out collaboration, and we returned to the days of two incompatible native-code development environments. | |
ID: 32539 | Rating: 0 | rate: / Reply Quote | |
Perhaps you might create a new thread devoted toward finding the OpenCL/CUDA information. | |
ID: 32540 | Rating: 0 | rate: / Reply Quote | |
Perhaps you might create a new thread devoted toward finding the OpenCL/CUDA information. And NOELIA_KLEBEbeta's which run fine by the way on my 660 and 770 with 8.02! Noelia and MJH did a good job with this. ____________ Greetings from TJ | |
ID: 32541 | Rating: 0 | rate: / Reply Quote | |
http://www.gpugrid.net/result.php?resultid=7221215 | |
ID: 32544 | Rating: 0 | rate: / Reply Quote | |
King's Own. 8.00 is deprecated - your problem is fixed in the current release. | |
ID: 32545 | Rating: 0 | rate: / Reply Quote | |
Thank you. | |
ID: 32546 | Rating: 0 | rate: / Reply Quote | |
IM getting zero output file errors every few minutes and it slams the fan on my card and resets the task. | |
ID: 32551 | Rating: 0 | rate: / Reply Quote | |
Richard wrote: Or maybe Mvidia didn't port some of their synch technology from CUDA to the OpenCL toolchain yet? That's what I suppose as well, without being a GPU developer. Over a year ago nVidias performance at POEM OpenCL was horrible, but they only used ~50% of one core. A driver update doubled performance but since then they're using a full CPu core. To me it seems like "just use a full core" was a quick fix. And now they don't want to push OpenCL any further than they have to and just stick with this solution. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 32553 | Rating: 0 | rate: / Reply Quote | |
How can you all see that a full core is used with one GPUGRID WU? | |
ID: 32560 | Rating: 0 | rate: / Reply Quote | |
How can you all see that a full core is used with one GPUGRID WU? It can be checked in the Windows Task Manager: look for the acemd.80x-55.exe (or acemd.80x-42.exe) on the "Processes" tab. If its CPU usage is 1-2%, then it's not using a full core, otherwise the CPU usage is 100/the number of your CPU's threads (12-13% on a 8-threaded CPU, 8% on a 12-threaded CPU). You can check the past workunits' CPU usage at your hosts' task list: if the "CPU time" (almost) equals the "run time", then the task used a full core, if the "CPU time" is significantly less than the "run time", then it didn't use a full core. | |
ID: 32562 | Rating: 0 | rate: / Reply Quote | |
Just note that the NOELIA_KLEBE WU's don't use a full CPU core/thread - Never have. | |
ID: 32564 | Rating: 0 | rate: / Reply Quote | |
Thanks for the info, skgiven. | |
ID: 32568 | Rating: 0 | rate: / Reply Quote | |
How can you all see that a full core is used with one GPUGRID WU? Thanks Zoltan, This is what I thought and in this way I look at task manager. With Noelia WU, the one we have now and in the past use 1-3%. Rosetta is using 13% per core. I have also seen Nathans not using less then 13% and Santi's that use not 13% all the time. It was fluctuating from 2% steady to 11% for seconds and then back to 2% again. But I am not watching task manager a lot. ____________ Greetings from TJ | |
ID: 32571 | Rating: 0 | rate: / Reply Quote | |
I had a NOEL_KLEBEbeta WU error out because of this: | |
ID: 32575 | Rating: 0 | rate: / Reply Quote | |
I had another NOEL_KLEBEbeta WU error out. | |
ID: 32576 | Rating: 0 | rate: / Reply Quote | |
I updated to the 326.98 driver last night. The 1 NOEL_KLEBEbeta WU I received failed with 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED error message. | |
ID: 32587 | Rating: 0 | rate: / Reply Quote | |
I updated to the 326.98 driver last night. The 1 NOEL_KLEBEbeta WU I received failed with 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED error message. Spoke too soon. One of the MJHARVEY_TEST betas failed with the same 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED error message. http://www.gpugrid.net/result.php?resultid=7233613 | |
ID: 32598 | Rating: 0 | rate: / Reply Quote | |
Just a note: there are also NOELA_KLEBE WUs on the acemdbeta queue. Somewhat confusingly, those are test WUs for the beta app and aren't part of this batch. If you have problems, please check the application that was used and report it over on thread about the beta application if appropriate: | |
ID: 32678 | Rating: 0 | rate: / Reply Quote | |
Message boards : News : WU: NOELIA_KLEBEs