Message boards :
Number crunching :
NOELIA tasks - when suspended or exited, often crash drivers
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Devs, If a NOELIA short run task is running, then: - when I exit BOINC, or - I suspend the task ... it often crashes the NVIDIA driver, and leads to Computation Errors on tasks that are running across all GPUs, causing me to lose work, even from other projects. This sounds very similar to what was happening when NOELIA tasks were in Beta. Could you please investigate, and see if you can reproduce the issue? Again, this is causing me to lose work for other projects. :( Windows 8 x64, BOINC 7.0.60 Beta, nVidia 314.22 WHQL, GTX 660 Ti (usually runs 2 GPUGrid tasks), GTX 460 (usually runs 2 World Community Grid HCC tasks) |
|
Send message Joined: 18 Dec 11 Posts: 10 Credit: 172,348,621 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thanks. You figured out why some of my machines are crashing. I'll make sure the task doesn't suspend. -- Craig |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
... GTX 660 Ti (usually runs 2 GPUGrid tasks) How do you get a GTX 660 TI to run TWO GPUGrid tasks?? |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
How do you get a GTX 660 TI to run TWO GPUGrid tasks?? You use an app_config.xml file. I'd recommend doing plenty of research beforehand, though, using the following links: http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration http://www.gpugrid.net/forum_thread.php?id=3319 http://www.gpugrid.net/forum_thread.php?id=3331 And if you happen to notice any tasks completing immediately while still granting credit, which is a bug we're still tracking down, then please discontinue the use of the app_config.xml file, and post your results/info here: http://www.gpugrid.net/forum_thread.php?id=3332 Regards, Jacob |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
How do you get a GTX 660 TI to run TWO GPUGrid tasks?? Blimey!! I'm on the case!!! Tom |
|
Send message Joined: 7 Jun 12 Posts: 112 Credit: 1,140,895,172 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
it often crashes the NVIDIA driver, and leads to Computation Errors on tasks that are running across all GPUs, causing me to lose work, even from other projects--------- I have exactly the same problems, always few times per week/monts nvidia drivers crash-blue screen on my win 8 64bit and I can not get over 620k rac-two months. Before I attacked 650k and climb higher on same HW configurations I've tried everything and the problem is clearly in favor of NVIDIA and GPUGRID. All the problems started about two months ago when all the people seeking solutions to find why they have problems in the noelia Tasks and cuda 4.2 .. the inability make available counting on TITAN nvidia cards probably just confirms problems with nvidia and GPUGRID. Or they have enough people to count the project.. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
GPUGrid.net Devs: This is still a big problem for me. If NOELIA tasks are suspended (which happens for me a lot because I have several <exclusive_app> settings configured in cc_config.xml).... It often crashes the driver, which crashes the game/program I'm about to run too! The error is: Display driver stopped responding and has recovered You should be able to easily reproduce this by letting a NOELIA task run for a bit, then hit the Suspend button. Do that over and over, 30 times, and see if you get any errors. My only workaround right now is, if I know I'm going to be suspending the GPU because I want to run a certain application, I have to suspend it manually, to let it crash the driver, before I run the application. You have made it so I cannot use <exclusive_app> or <exclusive_gpu_app> effectively anymore. PLEASE FIX THIS! - Jacob Klein |
|
Send message Joined: 16 Jul 12 Posts: 98 Credit: 386,043,752 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
I have this exact same problem. My specs are Windows 8x64 2 GTX 670 314.07 WHQL driver Boinc version 7.0.64 x64 My cpu is an Intel i7-3820 overclocked to 4.5 GHz which runs 5 WU's of World Community Grid and the rest of the cores left to power the GPU's. |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Today I replaced my GTX 460 with a GTX 660. My first WU is a Noelia, which looks like it will complete in 12 hours; 25% done in three hours. Much better! I added the app_config.xml file posted here, which BOINC included in its startup. I am disappointed not to have seen a second WU running yet. Any thoughts? BOINC log below. |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Today I replaced my GTX 460 with a GTX 660. My first WU is a Noelia, which looks like it will complete in 12 hours; 25% done in three hours. Much better! You didn't say what you have in your app config, just "posted here". I don't see it in this thread at least. Apparently you're trying to run 2 WUs concurrently. If so, they won't make the 24 hour deadline. The new NATHANS are even longer. Are you trying to increase your credit? Even if they run without problem, you will end up with lower credit than running 1X on a GTX 660. |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
You didn't say what you have in your app config, just "posted here". I don't see it in this thread at least. Thank you for responding. You're right. In this thread there is only a pointer to another thread. Sorry for the confusion. Apparently you're trying to run 2 WUs concurrently. If so, they won't make the 24 hour deadline. The new NATHANS are even longer. Are you trying to increase your credit? Even if they run without problem, you will end up with lower credit than running 1X on a GTX 660. Ah! That's not what I had understood: that 50% + 50% = 100% but no bonuses... I just wonder why there has been so much kerfuffle here on a 'feature' (2x) that benefits no-one. Whatever, if only for a challenge I'd like to give 2x a try. Can you tell me why it does not work for the .XML file below? Thanks. <app_config> <app> <name>acemdlong</name> <max_concurrent>9999</max_concurrent> <gpu_versions> <gpu_usage>1</gpu_usage> <cpu_usage>0.001</cpu_usage> </gpu_versions> </app> <app> <name>acemd2</name> <max_concurrent>9999</max_concurrent> <gpu_versions> <gpu_usage>1</gpu_usage> <cpu_usage>0.001</cpu_usage> </gpu_versions> </app> <app> <name>acemdshort</name> <max_concurrent>9999</max_concurrent> <gpu_versions> <gpu_usage>1</gpu_usage> <cpu_usage>0.001</cpu_usage> </gpu_versions> </app> </app_config> |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Apparently you're trying to run 2 WUs concurrently. If so, they won't make the 24 hour deadline. The new NATHANS are even longer. Are you trying to increase your credit? Even if they run without problem, you will end up with lower credit than running 1X on a GTX 660. Jacob was running 2X on his 660 Ti with 3GB on the MUCH shorter NATHAN WUs that are now unfortunately gone. If your GPU won't make the 24hr deadline (including DL, UL & reporting time), then you will miss the 24hr bonus and your credit will take a significant hit. That's even if everything runs optimally: errors are likely to be more frequent. Running 1X should be better for the project too as the time from WU generation to WU completion will most likely be less, an issue here. |
|
Send message Joined: 9 May 13 Posts: 171 Credit: 4,594,296,466 RAC: 127 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
tomba, To answer you question about the app_config, the <gpu_usage>1</gpu_usage> statement tells BOINC how many gpu's to use for each task. Currently it is set to use 1 full GPU for each task so only 1 task will run on each gpu at a time. If you set it to <gpu_usage>0.5</gpu_usage> that would tell BOINC to use half of a GPU for each task which would allow 2 tasks to run on each GPU. Be sure to post your test results so we can see if it helped. |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
tomba, Wow! That works! Thank you!! I'm now running two Noelias: ...and below are the pre- and post- 2x results from my GPU Monitor gadget. I will certainly report back on results! Many thanks. [/img] |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I will certainly report back on results! It's very early days but, for both running Noelia WUs, the "Remaining (estimated)" time is counting down much faster than one per second... |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
tomba, When comparing "before" and "after", to see how much faster the tasks are processed, assuming the tasks have the same amount of work to be done in them, then... it's best to look at the "Run Time" value of the results, after they're done. Also, this thread is about NOELIA tasks crashing. I'm interested in your results, but I have created a thread that documents the performance testing/results of app_config changes; could you please post to it instead? It's called "GPU Task Performance (vs. CPU core usage, app_config, multiple GPU tasks on 1 GPU, etc.)", and is located here: http://www.gpugrid.net/forum_thread.php?id=3331 I'd like to keep this thread focused on the NOELIA problems, which crash the drivers. It's an ongoing issue, and I'm hoping the admins will please look into it :( Regards, Jacob |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Developers: Has anything been done about this? NOELIA tasks are still, when suspended, sometimes: - crashing my drivers - yielding computation errors even for other GPU tasks - crashing the whole OS with DPC Watchdog Timeout errors Please fix this obnoxious behavior! I originally posted this thread over 6 weeks ago. Where is the response? Devs, |
|
Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
I will forward it. From a quick forum search it seems to be W7/W8 and driver related. So there might not be much we can do. Are you certain it only happens with Noelias and no other WUs? Unfortunately you need to keep in mind that at this point there are no dedicated GPUGrid (software) "developers". Our manpower is very limited and the user base very big. This means that with all the hardware/software combinations that work in GPUGrid there are bound to be problems for which we do not have the manpower to fix. We also prefer to dedicate more time on science for obvious reasons. However since this seems to happen to a few users I will pass it along to MJH who might know something more. Let's hope we find a fix. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I will forward it. From a quick forum search it seems to be W7/W8 and driver related. So there might not be much we can do. Are you certain it only happens with Noelias and no other WUs? The issue happens sometimes when GPU tasks are suspended. This means it will hopefully be easy for you guys to reproduce. I believe I've only seen the problem on NOELIA tasks. For reference, I'm using Windows 8 x64, with the new v320.18 WHQL drivers. It should be a matter of letting the task run for a some time (15 seconds), then suspending it... then just keep doing that several times, and hopefully you'll see the problem after a few tries. I'd be curious to know if you (or anyone in GPUGrid) can reproduce it? Thanks, Jacob |
MJHSend message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]()
|
Guys, Is the crash on the suspend or the restart? Do you have "keep application in memory when suspended" set? What if you change to the alternative? Matt |
©2025 Universitat Pompeu Fabra