NOELIA tasks - when suspended or exited, often crash drivers

Message boards : Number crunching : NOELIA tasks - when suspended or exited, often crash drivers
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29318 - Posted: 4 Apr 2013, 23:31:13 UTC

Devs,

If a NOELIA short run task is running, then:
- when I exit BOINC, or
- I suspend the task
...
it often crashes the NVIDIA driver, and leads to Computation Errors on tasks that are running across all GPUs, causing me to lose work, even from other projects.

This sounds very similar to what was happening when NOELIA tasks were in Beta.
Could you please investigate, and see if you can reproduce the issue?
Again, this is causing me to lose work for other projects. :(

Windows 8 x64, BOINC 7.0.60 Beta, nVidia 314.22 WHQL, GTX 660 Ti (usually runs 2 GPUGrid tasks), GTX 460 (usually runs 2 World Community Grid HCC tasks)
ID: 29318 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nucleon

Send message
Joined: 18 Dec 11
Posts: 10
Credit: 172,348,621
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwat
Message 29319 - Posted: 5 Apr 2013, 12:04:46 UTC - in response to Message 29318.  

Thanks.

You figured out why some of my machines are crashing. I'll make sure the task doesn't suspend.

-- Craig
ID: 29319 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tomba

Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29629 - Posted: 1 May 2013, 16:37:04 UTC - in response to Message 29318.  

... GTX 660 Ti (usually runs 2 GPUGrid tasks)

How do you get a GTX 660 TI to run TWO GPUGrid tasks??
ID: 29629 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29630 - Posted: 1 May 2013, 16:44:38 UTC - in response to Message 29629.  
Last modified: 1 May 2013, 16:46:00 UTC

How do you get a GTX 660 TI to run TWO GPUGrid tasks??

You use an app_config.xml file.
I'd recommend doing plenty of research beforehand, though, using the following links:
http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration
http://www.gpugrid.net/forum_thread.php?id=3319
http://www.gpugrid.net/forum_thread.php?id=3331

And if you happen to notice any tasks completing immediately while still granting credit, which is a bug we're still tracking down, then please discontinue the use of the app_config.xml file, and post your results/info here:
http://www.gpugrid.net/forum_thread.php?id=3332

Regards,
Jacob
ID: 29630 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tomba

Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29631 - Posted: 1 May 2013, 16:52:32 UTC - in response to Message 29630.  

How do you get a GTX 660 TI to run TWO GPUGrid tasks??

You use an app_config.xml file.
Jacob

Blimey!! I'm on the case!!!
Tom
ID: 29631 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jozef J

Send message
Joined: 7 Jun 12
Posts: 112
Credit: 1,140,895,172
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29676 - Posted: 4 May 2013, 14:07:35 UTC

it often crashes the NVIDIA driver, and leads to Computation Errors on tasks that are running across all GPUs, causing me to lose work, even from other projects---------
I have exactly the same problems, always few times per week/monts nvidia drivers crash-blue screen on my win 8 64bit and I can not get over 620k rac-two months. Before I attacked 650k and climb higher on same HW configurations
I've tried everything and the problem is clearly in favor of NVIDIA and GPUGRID. All the problems started about two months ago when all the people seeking solutions to find why they have problems in the noelia Tasks and cuda 4.2 .. the inability make available counting on TITAN nvidia cards probably just confirms problems with nvidia and GPUGRID. Or they have enough people to count the project..
ID: 29676 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30033 - Posted: 16 May 2013, 21:18:00 UTC - in response to Message 29676.  
Last modified: 16 May 2013, 21:18:48 UTC

GPUGrid.net Devs:

This is still a big problem for me.

If NOELIA tasks are suspended (which happens for me a lot because I have several <exclusive_app> settings configured in cc_config.xml).... It often crashes the driver, which crashes the game/program I'm about to run too!

The error is:
Display driver stopped responding and has recovered

You should be able to easily reproduce this by letting a NOELIA task run for a bit, then hit the Suspend button. Do that over and over, 30 times, and see if you get any errors.

My only workaround right now is, if I know I'm going to be suspending the GPU because I want to run a certain application, I have to suspend it manually, to let it crash the driver, before I run the application. You have made it so I cannot use <exclusive_app> or <exclusive_gpu_app> effectively anymore.

PLEASE FIX THIS!
- Jacob Klein
ID: 30033 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dylan

Send message
Joined: 16 Jul 12
Posts: 98
Credit: 386,043,752
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 30035 - Posted: 17 May 2013, 0:49:08 UTC

I have this exact same problem. My specs are
Windows 8x64
2 GTX 670
314.07 WHQL driver
Boinc version 7.0.64 x64

My cpu is an Intel i7-3820 overclocked to 4.5 GHz which runs 5 WU's of World Community Grid and the rest of the cores left to power the GPU's.
ID: 30035 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tomba

Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30290 - Posted: 24 May 2013, 14:34:41 UTC

Today I replaced my GTX 460 with a GTX 660. My first WU is a Noelia, which looks like it will complete in 12 hours; 25% done in three hours. Much better!

I added the app_config.xml file posted here, which BOINC included in its startup.

I am disappointed not to have seen a second WU running yet. Any thoughts?

BOINC log below.


ID: 30290 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30292 - Posted: 24 May 2013, 14:45:33 UTC - in response to Message 30290.  

Today I replaced my GTX 460 with a GTX 660. My first WU is a Noelia, which looks like it will complete in 12 hours; 25% done in three hours. Much better!
I added the app_config.xml file posted here, which BOINC included in its startup.
I am disappointed not to have seen a second WU running yet. Any thoughts?

You didn't say what you have in your app config, just "posted here". I don't see it in this thread at least. Apparently you're trying to run 2 WUs concurrently. If so, they won't make the 24 hour deadline. The new NATHANS are even longer. Are you trying to increase your credit? Even if they run without problem, you will end up with lower credit than running 1X on a GTX 660.
ID: 30292 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tomba

Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30299 - Posted: 24 May 2013, 15:33:06 UTC - in response to Message 30292.  

You didn't say what you have in your app config, just "posted here". I don't see it in this thread at least.

Thank you for responding. You're right. In this thread there is only a pointer to another thread. Sorry for the confusion.

Apparently you're trying to run 2 WUs concurrently. If so, they won't make the 24 hour deadline. The new NATHANS are even longer. Are you trying to increase your credit? Even if they run without problem, you will end up with lower credit than running 1X on a GTX 660.

Ah! That's not what I had understood: that 50% + 50% = 100% but no bonuses... I just wonder why there has been so much kerfuffle here on a 'feature' (2x) that benefits no-one.

Whatever, if only for a challenge I'd like to give 2x a try. Can you tell me why it does not work for the .XML file below? Thanks.

<app_config>
<app>
<name>acemdlong</name>
<max_concurrent>9999</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>0.001</cpu_usage>
</gpu_versions>
</app>
<app>
<name>acemd2</name>
<max_concurrent>9999</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>0.001</cpu_usage>
</gpu_versions>
</app>
<app>
<name>acemdshort</name>
<max_concurrent>9999</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>0.001</cpu_usage>
</gpu_versions>
</app>
</app_config>

ID: 30299 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30302 - Posted: 24 May 2013, 15:50:16 UTC - in response to Message 30299.  

Apparently you're trying to run 2 WUs concurrently. If so, they won't make the 24 hour deadline. The new NATHANS are even longer. Are you trying to increase your credit? Even if they run without problem, you will end up with lower credit than running 1X on a GTX 660.

Ah! That's not what I had understood: that 50% + 50% = 100% but no bonuses... I just wonder why there has been so much kerfuffle here on a 'feature' (2x) that benefits no-one.

Jacob was running 2X on his 660 Ti with 3GB on the MUCH shorter NATHAN WUs that are now unfortunately gone. If your GPU won't make the 24hr deadline (including DL, UL & reporting time), then you will miss the 24hr bonus and your credit will take a significant hit. That's even if everything runs optimally: errors are likely to be more frequent. Running 1X should be better for the project too as the time from WU generation to WU completion will most likely be less, an issue here.
ID: 30302 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
captainjack

Send message
Joined: 9 May 13
Posts: 171
Credit: 4,594,296,466
RAC: 127
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30304 - Posted: 24 May 2013, 16:15:39 UTC

tomba,

To answer you question about the app_config, the
<gpu_usage>1</gpu_usage>

statement tells BOINC how many gpu's to use for each task. Currently it is set to use 1 full GPU for each task so only 1 task will run on each gpu at a time. If you set it to
<gpu_usage>0.5</gpu_usage>

that would tell BOINC to use half of a GPU for each task which would allow 2 tasks to run on each GPU.

Be sure to post your test results so we can see if it helped.
ID: 30304 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tomba

Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30305 - Posted: 24 May 2013, 17:05:28 UTC - in response to Message 30304.  

tomba,

To answer you question about the app_config, the
<gpu_usage>1</gpu_usage>

statement tells BOINC how many gpu's to use for each task. Currently it is set to use 1 full GPU for each task so only 1 task will run on each gpu at a time. If you set it to
<gpu_usage>0.5</gpu_usage>

that would tell BOINC to use half of a GPU for each task which would allow 2 tasks to run on each GPU.

Be sure to post your test results so we can see if it helped.


Wow! That works! Thank you!! I'm now running two Noelias:



...and below are the pre- and post- 2x results from my GPU Monitor gadget.

I will certainly report back on results! Many thanks.



[/img]
ID: 30305 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tomba

Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30307 - Posted: 24 May 2013, 17:31:52 UTC - in response to Message 30305.  

I will certainly report back on results!

It's very early days but, for both running Noelia WUs, the "Remaining (estimated)" time is counting down much faster than one per second...
ID: 30307 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30310 - Posted: 24 May 2013, 18:30:03 UTC - in response to Message 30307.  

tomba,

When comparing "before" and "after", to see how much faster the tasks are processed, assuming the tasks have the same amount of work to be done in them, then... it's best to look at the "Run Time" value of the results, after they're done.

Also, this thread is about NOELIA tasks crashing. I'm interested in your results, but I have created a thread that documents the performance testing/results of app_config changes; could you please post to it instead? It's called "GPU Task Performance (vs. CPU core usage, app_config, multiple GPU tasks on 1 GPU, etc.)", and is located here: http://www.gpugrid.net/forum_thread.php?id=3331

I'd like to keep this thread focused on the NOELIA problems, which crash the drivers. It's an ongoing issue, and I'm hoping the admins will please look into it :(

Regards,
Jacob
ID: 30310 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30375 - Posted: 26 May 2013, 0:23:50 UTC - in response to Message 29318.  
Last modified: 26 May 2013, 0:26:26 UTC

Developers:

Has anything been done about this?

NOELIA tasks are still, when suspended, sometimes:
- crashing my drivers
- yielding computation errors even for other GPU tasks
- crashing the whole OS with DPC Watchdog Timeout errors

Please fix this obnoxious behavior!
I originally posted this thread over 6 weeks ago.
Where is the response?



Devs,

If a NOELIA short run task is running, then:
- when I exit BOINC, or
- I suspend the task
...
it often crashes the NVIDIA driver, and leads to Computation Errors on tasks that are running across all GPUs, causing me to lose work, even from other projects.

This sounds very similar to what was happening when NOELIA tasks were in Beta.
Could you please investigate, and see if you can reproduce the issue?
Again, this is causing me to lose work for other projects. :(

Windows 8 x64, BOINC 7.0.60 Beta, nVidia 314.22 WHQL, GTX 660 Ti (usually runs 2 GPUGrid tasks), GTX 460 (usually runs 2 World Community Grid HCC tasks)
ID: 30375 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 30405 - Posted: 26 May 2013, 11:46:59 UTC - in response to Message 30375.  

I will forward it. From a quick forum search it seems to be W7/W8 and driver related. So there might not be much we can do. Are you certain it only happens with Noelias and no other WUs?
Unfortunately you need to keep in mind that at this point there are no dedicated GPUGrid (software) "developers". Our manpower is very limited and the user base very big. This means that with all the hardware/software combinations that work in GPUGrid there are bound to be problems for which we do not have the manpower to fix. We also prefer to dedicate more time on science for obvious reasons.
However since this seems to happen to a few users I will pass it along to MJH who might know something more. Let's hope we find a fix.
ID: 30405 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30406 - Posted: 26 May 2013, 11:56:48 UTC - in response to Message 30405.  
Last modified: 26 May 2013, 11:57:16 UTC

I will forward it. From a quick forum search it seems to be W7/W8 and driver related. So there might not be much we can do. Are you certain it only happens with Noelias and no other WUs?
Unfortunately you need to keep in mind that at this point there are no dedicated GPUGrid (software) "developers". Our manpower is very limited and the user base very big. This means that with all the hardware/software combinations that work in GPUGrid there are bound to be problems for which we do not have the manpower to fix. We also prefer to dedicate more time on science for obvious reasons.
However since this seems to happen to a few users I will pass it along to MJH who might know something more. Let's hope we find a fix.


The issue happens sometimes when GPU tasks are suspended. This means it will hopefully be easy for you guys to reproduce.

I believe I've only seen the problem on NOELIA tasks. For reference, I'm using Windows 8 x64, with the new v320.18 WHQL drivers.

It should be a matter of letting the task run for a some time (15 seconds), then suspending it... then just keep doing that several times, and hopefully you'll see the problem after a few tries. I'd be curious to know if you (or anyone in GPUGrid) can reproduce it?

Thanks,
Jacob
ID: 30406 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 30418 - Posted: 26 May 2013, 16:43:49 UTC - in response to Message 30406.  

Guys,

Is the crash on the suspend or the restart? Do you have "keep application in memory when suspended" set? What if you change to the alternative?

Matt
ID: 30418 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : NOELIA tasks - when suspended or exited, often crash drivers

©2025 Universitat Pompeu Fabra