Message boards :
Graphics cards (GPUs) :
Recent hard drive failure
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 24 Mar 09 Posts: 37 Credit: 35,698,253 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I recently had a WD Velociraptor refuse to boot. Previous to the event, I noticed the hard drive was making noise. Well, I thought hardware failure, what are you going to do. So, I RMAed the faulty hard drive thinking there was some manufacturing fault. So, the new drive arrives. I reload Windows, drivers, and etc. Then I start up BOINC for the first time on the new drive. As soon as the GPUGRID WUs load up, the hard drive (the new one) starts making the same noises as the previous failed drive. Does GPUGRID obey the disk drive preferences of normal BOINC applications? I am boycotting GPUGRID until something changes. |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
That's decided by the client itself. Sorry you have to blame something else. gdf |
|
Send message Joined: 2 Mar 09 Posts: 159 Credit: 13,639,818 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
raptors are sometimes have had a bad rap for coming out bad... best thing i suggest, a SolidStateDrive. in the settings of boinc, change the write to disk time... that may help. |
|
Send message Joined: 24 Mar 09 Posts: 37 Credit: 35,698,253 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Sorry if I gave you a bad rap, but no other BOINC project has the same effect. |
|
Send message Joined: 24 Mar 09 Posts: 37 Credit: 35,698,253 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
raptors are sometimes have had a bad rap for coming out bad... I doubled the write to disk time from 30 sec. to 60 sec. seemingly to no effect. |
|
Send message Joined: 24 Mar 09 Posts: 37 Credit: 35,698,253 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The rig in question has 12 GB of good RAM. I'm wondering why the disk is active at all. |
NognliteSend message Joined: 9 Nov 08 Posts: 69 Credit: 25,106,923 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I found that the indexing service in Vista access the drive quite frequently, so I shut it off. It helped the problem a bit but I have the same drive access problems you describe. One drive is a 150Gb Raptor and the other is a 500Gb Seagate. I have also tried setting the disk access to 300sec with no success. Looks like another feature that does not work like the remote access using the BOINC client, but that is for another thread. I understand that BOINC uses checkpoints and I expect the drive to access but not as often as it does. Pat |
|
Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It depends on the science app if it will check point and how often. BOINC has the setting that you have adjusted, but its still determined by the science app how often to do a write. Usually there is little overhead on a disk write as the files are fairly small, but updated frequently so they stay in the cache. On all my crunching rigs I have the print spooler and the indexing services disabled, so there is less competition for disk access. The drive LED is blinking every 2 seconds, but I have i7's so typically 8-10 tasks running at a time all doing their checkpoints and result files. BOINC blog |
|
Send message Joined: 7 Jun 09 Posts: 24 Credit: 1,149,643,416 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I second what zpm said about the velociraptors being faulty. Theres a thread over on storagereviews forum about it, maybe you should read it. http://forums.storagereview.net/index.php?showtopic=27303&st=50 |
|
Send message Joined: 24 Mar 09 Posts: 37 Credit: 35,698,253 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I second what zpm said about the velociraptors being faulty. Excellent link, very informative. Thank you. |
|
Send message Joined: 24 Mar 09 Posts: 37 Credit: 35,698,253 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Well, I RMAed the velociraptor, flashed the drive with new firmware, but still if I turn off GPUGrid, the hard drive calms down. With GPUGrid running, the drive is working constantly. If I run a game, even with low graphics, the drive begins to rattle or clatter. I wonder if anyone else is experiencing this same issue. Perhaps there is a workaround. |
Michael GoetzSend message Joined: 2 Mar 09 Posts: 124 Credit: 124,873,744 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
While the velociraptor firmwear problem is a serious issue, it doesn't seem that it would explain the disk activity you're seeing while running GPUGRID. I'm also running GPUGRID under Vista SP2 and BOINC client 6.10.18, and looking at the process in the task manager shows exactly the behavior one would expect: a low memory footprint (around 60 megs), minimal I/O and minimal page faults. No untoward disk activity. So what's different? There's two significant differences between your machine and mine. The first is that you have two GPUs while I only have one, and the second is that you are running a later driver version (195 vs. my 191). If I had to guess, it's either a driver problem or an issue with dual GPUs that's causing the disk access. You said that the disk access only occurs with GPUGRID -- are you running any other CUDA projects? I doubt it's an issue with GPUGRID (*nobody* else has ever reported anything like this to my knowledge), but it might have something to do with any project that uses the GPU. Oh, one thing you could check: are you running BOINC as a service? My understanding is that when using CUDA, BOINC shouldn't be a service. I don't know what exactly breaks, but maybe this is what happens? Just a shot in the dark here; I could be barking up the wrong tree altogether. Mike P.S. You mentioned changing your checkpoint interval from 30 to 60 seconds. On this machine, I increased the interval to 300 seconds (5 minutes). I don't suspend tasks while the user is active, and the task-switch interval is 24 hours (allowing most tasks to complete in one shot), so I don't have a lot of tasks being preempted. Checkpointing slows the tasks down and keeps the disk busy (especially with multi-core & multi-GPU systems), and isn't really necessary if you're not preempting the tasks frequently. Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.
|
|
Send message Joined: 24 Mar 09 Posts: 37 Credit: 35,698,253 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
BOINC is running as application. I turned off Windows Search, updated my NVIDIA drivers. Still, the hard drive is working overtime and occasionally hiccups (the drive stops and becomes quiet), interrupting graphics applications such as games. Oddly, if there is a hiccup while I'm at the desktop, the clock's second hand keeps running. I guess the clock runs in memory. Still, if I suspend GPUGRID in BOINC manager, the drive quiets down. I changed the checkpoint interval to 300 seconds. No effect. Check local prefs, also no effect. No other CUDA applications are currently active. (More than not active, none have ever been loaded.) |
Michael GoetzSend message Joined: 2 Mar 09 Posts: 124 Credit: 124,873,744 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
As a diagnostic tool, you might want to try connecting to one of the other CUDA projects such as SETI or Milkyway just to see if you have the same problem with those. That will at least let you know if it's a generic problem with the CUDA installation on your system or something specific to GPUGRID. Those two projects have both CPU and GPU applications, so before you download any tasks go to the preferences part of "your account" on their website to deselect the CPU tasks. For testing purposes, Milkyway is probably best -- it's WUs are VERY short. Just set your cache to 0 so you don't download a gazillion WUs, and you should get only a single task for testing. Those only take about 15 minutes to run. Also, SETI is down until at least Tuesday morning PST, so Milkyway is your best bet for this test. Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.
|
robertmilesSend message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
BOINC is running as application. I turned off Windows Search, updated my NVIDIA drivers. Still, the hard drive is working overtime and occasionally hiccups (the drive stops and becomes quiet), interrupting graphics applications such as games. Oddly, if there is a hiccup while I'm at the desktop, the clock's second hand keeps running. I guess the clock runs in memory. Still, if I suspend GPUGRID in BOINC manager, the drive quiets down. I changed the checkpoint interval to 300 seconds. No effect. Check local prefs, also no effect. No other CUDA applications are currently active. (More than not active, none have ever been loaded.) Do you have an antivirus program allowed to run when it chooses? Mine (Norton Internet Security 2010) keeps the disk active about half the time and does NOT seem to offer a way to pause the disk accesses when desired. |
|
Send message Joined: 24 Mar 09 Posts: 37 Credit: 35,698,253 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Milkyway is keeping the hard drive fairly inactive. Unlike GPUGRID, I hear no hard drive noise. |
|
Send message Joined: 24 Mar 09 Posts: 37 Credit: 35,698,253 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
While I won't reveal the variety of Anti-virus/firewall which I use publically, rest assured there is no issue there. |
|
Send message Joined: 24 Mar 09 Posts: 37 Credit: 35,698,253 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Mr. Goetz, Do you suspect the CUDA driver may be an issue? If so, can you help me to roll back? Where is the download for previous versions of nVidia drivers? |
Michael GoetzSend message Joined: 2 Mar 09 Posts: 124 Credit: 124,873,744 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Milkyway is keeping the hard drive fairly inactive. Unlike GPUGRID, I hear no hard drive noise. That's decidedly odd. At this point, I admit to being totally stumped. I'm out of ideas. I can't think of anything that would cause this to happen with one CUDA application but not another. |
Michael GoetzSend message Joined: 2 Mar 09 Posts: 124 Credit: 124,873,744 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Mr. Goetz, Go here: http://www.nvidia.com/Download/Find.aspx?lang=en-us Enter the correct info (GTX260, Vista 64, etc.), and you'll get a page that lists all the archived versions of the driver. I did notice another difference -- I'm running 32 bit and you're running 64 bit. Maybe one of the GPUGRID project guys has an idea what's going on. There's really no reason a project should be doing disk access like that. Mike |
©2026 Universitat Pompeu Fabra