Recent hard drive failure

Message boards : Graphics cards (GPUs) : Recent hard drive failure
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Stephenish

Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 13175 - Posted: 14 Oct 2009, 23:40:55 UTC

I recently had a WD Velociraptor refuse to boot.

Previous to the event, I noticed the hard drive was making noise. Well, I thought hardware failure, what are you going to do. So, I RMAed the faulty hard drive thinking there was some manufacturing fault.

So, the new drive arrives. I reload Windows, drivers, and etc. Then I start up BOINC for the first time on the new drive. As soon as the GPUGRID WUs load up, the hard drive (the new one) starts making the same noises as the previous failed drive.

Does GPUGRID obey the disk drive preferences of normal BOINC applications? I am boycotting GPUGRID until something changes.
ID: 13175 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 13195 - Posted: 16 Oct 2009, 22:49:15 UTC - in response to Message 13175.  

That's decided by the client itself. Sorry you have to blame something else.

gdf
ID: 13195 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zpm
Avatar

Send message
Joined: 2 Mar 09
Posts: 159
Credit: 13,639,818
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 13196 - Posted: 16 Oct 2009, 23:43:07 UTC - in response to Message 13195.  
Last modified: 16 Oct 2009, 23:43:41 UTC

raptors are sometimes have had a bad rap for coming out bad...

best thing i suggest, a SolidStateDrive.

in the settings of boinc, change the write to disk time... that may help.
ID: 13196 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stephenish

Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 13197 - Posted: 17 Oct 2009, 1:26:22 UTC - in response to Message 13196.  

Sorry if I gave you a bad rap, but no other BOINC project has the same effect.
ID: 13197 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stephenish

Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 13198 - Posted: 17 Oct 2009, 1:32:02 UTC - in response to Message 13196.  

raptors are sometimes have had a bad rap for coming out bad...

best thing i suggest, a SolidStateDrive.

in the settings of boinc, change the write to disk time... that may help.


I doubled the write to disk time from 30 sec. to 60 sec. seemingly to no effect.
ID: 13198 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stephenish

Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 13199 - Posted: 17 Oct 2009, 1:35:46 UTC - in response to Message 13198.  

The rig in question has 12 GB of good RAM.

I'm wondering why the disk is active at all.
ID: 13199 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Nognlite

Send message
Joined: 9 Nov 08
Posts: 69
Credit: 25,106,923
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 13201 - Posted: 17 Oct 2009, 3:18:26 UTC - in response to Message 13199.  
Last modified: 17 Oct 2009, 3:20:23 UTC

I found that the indexing service in Vista access the drive quite frequently, so I shut it off. It helped the problem a bit but I have the same drive access problems you describe. One drive is a 150Gb Raptor and the other is a 500Gb Seagate.

I have also tried setting the disk access to 300sec with no success.

Looks like another feature that does not work like the remote access using the BOINC client, but that is for another thread. I understand that BOINC uses checkpoints and I expect the drive to access but not as often as it does.


Pat
ID: 13201 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MarkJ
Volunteer moderator
Volunteer tester

Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13202 - Posted: 17 Oct 2009, 3:53:44 UTC

It depends on the science app if it will check point and how often. BOINC has the setting that you have adjusted, but its still determined by the science app how often to do a write. Usually there is little overhead on a disk write as the files are fairly small, but updated frequently so they stay in the cache.

On all my crunching rigs I have the print spooler and the indexing services disabled, so there is less competition for disk access. The drive LED is blinking every 2 seconds, but I have i7's so typically 8-10 tasks running at a time all doing their checkpoints and result files.
BOINC blog
ID: 13202 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
cadbane

Send message
Joined: 7 Jun 09
Posts: 24
Credit: 1,149,643,416
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13208 - Posted: 17 Oct 2009, 6:58:26 UTC

I second what zpm said about the velociraptors being faulty.

Theres a thread over on storagereviews forum about it, maybe you should read it.

http://forums.storagereview.net/index.php?showtopic=27303&st=50

ID: 13208 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stephenish

Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 13224 - Posted: 18 Oct 2009, 20:55:28 UTC - in response to Message 13208.  

I second what zpm said about the velociraptors being faulty.

Theres a thread over on storagereviews forum about it, maybe you should read it.

http://forums.storagereview.net/index.php?showtopic=27303&st=50



Excellent link, very informative. Thank you.
ID: 13224 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stephenish

Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14104 - Posted: 3 Jan 2010, 16:26:50 UTC - in response to Message 13224.  

Well, I RMAed the velociraptor, flashed the drive with new firmware, but still if I turn off GPUGrid, the hard drive calms down. With GPUGrid running, the drive is working constantly. If I run a game, even with low graphics, the drive begins to rattle or clatter. I wonder if anyone else is experiencing this same issue. Perhaps there is a workaround.
ID: 14104 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael Goetz
Avatar

Send message
Joined: 2 Mar 09
Posts: 124
Credit: 124,873,744
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 14119 - Posted: 4 Jan 2010, 15:14:58 UTC - in response to Message 14104.  

While the velociraptor firmwear problem is a serious issue, it doesn't seem that it would explain the disk activity you're seeing while running GPUGRID.

I'm also running GPUGRID under Vista SP2 and BOINC client 6.10.18, and looking at the process in the task manager shows exactly the behavior one would expect: a low memory footprint (around 60 megs), minimal I/O and minimal page faults. No untoward disk activity.

So what's different?

There's two significant differences between your machine and mine. The first is that you have two GPUs while I only have one, and the second is that you are running a later driver version (195 vs. my 191). If I had to guess, it's either a driver problem or an issue with dual GPUs that's causing the disk access.

You said that the disk access only occurs with GPUGRID -- are you running any other CUDA projects? I doubt it's an issue with GPUGRID (*nobody* else has ever reported anything like this to my knowledge), but it might have something to do with any project that uses the GPU.

Oh, one thing you could check: are you running BOINC as a service? My understanding is that when using CUDA, BOINC shouldn't be a service. I don't know what exactly breaks, but maybe this is what happens? Just a shot in the dark here; I could be barking up the wrong tree altogether.

Mike

P.S. You mentioned changing your checkpoint interval from 30 to 60 seconds. On this machine, I increased the interval to 300 seconds (5 minutes). I don't suspend tasks while the user is active, and the task-switch interval is 24 hours (allowing most tasks to complete in one shot), so I don't have a lot of tasks being preempted. Checkpointing slows the tasks down and keeps the disk busy (especially with multi-core & multi-GPU systems), and isn't really necessary if you're not preempting the tasks frequently.
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

ID: 14119 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stephenish

Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14121 - Posted: 4 Jan 2010, 23:26:34 UTC - in response to Message 14104.  
Last modified: 4 Jan 2010, 23:28:04 UTC

BOINC is running as application. I turned off Windows Search, updated my NVIDIA drivers. Still, the hard drive is working overtime and occasionally hiccups (the drive stops and becomes quiet), interrupting graphics applications such as games. Oddly, if there is a hiccup while I'm at the desktop, the clock's second hand keeps running. I guess the clock runs in memory. Still, if I suspend GPUGRID in BOINC manager, the drive quiets down. I changed the checkpoint interval to 300 seconds. No effect. Check local prefs, also no effect. No other CUDA applications are currently active. (More than not active, none have ever been loaded.)
ID: 14121 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael Goetz
Avatar

Send message
Joined: 2 Mar 09
Posts: 124
Credit: 124,873,744
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 14123 - Posted: 5 Jan 2010, 0:15:48 UTC - in response to Message 14121.  

As a diagnostic tool, you might want to try connecting to one of the other CUDA projects such as SETI or Milkyway just to see if you have the same problem with those. That will at least let you know if it's a generic problem with the CUDA installation on your system or something specific to GPUGRID.

Those two projects have both CPU and GPU applications, so before you download any tasks go to the preferences part of "your account" on their website to deselect the CPU tasks. For testing purposes, Milkyway is probably best -- it's WUs are VERY short. Just set your cache to 0 so you don't download a gazillion WUs, and you should get only a single task for testing. Those only take about 15 minutes to run.

Also, SETI is down until at least Tuesday morning PST, so Milkyway is your best bet for this test.
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

ID: 14123 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14133 - Posted: 6 Jan 2010, 2:41:53 UTC - in response to Message 14121.  

BOINC is running as application. I turned off Windows Search, updated my NVIDIA drivers. Still, the hard drive is working overtime and occasionally hiccups (the drive stops and becomes quiet), interrupting graphics applications such as games. Oddly, if there is a hiccup while I'm at the desktop, the clock's second hand keeps running. I guess the clock runs in memory. Still, if I suspend GPUGRID in BOINC manager, the drive quiets down. I changed the checkpoint interval to 300 seconds. No effect. Check local prefs, also no effect. No other CUDA applications are currently active. (More than not active, none have ever been loaded.)


Do you have an antivirus program allowed to run when it chooses? Mine (Norton Internet Security 2010) keeps the disk active about half the time and does NOT seem to offer a way to pause the disk accesses when desired.
ID: 14133 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stephenish

Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14134 - Posted: 6 Jan 2010, 2:53:04 UTC - in response to Message 14123.  

Milkyway is keeping the hard drive fairly inactive. Unlike GPUGRID, I hear no hard drive noise.
ID: 14134 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stephenish

Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14135 - Posted: 6 Jan 2010, 2:54:46 UTC - in response to Message 14133.  

While I won't reveal the variety of Anti-virus/firewall which I use publically, rest assured there is no issue there.
ID: 14135 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stephenish

Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14136 - Posted: 6 Jan 2010, 2:57:28 UTC - in response to Message 14135.  

Mr. Goetz,

Do you suspect the CUDA driver may be an issue? If so, can you help me to roll back? Where is the download for previous versions of nVidia drivers?
ID: 14136 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael Goetz
Avatar

Send message
Joined: 2 Mar 09
Posts: 124
Credit: 124,873,744
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 14137 - Posted: 6 Jan 2010, 3:41:04 UTC - in response to Message 14134.  

Milkyway is keeping the hard drive fairly inactive. Unlike GPUGRID, I hear no hard drive noise.


That's decidedly odd. At this point, I admit to being totally stumped. I'm out of ideas. I can't think of anything that would cause this to happen with one CUDA application but not another.
ID: 14137 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael Goetz
Avatar

Send message
Joined: 2 Mar 09
Posts: 124
Credit: 124,873,744
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 14138 - Posted: 6 Jan 2010, 3:49:16 UTC - in response to Message 14136.  

Mr. Goetz,

Do you suspect the CUDA driver may be an issue? If so, can you help me to roll back? Where is the download for previous versions of nVidia drivers?


Go here:

http://www.nvidia.com/Download/Find.aspx?lang=en-us

Enter the correct info (GTX260, Vista 64, etc.), and you'll get a page that lists all the archived versions of the driver.

I did notice another difference -- I'm running 32 bit and you're running 64 bit.

Maybe one of the GPUGRID project guys has an idea what's going on. There's really no reason a project should be doing disk access like that.

Mike

ID: 14138 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Graphics cards (GPUs) : Recent hard drive failure

©2026 Universitat Pompeu Fabra