Server outage - What happened?!
- Yesterday I received an email with the following message:
the automated fsck for /dev/sdb started failing with drive access errors. It then prompted me to run a manual fsck, which resulted in the same. I shut down your server and attempted to reseat the drive, however the BIOS is no longer detecting your secondary hard drive. Device /dev/sdb (80GB SATA: WMAJ91097639) has completely failed and needs to be replaced.
I was not happy. There were also problems with the web service (httpd) starting up due to issues with old user accounts on old domains that don't exist. This stopped the service from restarting after a new hard drive had been installed. All that has been corrected and the sites that I run are back up (there are about 50 (somewhat active) sites other than codewolf.com that rely on my server). I do need to check the back-up drives and the status of the files that were on that mirror drive that failed but we should be good to go for now.
The service from SoftLayer
where the server is hosted was perfect. They replaced the failed drive in minutes and notified me of the problem as soon as it happened.