Smartmontools
From TrustixWiki
Contents |
Introduction
Smartmontools consists of a daemon and a control program that will monitor S.M.A.R.T. devices such as hard disk drives, and will report if the drive is predicted to fail. By default it runs tests once every 30 minutes.
For TSL 2.2, you can install with 'swup --install smartmontools'. For some reason the package is not included with the TSL 3 distribution. You can install the version from TSL 2.2 by overriding the default repository location. Here is the command:
swup --install smartmontools --repository-URI \ http://http.trustix.org/pub/trustix/releases/trustix-2.2/i586/trustix/rdfs
You can also download the latest i386 binary RPM from http://smartmontools.sourceforge.net/; currently TSL 2.2 has 5.33 which is the latest (21-Dec-2005). You can get extensive information on how to configure and use this package at this Web site, too. This wiki entry is only intended to show you how easy it is to install and use. The package includes two programs: smartd, a daemon that monitors S.M.A.R.T. devices and smartctl, a command line utility used to interactively run tests and view results.
Installation
To use swup see the commands above. To use the Sourceforge package, after downloading the installation commands are
rpm -Uvh smartmontools-5.33-1.i386.rpm chkconfig --add smartd chkconfig smartd on
Try it
At this point you can try running smartctl. For example:
# smartctl --all /dev/sda smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: IBM DDYS-T18350N Version: SA2A Serial number: 4EYNJ170 Device type: disk Transport protocol: Fibre channel (FCP-2) Local Time is: Thu Jan 13 21:02:54 2005 PST Device supports SMART and is Enabled Temperature Warning Disabled or Not Supported SMART Health Status: OK Current Drive Temperature: 38 C Drive Trip Temperature: 85 C Manufactured in week 09 of year 2004 Current start stop count: 47 times Recommended maximum start stop count: 10000 times Error counter log: Errors Corrected by Total Correction Gigabytes Total EEC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 0 0 0 0 7.103 0 write: 0 0 0 0 0 154.953 0 Non-medium error count: 0 Error Events logging not supported SMART Self-test log Num Test Status segment LifeTime? LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background long Completed - 4817 - [- - -] # 2 Background long Completed - 4648 - [- - -] # 3 Background long Completed - 4480 - [- - -] # 4 Background long Completed - 4312 - [- - -] Long (extended) Self Test duration: 680 seconds [11.3 minutes]
Configuration
The configuration file is /etc/smartd.conf. smartd will try to do good things with the defaults in this file but will work better if you tailor it to each system on which it is installed. The configuration file is extensively commented; just edit it and follow the instructions it contains. Also refer to the smartd.conf man page. Basically, you need to list the drives you want it to monitor including when to run tests and who to notify (via email) of test results. Here is a sample configuration file for a system with one IDE and three SCSI drives:
# Do a short test every day and a long test on sunday /dev/hda -a -m root@somewhere.net -o on -S on -s (S/../.././02|L/../../7/03) /dev/sda -d scsi -m root@somewhere.net -s (S/../.././02|L/../../7/03) /dev/sdb -d scsi -m root@somewhere.net -s (S/../.././02|L/../../7/03) /dev/sdc -d scsi -m root@somewhere.net -s (S/../.././02|L/../../7/03)
Starting smartd
The daemon portion of the program has to be set to start at boot time.
TSL 2.2 package
chkconfig smartd on # start the daemon after reboots service smartd start # start the daemon right now
Sourceforge package
The distributed init script does not know about Trustix. The package maintainers say that Trustix will be supported in release 5.34; you have to edit one line; change
if [ -f /etc/redhat-release -o -f /etc/yellowdog-release -o -f /etc/mandrake-release -o -f /etc/whitebox-release ] ; then
to read
if [ -f /etc/redhat-release -o -f /etc/yellowdog-release -o -f /etc/mandrake-release -o -f /etc/whitebox-release -o -f /etc/trustix-release ] ; then
The command
service smartd start
should now work properly.
Configuration tips
By default smartd only logs its test results. To have smartd send reports via email if it detects problems, you have to edit the /etc/smartd.conf. If you have centralized syslog and/or have carefully configured logwatch to scan your logfiles you don't need to do this. You can do both -- having two programs tell you a disk is about to fail is better than finding out the one monitor system was not working after the drive has failed.
On Linux systems generally you can add options to the DEVICESCAN line; that way you don't need a custom file for each server. DEVICESCAN will find your installed devices. For example, chamge
DEVICESCAN
to
DEVICESCAN -m root@yourhost.com
Read the smartd man page to learn what other options are available.
May you be in heaven an hour before the devil knows your hard drives are dying. -- old Irish proverb
