Smartmontools

From TrustixWiki

Jump to: navigation, search

Contents

Introduction

Smartmontools consists of a daemon and a control program that will monitor S.M.A.R.T. devices such as hard disk drives, and will report if the drive is predicted to fail. By default it runs tests once every 30 minutes.

For TSL 2.2, you can install with 'swup --install smartmontools'. For some reason the package is not included with the TSL 3 distribution. You can install the version from TSL 2.2 by overriding the default repository location. Here is the command:

swup --install smartmontools --repository-URI  \
http://http.trustix.org/pub/trustix/releases/trustix-2.2/i586/trustix/rdfs

You can also download the latest i386 binary RPM from http://smartmontools.sourceforge.net/; currently TSL 2.2 has 5.33 which is the latest (21-Dec-2005). You can get extensive information on how to configure and use this package at this Web site, too. This wiki entry is only intended to show you how easy it is to install and use. The package includes two programs: smartd, a daemon that monitors S.M.A.R.T. devices and smartctl, a command line utility used to interactively run tests and view results.

Installation

To use swup see the commands above. To use the Sourceforge package, after downloading the installation commands are

rpm -Uvh smartmontools-5.33-1.i386.rpm
chkconfig --add smartd
chkconfig smartd on

Try it

At this point you can try running smartctl. For example:

# smartctl --all /dev/sda
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
Device: IBM      DDYS-T18350N     Version: SA2A
Serial number:         4EYNJ170
Device type: disk
Transport protocol: Fibre channel (FCP-2)
Local Time is: Thu Jan 13 21:02:54 2005 PST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK
Current Drive Temperature:     38 C
Drive Trip Temperature:        85 C
Manufactured in week 09 of year 2004
Current start stop count:      47 times
Recommended maximum start stop count:  10000 times
Error counter log:
          Errors Corrected by           Total   Correction     Gigabytes    Total
              EEC          rereads/    errors   algorithm      processed    uncorrected
          fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0          7.103           0
write:         0        0         0         0          0        154.953           0
Non-medium error count:        0
Error Events logging not supported
SMART Self-test log
Num  Test              Status                 segment  LifeTime?  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -  4817                   - [-   -    -]
# 2  Background long   Completed                   -  4648                   - [-   -    -]
# 3  Background long   Completed                   -  4480                   - [-   -    -]
# 4  Background long   Completed                   -  4312                   - [-   -    -]
Long (extended) Self Test duration: 680 seconds [11.3 minutes]

Configuration

The configuration file is /etc/smartd.conf. smartd will try to do good things with the defaults in this file but will work better if you tailor it to each system on which it is installed. The configuration file is extensively commented; just edit it and follow the instructions it contains. Also refer to the smartd.conf man page. Basically, you need to list the drives you want it to monitor including when to run tests and who to notify (via email) of test results. Here is a sample configuration file for a system with one IDE and three SCSI drives:

# Do a short test every day and a long test on sunday
/dev/hda -a -m root@somewhere.net -o on -S on -s (S/../.././02|L/../../7/03)
/dev/sda -d scsi -m root@somewhere.net -s (S/../.././02|L/../../7/03)
/dev/sdb -d scsi -m root@somewhere.net -s (S/../.././02|L/../../7/03)
/dev/sdc -d scsi -m root@somewhere.net -s (S/../.././02|L/../../7/03)

Starting smartd

The daemon portion of the program has to be set to start at boot time.

TSL 2.2 package

chkconfig smartd on   # start the daemon after reboots
service smartd start  # start the daemon right now

Sourceforge package

The distributed init script does not know about Trustix. The package maintainers say that Trustix will be supported in release 5.34; you have to edit one line; change

if [ -f /etc/redhat-release -o -f /etc/yellowdog-release 
-o -f /etc/mandrake-release -o -f /etc/whitebox-release ] ; then

to read

if [ -f /etc/redhat-release -o -f /etc/yellowdog-release 
-o -f /etc/mandrake-release -o -f /etc/whitebox-release
-o -f /etc/trustix-release ] ; then

The command

service smartd start

should now work properly.

Configuration tips

By default smartd only logs its test results. To have smartd send reports via email if it detects problems, you have to edit the /etc/smartd.conf. If you have centralized syslog and/or have carefully configured logwatch to scan your logfiles you don't need to do this. You can do both -- having two programs tell you a disk is about to fail is better than finding out the one monitor system was not working after the drive has failed.

On Linux systems generally you can add options to the DEVICESCAN line; that way you don't need a custom file for each server. DEVICESCAN will find your installed devices. For example, chamge

DEVICESCAN 

to

DEVICESCAN -m root@yourhost.com

Read the smartd man page to learn what other options are available.


May you be in heaven an hour before the devil knows your hard drives are dying. -- old Irish proverb

Personal tools