[ Next Article | Previous Article | Book Contents | Library Home | Legal | Search ]
Problem Solving Guide and Reference

Periodic Diagnostics and Automatic Error Log Analysis

Periodic Diagnostics and Automatic Error Log Analysis are provided by the diagnostics.

Periodic Diagnostics

Periodic diagnosis of the disk drives and battery are enabled by default. The disk diagnostics will perform disk error log analysis on all disks. The battery diagnostics will test the real time clock and NV-RAM battery.

Periodic Diagnostics are performed in different ways depending on the diagnostic version.

AIX Version 4

Periodic diagnostics in AIX Version 4 is controlled by the Periodic Diagnostic Service Aid. The Periodic Diagnostic Service Aid allows error log analysis to be run on hardware resources once a day. By default, the battery and all disk drives are enabled to run. The battery diagnostic is run at 4:00 a.m. each day, and error log analysis is performed on all the disk drives at 3:00 a.m. each day. Other devices can be added to the Periodic Diagnostic Device list, and error log analysis can be directed to run at different times.

Problems are reported by a message to the system console, and a mail message is sent to all members of the system group. The message contains the SRN.

Running diagnostics in this mode for base system devices is similar to using the diag -c -d device command. All other devices are invoked with the -e flag appended.

Automatic Error Log Analysis (diagela)

Automatic Error Log Analysis (diagela) provides the capability to do error log analysis whenever a permanent hardware error is logged. If the diagela program is enabled and a permanent hardware resource error is logged, the diagela program is started. Automatic Error Log Analysis is enabled by default on all platforms.

The diagela program determines whether the error should be analyzed by the diagnostics. If the error should be analyzed, a diagnostic application will be invoked and the error will be analyzed. No testing is done. If the diagnostics determines that the error requires a service action, it sends a message to your console and to all system groups. The message contains the SRN, or a corrective action.

Running diagnostics in this mode is similar to using the diag -c -e -d device command.

Notification can also be customized by adding a stanza to the PDiagAtt object class. The following example illustrates how a customer's program can be invoked in place of the normal mail message:

PDiagAtt:

DClass = ""
DSClass = ""
DType = ""
attribute = "diag_notify"
value = "/usr/bin/customer_notify_program $1 $2 $3 $4 $5"
rep = "s"

If DClass, DSClass, and DType are blank, then the customer_notify_program will apply for ALL devices. Filling in the DClass, DSClass, and DType with specifics will cause the customer_notify_program to be invoked only for that device type.

Once the above stanza is added to the ODM data base, problems will be displayed on the system console and the program specified in the value field of the diag_notify predefined attribute will be invoked. The following keywords will be expanded automatically as arguments to the notify program:

$1 the keyword "diag_notify"
$2 the resource name that reported the problem
$3 the Service Request Number
$4 the device type
$5 the error label from the error log entry

In the case where no diagnostic program is found to analyze the error log entry, or analysis is done but no error was reported, a separate program can be specified to be invoked. This is accomplished by adding a stanza to the PDiagAtt object class with an attribute = "diag_analyze". The following example illustrates how a customer's program can be invoked for this condition:

PDiagAtt:
DClass = ""
DSClass = ""
DType = ""
attribute = "diag_analyze"
value = "/usr/bin/customer_analyzer_program $1 $2 $3 $4 $5"
rep = "s"

If DClass, DSClass, and DType are blank, then the customer_analyzer_program will apply for ALL devices. Filling in the DClass, DSClass, and DType with specifics will cause the customer_analyzer_program to be invoked only for that device type.

Once the above stanza is added to the ODM data base, the program specified will be invoked if there is no diagnostic program specified for the error, or if analysis was done, but no error found. The following keywords will be expanded automatically as arguments to the analyzer program:


 
Keyword Definition
$1 the keyword "diag_analyze"
$2 The resource name that reported the problem.
$3 Can be one of the following:
  • The error label from the error log entry if invoked for ELA
  • The keyword "PERIODIC" if invoked for Periodic Diagnostics.
  • The keyword "REMINDER" if invoked for providing a Diagnostic Reminder.
$4 the device type
$5 Can be either of the following keywords:
  • "no_trouble_found" if analyzer was run and no trouble was found
  • "no_analyzer" if analyzer was not available

 

To activate the Automatic Error Log Analysis feature, log in as root and type the following command:

/usr/lpp/diagnostics/bin/diagela ENABLE

To disable the Automatic Error Log Analysis feature, log in as root and type the following command:

/usr/lpp/diagnostics/bin/diagela DISABLE

The diagela program can also be enabled and disabled using the Periodic Diagnostic Service Aid.


[ Next Article | Previous Article | Book Contents | Library Home | Legal | Search ]