[ Next Article | Previous Article | Book Contents | Library Home | Legal | Search ]
Kernel Extensions and Device Support Programming Concepts

FCP Error Recovery

If the device is in initiator mode, the error-recovery process varies depending on whether or not the device is supporting command queuing. Also some devices may support NACA=1 error recovery. Thus FCP error recovery needs to deal with the two following concepts.

autosense data

When an FCP device returns a check condition or command terminated (the scsi_buf.scsi_status will have the value of SC_CHECK_CONDITION or SC_COMMAND_TERMINATED, respectively), it will also return the request sense data.

NOTE: Subsequent commands to the FCP device will clear the request sense data.

If the FCP device driver has specified a valid autosense buffer (scsi_buf.autosense_length > 0 and the scsi_buf.autosense_buffer_ptr field is not NULL), then the FCP adapter device driver will copy the returned autosense data into the buffer referenced by scsi_buf.autosense_buffer_ptr. When this occurs the FCP adapter device driver will set the SC_AUTOSENSE_DATA_VALID flag in the scsi_buf.adap_set_flags.

When the FCP device driver receives the SCSI status of check condition or command terminated (the scsi_buf.scsi_status will have the value of SC_CHECK_CONDITION or SC_COMMAND_TERMINATED, respectively), it should then determine if the SC_AUTOSENSE_DATA_VALID flag is set in the scsi_buf.adap_set_flags. If so then it should process the autosense data and not send a SCSI request sense command.

NACA=1 error recovery

Some FCP devices support setting the NACA (Normal Auto Contingent Allegiance) bit to a value of one (NACA=1) in the control byte of the SCSI command . If an FCP device returns a check condition or command terminated (the scsi_buf.scsi_status will have the value of SC_CHECK_CONDITION or SC_COMMAND_TERMINATED, respectively) for a command with NACA=1 set, then the FCP device will require a Clear ACA task management request to clear the error condition on the drive. The FCP device driver can issue a Clear ACA task management request by sending a transaction with the SC_CLEAR_ACA flag in the sc_buf.flags field. The SC_CLEAR_ACA flag can be used in conjunction with the SC_Q_CLR and SC_Q_RESUME flag in the sc_buf.flags to clear or resume the queue of transactions for this device, respectively. (see FCP Initiator-Mode Recovery During Command Tag Queuing).

FCP Initiator-Mode Recovery When Not Command Tag Queuing

If an error such as a check condition or hardware failure occurs, the transaction active during the error is returned with the scsi_buf.bufstruct.b_error field set to EIO. Other transactions in the queue may be returned with the scsi_buf.bufstruct.b_error field set to ENXIO. If the FCP adapter driver decides not return other outstanding commands it has queued to it, then the failed transaction will be returned to the FCP device driver with an indication that the queue for this device is not cleared by setting the SC_DID_NOT_CLEAR_Q flag in the scsi_buf.adap_q_status field. The FCP device driver should process or recover the condition, rerunning any mode selects or device reservations to recover from this condition properly. After this recovery, it should reschedule the transaction that had the error. In many cases, the FCP device driver only needs to retry the unsuccessful operation.

The FCP adapter device driver should never retry a SCSI command on error after the command has successfully been given to the adapter. The consequences for retrying a FCP command at this point range from minimal to catastrophic, depending upon the type of device. Commands for certain devices cannot be retried immediately after a failure (for example, tapes and other sequential access devices). If such an error occurs, the failed command returns an appropriate error status with an iodone call to the FCP device driver for error recovery. Only the FCP device driver that originally issued the command knows if the command can be retried on the device. The FCP adapter device driver must only retry commands that were never successfully transferred to the adapter. In this case, if retries are successful, the scsi_buf status should not reflect an error. However, the FCP adapter device driver should perform error logging on the retried condition.

The first transaction passed to the FCP adapter device driver during error recovery must include a special flag. This SC_RESUME flag in the scsi_buf.flags field must be set to inform the FCP adapter device driver that the FCP device driver has recognized the fatal error and is beginning recovery operations. Any transactions passed to the FCP adapter device driver, after the fatal error occurs and before the SC_RESUME transaction is issued, should be flushed; that is, returned with an error type of ENXIO through an iodone call.

Note: If a FCP device driver continues to pass transactions to the FCP adapter device driver after the FCP adapter device driver has flushed the queue, these transactions are also flushed with an error return of ENXIO through the iodone service. This gives the FCP device driver a positive indication of all transactions flushed.

FCP Initiator-Mode Recovery During Command Tag Queuing

If the FCP device driver is queuing multiple transactions to the device and either a check condition error or a command terminated error occurs, the FCP adapter driver does not clear all transactions in its queues for the device. It returns the failed transaction to the FCP device driver with an indication that the queue for this device is not cleared by setting the SC_DID_NOT_CLEAR_Q flag in the scsi_buf.adap_q_status field. The FCP adapter driver halts the queue for this device awaiting error recovery notification from the FCP device driver. The FCP device driver then has three options to recover from this error:

When the FCP adapter driver's queue is halted, the FCP device drive can get sense data from a device by setting the SC_RESUME flag in the scsi_buf.flags field and the SC_NO_Q flag in scsi_buf.q_tag_msg field of the request-sense scsi_buf. This action notifies the FCP adapter driver that this is an error-recovery transaction and should be sent to the device while the remainder of the queue for the device remains halted. When the request sense completes, the FCP device driver needs to either clear or resume the FCP adapter driver's queue for this device.

The FCP device driver can notify the FCP adapter driver to clear its halted queue by sending a transaction with the SC_Q_CLR flag in the scsi_buf.flags field. This transaction must not contain a FCP command because it is cleared from the FCP adapter driver's queue without being sent to the adapter. However, this transaction must have the SCSI ID field (scsi_buf.scsi_id) and the LUN field (scsi_buf.lun_id) filled in with the device's SCSI ID and logical unit number (LUN), respectively. Upon receiving an SC_Q_CLR transaction, the FCP adapter driver flushes all transactions for this device and sets their scsi_buf.bufstruct.b_error fields to ENXIO. The FCP device driver must wait until the scsi_buf with the SC_Q_CLR flag set is returned before it resumes issuing transactions. The first transaction sent by the FCP device driver after it receives the returned SC_Q_CLR transaction must have the SC_RESUME flag set in the scsi_buf.flags fields.

If the FCP device driver wants the FCP adapter driver to resume its halted queue, it must send a transaction with the SC_Q_RESUME flag set in the scsi_buf.flags field. This transaction can contain an actual FCP command, but it is not required. However, this transaction must have the SCSI ID field (scsi_buf.scsi_id) and the LUN field (scsi_buf.lun_id) filled in with the device's SCSI ID and logical unit number (LUN). If this is the first transaction issued by the FCP device driver after receiving the error (indicating that the adapter driver's queue is halted),then the SC_RESUME flag must be set as well as the SC_Q_RESUME flag.

Analyzing Returned Status

The following order of precedence should be followed by FCP device drivers when analyzing the returned status:

  1. If the scsi_buf.bufstruct.b_flags field has the B_ERROR flag set, then an error has occurred and the scsi_buf.bufstruct.b_error field contains a valid errno value.

    If the b_error field contains the ENXIO value, either the command needs to be restarted or it was canceled at the request of the FCP device driver.

    If the b_error field contains the EIO value, then either one or no flag is set in the scsi_buf.status_validity field. If a flag is set, an error in either the scsi_status or adapter_status field is the cause.

    If the status_validity field is 0, then the scsi_buf.bufstruct.b_resid field should be examined to see if the FCP command issued was in error. The b_resid field can have a value without an error having occurred. To decide whether an error has occurred, the FCP device driver must evaluate this field with regard to the FCP command being sent and the FCP device being driven.

    If the SC_CHECK_CONDITION or SC_COMMAND_TERMINATED is set in scsi_status, then a FCP device driver must analyze the value of sc_buf.scsi_fields.adap_set_flags (i.e. sc_buf.scsi_fields must point to a valid scsi3_fields structure) to determine if autosense data was returned from the FCP device.

    If the SC_AUTOSENSE_DATA_VALID flag is set in the sc_buf.scsi_fields.adap_set_flags field for a FCP device, then the FCP device returned autosense data in the buffer referenced by sc_buf.scsi_fields.autosense_buffer_ptr. In this situation the FCP device driver does not need to issue a SCSI request sense to determine the appropriate error recovery for the FCP devices.

    If the FCP device driver is queuing multiple transactions to the device and if either SC_CHECK_CONDITION or SC_COMMAND_TERMINATED is set in scsi_status, then the value of scsi_buf.adap_q_status must be analyzed to determine if the adapter driver has cleared its queue for this device. If the FCP adapter driver has not cleared its queue after an error, then it holds that queue in a halted state.

    If scsi_buf.adap_q_status is set to 0, the FCP adapter driver has cleared its queue for this device and any transactions outstanding are flushed back to the FCP device driver with an error of ENXIO.

    If the SC_DID_NOT_CLEAR_Q flag is set in the scsi_buf.adap_q_status field, the adapter driver has not cleared its queue for this device. When this condition occurs, the FCP adapter driver allows the FCP device driver to send one error recovery transaction (request sense) that has the field scsi_buf.q_tag_msg set to SC_NO_Q and the field scsi_buf.flags set to SC_RESUME. The FCP device driver can then notify the FCP adapter driver to clear or resume its queue for the device by sending a SC_Q CLR or SC_Q_RESUME transaction.

    If the FCP device driver does not queue multiple transactions to the device (that is, the SC_NO_Q is set in scsi_buf.q_tag_msg ), then the FCP adapter clears its queue on error and sets scsi_buf.adap_q_status to 0.

  2. If the scsi_buf.bufstruct.b_flags field does not have the B_ERROR flag set, then no error is being reported. However, the FCP device driver should examine the b_resid field to check for cases where less data was transferred than expected. For some FCP commands, this occurrence may not represent an error. The FCP device driver must determine if an error has occurred.

    If a nonzero b_resid field does represent an error condition, then the device queue is not halted by the FCP adapter device driver. It is possible for one or more succeeding queued commands to be sent to the adapter (and possibly the device). Recovering from this situation is the responsibility of the FCP device driver.

  3. In any of the above cases, if scsi_buf.bufstruct.b_flags field has the B_ERROR flag set, then the queue of the device in question has been halted. The first scsi_buf structure sent to recover the error (or continue operations) must have the SC_RESUME bit set in the scsi_buf.flags field.

Related Information

FCP Subsystem Overview

Understanding the scsi_buf Structure

Understanding the Execution of Initiator I/O Requests


[ Next Article | Previous Article | Book Contents | Library Home | Legal | Search ]