[ Next Article | Previous Article | Book Contents | Library Home | Legal | Search ]
System Management Concepts: Operating System and Devices

Understanding Fragments and a Variable Number of I-Nodes

The journaled file system (JFS) fragment support allows disk space to be divided into allocation units that are smaller than the default size of 4096 bytes. Smaller allocation units or "fragments" minimize wasted disk space by more efficiently storing the data in a file or directory's partial logical blocks. The functional behavior of JFS fragment support is based on that provided by Berkeley Software Distribution (BSD) fragment support. Similar to BSD, JFS fragment support allows users to specify the number of i-nodes that a file system has.

Disk Utilization

Many UNIX file systems only allocate contiguous disk space in units equal in size to the logical blocks used for the logical division of files and directories. These allocation units are typically referred to as "disk blocks" and a single disk block is used exclusively to store the data contained within a single logical block of a file or directory.

Using a relatively large logical block size (4096 bytes for example) and maintaining disk block allocations that are equal in size to the logical block are advantageous for reducing the number of disk I/O operations that must be performed by a single file system operation, since a file or directory's data is stored on disk in a small number of large disk blocks rather than in a large number of small disk blocks. For example, a file with a size of 4096 bytes or less would be allocated a single 4096-byte disk block if the logical block size is 4096 bytes. A read or write operation would therefore only have to perform a single disk I/O operation to access the data on the disk. If the logical block size were smaller requiring more than one allocation for the same amount of data, then more than one disk I/O operation may be required to access the data. A large logical block and equal disk block size are also advantageous for reducing the amount of disk space allocation activity that must be performed as new data is added to files and directories, since large disk blocks hold more data.

Restricting the disk space allocation unit to the logical block size can, however, lead to wasted disk space in a file system containing numerous files and directories of a small size. Wasted disk space occurs when a logical block's worth of disk space is allocated to a partial logical block of a file or directory. Since partial logical blocks always contain less than a logical block's worth of data, a partial logical block will only consume a portion of the disk space allocated to it. The remaining portion remains unused since no other file or directory can write its contents to disk space that has already been allocated. The total amount of wasted disk space can be large for file systems containing a large number of small files and directories. A file system using 4096-byte allocation units may experience up to 45% wasted disk space. (Statistic taken from UNIX System Manager's Manual, Computer Systems Research Group, University of California at Berkeley, The Regents of the University of California and/or Bell Telephone Laboratories, 1988, SMM 14.)

Optimizing Disk Utilization

In the JFS, however, the disk space allocation unit, referred to as a fragment, can be smaller than the logical block size of 4096 bytes. With the use of fragments smaller than 4096 bytes, the data contained within a partial logical block can be stored more efficiently by using only as many fragments as are required to hold the data. For example, a partial logical block that only has 500 bytes could be allocated a fragment of 512 bytes (assuming a fragment size of 512 bytes), thus greatly reducing the amount of wasted disk space. If the storage requirements of a partial logical block increase, one or more additional fragments will be allocated.

Fragments

The fragment size for a file system is specified during its creation. The allowable fragment sizes for journaled file systems (JFS) are 512, 1024, 2048, and 4096 bytes. For consistency with previous verions of AIX Version 3, the default fragment size is 4096 bytes. Different file systems can have different fragment sizes, but only one fragment size can be used within a single file system. Different fragment sizes can also coexist on a single system (machine) so that users can select a fragment size most appropriate for each file system.

JFS fragment support provides a view of the file system as a contiguous series of fragments rather than as a contiguous series of disk blocks. To maintain the efficiency of disk operations, however, disk space is often allocated in units of 4096 bytes so that the disk blocks or allocation units remain equal in size to the logical blocks. A disk-block allocation in this case can be viewed as an allocation of 4096 bytes of contiguous fragments.

Both operational overhead (additional disk seeks, data transfers, and allocation activity) and better utilization of disk space increase as the fragment size for a file system decreases. To maintain the optimum balance between increased overhead and increased usable disk space, the following factors apply to JFS fragment support:

Maintaining 4096-byte disk space allocations allows disk operations to be more efficient as described previously in "Disk Utilization."

As the files and directories within a file system grow beyond 32KB in size, the benefit of maintaining disk space allocations of less than 4096 bytes for partial logical blocks diminishes. The disk space savings as a percentage of total file system space grows small while the extra performance cost of maintaining small disk space allocations remains constant. Since disk space allocations of less than 4096 bytes provide the most effective disk space utilization when used with small files and directories, the logical blocks of files and directories equal to or greater than 32KB are always allocated 4096 bytes of fragments. Any partial logical block associated with such a large file or directory is also allocated 4096 bytes of fragments.

Variable Number of I-Nodes

Since fragment support optimizes disk space utilization, it increases the number of small files and directories that can be stored within a file system. However, disk space is only one of the file system resources required by files and directories: each file or directory also requires a disk i-node. The JFS allows the number of disk i-nodes created within a file system to be specified in case more or fewer than the default number of disk i-nodes is desired. The number of disk i-nodes can be specified at file system creation as the number of bytes per i-node (NBPI). For example, an NBPI value of 1024 causes a disk i-node to be created for every 1024 bytes of file system disk space. Another way to look at this is that a small NBPI value (512 for instance) results in a large number of i-nodes, while a large NBPI value (such as 16,384) results in a small number of i-nodes.

The set of allowable NBPI values vary according to the allocation group size (agsize). The default is 8MB. In AIX Version 4.1, agsize is fixed at 8MB. The allowable NBPI values are 512, 1024, 2048, 4096, 8192, and 16,384 with an agize of 8MB.

In AIX Version 4.2 or later, a larger agsize may be used. The allowable values for agsize are 8, 16, 32, and 64. The range of allowable NBPI values scales up as agsize increases. If the agsize is doubled to 16MB, the range of NBPI values also double: 1024, 2048, 4096, 8193, 16384, and 32768.

For consistency with previous versions of AIX Version 3, the default NBPI value is 4096, and the default agsize is 8. The desired NBPI and agsize values are specified during file system creation. If the file system size is increased, the NBPI and agsize values remain set to the values specified during the file system's creation.

Specifying Fragment Size and NBPI

Fragment size and the number-of-bytes-per-i-node (NBPI) value are specified during the file system's creation with the crfs and mkfs commands or by using the System Management Interface Tool (SMIT). The decision of fragment size and how many i-nodes to create for the file system should be based on the projected number of files contained by the file system and their size.

Identifying Fragment Size and NBPI

The file-system-fragment size and the number-of-bytes-per-i-node (NBPI) value can be identified through the lsfs command or the System Management Interface Tool (SMIT). For application programs, the statfs subroutine can be used to identify the file system fragment size.

Compatibility and Migration

Previous versions of AIX are compatible with the current JFS, although file systems with a nondefault fragment size, NBPI value, or allocation group size may require special attention if migrated to a previous version.

File System Images

The JFS fully supports JFS file system images created under previous versions of AIX. These file system images and any JFS file system image created with the default fragment size and NBPI value of 4096 bytes, and default allocation group size (agsize) of 8 can be interchanged with the current and previous versions of AIX without requiring any special migration activities.

JFS file system images created with a fragment size or NBPI value or agsize other than the default values may be incompatible with previous versions of AIX. Specifically, only file system images less than or equal to 2G in size and created with the default parameters can be interchanged amongst AIX Versions 3.2, 4.1 and 4.2. File system images created with fragment size of either 512, 1024, 2048, or 4096, and an NBPI value of either 512, 1024, 2048, 4096, 8192, or 16384, and a agsize of 8M can ge interchanges amongst AIX Version 4.1 and AIX Version 4.2. Finally, creating a file system with NBPI value greater than 16384 or with an agsize greater than 8M will result in a JFS file system that is only recognized on AIX Version 4.2.

The following procedure must be used to migrate incompatible file systems from one version of AIX to another:

  1. Backup the file system by file name on the source system.
  2. Create a file system on the destination system.
  3. Restore the backed-up files on the destination system.

Backup/Restore

Although backup and restore sequences can be performed between file systems with different fragment sizes and NBPI values, due to increased disk utilization and a large number of i-nodes, restore operations may fail due to a lack of free fragments or disk i-nodes if the fragment size or NBPI value of the source file system is smaller than the fragment size or NBPI value of the target file system. This is of particular interest for full file system backup and restore sequences and may even occur when the total file system size of the target file system is larger than that of the source file system.

Device Driver Limitations

A device driver must provide disk block addressability that is the same as the file system fragment size. For example, if a JFS file system was made on a user supplied RAM disk device driver, the driver must allow 512 byte blocks to contain a file system that had 512 byte fragments. If the driver only allowed page level addressability, a JFS with a fragment size of 4096 bytes could only be used.

Note: Any valid NBPI value can be specified for any device.

Performance Costs

Although file systems that use fragments smaller than 4096 bytes as their allocation unit may require substantially less disk space than those using the default allocation unit of 4096 bytes, the use of smaller fragments may incur performance costs.

Increased Allocation Activity

Since disk space is allocated in smaller units for a file system with a fragment size other than 4096 bytes, allocation activity may occur more often when files or directories are repeatedly extended in size. For example, a write operation that extends the size of a zero-length file by 512 bytes results in the allocation of one fragment to the file, assuming a fragment size of 512 bytes. If the file size is extended further by another write of 512 bytes, an additional fragment must be allocated to the file. Applying this example to a file system with 4096-byte fragments, disk space allocation would occur only once, as part of the first write operation. No additional allocation activity must be performed as part of the second write operation since the initial 4096-byte fragment allocation is large enough to hold the data added by the second write operation.

Allocation activity adds performance overhead to file system operations. However, allocation activity can be minimized for file systems with fragment sizes smaller than 4096 bytes if files are extended by 4096 bytes at a time when possible.

Free Space Fragmentation

Using fragments smaller than 4096 bytes may cause greater fragmentation of the disk's free space. For example, consider an area of the disk that is divided into eight fragments of 512 bytes each. Suppose that different files, requiring 512 bytes each, have written to the first, fourth, fifth, and seventh fragments in this area of the disk, leaving the second, third, sixth, and eighth fragments free. Although four fragments representing 2048 bytes of disk space are free, no partial logical block requiring four fragments (or 2048 bytes) will be allocated for these free fragments, since the fragments in a single allocation must be contiguous.

Since the fragments allocated for a file or directory's logical blocks must be contiguous, free space fragmentation may cause a file system operation that requests new disk space to fail even though the total amount of available free space is large enough to satisfy the operation. For example, a write operation that extends a zero-length file by one logical block requires 4096 bytes of contiguous disk space to be allocated. If the file system free space is fragmented and consists of 32 noncontiguous 512-byte fragments or a total of 16KB of free disk space, the write operation will fail, since eight contiguous fragments (or 4096 bytes of contiguous disk space) are not available to satisfy the write operation.

A file system with an unmanageable amount of fragmented free space can be defragmented with the defragfs command. The execution of defrags has an impact on performance.

Increased Fragment Allocation Map Size

More virtual memory and file system disk space may be required to hold fragment allocation maps for file systems with a fragment size smaller than 4096 bytes. Fragments serve as the basic unit of disk space allocation, and the allocation state of each fragment within a file system is recorded in the file system's fragment allocation map.


[ Next Article | Previous Article | Book Contents | Library Home | Legal | Search ]