[ Next Article |
Previous Article |
Book Contents |
Library Home |
Legal |
Search ]
General Programming Concepts: Writing and Debugging Programs
Manipulating Strings with sed
The sed program performs its editing without interacting with the person requesting the editing. This method of operation allows sed to do the following:
- Edit very large files
- Perform complex editing operations many times without requiring extensive retyping and cursor positioning (as interactive editors do)
- Perform global changes in one pass through the input.
The editor keeps only a few lines of the file being edited in memory at one time, and does not use temporary files. Therefore, the file to be edited can be any size as long as there is room for both the input file and the output file in the file system.
Starting the Editor
To use the editor, create a command file containing the editing commands to perform on the input file. The editing commands perform complex operations and require a small amount of typing in the command file. Each command in the command file must be on a separate line. Once the command file is created, enter the following command on the command line:
sed -fCommandFile >Output <Input
In this command the parameters mean the following:
CommandFile |
The name of the file containing editing commands. |
Output |
The name of the file to contain the edited output. |
Input |
The name of the file, or files, to be edited. |
The sed program then makes the changes and writes the changed information to the output file. The contents of the input file are not changed.
How sed Works
The sed program is a stream editor that receives its input from standard input, changes that input as directed by commands in a command file, and writes the resulting stream to standard output. If you do not provide a command file and do not use any flags with the sed command, the sed program copies standard input to standard output without change. Input to the program comes from two sources:
Input stream |
A stream of ASCII characters either from one or more files or entered directly from the keyboard. This stream is the data to be edited. |
Commands |
A set of addresses and associated commands to be performed, in the following general form:
[Line1 [,Line2] ] command [argument]
The parameters Line1 and Line2 are called addresses. Addresses can be either patterns to match in the input stream, or line numbers in the input stream. |
You can also enter editing commands along with the sed command by using the -e flag.
When sed edits, it reads the input stream one line at a time into an area in memory called the pattern space. When a line of data is in the pattern space, sed reads the command file and tries to match the addresses in the command file with characters in the pattern space. If it finds an address that matches something in the pattern space, sed then performs the command associated with that address on the part of the pattern space that matched the address. The result of that command changes the contents of the pattern space, and thus becomes the input for all following commands.
When sed has tried to match all addresses in the command file with the contents of the pattern space, it writes the final contents of the pattern space to standard output. Then it reads a new input line from standard input and starts the process over at the start of the command file.
Some editing commands change the way the process operates.
Flags used with the sed command can also change the operation of the command.
Using Regular Expressions
A regular expression is a string that contains literal characters, pattern-matching characters and/or operators that define a set of one or more possible strings. The stream editor uses a set of pattern-matching characters that is different from the shell pattern-matching characters, but the same as the line editor, ed.
Using the sed Command Summary
All sed commands are single letters plus some parameters, such as line numbers or text strings. The commands summarized below make changes to the lines in the pattern space.
The following symbols are used in the syntax diagrams:
Symbol |
Meaning |
[ ] |
Square brackets enclose optional parts of the commands |
italics |
Parameters in italics represent general names for a name that you enter. For example, FileName represents a parameter that you replace with the name of an actual file. |
Line1 |
This symbol is a line number or regular expression to match that defines the starting point for applying the editing command. |
Line2 |
This symbol is a line number or regular expression to match that defines the ending point to stop applying the editing command. |
Line Manipulation
Function |
Syntax/Description |
append lines |
[Line1]a\\nText
Writes the lines contained in Text to the output stream after Line1. The a command must appear at the end of a line. |
change lines |
[Line1 [,Line2] ]c\\nText
Deletes the lines specified by Line1 and Line2 as the delete lines command does. Then it writes Text to the output stream in place of the deleted lines. |
delete lines |
[Line1 [,Line2] ]d
Removes lines from the input stream and does not copy them to the output stream. The lines not copied begin at line number Line1. The next line copied to the output stream is line number Line2 + 1. If you specify only one line number, then only that line is not copied. If you do not specify a line number, the next line is not copied. You cannot perform any other functions on lines that are not copied to the output. |
insert lines |
[Line1] i \\nText
Writes the lines contained in Text to the output stream before Line1. The i command must appear at the end of a line. |
next line |
[Line1 [,Line2] ]n
Reads the next line, or group of lines from Line1 to Line2 into the pattern space. The current contents of the pattern space are written to the output if it has not been deleted. |
Substitution
Function |
Syntax/Description |
substitution for pattern |
[Line1 [,Line2] ] s/Pattern/String/Flags
Searches the indicated line(s) for a set of characters that matches the regular expression defined in Pattern. When it finds a match, the command replaces that set of characters with the set of characters specified by String. |
Input and Output
Function |
Syntax/Description |
print lines |
[Line1 [,Line2] ] p
Writes the indicated lines to STDOUT at the point in the editing process that the p command occurs. |
write lines |
[Line1 [,Line2] ]w FileName
Writes the indicated lines to a FileName at the point in the editing process that the w command occurs.
If FileName exists, it is overwritten; otherwise, it is created. A maximum of 10 different files can be mentioned as input or output files in the entire editing process. Include exactly one space between w and FileName. |
read file |
[Line1]r FileName
Reads FileName and appends the contents after the line indicated by Line1.
Include exactly one space between r and FileName. If FileName cannot be opened, the command reads it as a null file without giving any indication of an error. |
Matching Across Lines
Function |
Syntax/Description |
join next line |
[Line1 [,Line2] ]N
Joins the indicated input lines together, separating them by an embedded new-line character. Pattern matches can extend across the embedded new-lines(s). |
delete first line of pattern space |
[Line1 [,Line2] ]D
Deletes all text in the pattern space up to and including the first new-line character. If only one line is in the pattern space, it reads another line. Starts the list of editing commands again from the beginning. |
print first line of pattern space |
[Line1 [,Line2] ]P
Prints all text in the pattern space up to and including the first new-line character to STDOUT. |
Pick up and Put down
Function |
Syntax/Description |
pick up copy |
[Line1 [,Line2] ]h
Copies the contents of the pattern space indicated by Line1 and Line2 if present, to the holding area. |
pick up copy, appended |
[Line1 [,Line2] ]H
Copies the contents of the pattern space indicated by Line1 and Line2 if present, to the holding area, and appends it to the end of the previous contents of the holding area. |
put down copy |
[Line1 [,Line2] ]g
Copies the contents of the holding area to the pattern space indicated by Line1 and Line2 if present. The previous contents of the pattern space are destroyed. |
put down copy, appended |
[Line1 [,Line2] ]G
Copies the contents of the holding area to the end of the pattern space indicated by Line1 and Line2 if present. The previous contents of the pattern space are not changed. A new-line character separates the previous contents from the appended text. |
exchange copies |
[Line1 [,Line2] ]x
Exchanges the contents of the holding area with the contents of the pattern space indicated by Line1 and Line2 if present. |
Control
Function |
Syntax/Description |
negation |
[Line1 [,Line2] ]!
The ! (exclamation point) applies the command that follows it on the same line to the parts of the input file that are not selected by Line1 and Line2. |
command groups |
[Line1 [,Line2] ]{
grouped commands
}
The { (left brace) and the } (right brace) enclose a set of commands to be applied as a set to the input lines selected by Line1 and Line2. The first command in the set can be on the same line or on the line following the left brace. The right brace must be on a line by itself. You can nest groups within groups. |
labels |
:Label
Marks a place in the stream of editing command to be used as a destination of each branch. The symbol Label is a string of up to 8 bytes. Each Label in the editing stream must be different from any other Label. |
branch to label, unconditional |
[Line1 [,Line2] ]xLabel
Branches to the point in the editing stream indicated by Label and continues processing the current input line with the commands following Label. If Label is null, branches to the end of the editing stream, which results in reading a new input line and starting the editing stream over. The string Label must appear as a Label in the editing stream. |
test and branch |
[Line1 [,Line2] ]tLabel
If any successful substitutions were made on the current input line, branches to Label. If no substitutions were made, does nothing. Clears the flag that indicates a substitution was made. This flag is cleared at the start of each new input line. |
wait |
[Line1 ]q
Stops editing in an orderly fashion by writing the current line to the output, writing any appended or read test to the output, and stopping the editor. |
find line number |
[Line1 ]=
Writes to standard output the line number of the line that matches Line1. |
Using Text in Commands
The append, insert and change lines commands all use a supplied text string to add to the output stream. This text string conforms to the following rules:
- Can be one or more lines long.
- Each \n (new-line character) inside Text must have an additional \ character before it (\\n).
- The Text string ends with a new-line that does not have an additional \ character before it (\n).
- Once the command inserts the Text string, the string:
- Is always written to the output stream, regardless of what other commands do to the line that caused it to be inserted.
- Is not scanned for address matches.
- Is not affected by other editing commands.
- Does not affect the line number counter.
Using String Replacement
The s command performs string replacement in the indicated lines in the input file. If the command finds a set of characters in the input file that satisfies the regular expression Pattern, it replaces the set of characters with the set of characters specified in String.
The String parameter is a literal set of characters (digits, letters and symbols). Two special symbols can be used in String:
Symbol |
Use |
& |
This symbol in String is replaced by the set of characters in the input lines that matched Pattern. For example, the command: |
s/boy/&s/
tells sed to find a pattern boy in the input line, and copy that pattern to the output with an appended s. Therefore, it changes the input line:
From: |
The boy look at the game. |
To: |
The boys look at the game. |
Symbol |
Use |
\d |
d is a single digit. This symbol in String is replaced by the set of characters in the input lines that matches the dth substring in Pattern. Substrings begin with the characters \( and end with the characters\ ). For example, the command:
s/\(stu\)\(dy\)/\1r\2/
From: |
The study chair |
To: |
The sturdy chair |
|
The letters that appear as flags change the replacement as follows:
Symbol |
Use |
g |
Substitutes String for all instances of Pattern in the indicated line(s). Characters in String are not scanned for a match of Pattern after they are inserted. For example, the command:
s/r/R/g
changes:
From: |
the red round rock |
To: |
the Red Round Rock |
|
p |
Prints (to STDOUT) the line that contains a successfully matched Pattern. |
w FileName |
Writes to FileName the line that contains a successfully matched Pattern. if FileName exists, it is overwritten; otherwise, it is created. A maximum of 10 different files can be mentioned as input or output files in the entire editing process. Include exactly one space between w and FileName. |
[ Next Article |
Previous Article |
Book Contents |
Library Home |
Legal |
Search ]