1 | This is ../../doc/sed.info, produced by makeinfo version 4.5 from
|
---|
2 | ../../doc/sed.texi.
|
---|
3 |
|
---|
4 | INFO-DIR-SECTION Text creation and manipulation
|
---|
5 | START-INFO-DIR-ENTRY
|
---|
6 | * sed: (sed). Stream EDitor.
|
---|
7 |
|
---|
8 | END-INFO-DIR-ENTRY
|
---|
9 |
|
---|
10 | This file documents version 4.1.5 of GNU `sed', a stream editor.
|
---|
11 |
|
---|
12 | Copyright (C) 1998, 1999, 2001, 2002, 2003, 2004 Free Software
|
---|
13 | Foundation, Inc.
|
---|
14 |
|
---|
15 | This document is released under the terms of the GNU Free
|
---|
16 | Documentation License as published by the Free Software Foundation;
|
---|
17 | either version 1.1, or (at your option) any later version.
|
---|
18 |
|
---|
19 | You should have received a copy of the GNU Free Documentation
|
---|
20 | License along with GNU `sed'; see the file `COPYING.DOC'. If not,
|
---|
21 | write to the Free Software Foundation, 59 Temple Place - Suite 330,
|
---|
22 | Boston, MA 02110-1301, USA.
|
---|
23 |
|
---|
24 | There are no Cover Texts and no Invariant Sections; this text, along
|
---|
25 | with its equivalent in the printed manual, constitutes the Title Page.
|
---|
26 |
|
---|
27 | File: sed.info, Node: Top, Next: Introduction, Up: (dir)
|
---|
28 |
|
---|
29 |
|
---|
30 |
|
---|
31 | This file documents version 4.1.5 of GNU `sed', a stream editor.
|
---|
32 |
|
---|
33 | Copyright (C) 1998, 1999, 2001, 2002, 2003, 2004 Free Software
|
---|
34 | Foundation, Inc.
|
---|
35 |
|
---|
36 | This document is released under the terms of the GNU Free
|
---|
37 | Documentation License as published by the Free Software Foundation;
|
---|
38 | either version 1.1, or (at your option) any later version.
|
---|
39 |
|
---|
40 | You should have received a copy of the GNU Free Documentation
|
---|
41 | License along with GNU `sed'; see the file `COPYING.DOC'. If not,
|
---|
42 | write to the Free Software Foundation, 59 Temple Place - Suite 330,
|
---|
43 | Boston, MA 02110-1301, USA.
|
---|
44 |
|
---|
45 | There are no Cover Texts and no Invariant Sections; this text, along
|
---|
46 | with its equivalent in the printed manual, constitutes the Title Page.
|
---|
47 | * Menu:
|
---|
48 |
|
---|
49 | * Introduction:: Introduction
|
---|
50 | * Invoking sed:: Invocation
|
---|
51 | * sed Programs:: `sed' programs
|
---|
52 | * Examples:: Some sample scripts
|
---|
53 | * Limitations:: Limitations and (non-)limitations of GNU `sed'
|
---|
54 | * Other Resources:: Other resources for learning about `sed'
|
---|
55 | * Reporting Bugs:: Reporting bugs
|
---|
56 |
|
---|
57 | * Extended regexps:: `egrep'-style regular expressions
|
---|
58 |
|
---|
59 | * Concept Index:: A menu with all the topics in this manual.
|
---|
60 | * Command and Option Index:: A menu with all `sed' commands and
|
---|
61 | command-line options.
|
---|
62 |
|
---|
63 | --- The detailed node listing ---
|
---|
64 |
|
---|
65 | sed Programs:
|
---|
66 | * Execution Cycle:: How `sed' works
|
---|
67 | * Addresses:: Selecting lines with `sed'
|
---|
68 | * Regular Expressions:: Overview of regular expression syntax
|
---|
69 | * Common Commands:: Often used commands
|
---|
70 | * The "s" Command:: `sed''s Swiss Army Knife
|
---|
71 | * Other Commands:: Less frequently used commands
|
---|
72 | * Programming Commands:: Commands for `sed' gurus
|
---|
73 | * Extended Commands:: Commands specific of GNU `sed'
|
---|
74 | * Escapes:: Specifying special characters
|
---|
75 |
|
---|
76 | Examples:
|
---|
77 | * Centering lines::
|
---|
78 | * Increment a number::
|
---|
79 | * Rename files to lower case::
|
---|
80 | * Print bash environment::
|
---|
81 | * Reverse chars of lines::
|
---|
82 | * tac:: Reverse lines of files
|
---|
83 | * cat -n:: Numbering lines
|
---|
84 | * cat -b:: Numbering non-blank lines
|
---|
85 | * wc -c:: Counting chars
|
---|
86 | * wc -w:: Counting words
|
---|
87 | * wc -l:: Counting lines
|
---|
88 | * head:: Printing the first lines
|
---|
89 | * tail:: Printing the last lines
|
---|
90 | * uniq:: Make duplicate lines unique
|
---|
91 | * uniq -d:: Print duplicated lines of input
|
---|
92 | * uniq -u:: Remove all duplicated lines
|
---|
93 | * cat -s:: Squeezing blank lines
|
---|
94 |
|
---|
95 |
|
---|
96 | File: sed.info, Node: Introduction, Next: Invoking sed, Prev: Top, Up: Top
|
---|
97 |
|
---|
98 | Introduction
|
---|
99 | ************
|
---|
100 |
|
---|
101 | `sed' is a stream editor. A stream editor is used to perform basic
|
---|
102 | text transformations on an input stream (a file or input from a
|
---|
103 | pipeline). While in some ways similar to an editor which permits
|
---|
104 | scripted edits (such as `ed'), `sed' works by making only one pass over
|
---|
105 | the input(s), and is consequently more efficient. But it is `sed''s
|
---|
106 | ability to filter text in a pipeline which particularly distinguishes
|
---|
107 | it from other types of editors.
|
---|
108 |
|
---|
109 |
|
---|
110 | File: sed.info, Node: Invoking sed, Next: sed Programs, Prev: Introduction, Up: Top
|
---|
111 |
|
---|
112 | Invocation
|
---|
113 | **********
|
---|
114 |
|
---|
115 | Normally `sed' is invoked like this:
|
---|
116 |
|
---|
117 | sed SCRIPT INPUTFILE...
|
---|
118 |
|
---|
119 | The full format for invoking `sed' is:
|
---|
120 |
|
---|
121 | sed OPTIONS... [SCRIPT] [INPUTFILE...]
|
---|
122 |
|
---|
123 | If you do not specify INPUTFILE, or if INPUTFILE is `-', `sed'
|
---|
124 | filters the contents of the standard input. The SCRIPT is actually the
|
---|
125 | first non-option parameter, which `sed' specially considers a script
|
---|
126 | and not an input file if (and only if) none of the other OPTIONS
|
---|
127 | specifies a script to be executed, that is if neither of the `-e' and
|
---|
128 | `-f' options is specified.
|
---|
129 |
|
---|
130 | `sed' may be invoked with the following command-line options:
|
---|
131 |
|
---|
132 | `--version'
|
---|
133 | Print out the version of `sed' that is being run and a copyright
|
---|
134 | notice, then exit.
|
---|
135 |
|
---|
136 | `--help'
|
---|
137 | Print a usage message briefly summarizing these command-line
|
---|
138 | options and the bug-reporting address, then exit.
|
---|
139 |
|
---|
140 | `-n'
|
---|
141 | `--quiet'
|
---|
142 | `--silent'
|
---|
143 | By default, `sed' prints out the pattern space at the end of each
|
---|
144 | cycle through the script. These options disable this automatic
|
---|
145 | printing, and `sed' only produces output when explicitly told to
|
---|
146 | via the `p' command.
|
---|
147 |
|
---|
148 | `-i[SUFFIX]'
|
---|
149 | `--in-place[=SUFFIX]'
|
---|
150 | This option specifies that files are to be edited in-place. GNU
|
---|
151 | `sed' does this by creating a temporary file and sending output to
|
---|
152 | this file rather than to the standard output.(1).
|
---|
153 |
|
---|
154 | This option implies `-s'.
|
---|
155 |
|
---|
156 | When the end of the file is reached, the temporary file is renamed
|
---|
157 | to the output file's original name. The extension, if supplied,
|
---|
158 | is used to modify the name of the old file before renaming the
|
---|
159 | temporary file, thereby making a backup copy(2)).
|
---|
160 |
|
---|
161 | This rule is followed: if the extension doesn't contain a `*',
|
---|
162 | then it is appended to the end of the current filename as a
|
---|
163 | suffix; if the extension does contain one or more `*' characters,
|
---|
164 | then _each_ asterisk is replaced with the current filename. This
|
---|
165 | allows you to add a prefix to the backup file, instead of (or in
|
---|
166 | addition to) a suffix, or even to place backup copies of the
|
---|
167 | original files into another directory (provided the directory
|
---|
168 | already exists).
|
---|
169 |
|
---|
170 | If no extension is supplied, the original file is overwritten
|
---|
171 | without making a backup.
|
---|
172 |
|
---|
173 | `-l N'
|
---|
174 | `--line-length=N'
|
---|
175 | Specify the default line-wrap length for the `l' command. A
|
---|
176 | length of 0 (zero) means to never wrap long lines. If not
|
---|
177 | specified, it is taken to be 70.
|
---|
178 |
|
---|
179 | `--posix'
|
---|
180 | GNU `sed' includes several extensions to POSIX sed. In order to
|
---|
181 | simplify writing portable scripts, this option disables all the
|
---|
182 | extensions that this manual documents, including additional
|
---|
183 | commands. Most of the extensions accept `sed' programs that are
|
---|
184 | outside the syntax mandated by POSIX, but some of them (such as
|
---|
185 | the behavior of the `N' command described in *note Reporting
|
---|
186 | Bugs::) actually violate the standard. If you want to disable
|
---|
187 | only the latter kind of extension, you can set the
|
---|
188 | `POSIXLY_CORRECT' variable to a non-empty value.
|
---|
189 |
|
---|
190 | `-r'
|
---|
191 | `--regexp-extended'
|
---|
192 | Use extended regular expressions rather than basic regular
|
---|
193 | expressions. Extended regexps are those that `egrep' accepts;
|
---|
194 | they can be clearer because they usually have less backslashes,
|
---|
195 | but are a GNU extension and hence scripts that use them are not
|
---|
196 | portable. *Note Extended regular expressions: Extended regexps.
|
---|
197 |
|
---|
198 | `-s'
|
---|
199 | `--separate'
|
---|
200 | By default, `sed' will consider the files specified on the command
|
---|
201 | line as a single continuous long stream. This GNU `sed' extension
|
---|
202 | allows the user to consider them as separate files: range
|
---|
203 | addresses (such as `/abc/,/def/') are not allowed to span several
|
---|
204 | files, line numbers are relative to the start of each file, `$'
|
---|
205 | refers to the last line of each file, and files invoked from the
|
---|
206 | `R' commands are rewound at the start of each file.
|
---|
207 |
|
---|
208 | `-u'
|
---|
209 | `--unbuffered'
|
---|
210 | Buffer both input and output as minimally as practical. (This is
|
---|
211 | particularly useful if the input is coming from the likes of `tail
|
---|
212 | -f', and you wish to see the transformed output as soon as
|
---|
213 | possible.)
|
---|
214 |
|
---|
215 | `-e SCRIPT'
|
---|
216 | `--expression=SCRIPT'
|
---|
217 | Add the commands in SCRIPT to the set of commands to be run while
|
---|
218 | processing the input.
|
---|
219 |
|
---|
220 | `-f SCRIPT-FILE'
|
---|
221 | `--file=SCRIPT-FILE'
|
---|
222 | Add the commands contained in the file SCRIPT-FILE to the set of
|
---|
223 | commands to be run while processing the input.
|
---|
224 |
|
---|
225 |
|
---|
226 | If no `-e', `-f', `--expression', or `--file' options are given on
|
---|
227 | the command-line, then the first non-option argument on the command
|
---|
228 | line is taken to be the SCRIPT to be executed.
|
---|
229 |
|
---|
230 | If any command-line parameters remain after processing the above,
|
---|
231 | these parameters are interpreted as the names of input files to be
|
---|
232 | processed. A file name of `-' refers to the standard input stream.
|
---|
233 | The standard input will be processed if no file names are specified.
|
---|
234 |
|
---|
235 | ---------- Footnotes ----------
|
---|
236 |
|
---|
237 | (1) This applies to commands such as `=', `a', `c', `i', `l', `p'.
|
---|
238 | You can still write to the standard output by using the `w' or `W'
|
---|
239 | commands together with the `/dev/stdout' special file
|
---|
240 |
|
---|
241 | (2) Note that GNU `sed' creates the backup file whether or not
|
---|
242 | any output is actually changed.
|
---|
243 |
|
---|
244 |
|
---|
245 | File: sed.info, Node: sed Programs, Next: Examples, Prev: Invoking sed, Up: Top
|
---|
246 |
|
---|
247 | `sed' Programs
|
---|
248 | **************
|
---|
249 |
|
---|
250 | A `sed' program consists of one or more `sed' commands, passed in by
|
---|
251 | one or more of the `-e', `-f', `--expression', and `--file' options, or
|
---|
252 | the first non-option argument if zero of these options are used. This
|
---|
253 | document will refer to "the" `sed' script; this is understood to mean
|
---|
254 | the in-order catenation of all of the SCRIPTs and SCRIPT-FILEs passed
|
---|
255 | in.
|
---|
256 |
|
---|
257 | Each `sed' command consists of an optional address or address range,
|
---|
258 | followed by a one-character command name and any additional
|
---|
259 | command-specific code.
|
---|
260 |
|
---|
261 | * Menu:
|
---|
262 |
|
---|
263 | * Execution Cycle:: How `sed' works
|
---|
264 | * Addresses:: Selecting lines with `sed'
|
---|
265 | * Regular Expressions:: Overview of regular expression syntax
|
---|
266 | * Common Commands:: Often used commands
|
---|
267 | * The "s" Command:: `sed''s Swiss Army Knife
|
---|
268 | * Other Commands:: Less frequently used commands
|
---|
269 | * Programming Commands:: Commands for `sed' gurus
|
---|
270 | * Extended Commands:: Commands specific of GNU `sed'
|
---|
271 | * Escapes:: Specifying special characters
|
---|
272 |
|
---|
273 |
|
---|
274 | File: sed.info, Node: Execution Cycle, Next: Addresses, Up: sed Programs
|
---|
275 |
|
---|
276 | How `sed' Works
|
---|
277 | ===============
|
---|
278 |
|
---|
279 | `sed' maintains two data buffers: the active _pattern_ space, and
|
---|
280 | the auxiliary _hold_ space. Both are initially empty.
|
---|
281 |
|
---|
282 | `sed' operates by performing the following cycle on each lines of
|
---|
283 | input: first, `sed' reads one line from the input stream, removes any
|
---|
284 | trailing newline, and places it in the pattern space. Then commands
|
---|
285 | are executed; each command can have an address associated to it:
|
---|
286 | addresses are a kind of condition code, and a command is only executed
|
---|
287 | if the condition is verified before the command is to be executed.
|
---|
288 |
|
---|
289 | When the end of the script is reached, unless the `-n' option is in
|
---|
290 | use, the contents of pattern space are printed out to the output
|
---|
291 | stream, adding back the trailing newline if it was removed.(1) Then the
|
---|
292 | next cycle starts for the next input line.
|
---|
293 |
|
---|
294 | Unless special commands (like `D') are used, the pattern space is
|
---|
295 | deleted between two cycles. The hold space, on the other hand, keeps
|
---|
296 | its data between cycles (see commands `h', `H', `x', `g', `G' to move
|
---|
297 | data between both buffers).
|
---|
298 |
|
---|
299 | ---------- Footnotes ----------
|
---|
300 |
|
---|
301 | (1) Actually, if `sed' prints a line without the terminating
|
---|
302 | newline, it will nevertheless print the missing newline as soon as
|
---|
303 | more text is sent to the same output stream, which gives the "least
|
---|
304 | expected surprise" even though it does not make commands like `sed -n
|
---|
305 | p' exactly identical to `cat'.
|
---|
306 |
|
---|
307 |
|
---|
308 | File: sed.info, Node: Addresses, Next: Regular Expressions, Prev: Execution Cycle, Up: sed Programs
|
---|
309 |
|
---|
310 | Selecting lines with `sed'
|
---|
311 | ==========================
|
---|
312 |
|
---|
313 | Addresses in a `sed' script can be in any of the following forms:
|
---|
314 | `NUMBER'
|
---|
315 | Specifying a line number will match only that line in the input.
|
---|
316 | (Note that `sed' counts lines continuously across all input files
|
---|
317 | unless `-i' or `-s' options are specified.)
|
---|
318 |
|
---|
319 | `FIRST~STEP'
|
---|
320 | This GNU extension matches every STEPth line starting with line
|
---|
321 | FIRST. In particular, lines will be selected when there exists a
|
---|
322 | non-negative N such that the current line-number equals FIRST + (N
|
---|
323 | * STEP). Thus, to select the odd-numbered lines, one would use
|
---|
324 | `1~2'; to pick every third line starting with the second, `2~3'
|
---|
325 | would be used; to pick every fifth line starting with the tenth,
|
---|
326 | use `10~5'; and `50~0' is just an obscure way of saying `50'.
|
---|
327 |
|
---|
328 | `$'
|
---|
329 | This address matches the last line of the last file of input, or
|
---|
330 | the last line of each file when the `-i' or `-s' options are
|
---|
331 | specified.
|
---|
332 |
|
---|
333 | `/REGEXP/'
|
---|
334 | This will select any line which matches the regular expression
|
---|
335 | REGEXP. If REGEXP itself includes any `/' characters, each must
|
---|
336 | be escaped by a backslash (`\').
|
---|
337 |
|
---|
338 | The empty regular expression `//' repeats the last regular
|
---|
339 | expression match (the same holds if the empty regular expression is
|
---|
340 | passed to the `s' command). Note that modifiers to regular
|
---|
341 | expressions are evaluated when the regular expression is compiled,
|
---|
342 | thus it is invalid to specify them together with the empty regular
|
---|
343 | expression.
|
---|
344 |
|
---|
345 | `\%REGEXP%'
|
---|
346 | (The `%' may be replaced by any other single character.)
|
---|
347 |
|
---|
348 | This also matches the regular expression REGEXP, but allows one to
|
---|
349 | use a different delimiter than `/'. This is particularly useful
|
---|
350 | if the REGEXP itself contains a lot of slashes, since it avoids
|
---|
351 | the tedious escaping of every `/'. If REGEXP itself includes any
|
---|
352 | delimiter characters, each must be escaped by a backslash (`\').
|
---|
353 |
|
---|
354 | `/REGEXP/I'
|
---|
355 | `\%REGEXP%I'
|
---|
356 | The `I' modifier to regular-expression matching is a GNU extension
|
---|
357 | which causes the REGEXP to be matched in a case-insensitive manner.
|
---|
358 |
|
---|
359 | `/REGEXP/M'
|
---|
360 | `\%REGEXP%M'
|
---|
361 | The `M' modifier to regular-expression matching is a GNU `sed'
|
---|
362 | extension which causes `^' and `$' to match respectively (in
|
---|
363 | addition to the normal behavior) the empty string after a newline,
|
---|
364 | and the empty string before a newline. There are special character
|
---|
365 | sequences (`\`' and `\'') which always match the beginning or the
|
---|
366 | end of the buffer. `M' stands for `multi-line'.
|
---|
367 |
|
---|
368 |
|
---|
369 | If no addresses are given, then all lines are matched; if one
|
---|
370 | address is given, then only lines matching that address are matched.
|
---|
371 |
|
---|
372 | An address range can be specified by specifying two addresses
|
---|
373 | separated by a comma (`,'). An address range matches lines starting
|
---|
374 | from where the first address matches, and continues until the second
|
---|
375 | address matches (inclusively).
|
---|
376 |
|
---|
377 | If the second address is a REGEXP, then checking for the ending
|
---|
378 | match will start with the line _following_ the line which matched the
|
---|
379 | first address: a range will always span at least two lines (except of
|
---|
380 | course if the input stream ends).
|
---|
381 |
|
---|
382 | If the second address is a NUMBER less than (or equal to) the line
|
---|
383 | matching the first address, then only the one line is matched.
|
---|
384 |
|
---|
385 | GNU `sed' also supports some special two-address forms; all these
|
---|
386 | are GNU extensions:
|
---|
387 | `0,/REGEXP/'
|
---|
388 | A line number of `0' can be used in an address specification like
|
---|
389 | `0,/REGEXP/' so that `sed' will try to match REGEXP in the first
|
---|
390 | input line too. In other words, `0,/REGEXP/' is similar to
|
---|
391 | `1,/REGEXP/', except that if ADDR2 matches the very first line of
|
---|
392 | input the `0,/REGEXP/' form will consider it to end the range,
|
---|
393 | whereas the `1,/REGEXP/' form will match the beginning of its
|
---|
394 | range and hence make the range span up to the _second_ occurrence
|
---|
395 | of the regular expression.
|
---|
396 |
|
---|
397 | Note that this is the only place where the `0' address makes
|
---|
398 | sense; there is no 0-th line and commands which are given the `0'
|
---|
399 | address in any other way will give an error.
|
---|
400 |
|
---|
401 | `ADDR1,+N'
|
---|
402 | Matches ADDR1 and the N lines following ADDR1.
|
---|
403 |
|
---|
404 | `ADDR1,~N'
|
---|
405 | Matches ADDR1 and the lines following ADDR1 until the next line
|
---|
406 | whose input line number is a multiple of N.
|
---|
407 |
|
---|
408 | Appending the `!' character to the end of an address specification
|
---|
409 | negates the sense of the match. That is, if the `!' character follows
|
---|
410 | an address range, then only lines which do _not_ match the address range
|
---|
411 | will be selected. This also works for singleton addresses, and,
|
---|
412 | perhaps perversely, for the null address.
|
---|
413 |
|
---|
414 |
|
---|
415 | File: sed.info, Node: Regular Expressions, Next: Common Commands, Prev: Addresses, Up: sed Programs
|
---|
416 |
|
---|
417 | Overview of Regular Expression Syntax
|
---|
418 | =====================================
|
---|
419 |
|
---|
420 | To know how to use `sed', people should understand regular
|
---|
421 | expressions ("regexp" for short). A regular expression is a pattern
|
---|
422 | that is matched against a subject string from left to right. Most
|
---|
423 | characters are "ordinary": they stand for themselves in a pattern, and
|
---|
424 | match the corresponding characters in the subject. As a trivial
|
---|
425 | example, the pattern
|
---|
426 |
|
---|
427 | The quick brown fox
|
---|
428 |
|
---|
429 | matches a portion of a subject string that is identical to itself. The
|
---|
430 | power of regular expressions comes from the ability to include
|
---|
431 | alternatives and repetitions in the pattern. These are encoded in the
|
---|
432 | pattern by the use of "special characters", which do not stand for
|
---|
433 | themselves but instead are interpreted in some special way. Here is a
|
---|
434 | brief description of regular expression syntax as used in `sed'.
|
---|
435 |
|
---|
436 | `CHAR'
|
---|
437 | A single ordinary character matches itself.
|
---|
438 |
|
---|
439 | `*'
|
---|
440 | Matches a sequence of zero or more instances of matches for the
|
---|
441 | preceding regular expression, which must be an ordinary character,
|
---|
442 | a special character preceded by `\', a `.', a grouped regexp (see
|
---|
443 | below), or a bracket expression. As a GNU extension, a postfixed
|
---|
444 | regular expression can also be followed by `*'; for example, `a**'
|
---|
445 | is equivalent to `a*'. POSIX 1003.1-2001 says that `*' stands for
|
---|
446 | itself when it appears at the start of a regular expression or
|
---|
447 | subexpression, but many nonGNU implementations do not support this
|
---|
448 | and portable scripts should instead use `\*' in these contexts.
|
---|
449 |
|
---|
450 | `\+'
|
---|
451 | As `*', but matches one or more. It is a GNU extension.
|
---|
452 |
|
---|
453 | `\?'
|
---|
454 | As `*', but only matches zero or one. It is a GNU extension.
|
---|
455 |
|
---|
456 | `\{I\}'
|
---|
457 | As `*', but matches exactly I sequences (I is a decimal integer;
|
---|
458 | for portability, keep it between 0 and 255 inclusive).
|
---|
459 |
|
---|
460 | `\{I,J\}'
|
---|
461 | Matches between I and J, inclusive, sequences.
|
---|
462 |
|
---|
463 | `\{I,\}'
|
---|
464 | Matches more than or equal to I sequences.
|
---|
465 |
|
---|
466 | `\(REGEXP\)'
|
---|
467 | Groups the inner REGEXP as a whole, this is used to:
|
---|
468 |
|
---|
469 | * Apply postfix operators, like `\(abcd\)*': this will search
|
---|
470 | for zero or more whole sequences of `abcd', while `abcd*'
|
---|
471 | would search for `abc' followed by zero or more occurrences
|
---|
472 | of `d'. Note that support for `\(abcd\)*' is required by
|
---|
473 | POSIX 1003.1-2001, but many non-GNU implementations do not
|
---|
474 | support it and hence it is not universally portable.
|
---|
475 |
|
---|
476 | * Use back references (see below).
|
---|
477 |
|
---|
478 | `.'
|
---|
479 | Matches any character, including newline.
|
---|
480 |
|
---|
481 | `^'
|
---|
482 | Matches the null string at beginning of line, i.e. what appears
|
---|
483 | after the circumflex must appear at the beginning of line.
|
---|
484 | `^#include' will match only lines where `#include' is the first
|
---|
485 | thing on line--if there are spaces before, for example, the match
|
---|
486 | fails. `^' acts as a special character only at the beginning of
|
---|
487 | the regular expression or subexpression (that is, after `\(' or
|
---|
488 | `\|'). Portable scripts should avoid `^' at the beginning of a
|
---|
489 | subexpression, though, as POSIX allows implementations that treat
|
---|
490 | `^' as an ordinary character in that context.
|
---|
491 |
|
---|
492 | `$'
|
---|
493 | It is the same as `^', but refers to end of line. `$' also acts
|
---|
494 | as a special character only at the end of the regular expression
|
---|
495 | or subexpression (that is, before `\)' or `\|'), and its use at
|
---|
496 | the end of a subexpression is not portable.
|
---|
497 |
|
---|
498 | `[LIST]'
|
---|
499 | `[^LIST]'
|
---|
500 | Matches any single character in LIST: for example, `[aeiou]'
|
---|
501 | matches all vowels. A list may include sequences like
|
---|
502 | `CHAR1-CHAR2', which matches any character between (inclusive)
|
---|
503 | CHAR1 and CHAR2.
|
---|
504 |
|
---|
505 | A leading `^' reverses the meaning of LIST, so that it matches any
|
---|
506 | single character _not_ in LIST. To include `]' in the list, make
|
---|
507 | it the first character (after the `^' if needed), to include `-'
|
---|
508 | in the list, make it the first or last; to include `^' put it
|
---|
509 | after the first character.
|
---|
510 |
|
---|
511 | The characters `$', `*', `.', `[', and `\' are normally not
|
---|
512 | special within LIST. For example, `[\*]' matches either `\' or
|
---|
513 | `*', because the `\' is not special here. However, strings like
|
---|
514 | `[.ch.]', `[=a=]', and `[:space:]' are special within LIST and
|
---|
515 | represent collating symbols, equivalence classes, and character
|
---|
516 | classes, respectively, and `[' is therefore special within LIST
|
---|
517 | when it is followed by `.', `=', or `:'. Also, when not in
|
---|
518 | `POSIXLY_CORRECT' mode, special escapes like `\n' and `\t' are
|
---|
519 | recognized within LIST. *Note Escapes::.
|
---|
520 |
|
---|
521 | `REGEXP1\|REGEXP2'
|
---|
522 | Matches either REGEXP1 or REGEXP2. Use parentheses to use complex
|
---|
523 | alternative regular expressions. The matching process tries each
|
---|
524 | alternative in turn, from left to right, and the first one that
|
---|
525 | succeeds is used. It is a GNU extension.
|
---|
526 |
|
---|
527 | `REGEXP1REGEXP2'
|
---|
528 | Matches the concatenation of REGEXP1 and REGEXP2. Concatenation
|
---|
529 | binds more tightly than `\|', `^', and `$', but less tightly than
|
---|
530 | the other regular expression operators.
|
---|
531 |
|
---|
532 | `\DIGIT'
|
---|
533 | Matches the DIGIT-th `\(...\)' parenthesized subexpression in the
|
---|
534 | regular expression. This is called a "back reference".
|
---|
535 | Subexpressions are implicity numbered by counting occurrences of
|
---|
536 | `\(' left-to-right.
|
---|
537 |
|
---|
538 | `\n'
|
---|
539 | Matches the newline character.
|
---|
540 |
|
---|
541 | `\CHAR'
|
---|
542 | Matches CHAR, where CHAR is one of `$', `*', `.', `[', `\', or `^'.
|
---|
543 | Note that the only C-like backslash sequences that you can
|
---|
544 | portably assume to be interpreted are `\n' and `\\'; in particular
|
---|
545 | `\t' is not portable, and matches a `t' under most implementations
|
---|
546 | of `sed', rather than a tab character.
|
---|
547 |
|
---|
548 |
|
---|
549 | Note that the regular expression matcher is greedy, i.e., matches
|
---|
550 | are attempted from left to right and, if two or more matches are
|
---|
551 | possible starting at the same character, it selects the longest.
|
---|
552 |
|
---|
553 | Examples:
|
---|
554 | `abcdef'
|
---|
555 | Matches `abcdef'.
|
---|
556 |
|
---|
557 | `a*b'
|
---|
558 | Matches zero or more `a's followed by a single `b'. For example,
|
---|
559 | `b' or `aaaaab'.
|
---|
560 |
|
---|
561 | `a\?b'
|
---|
562 | Matches `b' or `ab'.
|
---|
563 |
|
---|
564 | `a\+b\+'
|
---|
565 | Matches one or more `a's followed by one or more `b's: `ab' is the
|
---|
566 | shortest possible match, but other examples are `aaaab' or
|
---|
567 | `abbbbb' or `aaaaaabbbbbbb'.
|
---|
568 |
|
---|
569 | `.*'
|
---|
570 | `.\+'
|
---|
571 | These two both match all the characters in a string; however, the
|
---|
572 | first matches every string (including the empty string), while the
|
---|
573 | second matches only strings containing at least one character.
|
---|
574 |
|
---|
575 | `^main.*(.*)'
|
---|
576 | his matches a string starting with `main', followed by an opening
|
---|
577 | and closing parenthesis. The `n', `(' and `)' need not be
|
---|
578 | adjacent.
|
---|
579 |
|
---|
580 | `^#'
|
---|
581 | This matches a string beginning with `#'.
|
---|
582 |
|
---|
583 | `\\$'
|
---|
584 | This matches a string ending with a single backslash. The regexp
|
---|
585 | contains two backslashes for escaping.
|
---|
586 |
|
---|
587 | `\$'
|
---|
588 | Instead, this matches a string consisting of a single dollar sign,
|
---|
589 | because it is escaped.
|
---|
590 |
|
---|
591 | `[a-zA-Z0-9]'
|
---|
592 | In the C locale, this matches any ASCII letters or digits.
|
---|
593 |
|
---|
594 | `[^ tab]\+'
|
---|
595 | (Here `tab' stands for a single tab character.) This matches a
|
---|
596 | string of one or more characters, none of which is a space or a
|
---|
597 | tab. Usually this means a word.
|
---|
598 |
|
---|
599 | `^\(.*\)\n\1$'
|
---|
600 | This matches a string consisting of two equal substrings separated
|
---|
601 | by a newline.
|
---|
602 |
|
---|
603 | `.\{9\}A$'
|
---|
604 | This matches nine characters followed by an `A'.
|
---|
605 |
|
---|
606 | `^.\{15\}A'
|
---|
607 | This matches the start of a string that contains 16 characters,
|
---|
608 | the last of which is an `A'.
|
---|
609 |
|
---|
610 |
|
---|
611 |
|
---|
612 | File: sed.info, Node: Common Commands, Next: The "s" Command, Prev: Regular Expressions, Up: sed Programs
|
---|
613 |
|
---|
614 | Often-Used Commands
|
---|
615 | ===================
|
---|
616 |
|
---|
617 | If you use `sed' at all, you will quite likely want to know these
|
---|
618 | commands.
|
---|
619 |
|
---|
620 | `#'
|
---|
621 | [No addresses allowed.]
|
---|
622 |
|
---|
623 | The `#' character begins a comment; the comment continues until
|
---|
624 | the next newline.
|
---|
625 |
|
---|
626 | If you are concerned about portability, be aware that some
|
---|
627 | implementations of `sed' (which are not POSIX conformant) may only
|
---|
628 | support a single one-line comment, and then only when the very
|
---|
629 | first character of the script is a `#'.
|
---|
630 |
|
---|
631 | Warning: if the first two characters of the `sed' script are `#n',
|
---|
632 | then the `-n' (no-autoprint) option is forced. If you want to put
|
---|
633 | a comment in the first line of your script and that comment begins
|
---|
634 | with the letter `n' and you do not want this behavior, then be
|
---|
635 | sure to either use a capital `N', or place at least one space
|
---|
636 | before the `n'.
|
---|
637 |
|
---|
638 | `q [EXIT-CODE]'
|
---|
639 | This command only accepts a single address.
|
---|
640 |
|
---|
641 | Exit `sed' without processing any more commands or input. Note
|
---|
642 | that the current pattern space is printed if auto-print is not
|
---|
643 | disabled with the `-n' options. The ability to return an exit
|
---|
644 | code from the `sed' script is a GNU `sed' extension.
|
---|
645 |
|
---|
646 | `d'
|
---|
647 | Delete the pattern space; immediately start next cycle.
|
---|
648 |
|
---|
649 | `p'
|
---|
650 | Print out the pattern space (to the standard output). This
|
---|
651 | command is usually only used in conjunction with the `-n'
|
---|
652 | command-line option.
|
---|
653 |
|
---|
654 | `n'
|
---|
655 | If auto-print is not disabled, print the pattern space, then,
|
---|
656 | regardless, replace the pattern space with the next line of input.
|
---|
657 | If there is no more input then `sed' exits without processing any
|
---|
658 | more commands.
|
---|
659 |
|
---|
660 | `{ COMMANDS }'
|
---|
661 | A group of commands may be enclosed between `{' and `}' characters.
|
---|
662 | This is particularly useful when you want a group of commands to
|
---|
663 | be triggered by a single address (or address-range) match.
|
---|
664 |
|
---|
665 |
|
---|
666 |
|
---|
667 | File: sed.info, Node: The "s" Command, Next: Other Commands, Prev: Common Commands, Up: sed Programs
|
---|
668 |
|
---|
669 | The `s' Command
|
---|
670 | ===============
|
---|
671 |
|
---|
672 | The syntax of the `s' (as in substitute) command is
|
---|
673 | `s/REGEXP/REPLACEMENT/FLAGS'. The `/' characters may be uniformly
|
---|
674 | replaced by any other single character within any given `s' command.
|
---|
675 | The `/' character (or whatever other character is used in its stead)
|
---|
676 | can appear in the REGEXP or REPLACEMENT only if it is preceded by a `\'
|
---|
677 | character.
|
---|
678 |
|
---|
679 | The `s' command is probably the most important in `sed' and has a
|
---|
680 | lot of different options. Its basic concept is simple: the `s' command
|
---|
681 | attempts to match the pattern space against the supplied REGEXP; if the
|
---|
682 | match is successful, then that portion of the pattern space which was
|
---|
683 | matched is replaced with REPLACEMENT.
|
---|
684 |
|
---|
685 | The REPLACEMENT can contain `\N' (N being a number from 1 to 9,
|
---|
686 | inclusive) references, which refer to the portion of the match which is
|
---|
687 | contained between the Nth `\(' and its matching `\)'. Also, the
|
---|
688 | REPLACEMENT can contain unescaped `&' characters which reference the
|
---|
689 | whole matched portion of the pattern space. Finally, as a GNU `sed'
|
---|
690 | extension, you can include a special sequence made of a backslash and
|
---|
691 | one of the letters `L', `l', `U', `u', or `E'. The meaning is as
|
---|
692 | follows:
|
---|
693 |
|
---|
694 | `\L'
|
---|
695 | Turn the replacement to lowercase until a `\U' or `\E' is found,
|
---|
696 |
|
---|
697 | `\l'
|
---|
698 | Turn the next character to lowercase,
|
---|
699 |
|
---|
700 | `\U'
|
---|
701 | Turn the replacement to uppercase until a `\L' or `\E' is found,
|
---|
702 |
|
---|
703 | `\u'
|
---|
704 | Turn the next character to uppercase,
|
---|
705 |
|
---|
706 | `\E'
|
---|
707 | Stop case conversion started by `\L' or `\U'.
|
---|
708 |
|
---|
709 | To include a literal `\', `&', or newline in the final replacement,
|
---|
710 | be sure to precede the desired `\', `&', or newline in the REPLACEMENT
|
---|
711 | with a `\'.
|
---|
712 |
|
---|
713 | The `s' command can be followed by zero or more of the following
|
---|
714 | FLAGS:
|
---|
715 |
|
---|
716 | `g'
|
---|
717 | Apply the replacement to _all_ matches to the REGEXP, not just the
|
---|
718 | first.
|
---|
719 |
|
---|
720 | `NUMBER'
|
---|
721 | Only replace the NUMBERth match of the REGEXP.
|
---|
722 |
|
---|
723 | Note: the POSIX standard does not specify what should happen when
|
---|
724 | you mix the `g' and NUMBER modifiers, and currently there is no
|
---|
725 | widely agreed upon meaning across `sed' implementations. For GNU
|
---|
726 | `sed', the interaction is defined to be: ignore matches before the
|
---|
727 | NUMBERth, and then match and replace all matches from the NUMBERth
|
---|
728 | on.
|
---|
729 |
|
---|
730 | `p'
|
---|
731 | If the substitution was made, then print the new pattern space.
|
---|
732 |
|
---|
733 | Note: when both the `p' and `e' options are specified, the
|
---|
734 | relative ordering of the two produces very different results. In
|
---|
735 | general, `ep' (evaluate then print) is what you want, but
|
---|
736 | operating the other way round can be useful for debugging. For
|
---|
737 | this reason, the current version of GNU `sed' interprets specially
|
---|
738 | the presence of `p' options both before and after `e', printing
|
---|
739 | the pattern space before and after evaluation, while in general
|
---|
740 | flags for the `s' command show their effect just once. This
|
---|
741 | behavior, although documented, might change in future versions.
|
---|
742 |
|
---|
743 | `w FILE-NAME'
|
---|
744 | If the substitution was made, then write out the result to the
|
---|
745 | named file. As a GNU `sed' extension, two special values of
|
---|
746 | FILE-NAME are supported: `/dev/stderr', which writes the result to
|
---|
747 | the standard error, and `/dev/stdout', which writes to the standard
|
---|
748 | output.(1)
|
---|
749 |
|
---|
750 | `e'
|
---|
751 | This command allows one to pipe input from a shell command into
|
---|
752 | pattern space. If a substitution was made, the command that is
|
---|
753 | found in pattern space is executed and pattern space is replaced
|
---|
754 | with its output. A trailing newline is suppressed; results are
|
---|
755 | undefined if the command to be executed contains a NUL character.
|
---|
756 | This is a GNU `sed' extension.
|
---|
757 |
|
---|
758 | `I'
|
---|
759 | `i'
|
---|
760 | The `I' modifier to regular-expression matching is a GNU extension
|
---|
761 | which makes `sed' match REGEXP in a case-insensitive manner.
|
---|
762 |
|
---|
763 | `M'
|
---|
764 | `m'
|
---|
765 | The `M' modifier to regular-expression matching is a GNU `sed'
|
---|
766 | extension which causes `^' and `$' to match respectively (in
|
---|
767 | addition to the normal behavior) the empty string after a newline,
|
---|
768 | and the empty string before a newline. There are special character
|
---|
769 | sequences (`\`' and `\'') which always match the beginning or the
|
---|
770 | end of the buffer. `M' stands for `multi-line'.
|
---|
771 |
|
---|
772 |
|
---|
773 | ---------- Footnotes ----------
|
---|
774 |
|
---|
775 | (1) This is equivalent to `p' unless the `-i' option is being used.
|
---|
776 |
|
---|
777 |
|
---|
778 | File: sed.info, Node: Other Commands, Next: Programming Commands, Prev: The "s" Command, Up: sed Programs
|
---|
779 |
|
---|
780 | Less Frequently-Used Commands
|
---|
781 | =============================
|
---|
782 |
|
---|
783 | Though perhaps less frequently used than those in the previous
|
---|
784 | section, some very small yet useful `sed' scripts can be built with
|
---|
785 | these commands.
|
---|
786 |
|
---|
787 | `y/SOURCE-CHARS/DEST-CHARS/'
|
---|
788 | (The `/' characters may be uniformly replaced by any other single
|
---|
789 | character within any given `y' command.)
|
---|
790 |
|
---|
791 | Transliterate any characters in the pattern space which match any
|
---|
792 | of the SOURCE-CHARS with the corresponding character in DEST-CHARS.
|
---|
793 |
|
---|
794 | Instances of the `/' (or whatever other character is used in its
|
---|
795 | stead), `\', or newlines can appear in the SOURCE-CHARS or
|
---|
796 | DEST-CHARS lists, provide that each instance is escaped by a `\'.
|
---|
797 | The SOURCE-CHARS and DEST-CHARS lists _must_ contain the same
|
---|
798 | number of characters (after de-escaping).
|
---|
799 |
|
---|
800 | `a\'
|
---|
801 | `TEXT'
|
---|
802 | As a GNU extension, this command accepts two addresses.
|
---|
803 |
|
---|
804 | Queue the lines of text which follow this command (each but the
|
---|
805 | last ending with a `\', which are removed from the output) to be
|
---|
806 | output at the end of the current cycle, or when the next input
|
---|
807 | line is read.
|
---|
808 |
|
---|
809 | Escape sequences in TEXT are processed, so you should use `\\' in
|
---|
810 | TEXT to print a single backslash.
|
---|
811 |
|
---|
812 | As a GNU extension, if between the `a' and the newline there is
|
---|
813 | other than a whitespace-`\' sequence, then the text of this line,
|
---|
814 | starting at the first non-whitespace character after the `a', is
|
---|
815 | taken as the first line of the TEXT block. (This enables a
|
---|
816 | simplification in scripting a one-line add.) This extension also
|
---|
817 | works with the `i' and `c' commands.
|
---|
818 |
|
---|
819 | `i\'
|
---|
820 | `TEXT'
|
---|
821 | As a GNU extension, this command accepts two addresses.
|
---|
822 |
|
---|
823 | Immediately output the lines of text which follow this command
|
---|
824 | (each but the last ending with a `\', which are removed from the
|
---|
825 | output).
|
---|
826 |
|
---|
827 | `c\'
|
---|
828 | `TEXT'
|
---|
829 | Delete the lines matching the address or address-range, and output
|
---|
830 | the lines of text which follow this command (each but the last
|
---|
831 | ending with a `\', which are removed from the output) in place of
|
---|
832 | the last line (or in place of each line, if no addresses were
|
---|
833 | specified). A new cycle is started after this command is done,
|
---|
834 | since the pattern space will have been deleted.
|
---|
835 |
|
---|
836 | `='
|
---|
837 | As a GNU extension, this command accepts two addresses.
|
---|
838 |
|
---|
839 | Print out the current input line number (with a trailing newline).
|
---|
840 |
|
---|
841 | `l N'
|
---|
842 | Print the pattern space in an unambiguous form: non-printable
|
---|
843 | characters (and the `\' character) are printed in C-style escaped
|
---|
844 | form; long lines are split, with a trailing `\' character to
|
---|
845 | indicate the split; the end of each line is marked with a `$'.
|
---|
846 |
|
---|
847 | N specifies the desired line-wrap length; a length of 0 (zero)
|
---|
848 | means to never wrap long lines. If omitted, the default as
|
---|
849 | specified on the command line is used. The N parameter is a GNU
|
---|
850 | `sed' extension.
|
---|
851 |
|
---|
852 | `r FILENAME'
|
---|
853 | As a GNU extension, this command accepts two addresses.
|
---|
854 |
|
---|
855 | Queue the contents of FILENAME to be read and inserted into the
|
---|
856 | output stream at the end of the current cycle, or when the next
|
---|
857 | input line is read. Note that if FILENAME cannot be read, it is
|
---|
858 | treated as if it were an empty file, without any error indication.
|
---|
859 |
|
---|
860 | As a GNU `sed' extension, the special value `/dev/stdin' is
|
---|
861 | supported for the file name, which reads the contents of the
|
---|
862 | standard input.
|
---|
863 |
|
---|
864 | `w FILENAME'
|
---|
865 | Write the pattern space to FILENAME. As a GNU `sed' extension,
|
---|
866 | two special values of FILE-NAME are supported: `/dev/stderr',
|
---|
867 | which writes the result to the standard error, and `/dev/stdout',
|
---|
868 | which writes to the standard output.(1)
|
---|
869 |
|
---|
870 | The file will be created (or truncated) before the first input
|
---|
871 | line is read; all `w' commands (including instances of `w' flag on
|
---|
872 | successful `s' commands) which refer to the same FILENAME are
|
---|
873 | output without closing and reopening the file.
|
---|
874 |
|
---|
875 | `D'
|
---|
876 | Delete text in the pattern space up to the first newline. If any
|
---|
877 | text is left, restart cycle with the resultant pattern space
|
---|
878 | (without reading a new line of input), otherwise start a normal
|
---|
879 | new cycle.
|
---|
880 |
|
---|
881 | `N'
|
---|
882 | Add a newline to the pattern space, then append the next line of
|
---|
883 | input to the pattern space. If there is no more input then `sed'
|
---|
884 | exits without processing any more commands.
|
---|
885 |
|
---|
886 | `P'
|
---|
887 | Print out the portion of the pattern space up to the first newline.
|
---|
888 |
|
---|
889 | `h'
|
---|
890 | Replace the contents of the hold space with the contents of the
|
---|
891 | pattern space.
|
---|
892 |
|
---|
893 | `H'
|
---|
894 | Append a newline to the contents of the hold space, and then
|
---|
895 | append the contents of the pattern space to that of the hold space.
|
---|
896 |
|
---|
897 | `g'
|
---|
898 | Replace the contents of the pattern space with the contents of the
|
---|
899 | hold space.
|
---|
900 |
|
---|
901 | `G'
|
---|
902 | Append a newline to the contents of the pattern space, and then
|
---|
903 | append the contents of the hold space to that of the pattern space.
|
---|
904 |
|
---|
905 | `x'
|
---|
906 | Exchange the contents of the hold and pattern spaces.
|
---|
907 |
|
---|
908 |
|
---|
909 | ---------- Footnotes ----------
|
---|
910 |
|
---|
911 | (1) This is equivalent to `p' unless the `-i' option is being used.
|
---|
912 |
|
---|
913 |
|
---|
914 | File: sed.info, Node: Programming Commands, Next: Extended Commands, Prev: Other Commands, Up: sed Programs
|
---|
915 |
|
---|
916 | Commands for `sed' gurus
|
---|
917 | ========================
|
---|
918 |
|
---|
919 | In most cases, use of these commands indicates that you are probably
|
---|
920 | better off programming in something like `awk' or Perl. But
|
---|
921 | occasionally one is committed to sticking with `sed', and these
|
---|
922 | commands can enable one to write quite convoluted scripts.
|
---|
923 |
|
---|
924 | `: LABEL'
|
---|
925 | [No addresses allowed.]
|
---|
926 |
|
---|
927 | Specify the location of LABEL for branch commands. In all other
|
---|
928 | respects, a no-op.
|
---|
929 |
|
---|
930 | `b LABEL'
|
---|
931 | Unconditionally branch to LABEL. The LABEL may be omitted, in
|
---|
932 | which case the next cycle is started.
|
---|
933 |
|
---|
934 | `t LABEL'
|
---|
935 | Branch to LABEL only if there has been a successful `s'ubstitution
|
---|
936 | since the last input line was read or conditional branch was taken.
|
---|
937 | The LABEL may be omitted, in which case the next cycle is started.
|
---|
938 |
|
---|
939 |
|
---|
940 |
|
---|
941 | File: sed.info, Node: Extended Commands, Next: Escapes, Prev: Programming Commands, Up: sed Programs
|
---|
942 |
|
---|
943 | Commands Specific to GNU `sed'
|
---|
944 | ==============================
|
---|
945 |
|
---|
946 | These commands are specific to GNU `sed', so you must use them with
|
---|
947 | care and only when you are sure that hindering portability is not evil.
|
---|
948 | They allow you to check for GNU `sed' extensions or to do tasks that
|
---|
949 | are required quite often, yet are unsupported by standard `sed's.
|
---|
950 |
|
---|
951 | `e [COMMAND]'
|
---|
952 | This command allows one to pipe input from a shell command into
|
---|
953 | pattern space. Without parameters, the `e' command executes the
|
---|
954 | command that is found in pattern space and replaces the pattern
|
---|
955 | space with the output; a trailing newline is suppressed.
|
---|
956 |
|
---|
957 | If a parameter is specified, instead, the `e' command interprets
|
---|
958 | it as a command and sends its output to the output stream (like
|
---|
959 | `r' does). The command can run across multiple lines, all but the
|
---|
960 | last ending with a back-slash.
|
---|
961 |
|
---|
962 | In both cases, the results are undefined if the command to be
|
---|
963 | executed contains a NUL character.
|
---|
964 |
|
---|
965 | `L N'
|
---|
966 | This GNU `sed' extension fills and joins lines in pattern space to
|
---|
967 | produce output lines of (at most) N characters, like `fmt' does;
|
---|
968 | if N is omitted, the default as specified on the command line is
|
---|
969 | used. This command is considered a failed experiment and unless
|
---|
970 | there is enough request (which seems unlikely) will be removed in
|
---|
971 | future versions.
|
---|
972 |
|
---|
973 | `Q [EXIT-CODE]'
|
---|
974 | This command only accepts a single address.
|
---|
975 |
|
---|
976 | This command is the same as `q', but will not print the contents
|
---|
977 | of pattern space. Like `q', it provides the ability to return an
|
---|
978 | exit code to the caller.
|
---|
979 |
|
---|
980 | This command can be useful because the only alternative ways to
|
---|
981 | accomplish this apparently trivial function are to use the `-n'
|
---|
982 | option (which can unnecessarily complicate your script) or
|
---|
983 | resorting to the following snippet, which wastes time by reading
|
---|
984 | the whole file without any visible effect:
|
---|
985 |
|
---|
986 | :eat
|
---|
987 | $d Quit silently on the last line
|
---|
988 | N Read another line, silently
|
---|
989 | g Overwrite pattern space each time to save memory
|
---|
990 | b eat
|
---|
991 |
|
---|
992 | `R FILENAME'
|
---|
993 | Queue a line of FILENAME to be read and inserted into the output
|
---|
994 | stream at the end of the current cycle, or when the next input
|
---|
995 | line is read. Note that if FILENAME cannot be read, or if its end
|
---|
996 | is reached, no line is appended, without any error indication.
|
---|
997 |
|
---|
998 | As with the `r' command, the special value `/dev/stdin' is
|
---|
999 | supported for the file name, which reads a line from the standard
|
---|
1000 | input.
|
---|
1001 |
|
---|
1002 | `T LABEL'
|
---|
1003 | Branch to LABEL only if there have been no successful
|
---|
1004 | `s'ubstitutions since the last input line was read or conditional
|
---|
1005 | branch was taken. The LABEL may be omitted, in which case the next
|
---|
1006 | cycle is started.
|
---|
1007 |
|
---|
1008 | `v VERSION'
|
---|
1009 | This command does nothing, but makes `sed' fail if GNU `sed'
|
---|
1010 | extensions are not supported, simply because other versions of
|
---|
1011 | `sed' do not implement it. In addition, you can specify the
|
---|
1012 | version of `sed' that your script requires, such as `4.0.5'. The
|
---|
1013 | default is `4.0' because that is the first version that
|
---|
1014 | implemented this command.
|
---|
1015 |
|
---|
1016 | This command enables all GNU extensions even if `POSIXLY_CORRECT'
|
---|
1017 | is set in the environment.
|
---|
1018 |
|
---|
1019 | `W FILENAME'
|
---|
1020 | Write to the given filename the portion of the pattern space up to
|
---|
1021 | the first newline. Everything said under the `w' command about
|
---|
1022 | file handling holds here too.
|
---|
1023 |
|
---|
1024 |
|
---|
1025 | File: sed.info, Node: Escapes, Prev: Extended Commands, Up: sed Programs
|
---|
1026 |
|
---|
1027 | GNU Extensions for Escapes in Regular Expressions
|
---|
1028 | =================================================
|
---|
1029 |
|
---|
1030 | Until this chapter, we have only encountered escapes of the form
|
---|
1031 | `\^', which tell `sed' not to interpret the circumflex as a special
|
---|
1032 | character, but rather to take it literally. For example, `\*' matches
|
---|
1033 | a single asterisk rather than zero or more backslashes.
|
---|
1034 |
|
---|
1035 | This chapter introduces another kind of escape(1)--that is, escapes
|
---|
1036 | that are applied to a character or sequence of characters that
|
---|
1037 | ordinarily are taken literally, and that `sed' replaces with a special
|
---|
1038 | character. This provides a way of encoding non-printable characters in
|
---|
1039 | patterns in a visible manner. There is no restriction on the
|
---|
1040 | appearance of non-printing characters in a `sed' script but when a
|
---|
1041 | script is being prepared in the shell or by text editing, it is usually
|
---|
1042 | easier to use one of the following escape sequences than the binary
|
---|
1043 | character it represents:
|
---|
1044 |
|
---|
1045 | The list of these escapes is:
|
---|
1046 |
|
---|
1047 | `\a'
|
---|
1048 | Produces or matches a BEL character, that is an "alert" (ASCII 7).
|
---|
1049 |
|
---|
1050 | `\f'
|
---|
1051 | Produces or matches a form feed (ASCII 12).
|
---|
1052 |
|
---|
1053 | `\n'
|
---|
1054 | Produces or matches a newline (ASCII 10).
|
---|
1055 |
|
---|
1056 | `\r'
|
---|
1057 | Produces or matches a carriage return (ASCII 13).
|
---|
1058 |
|
---|
1059 | `\t'
|
---|
1060 | Produces or matches a horizontal tab (ASCII 9).
|
---|
1061 |
|
---|
1062 | `\v'
|
---|
1063 | Produces or matches a so called "vertical tab" (ASCII 11).
|
---|
1064 |
|
---|
1065 | `\cX'
|
---|
1066 | Produces or matches `CONTROL-X', where X is any character. The
|
---|
1067 | precise effect of `\cX' is as follows: if X is a lower case
|
---|
1068 | letter, it is converted to upper case. Then bit 6 of the
|
---|
1069 | character (hex 40) is inverted. Thus `\cz' becomes hex 1A, but
|
---|
1070 | `\c{' becomes hex 3B, while `\c;' becomes hex 7B.
|
---|
1071 |
|
---|
1072 | `\dXXX'
|
---|
1073 | Produces or matches a character whose decimal ASCII value is XXX.
|
---|
1074 |
|
---|
1075 | `\oXXX'
|
---|
1076 | Produces or matches a character whose octal ASCII value is XXX.
|
---|
1077 |
|
---|
1078 | `\xXX'
|
---|
1079 | Produces or matches a character whose hexadecimal ASCII value is
|
---|
1080 | XX.
|
---|
1081 |
|
---|
1082 | `\b' (backspace) was omitted because of the conflict with the
|
---|
1083 | existing "word boundary" meaning.
|
---|
1084 |
|
---|
1085 | Other escapes match a particular character class and are valid only
|
---|
1086 | in regular expressions:
|
---|
1087 |
|
---|
1088 | `\w'
|
---|
1089 | Matches any "word" character. A "word" character is any letter or
|
---|
1090 | digit or the underscore character.
|
---|
1091 |
|
---|
1092 | `\W'
|
---|
1093 | Matches any "non-word" character.
|
---|
1094 |
|
---|
1095 | `\b'
|
---|
1096 | Matches a word boundary; that is it matches if the character to
|
---|
1097 | the left is a "word" character and the character to the right is a
|
---|
1098 | "non-word" character, or vice-versa.
|
---|
1099 |
|
---|
1100 | `\B'
|
---|
1101 | Matches everywhere but on a word boundary; that is it matches if
|
---|
1102 | the character to the left and the character to the right are
|
---|
1103 | either both "word" characters or both "non-word" characters.
|
---|
1104 |
|
---|
1105 | `\`'
|
---|
1106 | Matches only at the start of pattern space. This is different
|
---|
1107 | from `^' in multi-line mode.
|
---|
1108 |
|
---|
1109 | `\''
|
---|
1110 | Matches only at the end of pattern space. This is different from
|
---|
1111 | `$' in multi-line mode.
|
---|
1112 |
|
---|
1113 |
|
---|
1114 | ---------- Footnotes ----------
|
---|
1115 |
|
---|
1116 | (1) All the escapes introduced here are GNU extensions, with the
|
---|
1117 | exception of `\n'. In basic regular expression mode, setting
|
---|
1118 | `POSIXLY_CORRECT' disables them inside bracket expressions.
|
---|
1119 |
|
---|
1120 |
|
---|
1121 | File: sed.info, Node: Examples, Next: Limitations, Prev: sed Programs, Up: Top
|
---|
1122 |
|
---|
1123 | Some Sample Scripts
|
---|
1124 | *******************
|
---|
1125 |
|
---|
1126 | Here are some `sed' scripts to guide you in the art of mastering
|
---|
1127 | `sed'.
|
---|
1128 |
|
---|
1129 | * Menu:
|
---|
1130 |
|
---|
1131 | Some exotic examples:
|
---|
1132 | * Centering lines::
|
---|
1133 | * Increment a number::
|
---|
1134 | * Rename files to lower case::
|
---|
1135 | * Print bash environment::
|
---|
1136 | * Reverse chars of lines::
|
---|
1137 |
|
---|
1138 | Emulating standard utilities:
|
---|
1139 | * tac:: Reverse lines of files
|
---|
1140 | * cat -n:: Numbering lines
|
---|
1141 | * cat -b:: Numbering non-blank lines
|
---|
1142 | * wc -c:: Counting chars
|
---|
1143 | * wc -w:: Counting words
|
---|
1144 | * wc -l:: Counting lines
|
---|
1145 | * head:: Printing the first lines
|
---|
1146 | * tail:: Printing the last lines
|
---|
1147 | * uniq:: Make duplicate lines unique
|
---|
1148 | * uniq -d:: Print duplicated lines of input
|
---|
1149 | * uniq -u:: Remove all duplicated lines
|
---|
1150 | * cat -s:: Squeezing blank lines
|
---|
1151 |
|
---|
1152 |
|
---|
1153 | File: sed.info, Node: Centering lines, Next: Increment a number, Up: Examples
|
---|
1154 |
|
---|
1155 | Centering Lines
|
---|
1156 | ===============
|
---|
1157 |
|
---|
1158 | This script centers all lines of a file on a 80 columns width. To
|
---|
1159 | change that width, the number in `\{...\}' must be replaced, and the
|
---|
1160 | number of added spaces also must be changed.
|
---|
1161 |
|
---|
1162 | Note how the buffer commands are used to separate parts in the
|
---|
1163 | regular expressions to be matched--this is a common technique.
|
---|
1164 |
|
---|
1165 | #!/usr/bin/sed -f
|
---|
1166 |
|
---|
1167 | # Put 80 spaces in the buffer
|
---|
1168 | 1 {
|
---|
1169 | x
|
---|
1170 | s/^$/ /
|
---|
1171 | s/^.*$/&&&&&&&&/
|
---|
1172 | x
|
---|
1173 | }
|
---|
1174 |
|
---|
1175 | # del leading and trailing spaces
|
---|
1176 | y/tab/ /
|
---|
1177 | s/^ *//
|
---|
1178 | s/ *$//
|
---|
1179 |
|
---|
1180 | # add a newline and 80 spaces to end of line
|
---|
1181 | G
|
---|
1182 |
|
---|
1183 | # keep first 81 chars (80 + a newline)
|
---|
1184 | s/^\(.\{81\}\).*$/\1/
|
---|
1185 |
|
---|
1186 | # \2 matches half of the spaces, which are moved to the beginning
|
---|
1187 | s/^\(.*\)\n\(.*\)\2/\2\1/
|
---|
1188 |
|
---|
1189 |
|
---|
1190 | File: sed.info, Node: Increment a number, Next: Rename files to lower case, Prev: Centering lines, Up: Examples
|
---|
1191 |
|
---|
1192 | Increment a Number
|
---|
1193 | ==================
|
---|
1194 |
|
---|
1195 | This script is one of a few that demonstrate how to do arithmetic in
|
---|
1196 | `sed'. This is indeed possible,(1) but must be done manually.
|
---|
1197 |
|
---|
1198 | To increment one number you just add 1 to last digit, replacing it
|
---|
1199 | by the following digit. There is one exception: when the digit is a
|
---|
1200 | nine the previous digits must be also incremented until you don't have
|
---|
1201 | a nine.
|
---|
1202 |
|
---|
1203 | This solution by Bruno Haible is very clever and smart because it
|
---|
1204 | uses a single buffer; if you don't have this limitation, the algorithm
|
---|
1205 | used in *Note Numbering lines: cat -n, is faster. It works by
|
---|
1206 | replacing trailing nines with an underscore, then using multiple `s'
|
---|
1207 | commands to increment the last digit, and then again substituting
|
---|
1208 | underscores with zeros.
|
---|
1209 |
|
---|
1210 | #!/usr/bin/sed -f
|
---|
1211 |
|
---|
1212 | /[^0-9]/ d
|
---|
1213 |
|
---|
1214 | # replace all leading 9s by _ (any other character except digits, could
|
---|
1215 | # be used)
|
---|
1216 | :d
|
---|
1217 | s/9\(_*\)$/_\1/
|
---|
1218 | td
|
---|
1219 |
|
---|
1220 | # incr last digit only. The first line adds a most-significant
|
---|
1221 | # digit of 1 if we have to add a digit.
|
---|
1222 | #
|
---|
1223 | # The `tn' commands are not necessary, but make the thing
|
---|
1224 | # faster
|
---|
1225 |
|
---|
1226 | s/^\(_*\)$/1\1/; tn
|
---|
1227 | s/8\(_*\)$/9\1/; tn
|
---|
1228 | s/7\(_*\)$/8\1/; tn
|
---|
1229 | s/6\(_*\)$/7\1/; tn
|
---|
1230 | s/5\(_*\)$/6\1/; tn
|
---|
1231 | s/4\(_*\)$/5\1/; tn
|
---|
1232 | s/3\(_*\)$/4\1/; tn
|
---|
1233 | s/2\(_*\)$/3\1/; tn
|
---|
1234 | s/1\(_*\)$/2\1/; tn
|
---|
1235 | s/0\(_*\)$/1\1/; tn
|
---|
1236 |
|
---|
1237 | :n
|
---|
1238 | y/_/0/
|
---|
1239 |
|
---|
1240 | ---------- Footnotes ----------
|
---|
1241 |
|
---|
1242 | (1) `sed' guru Greg Ubben wrote an implementation of the `dc' RPN
|
---|
1243 | calculator! It is distributed together with sed.
|
---|
1244 |
|
---|
1245 |
|
---|
1246 | File: sed.info, Node: Rename files to lower case, Next: Print bash environment, Prev: Increment a number, Up: Examples
|
---|
1247 |
|
---|
1248 | Rename Files to Lower Case
|
---|
1249 | ==========================
|
---|
1250 |
|
---|
1251 | This is a pretty strange use of `sed'. We transform text, and
|
---|
1252 | transform it to be shell commands, then just feed them to shell. Don't
|
---|
1253 | worry, even worse hacks are done when using `sed'; I have seen a script
|
---|
1254 | converting the output of `date' into a `bc' program!
|
---|
1255 |
|
---|
1256 | The main body of this is the `sed' script, which remaps the name
|
---|
1257 | from lower to upper (or vice-versa) and even checks out if the remapped
|
---|
1258 | name is the same as the original name. Note how the script is
|
---|
1259 | parameterized using shell variables and proper quoting.
|
---|
1260 |
|
---|
1261 | #! /bin/sh
|
---|
1262 | # rename files to lower/upper case...
|
---|
1263 | #
|
---|
1264 | # usage:
|
---|
1265 | # move-to-lower *
|
---|
1266 | # move-to-upper *
|
---|
1267 | # or
|
---|
1268 | # move-to-lower -R .
|
---|
1269 | # move-to-upper -R .
|
---|
1270 | #
|
---|
1271 |
|
---|
1272 | help()
|
---|
1273 | {
|
---|
1274 | cat << eof
|
---|
1275 | Usage: $0 [-n] [-r] [-h] files...
|
---|
1276 |
|
---|
1277 | -n do nothing, only see what would be done
|
---|
1278 | -R recursive (use find)
|
---|
1279 | -h this message
|
---|
1280 | files files to remap to lower case
|
---|
1281 |
|
---|
1282 | Examples:
|
---|
1283 | $0 -n * (see if everything is ok, then...)
|
---|
1284 | $0 *
|
---|
1285 |
|
---|
1286 | $0 -R .
|
---|
1287 |
|
---|
1288 | eof
|
---|
1289 | }
|
---|
1290 |
|
---|
1291 | apply_cmd='sh'
|
---|
1292 | finder='echo "$@" | tr " " "\n"'
|
---|
1293 | files_only=
|
---|
1294 |
|
---|
1295 | while :
|
---|
1296 | do
|
---|
1297 | case "$1" in
|
---|
1298 | -n) apply_cmd='cat' ;;
|
---|
1299 | -R) finder='find "$@" -type f';;
|
---|
1300 | -h) help ; exit 1 ;;
|
---|
1301 | *) break ;;
|
---|
1302 | esac
|
---|
1303 | shift
|
---|
1304 | done
|
---|
1305 |
|
---|
1306 | if [ -z "$1" ]; then
|
---|
1307 | echo Usage: $0 [-h] [-n] [-r] files...
|
---|
1308 | exit 1
|
---|
1309 | fi
|
---|
1310 |
|
---|
1311 | LOWER='abcdefghijklmnopqrstuvwxyz'
|
---|
1312 | UPPER='ABCDEFGHIJKLMNOPQRSTUVWXYZ'
|
---|
1313 |
|
---|
1314 | case `basename $0` in
|
---|
1315 | *upper*) TO=$UPPER; FROM=$LOWER ;;
|
---|
1316 | *) FROM=$UPPER; TO=$LOWER ;;
|
---|
1317 | esac
|
---|
1318 |
|
---|
1319 | eval $finder | sed -n '
|
---|
1320 |
|
---|
1321 | # remove all trailing slashes
|
---|
1322 | s/\/*$//
|
---|
1323 |
|
---|
1324 | # add ./ if there is no path, only a filename
|
---|
1325 | /\//! s/^/.\//
|
---|
1326 |
|
---|
1327 | # save path+filename
|
---|
1328 | h
|
---|
1329 |
|
---|
1330 | # remove path
|
---|
1331 | s/.*\///
|
---|
1332 |
|
---|
1333 | # do conversion only on filename
|
---|
1334 | y/'$FROM'/'$TO'/
|
---|
1335 |
|
---|
1336 | # now line contains original path+file, while
|
---|
1337 | # hold space contains the new filename
|
---|
1338 | x
|
---|
1339 |
|
---|
1340 | # add converted file name to line, which now contains
|
---|
1341 | # path/file-name\nconverted-file-name
|
---|
1342 | G
|
---|
1343 |
|
---|
1344 | # check if converted file name is equal to original file name,
|
---|
1345 | # if it is, do not print nothing
|
---|
1346 | /^.*\/\(.*\)\n\1/b
|
---|
1347 |
|
---|
1348 | # now, transform path/fromfile\n, into
|
---|
1349 | # mv path/fromfile path/tofile and print it
|
---|
1350 | s/^\(.*\/\)\(.*\)\n\(.*\)$/mv "\1\2" "\1\3"/p
|
---|
1351 |
|
---|
1352 | ' | $apply_cmd
|
---|
1353 |
|
---|