sandra_henrystocker
Unix Dweeb

How to work with substrings on Linux

How-To
Aug 20, 20247 mins
Linux

The awk, cut, grep, expr, sed and xargs commands provide many useful options for manipulating text.

Loom, threading, textiles
Credit: Afonasey/Shutterstock

There are quite a few ways to extract words and substrings from lines of text on a Linux system, replace them with other strings, select delimiters, and even get rid of white space at the beginning and end of lines. These techniques can be extremely useful when you’re building scripts that may be used to process large amounts of data, cleaning up data files, or simply trying to grab a string for use in a subsequent command. This post describes many different commands that can make these tasks easier than you might imagine.

Full words or portions of strings

One important factor to keep in mind is whether you are trying to extract a complete word or a sequence of characters by position. The commands that you use will depend on what you want to grab or modify.

Full words and substrings

One of the easiest ways to grab a word from a segment of text based on its position (e.g., 3rd word) is to use the awk command. For example, to pull the third word from a phrase, you could use an awk command like this one:

$ echo "Focus on Peace" | awk '{print $3}'
Peace

The $3 represents the third word in the phrase since the space character is the default delimiter.

To do the same thing using the cut command, you would need to include the delimiter with the -d option in a command like this where -f3 represents the third word:

$ echo "Focus on Peace" | cut -d' ' -f3
Peace

You can also select multiple fields with the cut command as in these next examples:

$ echo "Focus on Peace on Earth" | cut -d' ' -f3,5
Peace Earth
$ echo "one two three 4 5 6" | cut -d' ' -f1-3,6
one two three 6

To use an alternate delimiter (in this case, a colon), use a command like this:

$ cut -d':' -f1-3,5,6 /etc/passwd | tail -n 5
justme:x:1004:JustMe:/home/justme
lola:x:1006::/home/lola
dumdum:x:1007::/home/dumdum

With awk, you can use more than one delimiter. In the following example, two delimiters are specified, so the awk command accepts either a colon or a blank to separate fields. The first two lines display the file, and the last two lines show the command and result.

$ cat file
Monday:1 Tuesday:2 Wednesday:3 Thursday:4 Friday:5
$ awk -F'[: ]' '{OFS=" ";print $1,$3,$4}' file
Monday Tuesday 2

Selecting substrings

To select an arbitrary sequence or characters from a string, you can use an awk command like the one below in which the $0 represents the entire phrase, 10 represents the first character position to be grabbed and 5 is the length of the string to be displayed.

$ echo "Focus on Peace" | awk '{print substr($0,10,5)}'
Peace

To do the same kind of thing with the cut command, you would use a command like this in which the 13th through 22nd characters are extracted from the phrase and displayed.

$ echo "Linux is an impressive OS" | cut -c 13-22
impressive

In this next command, the cut command displays the 7th-12th characters from the lines in a file. The head command simply limits the display to the first 4 lines of output.

$ cut -c 7-12 sayings | head -4
with 3
and ov
nd be
and be

Using grep

You can use the grep command to select multiple words from a file. In this example, only the selected words are displayed, not the entire lines. This is because the -o (display only the matched items) option is being used.

$ cat sayings | grep -o "quarrel\|empty"
quarrel
quarrel
empty

Without the -o option, you would see the complete lines.

$ cat sayings | grep "quarrel\|empty"
They do not quarrel,
So no one quarrels with them.
Is that an empty saying?

You can also select multiple-word phrases as shown in this example:

$ cat sayings | grep -o "never falter\|do not quarrel"
never falter
do not quarrel 

Using expr

The expr command can also be used to select a portion of a phrase by specifying its starting position and length.

$ str="Learning to use Linux is fun"
$ expr substr "$str" 13 9
use Linux

Using sed

The sed command provides a very convenient way to replace words in a string.

$ echo "They never falter" | sed 's/falter/forget/'
They never forget

You can also use this kind of command to replace multiple words as in this example:

$ echo "They never falter" | sed 's/never falter/always forget/'
They always forget

Using xargs

To remove blanks at the beginning and endings of phrases, use the xargs command.

$ echo "  Keep your nose to the grindstone " | xargs
Keep your nose to the grindstone

The xargs command also removes blank lines and tabs. In the example below, a file containing two lines with only tabs and blanks and one starting with four blanks and ending with a saying is cut down to just the saying.

$ cat eg


           Keep your nose to the grindstone
$ cat eg | xargs
Keep your nose to the grindstone

Using bash parameter expansion

When using bash parameter expansion, you can specify the starting and ending positions for the text that you want to extract. For example, you can create a variable by assigning it a value and then use syntax like that shown below to select a portion of it.

$ string="Happy days are here again"
$ echo ${string:1:10}
appy days
$ echo ${string:0:9}
Happy days

Note that the example above makes it clear that this technique starts position numbering at 0. So, in the next example, the 7 represents the eighth character in the string and the -2 means to drop the last 2 characters. As a result, the substring in the first example below has a single character and the second has all but the last two.

$ string="1234567890"
$ echo ${string:7:-2}
8
$ echo ${string:0:-2}
12345678

In this next example, we first create a variable using “set –” and then use echo to display the eighth and ninth characters. In other words, it starts with the eighth character (7) and then displays two characters.

$ set -- 01234567890abcdef
$ echo ${1:7:2}
78

NOTE: You could display the string created with the set command by simply using the command “echo $1”. This is what is referenced by the “1” in the example above.

$ set -- 01234567890abcdef
$ echo $1
01234567890abcdef

Wrap-up

Linux provides many commands to help you manipulate text. The awk, cut, grep, expr, sed and xargs commands along with bash parameter expansion provide you with many useful options.

sandra_henrystocker

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.

More from this author

Exit mobile version