30 April 2021

Unix shells use certain characters as wildcards. These so-called globbing characters are similar to but different from regular expressions. Globbing is used purely for file name expansion, while regular expressions have a much broader scope. You can use globbing with any utility that is used to work with files, such as ls, find, mv and rm.

The asterisk and question mark

An asterisk matches zero or more characters. For instance, if you want to list all .php files in a directory then you can run ls *php:

$ ls -1 *php
config.php
feed.php
index.php
install.php

The question mark matches a single character. This is obviously less common, but it can come in handy:

$ ls -1 backup_?.zip
backup_1.zip
backup_2.zip
backup_3.zip
backup_4.zip

Ranges

You can define a range in square brackets. For instance, let’s imaging that you have a directory with four backup files and that you want to delete all but the most recent file:

$ ls -1 2020111*
20201110_2145_backup.zip
20201111_2145_backup.zip
20201112_2145_backup.zip
20201113_2145_backup.zip

You can use the range 2020111[0-2] to match the first three files:

$ rm -f 2020111[0-2]*

$ ls -1 2020111*
20201113_2145_backup.zip

The command rm -f 2020111[0-2]* matches the first three files, but not the file that starts with 20201113. We are therefore left with just the most recent backup.

Ranges can be negated using an exclamation mark inside the square brackets. For instance rm -f 2020111[!0-2]* removes only files that don’t match the pattern (in this case the file 20201113_2145_backup.zip. And, a range doesn’t have to be numeric – you can match a range of letters as well. For instance, the range [a-c] matches a, b and c.

Classes

There are a few special character classes you can use. The classes are mainly useful to avoid “ugly” ranges. For instance, you can use [a-zA-Z] to match any alphabetical character in either lower or upper case. A more elegant way to archive the same result is to use the [[:alpha:]] class instead.

Classes are always a keyword inside double square brackets and colons. The table below shows the most common ones.

ClassMatchesEquivalent
[[:alpha:]]Alphabetical characters[a-zA-Z]
[[:alnum:]]Alphabetical characters and integers[a-zA-Z0-9]
[[:blank:]]Space or tab characters[ \t]
[[:digit:]]Integers[0-9]
[[:lower:]]Lower case alphabetical characters[a-z]
[[:upper:]]Upper case alphabetical characters[A-Z]

Globbing vs regular expressions

It is worth noting that the asterisk and question mark have a different meaning in regular expressions. In a “regex” the asterisk matches one or more instances of the preceding character, and the question mark matches zero or one instance of the preceding character. The dot character (.) is used to match one instance of any character, which can then be combined with an asterisk (.*) to match any number of any characters (including zero characters).