Previously, we worked with linux commands cat and grep to search through text files for specific strings. Today, we will talk about combining cat and grep with a new command, awk, that will enable you to do much more with text files, log files, and CSVs (or any text-based spreadsheet).
AWK is actually a programming language specifically designed for processing text. We’ll barely scratch the surface of all of awk’s capabilities, but I’ll show you some ways it can help you in a development environment.
Let’s start with a sample web server access log:
We learned how to pull only lines that contain certain strings (an IP address, for example) using grep in the previous blog post, but what if we only want a certain piece of information on the line, such as the date? We can do that using awk!
Which returns:
What we’ve done is told awk to print the 4th column of text. By default, awk separates columns by spaces. You can also print multiple columns at once. Let’s get the IP address and the page visited.
Which returns:
Neat! We’ve printed the 4th and 6th columns. I’ve separated the columns with a tab using the \t in double quotes, but it isn’t necessary.
Now, what if we just needed the pages visited by a specific IP address? We can combine cat, grep, and awk to customize our output.
Which returns:
We’re displaying the log with cat, piping it to grep to search for our IP, and the piping it to awk to get the desired columns of text. Cool, huh?
You can also separate columns of text by characters other than spaces. This need commonly arises working with comma separated files like CSVs. For example:
Let’s get a list of only the street addresses from this file.
Which returns:
Using the -F flag, we’ve told awk to treat the commas as the column separators.
Finally, let’s separate the FirstName LastName column into two separate columns and put it in a new file.
There is much more you can do with awk, but hopefully this has shown you some of the possibilities of this powerful command.