Linux Text Processing: grep, awk, sed One-Liners (2026)

The most surprising thing about grep, awk, and sed is how much raw power they pack into single-line commands, often eliminating the need for full scripting languages for common text manipulation tasks.

Let’s see them in action. Imagine you have a log file, app.log, with entries like this:

2023-10-27 10:05:15 INFO User 'alice' logged in from 192.168.1.100
2023-10-27 10:06:01 WARN Disk usage at 85%
2023-10-27 10:07:33 INFO User 'bob' accessed resource '/data/report.csv'
2023-10-27 10:08:00 ERROR Database connection failed: timeout
2023-10-27 10:09:10 INFO User 'alice' uploaded file 'image.jpg'

`grep`: The Filter

grep is your go-to for finding lines that match a specific pattern.

Problem: You want to see all the INFO messages.

Command:

grep "INFO" app.log

Output:

2023-10-27 10:05:15 INFO User 'alice' logged in from 192.168.1.100
2023-10-27 10:07:33 INFO User 'bob' accessed resource '/data/report.csv'
2023-10-27 10:09:10 INFO User 'alice' uploaded file 'image.jpg'

Problem: You want to find lines containing a specific username, say 'alice'.

Command:

grep "alice" app.log

Output:

2023-10-27 10:05:15 INFO User 'alice' logged in from 192.168.1.100
2023-10-27 10:09:10 INFO User 'alice' uploaded file 'image.jpg'

grep uses regular expressions, making it incredibly flexible. For instance, to find lines with either INFO or WARN messages:

Command:

grep -E "INFO|WARN" app.log

Output:

2023-10-27 10:05:15 INFO User 'alice' logged in from 192.168.1.100
2023-10-27 10:06:01 WARN Disk usage at 85%
2023-10-27 10:07:33 INFO User 'bob' accessed resource '/data/report.csv'
2023-10-27 10:09:10 INFO User 'alice' uploaded file 'image.jpg'

The -E flag enables extended regular expressions, allowing for the | (OR) operator.

`sed`: The Stream Editor

sed is for performing transformations on a stream of text, most commonly substitution.

Problem: You want to replace all occurrences of 'alice' with 'charlie'.

Command:

sed 's/alice/charlie/g' app.log

Output:

2023-10-27 10:05:15 INFO User 'charlie' logged in from 192.168.1.100
2023-10-27 10:06:01 WARN Disk usage at 85%
2023-10-27 10:07:33 INFO User 'bob' accessed resource '/data/report.csv'
2023-10-27 10:08:00 ERROR Database connection failed: timeout
2023-10-27 10:09:10 INFO User 'charlie' uploaded file 'image.jpg'

The s/old/new/g command means "substitute (s) all occurrences (g) of old with new."

Problem: You want to remove all lines containing WARN messages.

Command:

sed '/WARN/d' app.log

Output:

2023-10-27 10:05:15 INFO User 'alice' logged in from 192.168.1.100
2023-10-27 10:07:33 INFO User 'bob' accessed resource '/data/report.csv'
2023-10-27 10:08:00 ERROR Database connection failed: timeout
2023-10-27 10:09:10 INFO User 'alice' uploaded file 'image.jpg'

The /pattern/d command means "delete (d) lines matching pattern."

`awk`: The Pattern Scanner and Processor

awk is a powerful tool for processing text line by line, especially when dealing with structured data like columns. It works by splitting each line into fields, separated by whitespace by default.

Problem: You want to extract just the timestamp and the log level from each line.

Command:

awk '{print $1, $2, $3}' app.log

Output:

2023-10-27 10:05:15 INFO
2023-10-27 10:06:01 WARN
2023-10-27 10:07:33 INFO
2023-10-27 10:08:00 ERROR
2023-10-27 10:09:10 INFO

Here, $1, $2, and $3 refer to the first, second, and third fields on each line.

Problem: You want to list only the usernames from the INFO messages.

Command:

awk '/INFO/ {print $4}' app.log

Output:

User
User
User

This isn’t quite right. The username is part of a larger string. Let’s refine:

Command:

awk '/INFO/ {gsub(/'\''/, "", $4); print $4}' app.log

Output:

alice
bob
alice

This awk command first filters for lines containing "INFO" (/INFO/). Then, for those lines, it uses gsub (global substitution) to remove the single quotes from the 4th field ($4) before printing it. The '\'' is a way to represent a literal single quote within the awk script.

Problem: You want to count how many times each log level appears.

Command:

awk '{count[$3]++} END {for (level in count) print level, count[level]}' app.log

Output:

INFO 3
WARN 1
ERROR 1

This awk script builds an associative array called count. For every line, it increments the counter for the log level found in the 3rd field ($3). The END block executes after all lines are processed, iterating through the count array and printing each log level and its total count.

These tools are often chained together using pipes (|) to perform complex operations. For instance, to find the IP addresses from INFO messages related to user logins:

Command:

grep "User .* logged in" app.log | awk '{print $NF}'

Output:

192.168.1.100

Here, grep finds the specific log lines, and then awk '{print $NF}' prints the last field ($NF), which is the IP address in this case.

The real power of these tools lies in their ability to parse and manipulate text data efficiently, making them indispensable for system administration, log analysis, and general command-line text processing.

When you start dealing with complex, nested data structures or need to maintain state across many operations, you’ll likely find yourself reaching for more robust scripting languages like Python or Perl.

grep: The Filter

sed: The Stream Editor

awk: The Pattern Scanner and Processor

`grep`: The Filter

`sed`: The Stream Editor

`awk`: The Pattern Scanner and Processor