echo "Hello, World!" | wc -w
# 2
Pipes
The Unix pipe, denoted by the vertical bar |
, is a powerful feature of Unix and Unix-like operating systems that allows the output of one command (stdout
) to be used as the input to another (stdin
). This capability forms the basis of the Unix philosophy of building small, modular utilities that do one thing well and connecting them together to perform complex tasks.
Fundamental Concept
The pipe is placed between two commands and directs the standard output (stdout
) of the command to the left of the pipe to the standard input (stdin
) of the command to the right.
Example
echo "Hello, World!" | wc -w
sends the output of the echo
command to wc
, which then counts the words.
The output is 2
, indicating there are two words in “Hello, World!”.
Combining Pipes
Commands can be chained together using multiple pipes, allowing for the creation of command pipelines where data is processed in stages.
Example
ps
aux |
grep
httpd
lists all processes, filters those containing “httpd” (HTTPD = web server processes running):
ps aux | grep httpd
# mjfrigaard 23601 0.0 0.0 33597016 632 ?? S 6:44AM 0:00.00 grep httpd
# mjfrigaard 23599 0.0 0.0 33599596 928 ?? S 6:44AM 0:00.01 bash -c ps aux | grep httpd
# mjfrigaard 23598 0.0 0.0 33598572 932 ?? S 6:44AM 0:00.01 sh -c 'bash' -c 'ps aux | grep httpd' 2>&1
Example
wc
-l
counts the number of lines:
ps aux | grep httpd | wc -l
# 3
Filtering and Processing
Example 1
cat
data/roxanne.txt |
grep
"night"
displays lines from data/roxanne.txt
that contain the number "2"
.
cat data/raw/roxanne.txt | grep "night"
# You don't have to sell your body to the night
# You don't have to wear that dress tonight
Here, cat
outputs the file’s contents, which grep
filters.
Example 2
ls
-l data/raw |
sort
-r
lists the files in data/raw
in a detailed format, then sorts them in reverse order.
ls -l data/raw | sort -r
# total 128
# -rw-r--r--@ 1 mjfrigaard staff 12531 Apr 13 20:39 ajperlis_epigrams.txt
# -rw-r--r--@ 1 mjfrigaard staff 12057 Apr 22 14:11 pwrds.csv
# -rw-r--r--@ 1 mjfrigaard staff 6452 May 15 21:48 music_vids.csv
# -rw-r--r--@ 1 mjfrigaard staff 4828 May 15 21:49 vg_hof.csv
# -rw-r--r--@ 1 mjfrigaard staff 4462 May 15 21:48 trees.csv
# -rw-r--r--@ 1 mjfrigaard staff 1315 Apr 6 05:38 roxanne.txt
# -rw-r--r--@ 1 mjfrigaard staff 381 May 15 21:55 who_tb_data.csv
# -rw-r--r--@ 1 mjfrigaard staff 263 Apr 10 09:34 wu_tang.csv
It showcases how to reverse the listing of directory contents.
Transformation and Reduction
Example
find
. -type f |
xargs
du -sh |
sort
-h
finds files (-type f
) in the current directory and subdirectories, calculates their sizes (du -sh
), and sorts them by size (sort -h
):
find data -type f | xargs du -sh | sort -h
# 4.0K data/raw/roxanne.txt
# 4.0K data/raw/who_tb_data.csv
# 4.0K data/raw/wu_tang.csv
# 4.0K data/wu_tang.psv
# 4.0K data/wu_tang.tsv
# 8.0K data/music_vids.tsv
# 8.0K data/raw/music_vids.csv
# 8.0K data/raw/trees.csv
# 8.0K data/raw/vg_hof.csv
# 8.0K data/trees.tsv
# 8.0K data/vg_hof.tsv
# 12K data/README.md
# 12K data/pwrds.tsv
# 12K data/raw/pwrds.csv
# 16K data/raw/ajperlis_epigrams.txt
This pipeline not only identifies files but also sorts them by their disk usage, illustrating a complex operation made simple through pipes.
Real-time Streaming and Monitoring
Example
cat /var/log/system.log | grep DEAD_PROCESS
prints the system.log
file, continuously monitoring for new entries, filters for those containing DEAD_PROCESS
, then counts the number of lines:1
cat /var/log/system.log | grep "DEAD_PROCESS"
## Apr 10 06:35:23 Users-MacBook-Pro login[3596]: DEAD_PROCESS: 3596 ttys000
## Apr 10 06:35:25 Users-MacBook-Pro sessionlogoutd[19895]: DEAD_PROCESS: 225 console
## Apr 10 10:20:25 Users-MacBook-Pro login[715]: DEAD_PROCESS: 715 ttys000
Data Manipulation
Example
cut -d':' -f1 data/roxanne.txt | sort | uniq
extracts the first field from each line in data/roxanne.txt
, sorts the contents alphabetically, and removes duplicates.
cut -d':' -f1 data/raw/roxanne.txt | sort | uniq
# I have to tell you just how I feel
# I know my mind is made up
# I loved you since I knew you
# I won't share you with another boy
# I wouldn't talk down to you
# It's a bad way
# Ro...
# Roxanne
# Roxanne (Put on the red light)
# Roxanne (You don't have to put on the red light)
# So put away your make up
# Those days are over
# Told you once I won't tell you again
# Walk the streets for money
# You don't care if it's wrong or if it's right
# You don't have to put on the red light
# You don't have to sell your body to the night
# You don't have to wear that dress tonight
This sequence is an example of performing data extraction and deduplication.
Pipes with Loops
The example below demonstrates how to use the while
loop with pipes with find
, echo
, grep
, and wc
.
Filter with find
find data -name "*.tsv"
starts in the data
directory, looking for all files that end with the .tsv
extension. The search is recursive, meaning it includes all subdirectories of data
as well. Produces a list of paths to .tsv
files, each path on a new line. This list is piped to the next command.
find data -name "*.tsv"
# data/pwrds.tsv
# data/music_vids.tsv
# data/vg_hof.tsv
# data/trees.tsv
# data/wu_tang.tsv
Iterate with while
and do
| while read fname; do
: The pipe (|
) feeds the output from the find
command into a while
loop, which reads each line (file name) into the variable fname
, one at a time. For each iteration of the loop (i.e., for each file name read into fname
), the commands within the do ... done
block are executed.
find data -name "*.tsv" | while read fname; do
# do this!
done
Print with echo
echo -n "$fname: "
: Prints the current file’s name being processed. echo
-n
outputs the value of fname
(the path to the current .tsv
file) followed by a colon and a space, without adding a newline at the end. This means the count returned by wc
will be printed on the same line, right after the file name.
find data -name "*.tsv" | while read fname; do
echo -n "$fname: "
done
# data/pwrds.tsv: data/music_vids.tsv: data/vg_hof.tsv: data/trees.tsv: data/wu_tang.tsv:
Search with grep
grep "RZA" "$fname"
: Searches for a specific pattern within the file. grep
looks through the contents of the file (whose path is in fname
) for lines containing the string “RZA”. Only the lines that match this pattern are printed to stdout
, which is then piped to wc
.
find data -name "*.tsv" | while read fname; do
echo -n "$fname: "
grep "RZA" "$fname"
done
# data/pwrds.tsv: data/music_vids.tsv: data/vg_hof.tsv: data/trees.tsv: data/wu_tang.tsv: RZA Robert Diggs
Count with wc
wc
: For each file processed by the loop, wc
outputs three numbers: the line count, word count, and character/byte count of the lines that grep
found to contain “RZA”. Since no specific option is given to wc
, it defaults to displaying all three counts.
find data -name "*.tsv" | while read fname; do
echo -n "$fname: "
grep "RZA" "$fname" | wc
done
# data/pwrds.tsv: 0 0 0
# data/music_vids.tsv: 0 0 0
# data/vg_hof.tsv: 0 0 0
# data/trees.tsv: 0 0 0
# data/wu_tang.tsv: 1 3 17
This Bash command sequence combines find
, a while
loop, echo
, grep
, and wc
to search through .tsv
(Tab-Separated Values) files for lines containing a specific pattern (“RZA”) and reports the count of lines, words, and characters for each occurrence. Combining pipelines with loops is an efficient way to sift through a potentially large set of files within a directory, facilitating a detailed aggregation of specified conditions across multiple files.
Recap
Pipes (|
) allow the output of one command (stdout
) to be used as the input (stdin
) to another, enabling the chaining of commands to perform complex tasks with the output of one serving as the input for the next. Unix pipes embody the concept of composability in Unix, enabling users to build complex workflows out of simple, single-purpose programs. They are a testament to the flexibility and power of the Unix command line, facilitating a wide range of tasks from simple text processing to sophisticated data analysis and system monitoring.
This framework of commands, arguments, options, and the interplay of input (stdin
), output (stdout
) , and pipes enables sophisticated data processing and manipulation directly from the terminal.
tail -f /var/log/syslog | grep sshd
is useful for real-time monitoring of SSH daemon logs.↩︎