Pipes

Published

2024-08-28

Caution

This section is being revised. Thank you for your patience.

The Unix pipe, denoted by the vertical bar |, is a powerful feature of Unix and Unix-like operating systems that allows the output of one command (stdout) to be used as the input to another (stdin). This capability forms the basis of the Unix philosophy of building small, modular utilities that do one thing well and connecting them together to perform complex tasks.

Fundamental Concept

The pipe is placed between two commands and directs the standard output (stdout) of the command to the left of the pipe to the standard input (stdin) of the command to the right.


  • stdin (standard input) is a text stream from which a command reads its input. By default, it’s the keyboard, but it can be redirected to read from a file or another command’s output.

  • stdout (standard output) is a text stream where a command writes its output. Typically, this is the terminal screen, but it can be redirected to a file or another command’s input.

Example

echo "Hello, World!" | wc -w sends the output of the echo command to wc, which then counts the words.

echo "Hello, World!" | wc -w
#        2

The output is 2, indicating there are two words in “Hello, World!”.

Combining Pipes

Commands can be chained together using multiple pipes, allowing for the creation of command pipelines where data is processed in stages.

Example

psaux |grep httpd lists all processes, filters those containing “httpd” (HTTPD = web server processes running):

ps aux | grep httpd
# mjfrigaard       23601   0.0  0.0 33597016    632   ??  S     6:44AM   0:00.00 grep httpd
# mjfrigaard       23599   0.0  0.0 33599596    928   ??  S     6:44AM   0:00.01 bash -c ps aux | grep httpd
# mjfrigaard       23598   0.0  0.0 33598572    932   ??  S     6:44AM   0:00.01 sh -c 'bash'  -c 'ps aux | grep httpd' 2>&1

Example

wc-l counts the number of lines:

ps aux | grep httpd | wc -l
#        3

Filtering and Processing

Example 1

catdata/roxanne.txt |grep"night" displays lines from data/roxanne.txt that contain the number "2".

cat data/raw/roxanne.txt | grep "night"
# You don't have to sell your body to the night
# You don't have to wear that dress tonight

Here, cat outputs the file’s contents, which grep filters.

Example 2

ls-l data/raw |sort-r lists the files in data/raw in a detailed format, then sorts them in reverse order.

ls -l data/raw | sort -r
# total 128
# -rw-r--r--@ 1 mjfrigaard  staff  12531 Apr 13 20:39 ajperlis_epigrams.txt
# -rw-r--r--@ 1 mjfrigaard  staff  12057 Apr 22 14:11 pwrds.csv
# -rw-r--r--@ 1 mjfrigaard  staff   6452 May 15 21:48 music_vids.csv
# -rw-r--r--@ 1 mjfrigaard  staff   4828 May 15 21:49 vg_hof.csv
# -rw-r--r--@ 1 mjfrigaard  staff   4462 May 15 21:48 trees.csv
# -rw-r--r--@ 1 mjfrigaard  staff   1315 Apr  6 05:38 roxanne.txt
# -rw-r--r--@ 1 mjfrigaard  staff    381 May 15 21:55 who_tb_data.csv
# -rw-r--r--@ 1 mjfrigaard  staff    263 Apr 10 09:34 wu_tang.csv

It showcases how to reverse the listing of directory contents.

Transformation and Reduction

Example

find. -type f |xargsdu -sh |sort-h finds files (-type f) in the current directory and subdirectories, calculates their sizes (du -sh), and sorts them by size (sort -h):

find data -type f | xargs du -sh | sort -h
# 4.0K  data/raw/roxanne.txt
# 4.0K  data/raw/who_tb_data.csv
# 4.0K  data/raw/wu_tang.csv
# 4.0K  data/wu_tang.psv
# 4.0K  data/wu_tang.tsv
# 8.0K  data/music_vids.tsv
# 8.0K  data/raw/music_vids.csv
# 8.0K  data/raw/trees.csv
# 8.0K  data/raw/vg_hof.csv
# 8.0K  data/trees.tsv
# 8.0K  data/vg_hof.tsv
#  12K  data/README.md
#  12K  data/pwrds.tsv
#  12K  data/raw/pwrds.csv
#  16K  data/raw/ajperlis_epigrams.txt

This pipeline not only identifies files but also sorts them by their disk usage, illustrating a complex operation made simple through pipes.

Real-time Streaming and Monitoring

Example

cat /var/log/system.log | grep DEAD_PROCESS prints the system.log file, continuously monitoring for new entries, filters for those containing DEAD_PROCESS, then counts the number of lines:1

cat /var/log/system.log | grep "DEAD_PROCESS" 
## Apr 10 06:35:23 Users-MacBook-Pro login[3596]: DEAD_PROCESS: 3596 ttys000
## Apr 10 06:35:25 Users-MacBook-Pro sessionlogoutd[19895]: DEAD_PROCESS: 225 console
## Apr 10 10:20:25 Users-MacBook-Pro login[715]: DEAD_PROCESS: 715 ttys000

Data Manipulation

Example

cut -d':' -f1 data/roxanne.txt | sort | uniq extracts the first field from each line in data/roxanne.txt, sorts the contents alphabetically, and removes duplicates.

cut -d':' -f1 data/raw/roxanne.txt | sort | uniq
# I have to tell you just how I feel
# I know my mind is made up
# I loved you since I knew you
# I won't share you with another boy
# I wouldn't talk down to you
# It's a bad way
# Ro...
# Roxanne
# Roxanne (Put on the red light)
# Roxanne (You don't have to put on the red light)
# So put away your make up
# Those days are over
# Told you once I won't tell you again
# Walk the streets for money
# You don't care if it's wrong or if it's right
# You don't have to put on the red light
# You don't have to sell your body to the night
# You don't have to wear that dress tonight

This sequence is an example of performing data extraction and deduplication.

Pipes with Loops

The example below demonstrates how to use the while loop with pipes with find, echo, grep, and wc.

Filter with find

find data -name "*.tsv" starts in the data directory, looking for all files that end with the .tsv extension. The search is recursive, meaning it includes all subdirectories of data as well. Produces a list of paths to .tsv files, each path on a new line. This list is piped to the next command.

find data -name "*.tsv" 
# data/pwrds.tsv
# data/music_vids.tsv
# data/vg_hof.tsv
# data/trees.tsv
# data/wu_tang.tsv

Iterate with while and do

| while read fname; do: The pipe (|) feeds the output from the find command into a while loop, which reads each line (file name) into the variable fname, one at a time. For each iteration of the loop (i.e., for each file name read into fname), the commands within the do ... done block are executed.

find data -name "*.tsv" | while read fname; do
  # do this!
done

Search with grep

grep "RZA" "$fname": Searches for a specific pattern within the file. grep looks through the contents of the file (whose path is in fname) for lines containing the string “RZA”. Only the lines that match this pattern are printed to stdout, which is then piped to wc.

find data -name "*.tsv" | while read fname; do
  echo -n "$fname: "
  grep "RZA" "$fname"
done
# data/pwrds.tsv: data/music_vids.tsv: data/vg_hof.tsv: data/trees.tsv: data/wu_tang.tsv: RZA   Robert Diggs

Count with wc

wc: For each file processed by the loop, wc outputs three numbers: the line count, word count, and character/byte count of the lines that grep found to contain “RZA”. Since no specific option is given to wc, it defaults to displaying all three counts.

find data -name "*.tsv" | while read fname; do
  echo -n "$fname: "
  grep "RZA" "$fname" | wc 
done
# data/pwrds.tsv:        0       0       0
# data/music_vids.tsv:        0       0       0
# data/vg_hof.tsv:        0       0       0
# data/trees.tsv:        0       0       0
# data/wu_tang.tsv:        1       3      17

This Bash command sequence combines find, a while loop, echo, grep, and wc to search through .tsv (Tab-Separated Values) files for lines containing a specific pattern (“RZA”) and reports the count of lines, words, and characters for each occurrence. Combining pipelines with loops is an efficient way to sift through a potentially large set of files within a directory, facilitating a detailed aggregation of specified conditions across multiple files.

Efficiency and Performance

While pipes are incredibly powerful, their use can impact performance, especially when processing large amounts of data. Each pipe involves creating a new subprocess, and data is copied between processes, which can lead to overhead.

Error Handling

Error handling in pipes can be non-trivial, as each command in a pipeline executes independently. Users need to consider how each command handles errors and ensure that the pipeline as a whole behaves as expected even when errors occur.

Recap

Pipes (|) allow the output of one command (stdout) to be used as the input (stdin) to another, enabling the chaining of commands to perform complex tasks with the output of one serving as the input for the next. Unix pipes embody the concept of composability in Unix, enabling users to build complex workflows out of simple, single-purpose programs. They are a testament to the flexibility and power of the Unix command line, facilitating a wide range of tasks from simple text processing to sophisticated data analysis and system monitoring.

This framework of commands, arguments, options, and the interplay of input (stdin), output (stdout) , and pipes enables sophisticated data processing and manipulation directly from the terminal.

See a typo, error, or something missing?

Please open an issue on GitHub.


  1. tail -f /var/log/syslog | grep sshd is useful for real-time monitoring of SSH daemon logs.↩︎