Pipe_(Unix) Pipe_(Unix)

Pipe (Unix) - Definition

In UNIX and other Unix-like operating systems, a pipeline is a set of processes chained by their standard streams, so that the output of each process ("stdout") feeds directly as input ("stdin") of the next one. Filter programs are often used in this way. The concept was named by analogy to a physical pipeline.

This feature of UNIX was borrowed by other operating systems, such as Taos and MS-DOS, and eventually became the pipes and filters design pattern of software engineering.

Unix pipelines should not be confused with other data processing pipelines found in modern computer systems, although the general concept is quite similar.

Contents

Creating pipelines from the shell

Most Unix shells have a special syntax construct for the creation of pipelines. Typically, one simply writes the filter commands in sequence, separated by the ASCII vertical bar character "|" (which, for this reason, is often called "pipe character" by Unix users). The shell starts the processes and arranges for the necessary connections between their standard streams (including some amount of buffer storage).

Error stream

By default, the standard error streams ("stderr") of the processes are not passed on through the pipe; instead, they are merged and directed to the console. However, many shells have additional syntax for changing this behaviour. In the Bourne shell, for instance, using "|&" instead of "| " signifies that the standard error stream too should be merged with the standard output and fed to the next process.

Example

Below is an example of a pipeline that implements a kind of spell checker for the web resource indicated by a URL [1] (http://www.wordiq.com/definition/Pipeline).

curl http://www.wordiq.com/definition/Pipeline | \
sed 's/[^a-zA-Z ]//g' | \
tr 'A-Z ' 'a-z\n' | \
grep '[a-z]' | \
sort -u | \
comm -23 - /usr/dict/words

Here is an explanation of the pipeline:

  • First the curl program obtains the HTML contents of a web page.
  • The contents of this page are piped through sed, which removes all characters which are not spaces or letters.
  • tr then changes all of the uppercase letters into their corresponding lowercase counterparts, and converts the spaces in the lines of text to newlines.
  • Each 'word' is now on a separate line.
  • grep is used to remove lines of whitespace.
  • sort sorts the list of 'words' into alphabetical order, and removes duplicates.
  • Finally, comm finds which of the words in the list are not in the given dictionary file (in this case, /usr/dict/words).

Creating pipelines by program

Pipelines can be created also under program control.

Implementation

In most Unix-like systems, all processes of a pipeline are started at the same time, with their streams appropriately connected, and managed by the scheduler together with all other processes running on the machine. An important aspect of this, setting Unix pipes apart from other pipe implementations, is the concept buffering: a sending program may produce 1000 bytes per second, and a receiving program may only be able to accept 100 bytes per second, but the data is held in a buffer, or queue, by the operating system so that the receiving program need not worry about dropping data on account of it being too busy to receive it. Buffers also collect data from their senders as soon as it is made available, so that a sender need not finish its job, or exit, before a receiver can start its work on the product.

Other implementations of pipes have provided pipe-like functionality without buffering. Under MS-DOS, for example, memory limitations made buffering impractical, so when pipes were executed, a sending process would write its entire output to a file, and then ouput this file to the receiving process, in effect using the disk as a pseudo-buffer. This provided similar functionality while keeping usage of RAM to a minimum, but required processes to complete their work before handing their ouput off to the receiver.

Tools like netcat can connect pipes to TCP/IP sockets, following the Unix philosophy of "everything is a file".

History

The pipeline concept and the vertical-bar notation was invented by Douglas McIlroy, one of the authors of the early command shells, after he noticed that much of the time they were processing the output of one program as the input to another. The idea was eventually ported to other operating systems, such as DOS, OS/2, Windows NT, BeOS, and Mac OS X, often with the same notation.

See also


Copyright 2009 WordIQ.com - Privacy Policy  :: Terms of Use  :: Contact Us  :: About Us
This article is licensed under the GNU Free Documentation License. It uses material from the this Wikipedia article.