meanings of Preprocessor encyclopedia of Preprocessor dictionary of Preprocessor thesaurus on Preprocessor books about Preprocessor dreams about Preprocessor
 Preprocessor - Definition 

A preprocessor is a program that takes text and performs lexical conversions on it. The conversions may include macro substitution, conditional inclusion, and inclusion of other files.

The C programming language has a preprocessor that performs the following transformations:

  1. Replaces trigraphs with equivalents.
  2. Concatenates source lines.
  3. Replaces comments with whitespace.
  4. Reacts to lines starting with an octothorp (#), performing macro substitution, file inclusion, conditional inclusion, and other transformations.

The use of preprocessors has been getting less common as recent languages provide more abstract features rather than lexical-oriented ones. Indeed, the overuse of the proprecessor might yield quite chaotic code. In designing a new language based on C, Stroustrup introduced features such as inline and templates into C++ in an attempt to make the C preprocessor less relevant. Nevertheless, there is an abundance of installed C code which relies on the preprocessor.

New languages proposed recently have little or no preprocessor ability. Java has no preprocessor. D, designed as a replacement of C and C++, supports features such as imports, nested functions, versioning, debug statements, etc. that help make it practical to eliminate the preprocessor entirely.

Other preprocessors include m4 and Oracle Pro*C. The m4 preprocessor is general-purpose; Oracle Pro*C converts embedded PL/SQL into C.

Preprocessing can be quite cumbersome in incremental parsing or incremental lexical analysis because changes to preprocessing rules can affect the entire text to be preprocessed.

Most C compilers have a flag which allows generation of post-processed code so that static analysis can be performed on the output if desired.

C Examples

This section goes into some detail about C preprocessor usage. Good programming practice when writing C macros is crucial, particularly in a collaborative setting, so notes on this have been included. Of course, it is possible to abuse these features, but this is not recommended in a production environment.

The most common use of the preprocessor is to include another file:

#include <stdio.h>

int main (void)
{
    printf("Hello, world!\n");
    return 0;
}

The preprocessor replaces the line #include <stdio.h> with the system header file of that name, which facilitates use of the printf() function.

More precisely, the entire text of the file 'stdio.h' is inserted into the file at that point.

This can also be written using double quotes, e.g. #include "stdio.h". The angle brackets were originally used to indicate 'system' include files, and double quotes user-written include files, and it is good practice to retain this distinction. C compilers and programming environments all have a facility which allows the programmer to define where include files can be found. This can be introduced through a command line flag, which can be parameterized using a makefile, so that a different set of include files can be swapped in for different operating systems, for instance.

It is good programming practice to use the compiler parameter to define the include file paths. It is not good programming practice to used relative or absolute file names in #includes. If your source is copied to another system with a different directory structure, this could 'break' your code, requiring numerous edits to get it to compile on the new system.

Conventionally, include files are given a .h extension, and the files they are included in are given the .c extension. However, there is no particular requirement that this be observed. Occasionally one will see files with other extensions included in a .c file, including other .c files.

The #ifdef, #ifndef, #else, #elif and #endif directives can be used for conditional compilation.

#define __WINDOWS__

#ifdef __WINDOWS__
#include <windows.h>
#else
#include <unistd.h>
#endif

The first line defines a symbol __WINDOWS__. This could also be introduced from a compiler command line parameter, so that the program could be parameterized in a makefile.

Subsequently, if __WINDOWS__ is defined, the file <windows.h> is include, otherwise <unistd.h>.

Note that when a #define takes one argument, the symbol is regarded as implicitly 'true' for the sake of the macro preprocessor. In some preprocessors, it is assigned the value '1'.

A #define can take two arguments, in which case, the second argument is textually substituted for the first. This is conventionally used as part of good programming practice to create symbolic names for constants, e.g.

#define PI 3.14159

instead of hard-coding those numbers throughout one's code.

A #define can be used to create a function-style macro:

#define RADTODEG(x) ((double)((double)(x)*((double)57.295736)))

This defines a radians to degrees conversion which can be written subsequently, e.g. RADTODEG(34). This is expanded inline, so the caller does not need to litter copies of the multiplication constant all over his code. In C++ it is possible to use the inline keyword to indicate to the compiler that a function be expanded in this fashion, but this is simply a 'suggestion' to the compiler; the compiler has the prerogative to ignore this if it exceeds a certain level of complexity.

Note that C compiler is required to perform the math at compile time if all of the arguments to the macrofunction expand into integers. It will not do so if any of the arguments is a variable.

The macro here is written as all uppercase to emphasize that it is a macro, not a compiled function.

One of the most subtle and easy to abuse features of the C macropreprocessor is string concatenation. This is a feature of macrofunctions where two arguments can be 'glued' together using ## preprocessor operator. This allows two strings to be concatenated in the preprocessed code. This can be used to construct elaborate macros which act much like C++ templates (without many of their benefits).

For instance:

#define MYCASE(_id,_item) \
   case _id: \
     _item##_##_id=_id;\
   break 

  switch(x) {
      MYCASE(widget,23);
  }

The line MYCASE(widget,23) gets expanded here into case 23: widget_23=23; break;.

Note that the _ between the ## is 'literal' whereas the _id and _item arguments are 'arguments' to the function-style macro.

One stylistic note about this macro is that the semicolon on the last line of the macro definition is omitted so that the macro looks 'natural' when written. It could be included in the macro definition, but then there would be lines in the code without semicolons at the end which would throw off the casual reader.

The macro can be extended over as many lines as required using a backslash escape at the end of the line. The macro ends on the last line which does not end in a backslash

One drawback of multi-line macros is that comments cannot be written in the macro definition in standard C. Hence line-by-line source documentation can't be written in the body of the macro. However, properly used, multi-line macros can greatly conflate the size and complexity of a C program and enhance its readability and maintainability.

The #error directive inserts an error message into the compiler output.

#error "Gosh!"

This prints Gosh! in the compiler output and halts the computation at that point. This is extremely useful if you aren't sure whether a given line is being compiled or not. It is also useful if you have a heavily parameterized body of code and want to make sure a particular #define has been introduced from the makefile, e.g.:

#ifdef WINDOWS
    ... /* windows specific code */
#elif UNIX
    ... /* unix specific code */
#else
    #error "Unknown operating system"
#endif

Then there is the #pragma directive. This is a compiler specific directive which each vendor uses for whatever purpose they wish. For instance, #pragmas are used to allow suppression of specific error messages, manage heap and stack debugging, and so on.

Certain symbols are predefined in ANSI C. Two useful ones are __FILE__ and __LINE__, which expand into the current file and line number. For instance:

// debugging macros so we can pin down message provenance at a glance
#ifndef WHERESTR
#define WHERESTR "[file %s, line %d] "
#endif
#ifndef WHEREARG
#define WHEREARG __FILE__,__LINE__
#endif

printf(WHERESTR ": hey, x=%d\n", WHEREARG,x);

This prints the value of x, preceded by the file and line number, allowing quick access to which line the message was produced on. Note that the WHERESTR argument is concatenated with the following string.


Copyright 2008 WordIQ.com - Privacy Policy  ::  Terms of Use  :: Contact Us  :: About Us
This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "Preprocessor".