|
In computer science, a union is a data structure that stores one of several types of data at a single location. There are only two safe ways of accessing a union object. One is to always read the field of a union most recently assigned; tagged unions enforce this restriction. The other is to only access functionality common to all types in the union. For example, if the fields are all subtypes of a common supertype, then it is always legal to perform operations on the union object that one can perform on the supertype.
The remainder of this article will refer strictly to primitive untagged unions, as opposed to tagged unions.
Because of the limitations of their use, untagged unions are generally only provided in untyped languages or in an unsafe way (as in C). They have the advantage over simple tagged unions of not requiring space to store the tag. Most type inference algorithms cannot cope with untagged union types.
The name "union" stems from the type's formal definition. If one sees a type as the set of all values that type can take on, a union type is simply the mathematical union of its constituting types, since it can take on any value any of its elements can. Also, because a mathematical union discards duplicates, if more than one element of the union can take on a single common value, it is impossible to tell from the value alone which element was last written.
Unions in various programming languages
In C and C++, untagged unions are expressed nearly exactly like structures (structs), except that each data member begins at the same location in memory. The data members, as in structures, need not be primitive values, and in fact may be structures or even other unions. The union object occupies as much space as the largest member, whereas structures require space equal to at least the sum of the size of its members. This gain in space, while valuable in certain circumstances, comes at a great cost of safety: the program logic must ensure that it only reads the field most recently written along all possible execution paths.
One common C programming idiom uses unions to perform what C++ calls a reinterpret_cast, by assigning to one field of a union and reading from another, as is done in code which depends on the raw representation of the values. This is not, however, a safe use of unions in general.
Note that the safer tagged unions can be constructed from untagged unions (see tagged union). The safe C dialect Cyclone encourages the preference of tagged unions to untagged.
- Structure and union specifiers have the same form. [ . . . ] The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa.
- - ANSI/ISO 9899:1990 (the ANSI C standard), section 6.5.2.1
The following is to merge
In the C programming language, a union is a datatype that allows differently-typed objects to be treated as a single object. In other words, a union object may be one of several types, depending on the context.
A union declaration consists of a list of different fields, each of which has a different type. The value of a union can be any one of the fields, unlike a C language struct which stores one of each of the fields. A union object therefore needs only enough space to store the largest of the individual types (although there might be extra "padding" that is essentially irrelevant to the programmer). All the fields of a union object have the same address as the union itself.
For example:
- union {
- int a;
- float b;
- } u;
defines a union variable called u, which has an integer component, accessed by u.a', and a floating point component, accessed by u.b. Either field may be used at any time, but only one is likely to be meaningful. Because they occupy the same space, changing u.a also changes the value of u.b. The meaning of the new value of u.b is implementation dependent, and in fact the C Standard states that it is invalid to read u.b if u.a was last written. In practice however, experienced programmers can leverage this to circumvent language conventions.
The primary usefulness of a union is to conserve space, since it provides a way of letting many different types be stored in the same space. Unions also provide crude polymorphism. However, there is no checking of types, so it is up to the programmer to be sure that the proper fields are accessed in different contexts. The relevant field of a union variable is typically determined by the state of other variables.
Unions are often used in association with other types. For example
- struct {
- int type;
- union {
- int a;
- float b;
- } u;
- } s;
wraps a struct around the union. The type field of s can be used to store the meaningful type of the union, e.g., the value might be 1 if u.a is meaningful, and 0 if u.b is meaningful. Most applications of unions do something similar.
The name "union" sometimes confuses novice programmers. Indeed, it is something of a misnomer because a C union is different from a mathematical union. The latter collects object together in a way such that all the objects remain intact, while in C a union collects objects in such a way that only one remains intact.
See also :
Mathematical union - Set theoretic union.
|