|
In the study of vision, visual short-term memory (VSTM) is one of three broad memory systems including iconic memory and long-term memory. VSTM is similar to short-term memory, but limited to information within the visual domain. It is similar to the Visuospatial Sketchpad, which is a component of working memory proposed by Alan Baddeley. Whereas iconic memories are fragile, decay rapidly, and are unable to be actively maintained, visual short-term memories are robust to subsequent stimuli and last over many seconds.
Overview
The introduction of stimuli which were hard to verbalize, and unlikely to be held in long-term memory, revolutionized the study of visual short-term memory (VSTM) in the early 1970s (Cermak,
1971; Phillips, 1974; Phillips & Baddeley, 1971). The basic experimental technique used required observers to indicate whether two matrices (Phillips, 1974; Phillips &Baddeley, 1971), or figures (Cermak, 1971), separated by a short temporal interval, were the same. The finding that observers were able to report that a change had occurred, at levels significantly above chance, indicated that they were able to encode some aspect of the first stimulus in a purely visual store, at least for the period until the presentation of the second stimulus. However, as the stimuli used were complex, and the nature of the change relatively uncontrolled, these experiments left open various questions, such as: (1) whether only a subset of the perceptual dimensions comprising a visual stimulus are stored (e.g., spatial frequency, luminance, or contrast); (2) whether some perceptual dimensions are maintained in VSTM with greater fidelity than others; and (3) the nature by which these dimensions are encoded (i.e., are perceptual dimensions encoded within separate, parallel channels, or are all perceptual dimensions stored as a single bound entity within VSTM?).
Psychophysical approaches to VSTM
In a typical psychophysical VSTM experiment, observers' ability to discriminate
between sequentially presented test and reference patterns are measured using a
two-interval forced-choice (2-IFC) paradigm. For example, in a study involving
spatial frequency, observers might be required to make a judgment as to whether
the first or second pattern presented was of higher (or lower) spatial frequency.
Typically the test and reference patterns are separated by ISIs in the range of 0 s to
30 s. The properties of stimulus pairs can be controlled by using a psychophysical
staircase procedure, or via the method of constant stimuli (for details of these
techniques, see Regan, 2000).When a staircase procedure is used, the properties
of the stimulus pairs are altered until a criterion threshold level of performance is
achieved (e.g., 75% correct).
Fidelity of memory representations
A series of studies over the last decade and a half
(for good reviews, see Magnussen, 2000; Magnussen & Greenlee, 1999)
have demonstrated store various perceptual dimensions
(e.g., spatial frequency, orientation, hue)
with a remarkable degree of fidelity and stability within VSTM
(Magnussen & Greenlee, 1992; Magnussen, Greenlee, Asplund, & Dyrnes, 1991; Magnussen, Idas, & Myhre, 1998; Regan, 1985).
It has been shown, for instance,
that with a reference frequency of 10 c/deg, spatial frequency thresholds tested with
ISIs of up to several seconds (measured as Weber fractions, Df/f) differ by only
three-to-six percent from those recorded when gratings are presented
simultaneously (Regan, 1985). With a period difference of 360 arc sec, a threshold
of .04 Df/f implies that observers are able to distinguish spatial frequency
differences of 14.4 sec arc (Magnussen & Greenlee, 1999). As this is approximately
half the average cone spacing on the fovea, it implies that observers are able to
store spatial frequency information within the hyperacuity range for upwards of 60 s
(Bennett & Cortese, 1996).
A series of psychophysical studies have found that many perceptual dimensions
(i.e., spatial frequency, hue, orientation, speed) are stored with little or no loss in
VSTM. As already mentioned, spatial frequency can be stored for upwards of 60 s
with no increase in thresholds (Bennett & Cortese, 1996; Magnussen & Greenlee,
1997; Magnussen et al., 1991; Magnussen, Greenlee, & Thomas, 1996; Regan,
1985). Other studies have shown that colour (Nilsson & Nelson, 1981), speed
(Magnussen & Greenlee, 1992), and orientation (Magnussen et al., 1998), are also
stored in VSTM for upwards of 10 s with no significant decay.
The one notable exception to this rule is contrast. Several studies have shown that
thresholds for contrast discrimination grow rapidly as ISIs increase, with thresholds
doubling as ISIs are raised from 0 s to 10 s (Lee & Harris, 1996; Magnussen et al.,
1991; Magnussen et al., 1996). This appears to be due to a loss of information about contrast as ISI increases, which
correspondingly makes it increasingly unlikely that a change will be reported as ISIs
increase. The decay in contrast information is likely to underlie the apparent decay
in information for VSTM experiments using matrix patterns (e.g., Phillips, 1974;
Phillips & Baddeley, 1971)
Structure of memory representations
With the exception of contrast, basic perceptual attributes are similar in terms of
both the accuracy and stability with which they are stored in VSTM. However, it is
unclear whether the information from each perceptual stream is encoded separately
within parallel channels, or whether information for different perceptual dimensions
is represented within VSTM as a single, bound set of features. Two different lines of
evidence – one derived from the experimental paradigm known as memory
masking, the other associated with the differential effects observed for decisions
made either within or between perceptual dimensions – suggest that VSTM stores
information within multiple parallel perceptual channels.
Memory masking
Memory masking refers to an experimental technique in which the addition of a
"masking" grating, placed between the reference and test stimuli in a
psychophysical VSTM experiment (Bennett & Cortese, 1996; Magnussen &
Greenlee, 1992; Magnussen et al., 1991), leads to an increase in psychophysical
thresholds. It is important to note, however, that the use of the term "masking" here
is somewhat misleading, as the temporal placement of the additional grating is such
that it acts neither act as a pattern mask nor as an energy mask for the test or
reference stimuli (Breitmeyer, 1984).
If the masking grating matches the test or reference grating in the perceptual
dimension being discriminated, no increase in threshold is observed relative to a no-mask
control condition. However, the more the masking stimulus differs from the
reference grating on the dimension being discriminated, the more thresholds
increase, until thresholds are approximately double those recorded in the absence
of a mask (Bennett & Cortese, 1996; Magnussen & Greenlee, 1992; Magnussen et
al., 1991).
The short presentation times of the masking stimuli (e.g., 200 ms), coupled with the
relatively long time periods between the mask and both the test and reference
stimuli (e.g., Magnussen et al., 1991) argue against the possibility that spatial
adaptation is an explanation for the increase in thresholds caused by the presence
of the mask (e.g., Blakemore & Campbell, 1969).
Another feature of the memory-masking paradigm is that the effects of the mask are
specific to the type of discrimination being made. For instance, when performing a
spatial frequency judgment, the orientation of the masking grating has no effect on
threshold levels. Likewise, the spatial frequency of the masking grating does not
alter thresholds obtained when orientation is being discriminated (Magnussen et al.,
1991). This specificity of the masking effect on thresholds is evidence against its
being mediated either through distracting the observer, or by adding an additional
non-specific burden to memory. Since orientation and spatial frequency are
conjointly coded early in the visual system (DeValois & DeValois, 1990), this result
supports the view that the neurophysiological site affected by memory masking
occurs post-V1, at a locus where orientation and spatial frequency information are
coded into independent perceptual channels. This argument is supported by the
finding that masking by spatial frequency follows perceptual, rather than retinal
coordinates (Bennett & Cortese, 1996), as size constancy is also thought to occur at
a point post-V1 in the visual processing hierarchy (Magnussen, 2000), perhaps in
V4 (see, for instance, Schiller, 1995).
Dual discrimination costs
It is well established that observers are able to make independent decisions about
multiple stimulus dimensions (e.g., spatial frequency, contrast, orientation) with little
or no cost (e.g., Chua, 1990; Greenlee & Thomas, 1993; Vincent & Regan, 1995).
These studies support the view that spatial frequency, orientation, and contrast are
encoded within independent, parallel channels. Since individual neurons in striate
cortex conjointly code spatial frequency and orientation (DeValois & DeValois,
1990), these channels are likely to exist at a point later in the visual processing
hierarchy than V1.
The memory masking literature supports the view that different perceptual
properties are encoded independently within parallel channels. This view is further
supported by evidence from experiments examining the costs of making dual
decisions for attributes that are encoded either within the same or between different
perceptual channels (Greenlee & Thomas, 1993; Magnussen & Greenlee, 1997;
Magnussen et al., 1996; Thomas, Magnussen, & Greenlee, 2000).
The Greenlee-Thomas model assumes that different perceptual dimensions (e.g.,
spatial frequency, contrast, orientation, movement) are encoded within independent
channels (Greenlee & Thomas, 1993). According to this model, making dual
judgments about different perceptual dimensions will lead to only a moderate
increase in threshold, associated with the increased uncertainty of making two
independent judgments (i.e., decision-noise). However, if the two judgments made
are not independent – as might be expected if observers were required to make two
decisions which draw on the same limited resource – thresholds are predicted to
increase to a greater extent than can be explained on the basis of decision-noise
alone.
Magnussen and Greenlee (1997) performed a series of VSTM experiments in which
the relative costs associated with making dual discriminations within and between
stimulus dimensions were compared. Their results can be summarized as follows: (1)
when making discriminations regarding both contrast and spatial frequency,
observers' thresholds rise by an amount predicted by the additional uncertainty in
making two independent decisions; (2) when judgments are made within the same
perceptual dimension, there is a much greater increase in associated thresholds
than predicted on the basis of the increased uncertainty associated with making
multiple independent decisions, suggesting that these judgments are not made
independently (i.e., that they draw on the same limited resource).
Summary of results from psychophysical experiments
Psychophysical experiments in VSTM suggest that most perceptual dimensions
(e.g., spatial frequency, orientation, colour, speed) are stored with remarkable
fidelity over relatively long periods of time (Magnussen, 2000). The one exception to
this rule is contrast, which has been shown to decay rapidly in VSTM (Lee & Harris,
1996).
Converging evidence, drawn both from experiments using memory masking
(Magnussen & Greenlee, 1992), and from a comparison of single and dual-discrimination
costs (Magnussen & Greenlee, 1997), suggests that information is
encoded in VSTM in the form of multiple independent channels, each channel
representing a different perceptual dimension. Further evidence suggests that this
information is encoded at a level in the visual hierarchy later than V1 (e.g., Bennett
& Cortese, 1996).
Set-size effects in VSTM
In a typical VSTM experiment, observers are presented with two arrays, composed
of a number of stimuli. The two arrays are separated by a short temporal interval,
and the task of observers is to decide if the first and second arrays are composed of
identical stimuli, or whether one item differs across the two displays (e.g., Luck & Vogel, 1997). Increasing the number of stimuli present within the
two arrays leads to a monotonic decrease in the sensitivity of observers to
differences in stimuli across the two arrays (Luck & Vogel, 1997; Pashler, 1988).
There are a number of frameworks that attempt to explain the effect of increasing
set-size on performance in VSTM. These can be broadly grouped under three
categories: (1) psychophysical frameworks (e.g., Magnussen & Greenlee, 1997); (2)
sample size models (e.g., Palmer, 1990); and (3) urn models (e.g.,
Pashler, 1988).
Problems with psychophysical explanations
Psychophysical experiments suggest that information
is encoded in VSTM across multiple parallel channels, each channel associated
with a particular perceptual attribute (Magnussen, 2000). Within this framework, a
decrease in an observer's ability to detect a change with increasing set-size can be
attributed to two different processes: (1) if decisions are made across different
channels, decreases in performance are typically small, and consistent with
decreases expected when making multiple independent decisions (Greenlee &
Thomas, 1993; Vincent & Regan, 1995); (2) if multiple decisions are made within
the same channel, the decrease in performance is much greater than expected on
the basis of increased decision-noise alone, and is attributed to interference caused
by multiple decisions within the same perceptual channel (Magnussen & Greenlee,
1997).
However, the Greenlee-Thomas model (Greenlee & Thomas, 1993) suffers from
two failings as a model for the effects of set-size in VSTM. First, it has only been
empirically tested with displays composed of one or two elements. It has been
shown repeatedly in various experimental paradigms that set-size effects differ for displays composed of a relatively small
number of elements (i.e., approximately £ 4 items), and those associated with larger
displays (i.e., approximately > 4 items). The Greenlee-Thomas (1993) model offers
no explanation for why this might be so. Second, while Magnussen, Greenlee, and
Thomas (1997) are able to use this model to predict that greater interference will be
found when dual decisions are made within the same perceptual dimension, rather
than across different perceptual dimensions, this prediction lacks quantitative rigor,
and is unable to accurately anticipate the size of the threshold increase, or give a
detailed explanation of its underlying causes.
In addition to the Greenlee-Thomas model (Greenlee & Thomas, 1993), there are
two other prominent approaches for describing set-size effects in VSTM. These two
approaches are can be referred to as sample size models (Palmer, 1990), and urn models (e.g., Pashler, 1988). They differ
from the Greenlee-Thomas (1993) model by: (1) ascribing the root cause of set-size
effects to a stage prior to decision making; and (2) making no theoretical distinction
between decisions made in the same, or across different, perceptual dimensions.
The two models will be discussed in greater depth in Chapter 4, and a more
technical examination regarding the predictions of both models will be deferred until
then.
Sample size models
Sample size models (Palmer, 1990) propose that the monotonic
decrease in performance with increasing set-size in VSTM experiments is a direct
outcome of a limit in the amount of information observers can extract from a visual
display.
In the sample size model, each perceptual attribute of a stimulus is associated with
an internal, unidimensional percept, formed by the collection of a finite number of
discrete samples. It is assumed that the total number of samples that can be
collected across the entire visual scene is fixed. Assuming that equal attention is
paid to each stimulus, it follows that the total number of samples taken from each
element in an array will be inversely proportional to the number of stimuli present,
N. Central limit theorem implies that the mean of the samples
taken, and therefore the mean of the internal percept, will have a variance inversely
proportional to N. Signal detection theory defines sensitivity (i.e., d') as being
inversely proportional to the standard deviation of the underlying representation to
be discriminated (Macmillan & Creelman, 1991). Therefore according to the sample
size model, in a VSTM experiment an observer's sensitivity to a stimulus change, d',
will be inversely proportional to square-root of N.
Unfortunately, few studies have directly tested this prediction of the sample size
model. Some evidence has been provided by Palmer (1990), who performed a
VSTM experiment using arrays composed of lines of varying length, and set-sizes
of one, two or four. The task of observers was to determine whether there had been
a change in the length of one of the lines. It was found that observers' thresholds
increased proportional to square-root of N, in accordance with the predictions of the sample size
model.
Urn models
Sample size models posit a limit on the total amount of information that can be
extracted from a visual scene. Another prominent class of model proposes that
observers are limited by the total number of items which can be encoded, either
because the capacity of VSTM itself is limited (e.g., Cowan, 2001; Luck & Vogel,
1997; Pashler, 1988), or because of a bottleneck in the number of items which can
be attended to prior to encoding. This type of model has obvious similarities to urn models used in probability theory (see, for example, Mendenhall,
1967). In essence, an urn model assumes that VSTM is restricted in storage
capacity to only a few items, k (often estimated to lie in the range of three-to-five).
The probability that a suprathreshold change will be detected is simply the
probability that the change element is encoded in VSTM (i.e., k/N). Although urn models are used commonly to describe performance limitations in
VSTM (e.g., Luck & Vogel, 1997; Pashler, 1988; Sperling, 1960), it is only recently
that the actual structure of items stored has been considered. Luck and colleagues
have reported a series of experiments designed specifically to elucidate the
structure of information held in VSTM (Luck & Vogel, 1997). This
work provides evidence that items stored in VSTM are coherent objects, and not the
more elementary features of which those objects are composed.
References
- Bennett, P. J., & Cortese, F. (1996). Masking of spatial frequency in visual memory depends on distal, not retinal, frequency. Vision Research, 36(2), 233-238.
- Blakemore, C., & Campbell, F. W. (1969). On the existence of neurons in the human visual system selectively sensitive to the orientation and size of retinal images. Journal of Physiology, 203, 237-260.
- Breitmeyer, B. (1984). Visual masking: An integrative approach. Oxford: Oxford University Press.
- Cermak, G. W. (1971). Short-term recognition memory for complex free-form figures. Psychonomic Science, 25(4), 209-211.
- Chua, F. K. (1990). The processing of spatial frequency and orientation information. Perception & Psychophysics, 47(1), 79-86.
- Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24(1).
- DeValois, R. L., & DeValois, K. K. (1990). Spatial vision. Oxford: Oxford University Press.
- Greenlee, M. W., & Thomas, J. P. (1993). Simultaneous discrimination of the spatial frequency and contrast of periodic stimuli. Journal of the Optical Society of America A, 10(3), 395-404.
- Lee, B., & Harris, J. (1996). Contrast transfer characteristics of visual short-term memory. Vision Research, 36(14), 2159-2166.
- Luck, S. J., & Vogel, E. K. (1997). The capacity of visual working memory for features and conjunctions. Nature, 390, 279-281.
- Magnussen, S. (2000). Low-level memory processes in vision. Trends in Neurosciences, 23(6), 247-251.
- Magnussen, S., & Greenlee, M. W. (1992). Retention and disruption of motion information in visual short-term memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 151-156. 248
- Magnussen, S., & Greenlee, M. W. (1997). Competition and sharing of processing resources in visual discrimination. Journal of Experimental Psychology: Human Perception and Performance, 23(6), 1603-1616.
- Magnussen, S., & Greenlee, M. W. (1999). The psychophysics of perceptual memory. Psychological Research, 62(2-3), 81-92.
- Magnussen, S., Greenlee, M. W., Asplund, R., & Dyrnes, S. (1991). Stimulus-specific mechanisms of visual short-term memory. Vision Research, 31(7-8), 1213-1219.
- Magnussen, S., Greenlee, M. W., & Thomas, J. P. (1996). Parallel processing in visual short-term memory. Journal of Experimenal Psychology: Human Perception and Performance, 22(1), 202-212.
- Magnussen, S., Idas, E., & Myhre, S. H. (1998). Representation of orientation and spatial frequency in perception and memory: A choice reaction time analysis. Journal of Experimental Psychology: Human Perception and Performance, 24, 707-718.
- Nilsson, T. H., & Nelson, T. M. (1981). Delayed monochromatic hue matches indicate characteristics of visual memory. Journal of Experimental Psychology: Human Perception and Performance, 7, 141-150.
- Palmer, J. (1990). Attentional limits on the perception and memory of visual information. Journal of Experimental Psychology: Human Perception and Performance, 16(2), 332-350.
- Pashler, H. (1988). Familiarity and visual change detection. Perception & Psychophysics, 44(4), 369-378.
- Phillips, W. A. (1974). On the distinction between sensory storage and short-term visual memory. Perception & Psychophysics, 16(2), 283-290.
- Phillips, W. A., & Baddeley, A. D. (1971). Reaction time and short-term visual memory. Psychonomic Science, 22(2), 73-74.
- Regan, D. (1985). Storage of spatial-frequency information and spatial-frequency discrimination. Journal of the Optical Society of America A, 2(4), 619-621.
- Schiller, P. H. (1995). Effect of lesions in visual cortical area V4 on the recognition of transformed objects. Nature, 376, 342-344.
- Sperling, G. (1960). The information available in brief visual presentations. Psychological Monographs: General and Applied, 74(11), 1-30.
- Vincent, A., & Regan, D. (1995). Parallel independent encoding of orientation, spatial frequency, and contrast. Perception, 24(5), 491-499.
|