Ghost in the Box, Chapter 2 – Methodological and Conceptual Errors


Methodological and Conceptual


The official doctrine has been injudiciously
applied to biofeedback training with
humans. The consequence of the category
mistakes is poorly designed research
because the conception of biofeedback
training is like Mesmer’s “Ghost in the Tree”
mythology in which biofeedback is viewed
as having an independent power that is
quite magical, namely, the Ghost in the Box.


Methodology Error #1: Insufficient number of training sessions.

Conceptual Error: The feedback “stimulus” has such magical power that its effectiveness can be measured after subjects have been briefly exposed to this power (Category Mistake #1).

In the double-blind study by Hatch et al. (1983), one training session was given. The authors conclude, “the second prediction that subjects given true biofeedback would show lower EMG levels compared to subjects given pseudofeedback was not supported. There were no significant differences in EMG levels among the four groups at any stage of training” (p.422). This type of study, using one to four sessions, is representative of many studies that have confused the field of biofeedback training by implying that biofeedback is so powerful that minimal training is needed. Of 167 voluntary heart rate studies, 75% gave only one to three sessions (Banderia, Bouchard, & Granger, 1982, p.323). Examples of minimal training are presented in Table 1, excluding heart rate studies.

She1lenberger and Green 14

Table 1

Studies with Minimal Training

Number of Sessions Modality Author
One EMC Segreto-Bures & Kotses, 1984
Three BP Erbeck, Elfner & Driggs, 1983
One EMC Hatch, et al., 1983
Four EMC Morasky, Reynolds, & Sowell, 1983
Two Temp Suter, Fredericson, & Portuesi, 1983
Two EMC Shirley, Burish, & Rowe, 1982
One EMC Kiffer, Fridlun, & Fowler, 1981
One BP Lutz & Holmes, 1981
Three EMC O’Connell & Yeaton, 1981
One EMC Uchiyama, Lutterjohann & Shah, 1981
Three EMC Davis, 1980
Two BV Hoon, 1980
Four EMC Nielsen & Holmes, 1980
Three CSR Volow, Erwin, & Cipolat, 1979
Three BP Shannon, Goldman, & Lee, 1978
Two EEC Plotkin, 1977
Three EMC Stern & Berrenberg, 1977
Three Temp Willerman, Skeen, & Simpson, 1977
Four BP Blanchard, Haynes, Kaliman, & Harkey, 1976
One EEC Plotkin, & Cohen, 1976
One EEC Plotkin, Mazer, & Loewy, 1976
One Temp Price & Tursky, 1976
Two FPV Simpson & Nelson, 1976
Three EMC Alexander, 1975
Two EMC/EEC DeCood & Chishoim, 1978
Three BP Fey & Lindholm, 1976
One EMC Haynes, Mosley, & McGowan, 1975
One EEC Lynch, Paskewitz, & Orne, 1974
One EEC Beatty, 1972
One GSC Klinge, 1972
One GSR Shapiro & Watanabe, 1972
One BP Shapiro, Tursky, Gershon, & Stern, 1971
One SPL Crider, Shapiro, & Tursky, 1971
One SPL Johnson, & Schwartz, 1971
One EMG Cleayes, 1970
Note. Abbreviations: electromyogram (EMC); electroencephalogram (EEC); blood pressure (BP); temperature (Temp); blood volume (BV); galvanic skin resistance (CSR); galvanic skin conductance (CSC); skin potential level (SPL); finger pulse volume (FPV).


 Ghost in the Box 15

Based on minimal training, the logical fallacy of “hasty generalization” is frequent. Manuck, Levenson, Hinrichsen, and Gryll (1975) state: “The present findings, while demonstrating significant bi-directional heart-rate changes, do not support the hypothesis that feedback facilitates voluntary heart-rate control . . .Thus it may be speculated that the case for feed-back assisted heart rate control has been somewhat overstated in the recent literature” (p. 300). On the basis of one training session the authors speculate on the inefficacy of heart-rate feedback.

Plotkin’s research (1977 and 1976) has been widely cited for demonstrating that alpha feedback does not contribute to relaxation or tranquility. “Thus it appears that the major contribution that alpha feedback makes to the attainment of meditative-like experiences is the supply of a setting which is conducive to the natural self-inducement of such states” (Plotkin & Cohen, 1976, p.21). Plotkin arrives at this conclusion on the basis of one EEG session in two studies (Plotkin et. al., 1976 and Plotkin & Cohen, 1976) and two EEG sessions in a third study (Plotkin, 1977).

Another experimenter writes, “The lack of a more convincing group learning effect across sessions was puzzling in view of the moderately extended practice” (Volow, et al. 1979, p.139). Three sessions of skin resistance feedback were given. On a basis of four 20 minute forehead EMG treatment sessions, compared to relaxation and no-treatment control groups, Nielsen and Holmes (1980) conclude: ” . . it appears that the use of EMG biofeedback to teach normal persons how to control arousal in threatening situations may not be clearly warranted” (p.247).

Following a sports model, this paucity of training is like attempting to train an athlete to run a four minute mile in four sessions and then conclude that human beings are unable to run the four minute mile and that furthermore, the stopwatch is not useful.

Shellenberger and Green 16


Methodology Error #2: Insufficient length of each training session.

Conceptual Error: The magical biofeedback box has such power that length of “exposure” to the box during a session can be of very short duration (Category Mistake #1).

Many biofeedback training sessions reported in the research literature are only 16, 10, or even 3 minutes in length. Representative studies using a minimum of 3 to 16 minutes are: Borgeat, Hade, Larouche, and Bedwani (1980); Davis, (1980); Dahlstrdin, Carisson, Gale, and Jansson (1984): Herzfield and Taub (1980); Kostes, Rapaport, and Glaus (1978); Lang, P. J. (1977); McCanne, (1983); Nielson and Holmes, (1980); Uchiyama, Lutterjohann, and Shah (1981); Williamson, Janell, Margueolot, and Hutchinson (1983); Wilson and Bird (1981). Often total minutes of training are divided into “trials,” a method borrowed from animal research. For example Davis (1980) provided ten 70-second trials with twenty second intervals of EMG training. Nielson and Holmes (1980) provided sixteen one-minute trials with 15 second intervals. Other researchers have required subjects to alternately increase or decrease the variable on successive trials, such as heart rate (Twentyman and Lang, 1980), or hand temperature (Suter, Fredericson, and Portuesi, 1983). Commenting on EEG methodology, Hardt (1975) argued that such training is like trying to get an airplane off the ground by taxiing first forward and then backward repeatedly.

In a widely cited article, Lang (1977) concludes, ” . . . I think the available data justify the conclusion that high density visceral feedback does not provide a uniquely powerful treatment for anxiety or other broad system stress responses” (p.329). Lang bases this conclusion on heart rate data from his laboratory and on EMG and heart rate data from studies done in his laboratory by Cuthbert (1976). Lang fails to note that in Cuthbert’s study subjects were given five fifteen minute EMG sessions and trained with eyes open while looking at an oscilloscope. Besides failing to critique Cuthbert’s methodology, Lang generalizes from the subjects of Cuthbert’s study to a patient population that would be given EMG training under very different conditions and would not be trained to lower forehead FNIG with eyes open looking at an oscilloscope.

Ghost in the Box 17

Methodology Error #3: Homework exercises are not given.

Conceptual Error: Homework is not needed since the power is in the machine which is used only in the laboratory or clinic. (Category Mistakes #1 and #2).

Biofeedback researchers have rarely required trainees to do home biofeedback training. In over 300 research studies on EMG, thermal, heart rate, blood pressure and GSR feedback training only about 15% reported recommending homework practice. To date, few studies have reported home training data. In some cases the suggestion for homework is no more than “practice at home what you are doing in the laboratory.”

The failure to incorporate homework exercises into biofeedback research results from attributing power to the instrument. The drug model assumes a “specific effect” of the treatment being studied. Any variable that affects the outcome independently of the specific effect “confounds” the results and must be eliminated. Therefore, when the drug model is applied to biofeedback research, homework cannot be given because homework is a “confounding” variable and data on the “specific effect” of biofeedback are invalidated. “Blanchard and Young were forced to conclude that while the data looked promising, the unique contribution of EMG feedback had been consistently confounded with both the inclusion of other relaxation methods during training and regular home practice of nonfeedback relaxation” (Alexander & Smith, 1979, p. 125).

The failure to give homework in biofeedback research arises also from Category Mistake #2. In conditioning studies the animal does not practice the desired response when returned to the cage. Failure to give “home practice” arises from the animals inability to comprehend language and also from the operant conditioning paradigm in which the stimulus and reinforcement are thought to control behavior (Methodological Error #4). The animal is not expected to demonstrate the behavior in the home cage, where stimulus and reinforcers are absent.

In applying concepts from operant conditioning to bioteedback training, in which the stimulus and reinforcer ghosts in the box are thought to have power, home practice for humans is excluded. According to theory, training can proceed only while the trainee

Shellenberger and Green 18

is connected to the box, and biofeedback instruments are typically not given to subjects for home use. The experiments of Budzynski, Stoyva, Adler, and Jullaney (1973), Sargent Walters, and Green (1973), Sterman and MacDonald (1978) are early exceptions to this practice.

Failure to provide the subject or patient with home practice exercises with or without an instrument has hindered the development of biofeedback training, we comment on this again in Error #8 (use of relaxation control groups). Skilled athletes cannot achieve competence without regular practice, nor can biofeedback trainees. In most cases, training in the laboratory or the clinic is not sufficient for learning self regulation skills and for transfer of training to other situations. Homework is an integral part of successful training.

Ghost in the Box 19


Methodology Error #4: Failure to maximize internal locus of control.

Conceptual Error: The response is under the control of the f&edback characteristics: stimulus, reinforcer, and contingency (Category Mistake  #1 and #2).

In their book on behavioral medicine under the heading “Operant Conditioning,” Olton and Noonberg (1980) state, “Biofeedback teaches a person to develop voluntary control over some biological process. In this respect, it is a form of learning, differing from other forms mainly in the types of responses that are controlled” (p.24). These authors are suggesting that human learning is like research animal learning-humans use the same strategies and are subject to the same situational and psychological variables as laboratory animals; the main difference in biofeedback training is that the human is not learning bar-pressing or disc-pecking. In accordance with this category mistake, and in spite of using the term “voluntary,” the authors continue, ‘To be effective in controlling behavior [emphasis added], reinforcers must be made contingent upon behavior” (p.24). As noted in Chapter 1, other authors suggest that the behavior comes under the control of the stimulus. (Alexander, 1975; Engel, 1979). We also noted that Hatch (1982) and Furedy and Riley (1982) believe that the contingency between the stimulus and response determines behavior.

The language of “control” and ”determines” is a statement of causality.   In operant conditioning the stimulus, reinforcer and contingency are external experimental variables, not subject variables and as noted, are thought to control or cause behavior. Thus the locus of control is thought to be external to the subject. It is important to note that in animal research the locus of control is external to the animal in so far as the researcher has the power to control stimuli, reinforcers, and the contingencies between stimuli, responses and reinforcers. For this reason an external locus of control is assumed and promoted in biofeedback research with humans.

Applying the language of these models literally it would be said, “Hand warming came under the control of the green light” when a green light signals hand warming. It would not be stated that

She1lenberger and Green 20

hand warming came under the control of the hypothalamus, the limbic system, the cortex, or the trainee. The error here seems obvious–green lights do not have the power to increase hand temperature any more than red traffic lights have the power to make us apply the brakes. The human who is increasing hand temperature, or braking at a red light has the power to control the response.

In biofeedback training neither the feedback instrument nor the information nor the contingency between the information and the response have the power to control the response. just as they do not have the power to control the response when we use a mirror. In biofeedback training the locus of control is internal, just as it is when we use a mirror. The trainee alone has the power to use the information, learn the desired response, and control that response. Furthermore, the human controls the contingency in the same way that the human controls the contingency between stimulus and response when using a mirror. The human controls the con tingency because the human controls the response, which in turn produces the “feedback stimulus.” For example, if you put your right hand over your left ear while looking in a mirror, you will see exactly that movement reflected in the mirror. The reflection of that response acts as a stimulus for new behavior, as a reinforcer, and as information simultaneously, and all are under your control because you initiate the behavior.  It would be superstitious to believe that the mirror caused you to put your hand over your ear. Similarly, in learning digital blood flow control with biofeedback for example, we create an internal state, and we see the immediate response on the meter. The reflection is simultaneously the stimulus, the response, the information, and the reinforcer. At the same time, we also become aware of the contingency between the internal state that we voluntarily create and the physiological response that is indicated by the feedback.

This commonality of the stimulus, reinforcer, information and response is unique to biofeedback training and sets it apart from operant conditioning, they are not in the same category of learning paradigms. If this discussion seems abstruse simply stand in front of a mirror, perform any task, and ask yourself “What is controlling what?”

The assumption that behavior change through biofeedback train-

Ghost in the Box 21

ing is controlled by a power external to the trainee leads to serious methodological problems in research. When the feedback characteristics (stimulus, reinforcer. information and contingency) are believed to control the response or the feedback is believed to have a specific drug-like effect, the human in the situation is disempowered, learning is thwarted due to lack of clear goals, proper instruction, home practice and coaching, and most rudimentary form of learning is studied. 

In applying the myth of external locus of control to running. it would be assumed for example, that the stopwatch has the power to control the runner’s behavior. If the coach believed this, training of the athlete would be minimal; the coach would focus on the characteristics of the stopwatch. rather than on physical, emotional and mental training techniques. This has happened repeatedly in biofeedback training research. Numerous research articles have been written on “stimulus characteristics.” response characteristics,” “contingency characteristics.” and “reinforcement characteristics” (Katkin & Goldband, 1979, p. 186). A natural consequence of this mythology is a paucity of studies on the uniquely human parameters involved in successful learning. Furthermore, research based on the ghost in the box mythology fails, not because biofeedback training fails, or because humans cannot learn to self regulate, but because there is no magic in the box. The “magic” is in the human using the box.

Shellenberger and Green 22


Methodology Error #5: Failure to provide adequate cognitive support (rationales, instructions and coaching).

Conceptual Error: The biofeedback machine has such power that minimal instruction and coaching are needed; trial-and-error learning is good enough, (Category Mistake #1). Instruction is not needed in drug and animal studies, and therefore is not needed in biofeedback training with humans (Category Mistake #2).

In their article, An Alternative Perspective on Biofeedback Efficacy Studies: A Reply to Steiner and Dince,” Kewman and Roberts (1983) defend the use of unskilled trainers in biofeedback training studies. They take this stance because they do not believe that high performance coaching is necessary for the “power” of biofeedback training to have its effect. The concept of the biofeedback “technician” implies that the power is in the box and only a technician is needed for connecting the subject or patient to the instrument and turning it on, and it implies that trial-and-error learning is good enough for the effect of biofeedback to be demonstrated.

When researchers attempt to study the biofeedback ghost in the box devoid of facilitating variables such as instructions, coaching, and appropriate rationales, we call this “bare bones biofeedback,” the same term adopted by Budzynski (1973a) to describe this research approach. As an example of bare bones biofeedback. consider the familiar research scenario: the thermister is taped to the subject’s finger with the instruction, ‘Your task is to make the needle move to the right; I will return in ten minutes at the end of the experiment.” Twentyman and Lang (1980) describe their procedure: “Subjects were encouraged to work seriously at the task but were not advised how to accomplish it” (p.421). Stoffer, Jensen and Nessct (1979) write, Each subject was told that when the red “start” light came on he “was to attempt to raise his finger temperature by whatever mental means he could employ for a 13 minute period”, (p. 555). In these situations the only learning strategy available to the subject is trial-and-error, not a high-powered strategy.

In human performance training such as sports or music, students are not expected to learn a skill solely by trial-and-error. Or

Ghost in the Box 23

imagine letting a teenager learn to drive a car by trial-and-error, “Here are the keys, just turn on the engine and see what happens.” At least it is clear that the car has no special power to instill driving skills in the learner, just as the biofeedback machine has no special power. Many biofeedback studies, however, give the impression that to provide instructions or coach the subject, or enhance performance in any way would bias the results-as though the power of the entity, biofeedback, could and should be studied independently of any skills of the user of the biofeedback information. Using bare bones methodology, the minimal potential of feedback to facilitate psychophysiological change and learning is studied. In fact, we argue in Error #7 (failure to establish training criteria) that in the majority of bare bones studies learning cannot be clearly demonstrated. Learning is not demonstrated by a few mmHg change in blood pressure (Elder and Eustis, 1975; Surwit, Shapiro, and Good, 1978); a few b.p.m. in heart rate (Bouchard and Granger, 1977; Schwartz, 1972); a two or three degree increase in hand temperature (Achterberg, McGraw, and Lawlis, 1981; Gamble and Elder, 1983; Guglielmi et al., 1982; Stoffer et al., 1979, ) and small changes in muscle tension (O’Connell and Yeaton, 1981; Phillips, 1977; Weinman, et al., 1983).

When it became clear that bare bones biofeedback was likely to fail, some researchers began using instructions and found that instructions do improve learning, (Bergman & Johnson, 1971; Bouchard & Granger, 1978; Herzfeld & Taub, 1980; Hoon, 1980; Lacroix & Roberts, 1978; Stephens, Harris, & Brady, 1972 ). If this seemed surprising it was because biofeedback was thought to be so powerful that mere exposure to the machine could change behavior.

The research on the use of cognitive behavior modification in sports psychology and psychotherapy has demonstrated the importance of positive instructions, positive self-talk, and positive imagery for effective coaching, teaching and therapy with humans (Cox, 1985; Heil, 1984; Lazarus, 1975; Meichenbaum, 1976; Mickelson & Stevic, 1971; Shaw & Blanchard, 1983; Suinn, 1984; Weinberg, 1984;). In addition, research in psychology, sports and education over the past 30 years has demonstrated the importance of a positive interaction between teacher and student, coach and athlete, therapist and client (Aspy, 1969; Carkhuff & Berenson,

Shellenberger and Green 24

1976; Cox, 1985; Kratochvil, Carkhuff, & Berenson, 1969; Smoll & Smith, 1984; Taub, 1977; Truax & Mitchell, 1971).

The importance of the positive or negative expectations of the coach or teacher on motivation and performance has been well documented (Brawley & Roberts, 1984; Cox, 1985; Horn, 1984; Martinek, 1981). Motivation and performance can be enhanced by the coach, teacher or therapist, and are relevant in biofeedback training research, just as they are in biofeedback training, but have rarely been recognized due to the category mistakes of the ghost in the box. In fact, these variables have been excluded in ghost in the box methodology because researchers believe that they contaminate the results, making the specific drug-like effect of biofeedback difficult to detect.  ” . . rather, the evidence for informational biofeedback’s efficacy has to be in the form of control conditions that show that an appreciable amount of increased control can indeed be attributed to the information supplied and not to the placebo-related effects such as motivation, self-instruction, relaxation and subject selection” (Furedy, 1979, p. 206).

When a coaching model is employed, biofeedback training includes an appropriate rationale and instructions, a variety of psychophysiological training techniques, motivation enhancement, and home practice with or without a home biofeedback unit. With this methodology the maximum potential of biofeedback training to facilitate learning and psychophysiological change is studied.

As learning is facilitated by departing from bare bones trial-and-error methodology, studies show greater learning as described in Chapter 4, Successful Biofeedback Training. This is true in clinical practice as well.

Ghost in the Box 25


Methodology Error #6: Double-blind designs.

Conceptual Error: The biofeedback box and signals coming from it are so powerful that the trainer and trainee can be “blind” to the goals, methods, and relevant feedback for successful training and symptom reduction (Category Mistake #1).

The double-blind design used in biofeedback training research is the quintessential example of Category Mistake #1 and its consequences. In this case a design appropriate to the study of the effect of chemicals on physiology is used to study the effect of biofeedback on physiology. The essence of the biofeedback mirror is to remove blindfolds and provide salient information for learning. The essence of the double-blind design as used by Furedy (1985), Guglielmi et al. (1982), Kewman and Roberts (1980), Whitset, Lubar, Holder, Pomplin, and Shabsin (1982) is to “blindfold” the trainer and trainee to the salient feedback of information. In these studies subjects are given signals from a biofeedback machine that reflect a physiological process, but are not informed of the contingency. For example, Guglielmi et al. (1982) attached a thermister and EMG electrodes to each Raynaud’s patient, but to keep the subjects “blinded” they were not told which physiological process was creating the feedback. “They all received both auditory and visual feedback, but they were not told about the nature of direction of the physiological change upon which feedback was contingent. They were simply instructed to “drive” the feedback meter and tone in one direction” (p.107). Why these researchers refer to this process as “training” is unclear.

Double-blind studies of this type cannot measure the effectiveness of biofeedback training, because they are not biofeedback studies. What is being studied? The ghost in the box that is not there. Using this type of double-blind design in biofeedback training research is like putting a blindfolded person in front of a mirror to determine whether or not the mirror has a specific effect. When the double-blind design produces little learning and symptom reduction, researchers inevitably conclude that biofeedback was not effective. “The results of the present investigation clearly indicate that the best treatment for Raynaud’s Disease is warm weather” (Guglielmi, et al., 1982, p. 118.). This is like concluding that

Shellenberger and Green 26

because a blindfolded person has difficulty learning to braid her hair while standing in front of a mirror, the mirror is not effective. The conclusion is false and so it is in biofeedback research.

With this type of double-blind design, direct comparisons are made between exposure to signals from a biofeedback instrument and symptoms, by-passing the essential ingredient, learning; the comparison of the training effect (the degree of learning) to the treatment effect (the degree of symptom reduction) is missed (Fahrion, 1978).

Because learning is irrelevant to the effect of drugs on symptoms, when the double-blind design is applied to biofeedback training two consequences may occur: (a) the design itself hinders learning by eliminating much that goes with learning: appropriate goals, appropriate feedback, coaching, adjunctive tools, appropriate homework instructions, and motivation that arises from success and knowledge, and (b) the researcher fails a priori to appreciate the role of learning (as distinct from mere physiological change) in symptom reduction. Double-blind methodology of the Guglielmi et al. variety necessarily implies that biofeedback training does not involve learning, that biofeedback has a special power independent of the user and provider, as do drugs. This implication and the research that followed from it have hindered the development of biofeedback training and falsely underestimated its potential.

The Kewman and Roberts double-blind study (1980) with migraine patients deserves mention here although it has been well critiqued by Steiner and Dince (1981). In this study migraineurs received feedback for either increases or decreases in hand temperature; subjects were of course not aware of the contingency. As in the later study by Guglielmi et al. (1982), this “blindness” is justified by the fact that rats can learn and they seem to be unconscious of the target response. The human data however, indicate the contrary. Subjects failed to learn to increase or decrease hand temperature, contradicting the authors claim that learning occurred. “Learning” to increase hand temperature is defined as a temperature change from 87.20F to 88.50F (group means), and “learning” to decrease is defined as a temperature change from 88.80F to 87.60F (group means), all within normal variation (Error #10, failure to establish reliability). Subjects with less change were termed “non-learners.” Kewman and Roberts seem to be

Ghost in the Box 27

unconcerned by the fact that some subjects receiving feedback for decreases in temperature were later assigned to the “learned increase group because their temperatures went up, and vice versa. This is not learning, this is quasi-random variation in hand temperature. But because Kewman and Roberts choose to call a slight variation “learning,” they can claim that learning to decrease hand temperature is as effective as learning to increase hand temperature in reducing symptomology since these groups had somewhat similar treatment results. Phrases used by these authors such as “migraine patients who learned to raise finger temperature ,” ” . . those trained to lower finger temperature,” and “learning criterion” are psychophysiologically and scientifically incorrect and misleading. This misuse of the term “learning” is discussed further in Error #7, failure to establish training criteria.

The studies referenced above are actually not “double-blind” studies as used in drug research because no group was given the “active ingredient” i.e. true, contingent feedback of information that could be used for learning. Other double-blind studies have included a feedback training group, but because researchers using the double-blind design have accepted the drug model, training in the feedback group is minimal. Hatch et al. (1983) included a contingent feedback group in their double-blind study on EMG training with normal subjects, but in this single session study, subjects in the feedback, false feedback and pseudofeedback groups were given one instruction, ”  . . reach the deepest level of relaxation.” Other than this, trial-and-error learning was the only strategy available to the feedback group. It is not surprising that the contingent and noncontingent groups had similar results.

Furthermore, even in “correct” double-blind studies, the trainer is blind, and doesn’t know whether or not the trainee is receiving contingent or false feedback. Again, training is minimal because effective coaching is prevented.

The assumptions that biofeedback instruments have a special power to change behavior, and that learning is not essential for this power to have its effect are also tacitly incorporated in the ABA’ design (treatment-no-treatment-treatment or baseline-treatment-baseline) advocated and used by many researchers (Blanchard & Young, 1974). This design suggests that when the

Shellenberger and Green 28

subject is exposed to biofeedback in the A phase behavior will change, and when the feedback is removed or falsified, phase B, the conditioned behavior will extinguish. To be sure, if the effectiveness of a medication such as an anticonvulsant or insulin is studied the loss of the behavior in the B phase is expected. (The loss of behavior is not expected if the medication is administered to facilitate a cure, however.) A problem with the ABA’ design is described by Whitsett et al. (1982), in their report of a double-blind study on EEG and seizure activity:

An additional complication stemmed from the ABA design that was incorporated. Although this particular paradigm was utilized to strengthen the claim of operant control over the EEG, and to rule out placebo and other nonspecific effects, it may have made acquisition of the task too difficult for several patients. The reversal of contingencies during the B phase, in particular, appeared to cause considerable distress for some patients, despite the fact that the patients were not informed of the change (p.207).

(It is curious that while this type of design is used to rule out the positive placebo effect, the possibility that a negative placebo effect might seriously affect results is not considered.)

Sterman (1985) reported a situation similar to that of Whitsett et al. (1982) in an SMR study in which a subject knew when the B phase was initiated. In essence the subject’s attitude was: “I know exactly what I am doing and I am not going to produce a brain wave that is not effective.” In this case the subject had apparently become aware of the subjective correlates of SMR and had gained mastery of the rhythm, certainly the goal of biofeedback training.

These designs may be appropriate if the treatment being studied necessitates neither consciousness nor learning on the part of the subject. These conditions are not true of successful biofeedback training. In successful biofeedback training consciousness and learning are fundamental.

To a clinician, the ABA’ and similar designs are problematic. Ideally, if the clinician and the patient have done their jobs well,

Ghost in the Box 29

there is no substantial decrement of behavior with termination of treatment (A’ or B phase, depending on the design); the behavior has been learned and brought under conscious control and should be independent of the training tool and trainer. Cessation of treatment is a test of treatment success; extinction is not expected, and further improvement at follow-up would not be surprising. By analogy, after the child has learned to say the alphabet in kindergarten this behavior should be maintained during withdrawal of treatment called summer vacation. If the child loses the behavior over the summer months the teacher might be dismissed. If the child practices the alphabet over the summer months, improvement in alphabet-saying is expected. The double-blind, and ABA’ designs illustrate the self-fulfilling prophecy-conditions are created in which learning cannot occur and “biofeedback” must fail, and indeed it does.

These designs are inappropriately applied to biofeedback training, and the results generated from them have hindered the development of the field and its acceptance as a treatment modality.


She1lenberger and Green 30


Methodology Error #7: Failure to establish training criteria.

Conceptual Error: Criteria for determining successful training are not needed. A training effect is not necessary since the power is in the machine (Category Mistakes #1 and #2).


Failure to establish performance goals arises naturally from Category Mistakes #1 and #2. Had researchers not been so mystified by the assumed drug-like power of the biofeedback machine, long ago they would have asked “To what level should subjects and patients train?”, As an example many studies on the effectiveness of biofeedback training for remediation of tension or migraine headache failed to establish training goals. Furthermore, the authors do not report training results but simply report treatment results (Chesney & Shelton, 1976; Cox, Freundlich & Meyer, 1975; Diamond, Medina, Diamond-Falk, & Deveno, 1979; Fried et al., 1977; Haynes, Giffin, Mooney, & Parise,1975; McKenzie, Ehrisman, Montgomery & Barnes, 1974; Medina, Diamond, and Franklin, 1976; Sturgis et al.. 1978). Apparently learning data are not reported because learning is not thought to be the essential variable; “biofeedback” is the essential variable, meaning exposure to signals from the biofeedback instrument.

Failure to establish training criteria and train patients to these criteria encompasses another error-failure to appreciate the essential link between training and treatment, (and the need to study this link), currently referred to as the training effect (degree of learning) vs. the treatment effect (degree of symptom reduction), (Blanchard, et al., 1980; Fahrion, 1978; Libo, 1983b; Steiner & Dince, 1981 and 1983). While successful training by the patient has not occurred in many studies, it is nonetheless expected that significant changes in symptomatology should result-that the treatment effect” is somehow independent of the “training effect.” In A Biofeedback Primer (1978) Blanchard and Epstein write:

One report (Kaplan) has failed to confirm the efficacy of SMR [sensorimotor rhythm] feedback training for the treatment of epilepsy [emphasis added]. Kaplan treated two epileptics for three months with feedback of the

Ghost in the Box 31

SMR. Neither showed any improvement in seizure rate or any evidence of learning to produce SMR [emphasis added] although a technique similar to Sterman’s was used. Her systematic case studies thus throw some doubt on Sterman’s procedure (p. 143).

Although Kaplan’s subjects failed to learn to produce SMR, Blanchard and Epstein conclude that her study fails to confirm the efficacy of SMR feedback for seizure reduction, and suggest that this casts doubt on Sterman’s procedure. They can only conclude that SMR is difficult to learn. Obviously Kaplan’s study says nothing about the efficacy of SMR feedback training. In fact, it is doubtful that this could be called an SMR feedback study. In order to receive SMR feedback, subjects must be learning to produce the brainwave rhythm (unlike other continuous physiological variables such as temperature, or muscle tension). These authors and others, make this type of error because they believe that training and treatment effects are not related. In this case, Blanchard and Epstein apparently believe that if the procedure is effective, then by merely being connected to the SMR feedback device epileptics should experience seizure reduction independently of learning-again, the ghost in the box mythology.

It was for good reasons that early pioneers in biofeedback training emphasized the fact that criteria to demonstrate significant learning must be established before making claims about the treatment effect, or before correlating the treatment effect with biofeedback training, (Budzynski, et al., 1977; Fahrion, 1978). It is curious that this simple logic has been missed.

Due to the mythology of the ghost in the box, these early researchers were not heeded. The irony is that on this issue, if official doctrine researchers had been truer to the drug model they would have investigated the “dosage” necessary to achieve the desired treatment effect. Blanchard, Andrasik and Silver (1980) argue against training to criterion while still justifying their conclusion that biofeedback is not effective in the treatment of muscle contraction headache.” Another criticism leveled by Belar is that no studies utilized a learning to criterion as part of the biofeedback training. While this is a valid observation, there is no evidence that this would be an effective strategy” (p.22). The circularity

Shellenberger and Green 32

of this reasoning is striking. The fact that these studies lacked training criteria is not evidence from which to conclude that criteria are not needed. This is a misuse of the category “no evidence.” In scientific investigation it is concluded that there is no evidence for the effect of the experimental variable only after having carefully studied the variable and not before. This type of equivocation with the term “no evidence” is used repeatedly by Kewman and Roberts (1983).

Many clinicians and researchers have found that when patients fail to achieve generalized low arousal states minimal treatment results occur, and when patients achieve generalized low arousal states, maximal treatment results occur. Libo and Arnold (1983) in a 1 to 5 year follow-up study of 49 patients found that all patients who achieved training criteria on both EMG (1 �V RMS) and finger temperature (95�F) reported long-term improvement, (N=12). Of the patients who did not improve (N = 11), eight had not achieved training criterion in either modality. It was found that of the 26 remaining patients who achieved criterion on one modality only, 23 showed long term improvement. Similar data are reported by Fahrion, Norris, Green, and Green (1986) on EMG and temperature training for blood pressure reduction, and by Budzynski et al., (1973).

In spite of their earlier viewpoint, Blanchard, Andrasik, and associates have begun to examine the relation between the training and the treatment effect (Acerra, Andrasik, & Blanchard, 1984). A preliminary result from their work with essential hypertensives is: “Repeated measures ANOVA revealed that those patients who were able to raise their hand temperature to at least 97�F during biofeedback showed decreases in diastolic blood pressure from one week pre-treatment to the last week of treatment (p = .001).” They conclude: “The home blood pressure data supports the idea of a relationship between reaching a criterion and clinical outcome” (p. 5).

If the necessity to train to criterion were recognized, it would not be acceptable for grant writers and graduate students to propose that subjects be trained for a predetermined number of sessions rather than to a particular level of performance or symptom reduction. Typically grant proposals state the number of sessions to be given; when that number is reached the experiment ends,

Ghost in the Box 33

whether or not the trainees have learned self regulation of the variable being studied, or symptoms are reduced. If symptoms are not reduced beyond the control group, conclusions are negative. A more scientific approach would be to train patients to a point of substantial symptom reduction and then through an analysis of the training data determine the necessary criteria for training, for particular symptoms. Or, adequately train subjects in self regulation skills, and only then draw conclusions about the efficacy of biofeedback training for symptom reduction.

Failure to establish criteria with which to determine learning has led to another serious problem-every researcher has a different definition of “learning.” In many reports “learning” is assumed if any change occurs in the variable being studied, as seen in the Kewman and Roberts double-blind migraine study described above. Onoda (1983) conducted a study in which one group of subjects was instructed to relax and warm their hands, and one group was instructed to relax and cool their hands; eight one-half hour sessions were given. Onoda concludes:

Since there was no significant difference in reported subjective relaxation between the WR [warm-relax] and CR [cool-relax] groups, a clear pattern between physiological change in hand temperature and subjective relaxation cannot be established. These findings suggest that the use of hand-warming with a “normal” population to enhance relaxation is largely placebo, or due to nonspecific effects (p. 113).

In examining the temperature data it is clear that no learning occurred. The mean decrease over eight sessions for the cool group was 1 .7�F and appears to vary randomly; the mean increase for the warm group was 3.38�F and temperature gains actually decreased in the last three sessions, ending with a mere 1 .4�F increase in the final session. This is not learning. Why the author expected that subjective cues could be attached to these small variations in temperature is unclear, (unfortunately absolute measurements are not given). Naturally these groups have similar subjective experience since neither learned to control the response and both were told to relax. This study does not contribute to our knowledge

Shellenberger and Green 34

of the subjective experiences of hand warming and cooling.

In the Stoffer et al. study (1979) an increase in temperature of at least 0.3� C was considered significant; no differences were found between the feedback group, yoked control and no-treatment groups. Unlike many studies the authors did include a demonstration of “learning” task, and in this task the feedback group did exceed the control groups. The magnitude of increase during the demonstration of voluntary control however, was ”typically less that 0.5�C” (p.59). In spite of minimal learning (it is suggested that small changes may have been due to high baseline temperatures) results on a cold pressor test led the authors to conclude: “There is no indication that previous temperature training influenced blood pressure, heart rate, subjective pain, or immersion time during the cold pressor test given under no-feedback conditions. Stress modulation effects of training may not apply to temperature control” (p.59). As noted in the previous error, the use of phrases such as “learning” and ”control” in studies of this type are both inaccurate and misleading.

Unfounded conclusions such as these occur because there are no well established criteria for learning among researchers. Were adequate criteria established, these studies would have been conducted differently, or would not have been published with faulty conclusions.

In Chapter Four, Successful Biofeedback Training, we discuss the relationship between training to criteria and successful outcomes.

Ghost in the Box 35

Methodology Error #8: Using a relaxation control group for comparison to biofeedback training.

Conceptual Error: Relaxation training and biofeedback training are different; biofeedback has power independently of relaxation (Category Mistake #1).

Kewman and Roberts (1983) state: “There is uncertainty as to whether the efficacy of biofeedback exceeds that of relaxation training alone” (p.489). Chesney and Shelton (1976) write: “Relaxation training and practice rather than biofeedback are essential in the treatment of muscle contraction headaches” (p.225). Price states: “One primary defect is that biofeedback has generally not been found to be superior to training in relaxation only” (Price, 1979, p.146). “Unfortunately, these sustained improvements [in blood pressure] cannot be attributed to the effects of biofeedback training alone since general relaxation training was incorporated into the treatment procedures as well” (Yates, 1980. p.491). Searching for the specific effect of biofeedback and not finding it, Beatty writes: “Furthermore, detailed studies of the hemodynamic effects of hand-warming procedures suggest that any observed therapeutic effect cannot be attributed to specific effects on the pathophysiological processes, but rather are indicative of generalized relaxation.” Thus Beatty concludes “. . . these data speak quite clearly against the continued use of biofeedback procedures in the treatment of migraine . . .” (Beatty, 1982, p.220). In numerous studies the experimental and control groups are described as follows: the experimental group received biofeedback training.and the control group received relaxation training, (Alexander, 1975; Cox et al., 1975; Coursey, 1975; Haynes, Mosley, & McGowan, 1975). In many cases a specific type of relaxation training is compared to biofeedback training. What is meant by biofeedback training?

The belief in a specific drug-like power of biofeedback led to the methodological concern that any relaxation technique used in conjunction with the feedback would confound the results. “Blanchard and Young were forced to conclude that while the data looked promising, the unique contribution of EMG feedback had been consistently confounded with both the inclusion of other relax-

Shellenberger and Green 36

ation methods during training and regular home practice of nonfeedback relaxation” (Alexander & Smith, 1979, p.124). By analogy, this is to suggest that we could and should study the unique contribution of the mirror to behavior, for example hair-brushing, without the additional aid of a brush.

When trainees learn to increase blood flow in their hands or lower muscle tension they are learning to relax. This fact is not clear to many researchers who seem to believe that biofeedback has a specific drug-like effect that increases hand temperature or reduces EMG level or decelerates the heart, independently of relaxation. But there is nothing inherently relaxing about the feedback of information, and feedback is not a relaxation procedure any more than the reflection in the mirror is a procedure. Biofeedback information merely aids in the learning of relaxation.

To achieve the goal of increased blood flow in the hands or lower muscle tension, relaxation must be learned by whatever methods are effective. These methods can be either unsystematic or systematic. When researchers believe that biofeedback has a specific effect and instructions and coaching should not be given, then the only learning method with which the trainee can learn relaxation (hand warming or lowering EMG, or heart rate reduction) is trial-and-error, an unsystematic and often ineffective approach. When it is understood that the feedback of information is merely an aid to learning, then the goal of increased blood flow in the hands or reduced muscle tension is taught through a variety of systematic relaxation such as autogenic training, progressive relaxation, breathing techniques and imagery techniques.

Studies that compare a “biofeedback” group to a relaxation control group are usually comparing trial-and-error learning, to learning a systematic relaxation technique, (or simply the instruction to relax), not a useful comparison and certainly misleading.

The extent to which the subject learns to voluntarily create the subjective experience and achieve the physiological parameters of relaxation, is the extent to which hand warming or muscle tension reduction will be learned; feedback of information aids in that process but is not a relaxation technique in itself. Because relaxation (low arousal) is the psychophysiological process that brings the body back to healthy homeostasis, it is no surprise that the relaxation group does better than the biofeedback group in

Ghost in the Box 37

achieving low arousal and symptom reduction in these studies. Whatever the biofeedback group is doing, it apparently is not relaxation, other than that gained by trial-and-error attempts at changing the feedback. In addition, subjects in a relaxation control group, without feedback, undoubtedly initiate “passive volition.”   In contrast, subjects given feedback and a task to perform, may use “active volition” in attempts to succeed, a counterproductive strategy in learning psychophysiological control in most cases.

This misleading confusion of ‘biofeedback” and “relaxation” contributes also to the issue of homework. If biofeedback were considered to be a tool for enhancing relaxation skills, or relaxation skills useful for enhancing biofeedback training, the usefulness of homework exercises would be clear. Certainly biofeedback training subjects/patients could practice relaxation at home without a biofeedback instrument, and thus enhance their relaxation skills, probably beyond that of the control “relaxation” group since the feedback group would have feedback in the laboratory to confirm the efficacy of their relaxation strategies. When the biofeedback group is given relaxation strategies and home training, the biofeedback group is found to be superior to a relaxation control group (Blanchard et al.,1982a; Blanchard et al., 1982b) . When the power is thought to be in the instrument which is generally not sent home (early exceptions are Budzynski et al., 1973; Sargent, Green, & Walters, 1973; and Sterman, 1977), and when biofeedback training is thought to be something different from relaxation training, the need for homework is not appreciated. Subjects and patients are denied an important aspect of training.

The early article by Stoyva and Budzynski (1974) emphasizing the need for generalized low arousal measured on several modalities, (temperature, GSR, forearm EMG, and forehead EMG) has been essentially disregarded in research until recently. Stoyva and Budzynski noted that the crucial issue is self regulation of the low arousal states of deep relaxation, and the initial goal of training is generalized low arousal by whatever method(s) the patient finds most efficacious. In clinical practice we do not attempt to treat the patient with only one training method or feedback modality. Several techniques and modalities are used and the patient determines which is most beneficial (the locus of control is purposefully internal). In our clinical work and research (Shellenberger, Turner,

Shellenberger and Green 38

Green, and Cooney, 1986, and Shellenberger et al., 1983), we find that subjects and patients prefer a variety of systematic relaxation techniques and biofeedback modalities.

Failure to appreciate the value of generalized low arousal (Category Mistake #1-biofeedback has the power to create change independently of the internal state of the patient), has resulted in poor research results. Fahrion, et al., (1986) argues that many blood pressure studies have produced minimal results because a state of generalized low arousal was not achieved by subjects. Furthermore, when generalized low arousal is the goal, the issue of specificity between feedback modality and symptom is irrelevant. We raise the issue of specificity here because it has a bearing on the issue of relaxation versus biofeedback. If it is assumed that a specific illness should be treated with a specific biofeedback modality or relaxation technique, and that biofeedback and relaxation are different, then the risk of minimal biofeedback training and poor results is imminent.

In summary, there are three options for understanding biofeedback and relaxation training:

(1) Feedback of information is an aid to unsystematic relaxation training-trial-and-error feedback learning;

(2) Feedback of information is an aid to systematic relaxation training: autogenic feedback training (Green , Green, & Walters, 1970), progressive feedback training (Budzynski, 1973a), open focus feedback training (Fritz, 1985), and quieting response feed-back training (Ford, Stroebel, Strong, & Szarek, 1983).

(3) Feedback of information is not an aid to unsystematic or systematic training but has a specific effect of its own—the ghost in the box.

Ghost in the Box 39


Methodology Error #9: Failure to incorporate mental/emotional variables in biofeedback training.

Conceptual Error: Biofeedback has a power of its own independent of the user; the conscious mind does not play a significant role in biofeedback (Category Mistakes #1 and #2).

This conceptual error is the fundamental presupposition for the category mistakes, and of the conceptual and methodological errors described above. By “mind” we mean that uniquely human complex of emotions, expectations, self-talk, visualizations, goals, volition, private agendas, perceptions, beliefs, attitudes, language and consciousness. Not of great importance in drug and animal research, these variables have been neglected in human research. Had these human variables been acknowledged, facilitated and employed in biofeedback research, the field might have advanced rapidly in the first decade.

Understanding the category mistakes enables us to understand the contention that consciousness is not necessary for successful biofeedback training. In spite of the failure of their research to achieve either a training effect or a treatment effect, Guglielmi, Roberts and Patterson (1982) nevertheless argue this point: “Furthermore, in recent years a body of literature has accumulated indicating that what is true for rats also applies to humans [knowledge of the feedback-relevant response is not necessary]” (p.117). To support this contention they quote from a Biofeedback Society of America task force report “. . . There appears to be no basis for the claim by many clinicians that awareness of the feedback-relevant response is necessary in order to achieve self-control over the response . . . In fact, the weight of the evidence to date indicates that nonawareness produces results equal to or better than awareness” (Carlson, 1978, p.7). The few studies that led to this conclusion contain so many methodological errors, including Error #11 (lack of reliability measures) and contradictions (the use of the term “self-control” when self control was never achieved), that this puzzling claim is easily understood. It is possible to so befuddle the feedback group that knowledge of the response is of no benefit whatsoever. In fact physiological change may be the opposite of that intended, indicating arousal or confusion.


Shellenberger and Green 40

Here is a fascinating paradox: on one hand, biofeedback research of the ‘official doctrine” type has failed to acknowledge and investigate the powerful impact of mind on body (to discover for exarnple, which particular cognitions facilitate physiological change and symptom reduction), while on the other hand, it has wholeheartedly attempted to eliminate this impact by calling it the “placebo effect.” This is inconsistent. Verbally and conceptually, researchers deny the existence or power of the conscious mind, and yet confirm the existence and power of the mind by insisting on rigorous research to control for the placebo effect-the mind’s power to influence physiology through belief.

The double-blind design applied to “self-regulation” research is an excellent example of this paradox and the magical thinking about biofeedback. The mind is considered powerful but contaminating and is therefore “removed” from the treatment, or at least “controlled for” in the research design. It is hoped that by systematically controlling for all the cognitive and situational variables that might contaminate the results, at last the pure effect of “biofeedback” will be demonstrated. It is erroneous to assume that biofeedback has a drug-like power that when purified and dispensed in given doses may yield “pure” results. When this is attempted, the magical power seems to fail. This is because it was never there in the first place. Using a drug analogy, ”biofeed-back” fails because so many of the powerful ingredients are removed, ingredients that interact synergistically with the pure feedback of information to facilitate training.

The uniqueness of biofeedback training is that the user of the information, the trainee, produces the physiological information in the first place, in the same way that the user of the mirror produces the information that is reflected in the mirror. Therefore anything that affects the user affects the feedback. Because the information from the instrument is often research data, anything that affects the user affects the data. A continuous synergistic loop exists between the user’s cognitive processing, the impact of cognitions on the user’s physiology, and the feedback information, or data. Any variable that affects one of these components alters the entire process. Many biofeedback researchers have not considered this fact other than to control for the placebo effect. In general, researchers have neither used the impact of cognitions on

Ghost in the Box 41

physiology to the subjects benefit, nor considered the effect on the data that negative cognitions might have. Donald Meichenbaum (1976) states:

. . . It is proposed that the biofeedback literature to date could be compared to the verbal-conditioning literature prior to the active research on the role of awareness in the conditioning process. The research on awareness (e.g., Dulany, 1962; Spielberger & Denike, 1966) questioned whether the experimenter’s reinforcement acted in an automatic fashion and it highlighted the important role of the client’s knowledge of the reinforcement contingencies and his motivation to comply. The biofeedback literature requires as much similar attention to the client’s cognitive processes. Such attention to the client’s cognitive process at each phase of the biofeedback training should result in the training becoming more effective and will help elucidate the mechanisms that contribute to change (p.216).

Meichenbaum, in this article, and Lazarus (1975), have been unheeded. The cognitions of the subject or patient in biofeedback research have been viewed as confounding variables, and have not been elicited or regarded as meaningful data. Engel (1972) writes “. . . it is small wonder that I have not been able to find any consistency among the stories that patients have told me. I am certain that they do not know what they are doing, and that they are just making up stories to please me” (p.205). Undoubtedly this bias against subject reports arises from the belief in the ghost in the box and from the belief that such reports are unscientific, and from the methodologies of the operant conditioning model that disallow an appreciation or enhancement of individual learning styles. If subject reports had been elicited and used to enhance training, biofeedback research might have kept pace with its clinical applications.

Many researchers have believed that the effect of the feedback information can be isolated and studied separately from the human mind using the information. Rather, the evidence for informational biofeedback’s efficacy has to be in the form of control con-

Shellenberger and Green 42

ditions that show that an appreciable amount of increased control can indeed be attributed to the information supplied and not to other placebo-related effects such as motivation, self-instruction, relaxation and subject selection” (Furedy, 1979, p.206). This is an impossible task–as impossible as trying to study the characteristics of water by isolating and studying the characteristics of hydrogen.   Like hydrogen and oxygen that combine to create water, information feedback and the conscious mind using the information combine in an interactive process called biofeedback training.


Ghost in the Box 43

Methodology Error #10: Failure to establish reliability measures and confidence bands.

Conceptual Error: Psychophysiological parameters in humans are invariant thus reliability studies are not needed (Category Mistake #1).

In 1982, we hired a researcher, John Cooney, Ph.D., not in the field of biofeedback, to help us assess the effectiveness of our biofeedback and stress management programs (Shellenberger et al., 1986). The first task he proposed was to examine the reliability of our measures. Before doing this we searched the literature for reliability studies on EDR, EMG, and thermal measures. To our surprise, we could not find a single article in the major journals on reliability. As a result, we found it necessary to examine reliability coefficients for two different groups (Shellenberger, Green, Cooney, & Turner, 1983). One group (N = 85) was a no-treatment control group that was given a stress profile using EMG, EDR, and thermal measures on day 1 and day 60. A second group (N = 149) was given the same stress profile before and after a ten week course on biofeedback training and stress management.

Simultaneously, the Stress Disorders Clinic at the State University of New York was conducting a reliability study with 15 subjects in which EMG, EDR, thermal, and heart rate measures were recorded on days 1, 2, 14, and 28 (Arena, Blanchard, Andrasik, Cotch, & Myers, 1983).

The conclusions of the two studies were similar: EMG was the only measure that was somewhat reliable. Arena et al. conclude:

The results of the present study contain a straight-forward message for the majority of research involving psychophysiological measurements: investigators must first ascertain how reliable are these measures on their respective subject population and then employ in their research only those measures which are found to be reliable on their populations (p. 458).

The authors of this important study make another point:

Shellenberger and Green 44

Traditionally behavioral assessment has emphasized situational specificity as opposed to stable trait-like characteristics of individuals (Goldfried and Kent, 1972). It seems unusual then that behaviorally-oriented researchers and clinicians would assume that psychophysiological measures are relatively stable over time. Unfortunately, this assumption seems to have been the case; with but one exception (Sturgis, 1980), behavioral investigators have in recent years not concerned themselves with issues of reliability of this class of measures (p.443).

The failure to determine the reliability of psychophysiological measures in humans arises primarily from an assumption implicit in Category Mistake #2-human physiology is as invariant as laboratory animal physiology. The physiology of the caged animal is assumed to be minimally affected by situational conditions and not affected at all by “cognitive” variables. The unstated assumption is that human physiology is the same and ticks away day after day, invariant unless manipulated by an external power such as biofeedback or drugs. It is implicitly assumed that reliability studies against which to measure the impact of the treatment are not necessary.

Human physiology, however, is affected by multiple situational and cognitive variables outside the experimenter’s control: interpersonal stressors, monetary stressors, diet, exercise habits, drugs, belief systems, expectations, volition. Because these variables affect psychophysiological measures, they affect reliability. As a result, comparing an individual’s baseline readings to training readings over several sessions becomes meaningless in the absence of the standard error of measurement.

The purpose of the standard error of measurement is to estimate the stability of measures over time. The standard error of measurement (Se) given by the formula below requires knowledge of the standard deviation (S) and reliability (rxx) of the measurements.

Se = S (sq.root) l-rxx

Once the Se for each measurement is obtained, 68% or 95% confidence bands are placed about each data point on a person’s initial baseline session. To our knowledge, no standardization of

Ghost in the Box 45

psychophysiological data has included confidence bands. Without norms and confidence bands it becomes difficult to determine whether or not an individual’s change from baseline to training sessions is:

(1) a normal fluctuation of psychophysiological states, or

(2) adaptation to the experimental or clinical situation, or

(3) genuine learning, or

(4) clinically significant learning (i.e. the individual has mastered deep levels of relaxation).

To illustrate the problem, it is useful to examine confidence bands based on the most reliable measure, forehead EMG. Figure 1 shows the confidence bands (68%) of forehead EMG measures of a 38 year old healthy male [no symptoms (Cornell Medical Index), normal Minnesota Multiphasic Personality Inventory, and normal State-Trait (STAI)] who participated in the no-treatment control group (mean age for males was 35, SD 4.8)

These confidence bands show a tremendous variability in the subject’s physiology. The group standard deviation at baseline of 8.7 �V peak-to-peak and the reliability coefficient of .52 accounts for the large variability.

Figure 1: EMG Confidence Bands

In an attempt to improve our reliability measures and achieve the .80 reliability coefficient standard set by the American Psychological Association (1974), and to narrow the confidence bands, we replicated the reliability study of the Stress Disorders

Shellenberger and Green 46

Clinic (Arena, et. al, 1983). In our replication study (Shellenberger and Lewis, 1986), physiologic measures on days 1, 2, 14, and 28 under stressed and relaxed conditions were recorded from 15 healthy subjects. No feedback or training was given. In addition, an attempt was made to regulate and control for many factors that might influence the reliability of the results-diet, drugs, smoking, medications, exercise, stress, humidity and temperature of the testing room, posture during testing, adaptation period, time of day, time of week, emotional state, illness, skin preparation on hands and forehead, electrode impedance, equipment calibration, eyes open/closed, and menstrual cycle. The results were disappointing. Baseline correlations between day 1 and day 28 were .14 for EDR, .16 for hand temperature, and .52 for forehead EMG.

One reason for the low correlations was the large adaptation effect that many individuals demonstrated (discussed in Error #11).  The difficulties discussed above are also characteristic of research on other aspects of physiological functioning. For example, John Cohen (1985) in his article, “Stress and the Human Immune Response: A Critical Review,” points out that many research studies show a statistically significant difference between control and experimental groups but the differences are too small to be of biological significance. ” ‘Statistically, but not biologically significant’ is a phrase immunologists love to use” (p.168). Cohen states: ”In most laboratories the day-to-day and subject-to-subject variation in mitogen response is such that the range of normal is very wide” (p.172). This wide range of normal is characteristic of biofeedback measures as well. Differences between control and experimental groups that are found to be significant may fall within the normal variation.

In an excellent methodology article, Banderia et al. (1982), makes a similar criticism of heart rate biofeedback studies and points out that one study obtained a statistically significant heart rate change of .87 beats per minute. This is clearly within the range of nor mal heart rate variation. Along the same line, Fahrion, Norris, Green, and Green (1986) state, “Most biobehavioral research programs have focused on direct blood pressure feedback which produces statistically significant but not clinically significant effects . . . (p.18).” Other examples include two microvolt changes or less on EMG measures peak-to-peak (Kinsman, et al., 1975;

Ghost in the Box 47

O’Connell & Yeaton, 1981; Philips, 1977) and minimal changes in temperature as discussed in Errors #6 and #7. Too often researchers have made claims about the success or failure of biofeedback training on the basis of statistically significant changes in physiological measures without demonstrating the reliability of their measures or demonstrating that a physiologically significant change did or did not occur beyond the range of normal variation. Statistically, but not physiologically significant, aptly describes many biofeedback studies.

Researchers need to establish reliability scores for psychophysiological measures. And, confidence bands and norms need to be established for many variables-sex, age, disease type, level of stress or relaxation, medication status, biofeedback modality, and instrumentation characteristics. This is a difficult task that has not been attempted until recently. In discussion of Error #12 on mastery, we propose a procedure to effectively resolve this problem.

She1lenberger and Green 48


Methodology Error #11: Failure to control for adaptation.

Conceptual Error: psychophysiological parameters in humans are invariant (Category Mistake #2).


The adaptation effect has been well documented and discussed in psychophysiological research, (Kamiya, 1977; Yates, 1980), and while some biofeedback studies have controlled for it, others have not. This error arises from the assumption of invariant measures and/or the assumption of the overwhelming power of biofeedback to create change beyond all other variables.

Data from our reliability studies (Shellenberger and Lewis, 1986, Shellenberger, et al. 1983) indicate that adaptation in some individuals accounts for much of the variance. For example, seven subjects in the repeated profile study exhibited considerable EMG adaptation from Session 1 to Session 4, 28 days later. The mean EMG score for this group in Session 1 was 16.0 microvolts peak-to-peak, and in Session 4 was 8.9 microvolts peak-to-peak, an adaptation change of – 7.1 microvolts, indicating relaxation. Seven subjects had a mean temperature increase from 80.5�F in Session 1 to 87.4�F in Session 4, an adaptation increase of 6.9�F, again indicating increased relaxation. These data show across session adaptation.

Equally impressive data indicate within session adaptation. In a normative study in our laboratory (Shellenberger, et al., 1983), of 121 female subjects, (mean age 30.9, SD 11.9, range 17-79), without a 20 minute adaptation period, hand temperature increased from baseline of 78.7�F (SD 7.2) to 84.4�F (SD 8.7) in the first twenty minutes of a standard stress profile. Mean EMG scores for these subjects dropped from 14.5 (SD 8.6) microvolts peak-to-peak at baseline reading to 9.8 (SD 6.4) microvolts peak-to-peak. In another normative study, Kappes and Morris (1982) found similar results for females (N = 34) on hand temperature. They observed an increase from an initial baseline mean of 84�F to an ending mean of 91�F twenty minutes later. It is clear from these studies that through adaptation alone EMG changes of 7.1 microvolts peak-to-peak or more and temperature increases of 7�F or more may occur within a single session.

It is important to note that the adaptation effect discussed above

Ghost in the Box 49

is the effect of relaxation. It is now clear why studies that have not controlled for adaptation by giving all subjects an extended adaptation period of greater than 20 minutes (with electrodes attached in the experimental and control groups) may find that the relaxation or no-treatment control groups show as much or greater change than the biofeedback treatment group.

Studies using a no-treatment control group often require control subjects to sit quietly in the training room for the same period of time as the experimental subjects; physiological change within sessions and between sessions may result. Davis (1980) describes this procedure: “Subjects in the no-treatment condition were instructed to relax as deeply as they could, using any means that was helpful to them” (p.60). The experimental subjects on the other hand, are immediately subjected to electrode or thermister attachment and feedback, and are given a task to perform, in some cases with no clear idea of how to proceed. Except for the experimental variable, feedback, these groups are assumed to be equal. Clearly these groups are not equal. The no-treatment condition allows adaptation to occur rapidly, while the biofeedback treatment condition may stress the subject and delay both adaptation and learning. When group results are compared, the biofeedback treatment group appears to have done no better than the no-treatment group. Invariably the erroneous conclusion is that biofeedback is not a useful training technique, (Davis, 1980; Nielson & Holmes, 1980; Stoffer, et al., 1979).

Shel1enberger and Green 50


The Mastery Model


Methodology Error #12: Failure to train to mastery.

Conceptual Error: Achieving and demonstrating mastery (true self regulation) of the physiological variable being studied is not necessary; the power is in the box (Category Mistake #1).

Few researchers have developed criteria for training, or advocated training to criteria, and even fewer have trained for psychophysiological self mastery. After many years of research and clinical practice, however, we believe that training for mastery is essential. By mastery we mean the ability to demonstrate the learned skill under adverse conditions, both in and out of the laboratory or clinic. Guidelines for the demonstration of learned skills will be discussed in Chapter 4, Successful Biofeedback Training.

Training to mastery, and the demonstration of mastery are important for many reasons. Stoyva and Budzynski (1974) explicated the first reason by emphasizing that to ensure transfer of skills from laboratory to “real life,” stressful situations must be created for the trainee that will simulate real life situations. As a result, Budzynski, Stoyva and Peffer (1977) have done considerable research on determining stressors that will allow the individual to demonstrate self-mastery skills. The need for such training became apparent to Budzynski while working with a patient with elevator phobia (Budzynski, 1977). The patient was given desensitization training with deep relaxation but was unable to control the phobic response when he was unexpectedly confronted with an elevator full of conventioneers. The patient returned to therapy, with the added dimension of training for the worst possible situation, a technique called “flooding.” This added ingredient ensured mastery and enabled the patient to transfer the training to all situations. In another case, Miller (1976) trained a patient with hypertension to voluntarily create substantial decreases in diastolic pressure, through extensive training (50 sessions). Miller writes “This patient seemed to be cured because similar decreases were observed on the ward. However, under an unusual combination of emotional stresses, her baseline blood pressure rose, she lost volun-

Ghost in the Box 51

tary control, and had to be restored to antihypertensive drugs. After the situational stresses were largely resolved, she returned to training approximately 2.5 years later and has rapidly regained a large measure of voluntary control (Miller, 1976, p.372). The need for mastery training is clear, that is, the practice and demonstration of the learned skill under adverse conditions as an integral part of training.

Second, mastery reinforces a sense of control in the trainee. The importance of this to successful training cannot be overemphasized. In fact, one study with headache patients (Holroyd, Penzien, Hursey, Tobink, Rogers, Holm, Hall, Marcille, & Chila, 1984) indicates that a sense of control, even if false, may be associated with symptom reduction (subjects were told that they were successfully decreasing forehead EMG when in fact they were being trained to increase EMG).

Third, by gaining mastery based on successful learning patients come to “know that they know” and can maintain and use their skills as needed. Arthur Gladman (1981) who began using biofeedback therapy in his psychiatric practice over fifteen years ago, writes:

At the moment that the patient becomes aware that he and he alone is changing his symptoms, his concept of himself changes . . . learned deep relaxation and control of a physiologic state in itself produces real change in the physical state and cannot be discounted but, the fact that the individual has developed a sense of mastery, a shift in locus of control, must also be considered in accounting for the remarkable changes that occur in biofeedback training (p. 15).

Mastery facilitates transfer of skills and brings freedom from the feedback equipment and from the therapist.

A fourth reason for demonstrating psychophysiological self mastery is related to the question “How do we, the researcher or clinician, know when the trainee has really learned to control a psychophysiological process?” The ability to determine the extent to which the trainee has learned is hindered by (a) the adaptation effect, (b) lack of reliability measures and confidence bands,

Shellenberger and Green 52

(c) wide range of normal variation, (d) the clinical and experimental impracticability of controlling inter- and intrapersonal variables that influence physiology, and (e) dubious inference from group data to the individual (Banderia, et al. 1982). All these problems prevent accurately assessing learning of the experimental or treatment group, and correctly assessing the relationship between psychophysiological training and symptom reduction.

As noted earlier, many researchers have either not attempted to establish a training effect at all, or they have attempted with improper research design to demonstrate learning by (a) comparing baseline scores to training scores without reliability measures or by (b) defining learning as a degree of change in the physiological process being studied that reaches statistical significance, or by (c) defining learning as any change in the physiological process.

The most eloquent and practical method for avoiding these difficulties is to incorporate the demonstration of psychophysiological mastery into research and clinical practice. This can be done by creating situations in which the trainee can demonstrate self-regulatory abilities, just as we use academic examinations for students to demonstrate learning or competitive events for athletes to demonstrate athletic skill. For example, mastery of blood flow and the concurrent relaxation response might be demonstrated by increasing hand temperature on command at a rate of one degree Fahrenheit per minute or greater, in a cold room.

Patients often do demonstrate mastery of psychophysiological variables and stress management skills by handling stressful life events in new ways without exacerbation of symptoms. But in laboratory situations the demonstration of mastery is even more important scientifically because only then can correct conclusions be drawn about the nature of biofeedback training, its potential and its limitations.

If demonstration of mastery is accepted as the sine qua non for successful training and for conclusions about biofeedback training, the preceding twelve errors will necessarily be eliminated.

In conclusion, a demonstration of mastery should be incorporated into research and clinical methodology because: (a) It is the only way to know with certainty that the trainee has learned self regulation, particularly when responses are within the normal range of

Ghost in the Box 53

variation (as might be true for successful treatment of a Raynaud’s disease patient, for example), and (b) training to mastery is good therapy.

Attention: The Ghost in the Box is “Shareware”!
Please Register Your Copy Today

This internet publication of the historic monograph “From the Ghost in the Box to Successful Biofeedback Training” is itself an historic event — the first known publication of a masterpiece previously-published document as “shareware”.

For years computer software has been published under this “honor system”.  You can download a program for free, try it out, and see if you like it. And, if you continue to use the program, you are honor-bound to send the modest registration fee to the author.

If you download and read this Internet Edition of GHOST, either the whole book or any one or more of its chapters, and if you allow its message to influence your thinking about Biofeedback, you are asked to submit the modest sum of FIVE DOLLARS directly to the book’s authors, Bob and Judy.  That’s even a bargain, since the original 1986 publication, which is herewith reproduced in its entirety, cost $9.95!  Unfortunately, it has been out-of-print for several years, but would surely cost more if reprinted today.

License.  Payment of the $5.00 registration fee entitles the reader to print one (1) copy of the entire text, or any part of the entire text, for personal use.  Up to ten additional copies may be printed, provided that this notice is always included (once) and each recipient understands that the shareware fee applies to each and every printed copy of the book or any chapter of the book.   Reproduction beyond the scope of this license is a violation of US and International Copyright Laws.

To Register your copy of “Ghost”, print this page and send it with your check or US$ 5.00 cash to:

Bob Shellenberger & Judy Green
c/o Psychology Department
Aims Community College
PO Box 69
Greeley, CO 80632 USA

Dear Bob and Judy:

Thanks for making “The Ghost in the Box” available again. 

Name: _________________________________________

Address: _______________________________________

City/State: ______________________________________

Email Address: __________________________________

Remember, send only one registration per person, regardless of how many chapters you have.

A revised edition of Ghost is in the planning stages; if you do include your name and address, you will be notified if and when it becomes available.

A brief description of your professional involvement in biofeedback would be of great interest to the authors.

Leave a Reply

Your email address will not be published. Required fields are marked *