Fred L. Bookstein (1994) Partial Least Squares: a Dose-response Model for. Psycoloquy: 5(23) Least Squares (1)

MEASUREMENT IN THE BEHAVIORAL AND BRAIN SCIENCES

Target Article by Bookstein on Least Squares

Center for Human Growth and Development

The University of Michigan

Ann Arbor, Michigan 48109

(313) 764-2443

fred@brainmap.med.umich.edu

Partial Least Squares (PLS) is a relatively new multivariate statistical method for the analysis of indirectly measured cause and effect in complex behavioral systems. The core of the technique is a rearrangement of the singular-value decomposition (SVD) of the correlation matrix between two blocks of variables. In this setting, the SVD can be reinterpreted as dealing with two latent variable (LV) scores, one for each block, such that the coefficients of either are proportional to the predictive salience of the corresponding variable for the other LV. In the presence of a true causal nexus, subsequent statistical manipulation of these coefficients and scores can be very enlightening. The strengths of PLS are demonstrated using the Seattle study of the effects of prenatal alcohol exposure on offspring development. This longitudinal study is based on 13 diverse measures of prenatal exposure and hundreds of outcome scores that assay attentional behavior, neuromotor maturation, cognitive functioning, and socialization to school in a population-based sample of 500 children born in 1975. There is an enduring effect of prenatal exposure on outcomes in all of these channels. I argue that PLS is the best method for discovering and reporting the nature of the dose-response relationship and the characteristics of affected children in studies such as these.

1. This target article calls the attention of the PSYCOLOQUY readership to a relatively new statistical methodology for the analysis of cause and effect when neither can be measured directly but both can be measured redundantly. This familiar measurement design is particularly important at the nexus of the behavioral and brain sciences, where studies often combine multiple indirect measurements of important but brief causes (e.g., prenatal insults) having important consequences of relatively great duration (e.g., the lifespan). A shorter version of this article was published in PSYCOLOQUY in February 1991 (Bookstein, 1991), but elicited no commentary (PSYCOLOQUY was still very young then, and the biobehavioral science community not yet familiar with the concept of interactive electronic publication). Now, three years later, the Editor has invited me to broach the topic again, at greater length. Today it is easier to argue that in its combination of biometric and psychometric themes the approach is worth considering for a much broader range of studies than those in which it has been exploited so far.

2. The technique I am reviewing, "Partial Least Squares" (PLS), is a variant of a family of least-squares models of correlation matrices introduced in the 1920's by the biometrician Sewall Wright (1889-1988) to link path analysis with factor analysis. The technique was rediscovered, and the present name assigned, by the Swedish econometrician Herman Wold (1907-1992), in diverse sociological applications throughout the 1970's; but the explanation that follows is not Wold's, and does not correspond to the algorithm of a program package matching Wold's ideas that was distributed by Lohmuller in the late 1980's. Also, this version of PLS should not be confused with a quite different algorithm of the same name that applies to prediction and classification problems in chemometrics, an algorithm having another Wold for inventor (Svante Wold, Herman's son).

3. The present method was worked out in extensive analyses of neurobehavioral sequelae of prenatal exposure to alcohol in 500 children exposed at levels milder than those bringing on frank Fetal Alcohol Syndrome (FAS). Paul D. Sampson has been my principal statistical collaborator in this work. Ann P. Streissguth of the University of Washington, Seattle, has been Principal Investigator of this project since 1974. Our 1993 monograph (see the reading-list at the end of this paper) explains PLS methods in much greater detail than we have room for here, and also reviews the teratological context of this study, the design of all its phases of data collection, and its major findings through the early school years.

4. Rather than turning directly to these teratological matters, however, I will begin at the core of the technique, with a description of some unexpectedly fruitful matrix maneuvers. The scientist for whom PLS is designed is faced with two lists of ordinary statistical variables; call them the X's and the Y's. Later on, the X's will be measures of prenatal alcohol exposure, and the Y's a great variety of measures of child neurobehavioral functioning. Number the X's from X1 to Xm, the Y's from Y1 to Yn. In the application to come, m is 13 and n 474. Even though the X's and Y's are to have been measured on the same cases, usually they do not share any natural units of measurement; then it is convenient to normalize each of them to mean zero and variance one. Write R for the correlation matrix of the X's with the Y's, m rows by n columns. Its element Rij is the correlation of Xi with Yj. This matrix does not have 1's down the diagonal; usually it is not even square.

5. A certain interesting computation can now be phrased in any of several different ways. All involve the production of a vector of m coefficients Ai, one for each X, together with a vector of n coefficients Bj, one for each Y. In the easiest approach to this nexus of overlapping interpretations, the name "Partial Least Squares" can be thought of as referring to "Least-Squares analysis of Part of a correlation matrix."

6. Suppose we want the vectors A and B for which the dyadic product AB (that is, the m by n matrix whose ij-th entry is Ai*Bj) comes closest to the matrix R in the least-squares sense. There are algorithms (specifically, the singular-value decomposition, SVD) for producing A and B at the same time: they are the first pair of left and right "singular vectors" of the matrix R. When R is a correlation matrix, however, they take on an additional meaning that it is useful to explain from first principles. Assume we already have the vector A somehow and merely wish to compute the corresponding vector B one entry Bk at a time. If Ai*Bj matches R in a least-squares sense, then its k-th column, Ai*Bk, has to do its share in fitting the k-th column of R: we want to choose Bk so that the difference between Rik and Ai*Bk is as small as possible in the least-squares sense -- to minimize the sum of the terms (Rik-Ai*Bk)**2 as i varies from 1 to m. Here Bk is unknown, and the Rik and Ai are already known (by assumption).

7. But this is an ordinary univariate regression (without constant term). So we know the formula for Bk: it is equal to the cross product of the A's by the R's divided by the cross product of the A's by the A's: Bk=sum(Ai*Rik)/sum(Ai**2), both sums taken over i. Given the A's, then, the values Bj of the B's that together guarantee a least-squares fit of Ai*Bj to Rij are proportional to the expression sum(Ai*Rij). The scientific import of PLS is concentrated in a peculiarly useful alternate interpretation of part of this same formula. The expression sum(Ai*Rik), numerator of the least-squares estimate of the value of Bk for the k-th column of R, is at the same time the covariance of the original variable Yk with a new linear combination of the X's, the "latent variable" LV.X=sum(Ai*Xi) that combines the X's using the coefficients of the vector A for weights. That is, sum(Ai*Rik)=cov(Yk,sum(Ai*Xi)) -- this is just an algebraic identity.

8. Similarly, given the values of the B's, we find that each Ak is proportional to the covariance sum(Bj*Rkj) of the original variable Xk with a new "latent variable" LV.Y=sum(Bj*Yj) combining the Y's using the coefficients of the vector B for weights.

9. A computation ostensibly referring only to a matrix of cross-block correlations, then, ends up interpretable in terms of covariances with "latent variables" summarizing the import of either block for predicting the other. Each coefficient, an Ak or a Bk, is proportional to the covariance of its original variable with the LV score of the OTHER block. These coefficients are called "saliences." The coefficients Ai are the saliences of the several Xi for predicting (or being predicted by) the summary score LV.Y combining the Y's, and similarly the coefficients Bj are the saliences of the several Yj for predicting (or being predicted by) the summary score LV.X combining the X's. The A's are the saliences of the X's for the Y's, and the B's are the saliences of the Y's for the X's, all at the same time. We call this a "consistency criterion"; in other fields, such as econometrics, it is called an alternating expression of a fix-point property.

10. Other characterizations of this same pairing of singular vectors A and B follow from other aspects of the singular-value decomposition reviewed in the papers listed in the bibliography below. There are several pairs of vectors A and B that satisfy the "consistency criterion" just reviewed (each element of either proportional to covariances with the latent variable score for the other). One of these pairs has the greatest possible covariance of any pair of linear combinations sum(Ai*Xi), sum(Bj*Yj) when the scales of the A's and B's are controlled separately (by setting sum(Ai**2)=sum(Bj**2)=1). This covariance is called a "singular value" of the original matrix R; the sum of squares of all these covariances is the sum of squares of all the entries of R. So we can talk about the "goodness of fit" of the model Ai*Bj for R in terms of the covariance of the two latent variables involved. The usual statistic from this least-squares fitting problem is the "fraction of summed squared correlation explained" by this pair of LV's. ("Explanation" here does not mean the way correlations explain case values, as in regression or analysis of variance, but the way that latent variables explain correlations.) The larger that fraction, the closer the pattern Ai*Bj comes to fitting all the entries of R -- the more the columns of R all look as if they have the same pattern, the pattern of the vector A of X-saliences, or, equivalently, the more the rows of R all look as if they have the same pattern, the pattern of the vector B of Y-saliences. From the basic formula, the salience Ak of the k-th X is the "amplitude" according to which that row of R matches the general vector B across columns, and vice-versa.

11. Another way of thinking about this same pairing is to imagine a principal components analysis (PCA) of the matrix R as if it were raw data: "rows" that are "cases" Xi, and columns that are "variables" Rij. Then the vector B we're looking for is the ordinary first principal component of the matrix R interpreted in this way (a principal component computed "around zero," not around the means of the columns of R), and the vector A we are looking for is made up of the scores of the "cases" on this first principal component. Or we can imagine the "cases" to be the columns of R, not the rows, which are now the "variables." Now that the problem is transposed, the first principal component of the "rows" is the vector A, and the "scores" on this PC make up the vector B. But in either version, rows or columns, both vectors (A's or B's) actually combine with the original data (X's or Y's) to generate scores over the original sample of actual cases (no quotes this time). It is this shared, symmetric interpretation of what is usually an asymmetric computation that underlies the application to dose-response analysis concerning us here. All these interpretations are guaranteed consistent by the properties of the singular-value decomposition.

12. These "latent variables" LV.X, LV.Y should not be thought of as "factors" or as the sort of LV's computed in other approaches to "structural equations modeling." For now, consider them simply as naked formulas: a latent variable of a block of variables X for predicting a variable Z is the new linear combination sum(cov(Xi,Z)*Xi). In this definition, Z could be any variate at all. The power of this new construction of LV's arises from the computation of LV.X and LV.Y in pairs. LV.Y serves the role of Z for the definition of LV.X -- we have LV.X=sum(cov(Xi,LV.Y)*Xi) -- and, simultaneously, LV.X serves that role for the definition of LV.Y: LV.Y=sum(cov(LV.X,Yj)*Yj).

13. So far I have been describing the "two-block" form of PLS. Our 1993 monograph explains its extension to multiple dimensions within blocks and to analysis of more than two blocks of variables at the same time. The basic algorithm is still least-squares fitting of some off-diagonal parts of a correlation matrix, and it still results in sets of multiple latent variable scores, one per "block," bearing pairwise covariances that are optimal in a certain scientifically useful sense. Lists of variables work well as blocks if they are measured at the same time, represent the same behavioral channel, or can otherwise be expected to have "something in common" in respect of their prediction of or prediction by other blocks in an explanatory scheme.

14. To understand what these formulas can mean in practice, we need to interpret them in the context of how the X's and the Y's were collected in the first place. The A's and B's tell us important things about variables; the latent variable scores LV.X and LV.Y tell us equally important things about dose and response case by case. The remainder of this article, then, is a description of one particular scientific application to which PLS appears exceptionally well matched.

15. PLS seems ideal for studies of dose and response (cause and effect) in systems under indirect observation. These are not studies of "normal variation." Instead, the investigator is typically trying to extend into the range of human observational studies a pure dose-response nexus known to lead to unipolar syndromes in high-dose cases. Fetal alcohol syndrome (FAS) is known to exist and to be caused by prenatal exposure to alcohol in sufficient quantity. The associated dose-response studies analyze the dependence of effect upon cause -- of response upon dose -- in the mildly abnormal case ("social drinking"). The fact of this prior knowledge means that the existence of the underlying statistical tie is not in question. The study is, rather, one of calibration: to find and rank the saliences of the dose measures, or the response measures; and to find and rank the subjects of the study according to their dose or their response scores. For the present study, a sample of 500 Seattle women pregnant in 1974-75 was drawn from 1500 interviews of women in prenatal care by the fifth month in two hospitals. The sample is typical of Seattle pregnancies of the mid-1970's except that in order to increase the precision of the dose-response calibration it was intentionally overweighted for high levels of social drinking. At the time, drinking was not correlated with other aspects of high-risk pregnancies; most of the statistical confounds bedeviling recent studies of this type were inoperative in this one.

16. PLS applies to studies in which cause and effect are each measured variously and redundantly. Our alcohol study includes multiple measures of amount and pattern: averaged dose ("drinks per day" -- a drink is a standardized dose of alcohol, about half an ounce), occasions per month, average drinks per occasion, maximum drinks per occasion, and "bingeing" according to two different categorization schemes. These were assessed at two times during pregnancy ("prior to awareness" and contemporaneously) in the course of one single interview during the fifth month, and were augmented by a summary rating of priority for inclusion in the "follow-up" sample when the child was actually born. The measures of "effect" include quite an assortment: nearly five hundred measures of neurobehavioral functions typically found to be altered in the full clinical expression of Fetal Alcohol Syndrome. These outcomes are gathered into "blocks" by child's age (from 1 day to 14 years), behavioral channel (attentional, motor, mental, and school-related), and modality (laboratory task, expert rating, parent rating, teacher rating, standardized psychometric exam). Analyses proceed both separately block by block and with the outcomes pooled into one list of 474 (through age 7 years only, as reviewed in our monograph). The study is prospective and longitudinal in design, with 82% sample continuity from birth right through 14 years. This continuity is as salutary for PLS analysis as for any other form of biometric investigation.

IV.1 Alcohol Saliences (A's)

17. We find that one single latent measure of dose reliably accounts for most of the profiles of correlation with outcomes whether pooled over seven years or separately block by block at up to fourteen years. (For this report, dose measures have been linearized with respect to this composite outcome. The transformations are in Figure 4.1 of our monograph.) A first pair of LVs accounts for 75% of the sum of squares of the 13 x 474 correlations up through seven years; the incorporation of a second dimension to account for variations in the general weighting of binges increases that fraction to 86%. Even without this refinement, the pattern of saliences of dose confirms the findings from animal studies. Salience for net deleterious outcome is higher for doses early in pregnancy ("prior to awareness") than in mid-pregnancy (when the interview took place), and saliences are higher for patterns of massed drinking ("binges") than for the comparable net intake of alcohol in a pattern of steady drinking. The two-dimensional analysis corresponds to a collection of simpler fits against alcohol, block by block, in which all the alcohol salience vectors A resemble the one that accounts for 75% of all 13 x 474 correlations and differ from it mainly by the weighting of the early/binge components with respect to the others. For the typical outcome, the most salient measure of alcohol dose is average number of drinks per occasion prior to awareness of pregnancy. The most frequently encountered measure of alcohol intake, net ethanol per day, is among the least useful for prediction of the ensuing neurobehavioral deficits.

IV.2 Outcome Saliences (B's)

18. The second set of saliences pertains to outcomes. The first PLS latent variable for the first seven years of outcomes most heavily weights our modification of a Brazelton neonatal measure, habituation to light at age 1 day (an age clearly too early for socioeconomic effects to have intervened); but the next most salient outcome variable is a rating of adjustment to the second-grade school setting. Other outcomes having high salience (high correlation with the alcohol LV) include several other neonatal manifestations of neurological maturity, standardized scores (especially the standard deviation of reaction time) on vigilance tests at ages four through fourteen years, teacher ratings of undesirable classroom behaviors related to learning disability, a sharply focused profile of IQ deficit at 7 years concentrating on arithmetic and digit span subtests, a trade-off of accuracy for speed in a geometric puzzle task at age 14, and a sharp, specific failure to pronounce unfamiliar English phonemics correctly.

IV.3 Case Scores

19. In spite of the long tail of the measure of latent dose, the joint distribution of dose and response LV scores for this sample is well-behaved, as shown in the following scatterplot summarizing the findings up through seven years (Streissguth, Bookstein et al., 1993, Figure 6.1). This pair of LV's incorporates 75% of all the squared correlation between the blocks; the correlation between the scores is 0.29.

-------------------------------------------------------------------------

FIGURE 1

First Alcohol LV Score

............................................................... . * . .F 15.. i . r . s . * * * * t . * *

10.. ** * *O . 2 * * * * 2 * u . 22 * ** ** * * ** t . 25 * * * ** * * 2 * * c 5.. 5* * 2 2 * * ** * o . 24** 3 * * * ** *2 * * ** * m . 57 ** ** 2 **** *2*2* * * * e . 242*3** * ** * * 2 * 2 * * * *

. b0222** *2*2 2 ** ** ** * 3 * ** * * *L 0.. ih 4*335*222 2* 22** ** 2 232*2 * * * * V . gqia3522 *3 *3 *** ***3* * 2

. 0923 **23 * 2 * *2 * *S . a5* *2** *2* 2* * * 2 * c -5.. a02*22*2***** * ** **** ** * * * ** o . 34 * * * * ** r . 242* ** ** * e . 32 2 ** * * * * * *

. *2 * * 3 -10.. 23 * * ** . 3* * . * ............................................................... 0 2 4 6 8 10 12

-------------------------------------------------------------------------

20. In the context of dose-response analysis, the abscissa of this plot is the best composite measure of dose for these responses, and the ordinate the best composite measure of response for these dose measures. The best single statistical realization of the "dose-response curve" of teratological theory would be the trail of a scatterplot smoother through this scatter. After parental education, the alcohol LV is a more significant predictor of this composite outcome than any other measurable factor, and the slope of the apparent dose-response curve is hardly altered at all after such statistical adjustments. Children can show neurobehavioral deficits at all levels of prenatal dose (notice, for instance, the large vertical scatter of the streak at the left, the children of abstainers), but the risk of a high deficit clearly increases with dose. Of the few dozen most heavily exposed subjects (those at the right of the horizontal axis), two were formally diagnosed FAS at birth. We have been able to diagnose 11 others as fetal-alcohol-affected on the basis of consistent deficits over the components of this summary score disaggregated by channel and wave of measurement.

21. It may be helpful to compare this two-block procedure with the more familiar approach of canonical correlations analysis (CCA), which is an optimization of the correlation between a similar pair of scores, likewise linear combinations of the two blocks. Recall that PLS optimizes covariance, not correlation; the distinction is much more important than it seems. Interpreting the coefficients of canonical variates requires the usual stringent assumptions underlying multiple regression of either canonical variate upon the variables of the other block. Such assumptions are unlikely to obtain when predictors or outcomes are intentionally redundant. (In a typical analysis from the Seattle study, alcohol versus 11 IQ subscores, the first three pairs of canonical variates have nearly the same high correlation; but each involves an uninterpretable contrast among the alcohol variables, and none bears much predictively usable covariance.) In contrast, the PLS procedure begins with the assignment of an interpretive meaning to each coefficient, as being a salience for the cross-block prediction problem driving the research: being proportional to covariance with the facing LV. From this follows the optimization of covariance of the normalized LV's. In this, PLS directly generalizes the meaning of the coefficients of a principal component (the linear combination LV.X satisfying the definition of LV with Z = LV.X itself).

22. Because two-block PLS is effectively a principal-components analysis of either the rows or the columns of the cross-block correlation matrix, its pathologies are the milder ones characteristic of PCA (influential observations, clusters) rather than those of multiple regression or likelihood-based modeling of covariance structures. The "data" for the PCA are correlations rather than individual measurements, further ameliorating these difficulties. Another way of thinking about all this construes the alcohol LV score as an averaging of a large list of univariate predictions cov(Xi,LV.Y)Xi of the same outcome LV: one prediction for each predictor in the "predictor block" of alcohol scores. The same averaging applies to the object of prediction, the outcome LV. The apparent regularization of all sorts of quirks of data so that a central tendency can emerge (in PLS, a central tendency of cross-block prediction) is just what one expects an average to do. (As early as 1892, F. Y. Edgeworth had noticed that covariances themselves were already weighted averages of simple slope measures.)

23. Computing the SVD of an arbitrary matrix, such as the cross-correlation matrix R underlying the PLS approach, is emerging as a standard capability in most statistical computing environments. The scatters, diagnostics, and further explorations we have described are not peculiar to the particular linear combinations that are PLS LV's, but can be carried out effectively in almost any statistics package. In our studies of outcome blocks showing alcohol teratogenesis, age after age and channel after channel, after each SVD we routinely produce scatters of the LV scores against each other (as in the example above) to check for outliers and nonlinearity; we scatter the outcome LV against the dose measures and rescale those measures by a gently nonlinear scatterplot smoother if necessary; we check the covariances between dose and response LV's for confounding by the obvious covariates that afflict human teratology studies (other prenatal exposures, social class, education, nutrition) -- so far these have never been a problem; and we verify that the children labeled as "alcohol-affected" at earlier ages remain characteristically in deficit at later ages. Beyond the basic SVD itself, all of these statistical maneuvers are elementary. Our monograph (see the bibliography) explains all of these tactics and lays out the tables and diagrams that lead us to conclude that the primary findings about the pattern of alcohol effects and the pattern of saliences of dose are valid.

24. PLS may be contrasted with diverse other approaches to the same sort of causal explanation. By maximizing covariance between the LV scores, PLS optimizes the usefulness of the analysis for subsequent studies of intervention. Unlike the coefficients of a canonical correlations analysis, the saliences that PLS computes have meaning individually even when (indeed, especially when) the predictor block or the outcome block is intentionally multicollinear. Along with the scores, the saliences can be computed in any statistical package that gives users access to eigenanalysis, so that PLS can be applied to much larger problems than can more sophisticated optimizations. PLS differs from structural equations models in its lack of most distributional assumptions and in that it invariably ignores the within-block factor structure of the dose measures and the response measures separately. In our experience, this structure is quite irrelevant to the assigned task of cross-block explanation. (For instance, alcohol does not affect the general factor of IQ as much as it affects a particular profile of arithmetic deficiency.)

25. As a fit to the cross-correlation matrix rather than the raw data, PLS avoids the difficulty of all likelihood-based structural equation modeling (including multiple regression), namely, that to be interpretable a fitted model must first be "true." PLS is a useful multivariate tool for those who believe, along with me, David Freedman, Clark Glymour, and many others, that structural-equations modeling as applied in the behavioral sciences has never taught us anything we did not already know -- that it has never arrived at any positive conclusions not already built into the hypotheses. While PLS is not designed for the "testing" of "hypotheses," the vectors of saliences A and B can be tested against a null model by bootstrap computations, and similar exploratory resampling data analyses can be applied to substantive aspects of the interpretations that result, such as covariates of LV scores or the reliable identification of types of dose or response measures as particularly salient for each other.

26. Through 1993, this biometric mode of PLS has been applied in diverse evolutionary and developmental studies as well as in the extensive study of alcohol effects to which I've been referring. Many more studies of behavioral/brain development could be cast into a framework for which these essentially simple computations, and the insights they support, might be similarly useful. Our version of PLS was designed to reward careful, conscientious measurement of multiple aspects of familiar but only indirectly observable phenomena and to discourage all modeling, including multiple regression and structural equations, that drifts farther than necessary from such data. Although PLS is not part of any common psychometric statistical package today, its saliences and scores can all be computed in any interactive statistical environment that includes a singular-value decomposition, such as S, Matlab, or SAS. I would welcome comments from readers, whatever their discipline, regarding precursors of this technique, other potential applications, or pitfalls.

27. This version of PLS has been a collective effort of the Pregnancy and Health Study, Department of Psychiatry, the University of Washington. I am grateful for the energies of my colleagues Paul Sampson, Ann Streissguth, and Helen Barr over the years during which these techniques and explanations were crafted. Support for this methodology has been obtained from NIAAA grant AA-01455 to A. P. Streissguth and NIA grant AG-11037 to Fred L. Bookstein. Paul Sampson produced the scatterplot in paragraph 19. [Note that this target article is a revised and updated version of an article that originally appeared in PSYCOLOQUY 2(3) 1991.]

On this particular dose-response form of Partial Least Squares, the best source is our recently published monograph:

Streissguth, A.P., Bookstein, F.L., Sampson, P.D. and Barr, H.M. The Enduring Effects of Prenatal Alcohol Exposure on Child Development. University of Michigan Press, 1993. xxxiv + 301 pp.

An appendix to that monograph lists earlier articles over a range of journals.

Streissguth, A.P., Barr, H.M., Bookstein, F.L. and Sampson, P.D. Neurobehavioral Effects of Prenatal Alcohol. Neurotoxicology and Teratology 11:461-507, 1989.

Bookstein, F.L., Sampson, P.D., Streissguth, A.P. and Barr, H.M. Measuring "Dose" and "Response" With Multivariate Data Using Partial Least Squares Techniques. Communications in Statistics: Theory and Methods 19:765-804, 1990.

Bookstein, F.L. (1991) Partial Least Squares: A Dose-response Model for Measurement in the Behavioral and Brain Sciences. PSYCOLOQUY 2(3). psyc.arch.2.3.91

Carmichael Olson, H., Sampson, P.D., Barr, H.M, Streissguth, A.P. and Bookstein, F.L. Prenatal Exposure to Alcohol and School Problems in Late Childhood: A longitudinal prospective study. Development and Psychopathology 4:341-359, 1992.

Streissguth, A.P., Sampson, P.D., Carmichael Olson, H., Bookstein, F.L., Barr, H.M., Scott, M., Feldman, J. and Mirsky, A.F. Maternal Drinking During Pregnancy: Attention and Short-term Memory Performance in 14-year-old Offspring: A Longitudinal Prospective Study. Alcoholism: Clinical and Experimental Research, in press, 1994.

Two readers in earlier styles of PLS analysis:

Joreskog, K.G., and Wold, H. eds. Systems Under Indirect Observation: Causality, Structure, Prediction. Contributions to Economic Analysis, Volume 139, Part II. Amsterdam: North-Holland, 1982.

Wold, H., ed. Theoretical Empiricism: A General Rationale for Scientific Model Building. New York: Paragon House, 1989.