tiple regression equations, however, one is able to determine the optimum weight which should be given to each factor, that is, the weight to give it so that the highest correlation or predictive power will be obtained. Therefore, the fact that a combination of high-school English mark and the high-school general average resulted in a higher correlation with freshman rhetoric than did the former alone, merely means that the weighting of English equally with other subjects in computing the general average is not high enough to yield the best prediction and, therefore if it is only given equal weighting with the other subjects in this average, it should be introduced again with the relative weight indicated by the multiple regression coefficient to accomplish this purpose. The direct method of securing the same result would be to use no averages of marks in different high-school subjects, but to consider each as a separate variable or criterion in the multiple correlation work. The reason this was not done was that it would have increased very greatly the amount of calculation necessary without yielding more helpful results than the method used. It would, of course, have shown exactly just which of the subjects entering into the high-school average were useful for making the best prediction in each case and which were not, but there seems little advantage in knowing this, provided one knows how to make as good an estimate without this knowledge and with even less labor. Not only was much work saved in computation, but also in the use of results, since the multiple coefficients and regression equations secured involve, on the whole, fewer variables or criteria than would be the case if averages of high-school subjects had not been taken and therefore require less computation in employing them for predictive purposes. The objection can be raised that there are included in the general high-school average marks made in subjects which show much lower correlations with the freshman subject being considered than do those of certain other high-school subjects and that the inclusion of these marks may have lowered the correlation between the freshman subject mark and the high-school average. This contention is true, but the writer believes that for all practical purposes any such results have been taken care of by including in the multiple correlations and regressions the subjects which appeared at all likely to make any contribution. to them. Thus, for example, if freshman French mark was best predicted by a combination of high-school marks in English, French and Latin and point score, rather than by including the general high-school average, the method of computation used eliminated the latter. In any event, in view of the practical limitations of time and money, it seemed wise, if not absolutely necessary, to follow the method described above. The measures of accuracy of prediction obtained in this study. Finally, as a measure of the accuracy or reliability of predictions based upon coefficients of correlation and regression equations, the coefficients of alienation and the probable errors of estimate corresponding to each of the former expressions were determined. The first of these, the coefficient of alienation, is an expression which shows the relationship between the prediction based upon a given coefficient of correlation and a pure guess. For example, the coefficient of alienation which corresponds to a correlation coefficient of .65 is approximately .76. This means that if two variables or series of scores correlate .65 with each other, the estimates of particular scores in one series based upon corresponding known scores in the other will on the average be in error by about .76 as much as if the errors resulted from pure guesses, or, subtracting .76 from 1.00, that the errors will be .24 smaller than those in pure guesses. The probable error of estimate describes the same situation by stating the limits within which half of the errors will fall. For example, if the probable error of estimate is found to be 4 points on a percentile scale, it means that half of the estimated scores will not vary from the true scores by more than 4 per cent, and, of course, that the other half will differ by more than this amount. These two indices, the coefficient of alienation and the probable error of measurement, give a more concrete and meaningful description of the accuracy of prediction than does the coefficient of correlation. 'For a more complete discussion of the coefficient of alienation and the probable error of estimate, see Chapter VI. Also: ODELL, C. W. "The interpretation of the probable error and the coefficient of correlation." University of Illinois Bulletin, Vol. 23, No. 52, Bureau of Educational Research Bulletin No. 32. Urbana: University of Illinois, 1926, p. 28-32 and 41-45, and ODELL, C. W. Educational Statistics. New York: The Century Company, 1925, p. 173-74, 230-41, or some other text on the same subject. CHAPTER IV THE SIMPLE CORRELATIONS BETWEEN FRESHMAN The simple correlations computed in this study. At the risk of repeating a portion of the outline of the study given in the last chapter, it seems worth while to state again what correlations were and were not found. The simple or zero-order coefficients obtained are shown in Table III, the first column of which gives the correlations of the freshman marks with age, the second those with point score, the third with I. Q., and the fourth with the general high-school average. Following this are the coefficients found between the marks in various freshman subjects and those in high-school subjects or groups of subjects selected as being most similar to the freshman ones, or as most likely to exhibit significant correlations with them. Thus, for example, the first row of the table shows that freshman accountancy mark had a correlation of 18 with age, .28 with point score, .29 with I. Q., .47 with highschool average, .38 with high-school commercial average and .47 with high-school mathematics average. As was mentioned in Chapter III, correlation coefficients between certain possible criteria and college marks are not included in this table because, after computing quite a number of them, it appeared that they were of so little value for the purpose of this investigation as not to be worth further consideration. These were the coefficients of the freshman subject marks with the amounts of particular subjects carried in high school and with particular years' marks in high-school subjects, rather than with the average for all of each subject. It will be noted that a number of the coefficients given in Table III are enclosed in parentheses. These are the ones which, because of the joint effect of their small size and the few cases concerned, are less than twice their standard errors or three times their probable errors and so can hardly be considered reliable. The chances are greater than twenty-one or twenty-two to one that all of the co 1 "The formula for the standard error of a coefficient of correlation is and 1-r2 VN' in which r is the coefficient of correlation and that for the probable error .6745 N the number of cases. Thus the greater the coefficient and also the greater the number of cases the smaller is the error and the greater the reliability of the coefficient. |