Understanding Simple Linear Regression
Statistical Technique in Review
In nursing practice, the ability to predict future events or outcomes is crucial, and researchers calculate and report linear regression results as a basis for making these predictions. Linear regression provides a means to estimate or predict the value of a dependent variable based on the value of one or more independent variables. The regression equation is a mathematical expression of a causal proposition emerging from a theoretical framework. The linkage between the theoretical statement and the equation is made prior to data collection and analysis. Linear regression is a statistical method of estimating the expected value of one variable, y, given the value of another variable, x. The focus of this exercise is simple linear regression, which involves the use of one independent variable, x, to predict one dependent variable, y.
The regression line developed from simple linear regression is usually plotted on a graph, with the horizontal axis representing x (the independent or predictor variable) and the vertical axis representing the y (the dependent or predicted variable; see Figure 14-1). The value represented by the letter a is referred to as the y intercept, or the point where the regression line crosses or intercepts the y-axis. At this point on the regression line, x = 0. The value represented by the letter b is referred to as the slope, or the coefficient of x. The slope determines the direction and angle of the regression line within the graph. The slope expresses the extent to which y changes for every one-unit change in x. The score on variable y (dependent variable) is predicted from the subject’s known score on variable x (independent variable). The predicted score or estimate is referred to as Ŷ (expressed as y-hat) (Cohen, 1988; Grove, Burns, & Gray, 2013; Zar, 2010).
FIGURE 14-1 GRAPH OF A SIMPLE LINEAR REGRESSION LINE140
Simple linear regression is an effort to explain the dynamics within a scatterplot (see Exercise 11) by drawing a straight line through the plotted scores. No single regression line can be used to predict, with complete accuracy, every y value from every x value. However, the purpose of the regression equation is to develop the line to allow the highest degree of prediction possible, the line of best fit. The procedure for developing the line of best fit is the method of least squares. If the data were perfectly correlated, all data points would fall along the straight line or line of best fit. However, not all data points fall on the line of best fit in studies, but the line of best fit provides the best equation for the values of y to be predicted by locating the intersection of points on the line for any given value of x.
The algebraic equation for the regression line of best fit is y = bx + a, where:
In Figure 14-2, the x-axis represents Gestational Age in weeks and the y-axis represents Birth Weight in grams. As gestational age increases from 20 weeks to 34 weeks, birth weight also increases. In other words, the slope of the line is positive. This line of best fit can be used to predict the birth weight (dependent variable) for an infant based on his or her gestational age in weeks (independent variable). Figure 14-2 is an example of a line of best fit that was not developed from research data. In addition, the x-axis was started at 22 weeks rather than 0, which is the usual start in a regression figure. Using the formula y = bx + a, the birth weight of a baby born at 28 weeks of gestation is calculated below.
FIGURE 14-2 EXAMPLE LINE OF BEST FIT FOR GESTATIONAL AGE AND BIRTH WEIGHT141
The regression line represents y for any given value of x. As you can see, some data points fall above the line, and some fall below the line. If we substitute any x value in the regression equation and solve for y, we will obtain a ŷ that will be somewhat different from the actual values. The distance between the ŷ and the actual value of y is called residual, and this represents the degree of error in the regression line. The regression line or the line of best fit for the data points is the unique line that will minimize error and yield the smallest residual (Zar, 2010). The step-by-step process for calculating simple linear regression in a study is presented in Exercise 29.