# Report: Selling Price and Area Analysis for D.M. Pan National Real Estate Company-Nursing Pape Examples

## Introduction (Selling Price and Area Analysis)

Smart businesses today use data to provide an intuitive way to gain a competitive advantage. The real estate industry heavily uses linear regression to estimate home prices since the cost of housing is currently the largest expense for most families. For this paper, I will act as the newly hired junior analyst for D.M. Pan Real Estate Company. After analyzing the data provided, I will prepare a report for the sales team (Selling Price and Area Analysis).

## Representative Data Sample

The sample mainly obtained from the West South Central region. Table 1 shows the calculated statistics.

Statistics | State | County | listing price | $’s per square foot | square feet |

Mean | 345,063 | $ 122 | 2,864 | ||

Median | 264,050 | $ 122 | 2,141 | ||

Standard Deviation | 163371 | $ 16 | 1396 |

The mean, median, and standard deviation of the listing price are $345,063, $264,050, and $163,371, respectively. The mean, median, and standard deviation of prices per square foot are $122, $122, and $16, respectively. Lastly, the mean, median, and standard deviation of square feet are 2,864, 2,141, and 1,396, respectively (Selling Price and Area Analysis).

## Data Analysis

Only a sample of 30 remained extracted from the entire population of 1000 counties. This remains a very small sample compared to the 10% convention. Consequently, The regional sample selected is not reflective of the national market. The mean listing price of the national market is $288,407, while that of the selected region is $345,063. The difference is quite big ($56,656). The same is the case for the price per square foot. The national market has a mean price per square foot of $142, while that of the selected region is $122 (Selling Price and Area Analysis).

To ensure randomness, the data selection occured randomly. Every subject in the population got an equal chance for the sample selection. The selection method involved a single random selection and needed little advanced information about the population (Selling Price and Area Analysis).

## Scatterplot

Chart 1 shows the scatterplot.

**The Pattern** (Selling Price and Area Analysis)

From the graph, the x-axis is the horizontal axis. It represents square feet. The Y-axis is the vertical axis, and it represents the listing price. The variable on the x-axis is usually used to make predictions. In this case, the square feet variable is useful in making predictions.

There is an association between x and y. As the value of the square feet increases, the value of the listing price also increases. Hence, this explains the rising pattern of the trend line. The association is positive. The relationship is linear. The graph has a linear pattern – which is further proved by the trend line.

The regression equation (as obtained from the trend line) is y = 112.72x + 22234, where y is the listing price and x is the square feet. Given 1800 square feet, the value of the listing price can be calculated using the equation (Selling Price and Area Analysis).

y = 112.72x + 22234

y = 112.72(1800) + 22234

= $225130.

Therefore, given 1800 square feet, I would choose a price of 225130 to list the house.

Looking at the graph, there are potential outliers. Outliers are observations, which lie at an abnormal distance from other values in a random sample of a population. The points (5,777, 726,800), (5,284, 476,700), and (5,962, 727,900) are outliers. These three points lie farthest from the regression line. Mostly, outliers appear not to fit the pattern of the graph. Mostly, they are removed from the data before plotting. However, they had to be plotted, as they were part of the sample data. Outliers were present because they were numerically distant from the rest of the data, making them appear out of place. They appeared since some counties had very high listing prices and square foot values as compared to others. Outliers are crucial since they can have a large impact on the statistics derived from the sample data. As earlier mentioned, they represent extreme values of the sample data.