Scenario
Smart businesses in all industries use data to provide an intuitive analysis of how they can get a competitive advantage. The real estate industry heavily uses linear regression to estimate home prices, as cost of housing is currently the largest expense for most families. Additionally, in order to help new homeowners and home sellers with important decisions, real estate professionals need to go beyond showing property inventory. They need to be well versed in the relationship between price, square footage, build year, location, and so many other factors that can help predict the business environment and provide the best advice to their clients.
Prompt
You have been recently hired as a junior analyst by D.M. Pan Real Estate Company. The sales team has tasked you with preparing a report that examines the relationship between the selling price of properties and their size in square feet. You have been provided with a Real Estate Data Spreadsheet spreadsheet that includes properties sold nationwide in recent years. The team has asked you to select a region, complete an initial analysis, and provide the report to the team.
Note: In the report you prepare for the sales team, the response variable (y) should be the listing price and the predictor variable (x) should be the square feet.
Specifically you must address the following rubric criteria, using the Module Two Assignment Template:
- Generate a Representative Sample of the Data
- Select a region and generate a simple random sample of 30 from the data.
- Report the mean, median, and standard deviation of the listing price and the square foot variables.
- Analyze Your Sample
- Discuss how the regional sample created is or is not reflective of the national market.
- Compare and contrast your sample with the population using the National Summary Statistics and Graphs Real Estate Data PDF document.
- Explain how you have made sure that the sample is random.
- Explain your methods to get a truly random sample.
- Generate Scatterplot
- Create a scatterplot of the x and y variables noted above. Include a trend line and the regression equation. Label the axes.
- Observe patterns
- Answer the following questions based on the scatterplot:
- Define x and y. Which variable is useful for making predictions?
- Is there an association between x and y? Describe the association you see in the scatter plot.
- What do you see as the shape (linear or nonlinear)?
- If you had a 1,800 square foot house, based on the regression equation in the graph, what price would you choose to list at?
- Do you see any potential outliers in the scatterplot?
- Why do you think the outliers appeared in the scatterplot you generated?
- What do they represent?
You can use the following tutorial that is specifically about this assignment. Make sure to check the assignment prompt for specific numbers used for national statistics and/or square footage. The video may use different national statistics or solve for different square footage values.
Module Two Assignment Rubric
Criteria | Exemplary | Proficient | Needs Improvement | Not Evident | Value |
---|---|---|---|---|---|
Generate a Representative Sample of the Data | N/A | Includes a random sample of 30 from a region and descriptive statistics for the sample (100%) | Shows progress toward proficiency, but with errors or omissions; areas for improvement may include a sample that is not truly random or has incorrect descriptive statistics (55%) | Does not attempt criterion (0%) | 20 |
Analyze Your Sample | Exceeds proficiency in an exceptionally clear manner (100%) | Discusses how the regional sample created is or is not reflective of the national market and explains how the sample is random (85%) | Shows progress toward proficiency, but with errors or omissions; areas for improvement may include inaccurate descriptions of the extent to which the sample is reflective of the population and random (55%) | Does not attempt criterion (0%) | 25 |
Generate Scatterplot | Exceeds proficiency in an exceptionally clear manner (100%) | Creates a scatterplot of the x and y variables including a trend line and the regression equation (85%) | Shows progress toward proficiency, but with errors or omissions; areas for improvement may include inaccuracies within the scatterplot or definitions of x and y (55%) | Does not attempt criterion (0%) | 20 |
Observe Patterns | Exceeds proficiency in an exceptionally clear, insightful, or sophisticated manner (100%) | Defines x and y, provides descriptions of association and shape, makes cost projections based on the regression equation and discusses outliers (85%) | Shows progress toward proficiency, but with errors or omissions; areas for improvement may include definitions of x and y, inaccuracies in descriptions of association or shape, inaccuracies in cost projections, or discussion of outliers (55%) | Does not attempt criterion (0%) | 25 |
Articulation of Response | Exceeds proficiency in an exceptionally clear, insightful, sophisticated, or creative manner (100%) | Clearly conveys meaning with correct grammar, sentence structure, and spelling, demonstrating an understanding of audience and purpose (85%) | Shows progress toward proficiency, but with errors in grammar, sentence structure, and spelling, negatively impacting readability (55%) | Submission has critical errors in grammar, sentence structure, and spelling, preventing understanding of ideas (0%) | 10 |
Total: | 100% |