For this assignment, collect data exhibiting a relatively linear trend, find the line of best fit, plot the data and the line, interpret the slope, and use the linear equation to make a prediction. Also, find r2(coefficient of determination) and r (correlation coefficient). Discuss your findings. Your topic may be that is related to sports, your work, a hobby, or something you find interesting. If you choose, you may use the suggestions described below.
A Linear Model Example and Technology Tips are described in separate topics.
Tasks for Linear Regression Model (LR)
(LR-1) Describe your topic, provide your data, and cite your source. Collect at least 8 data points. Label appropriately. (Post this information as a main topic here in the Project conference as well as in your completed project. Include a brief informative description in the title of your posting. Each student must use different data.)
The idea with the conference posting is two-fold: (1) To share your interesting project idea with your classmates, and (2) To give me a chance to give you a brief thumbs-up or thumbs-down about your proposed topic and data. Sometimes students get off on the wrong foot or misunderstand the intent of the project, and your posting provides an opportunity for some feedback. Remark: Students may choose similar topics, but must have different data sets. For example, several students may be interested in a particular Olympic sport, and that is fine, but they must collect different data, perhaps from different events or different gender.
(LR-2) Plot the points (x, y) to obtain a scatterplot. Use an appropriate scale on the horizontal and vertical axes and be sure to label carefully. Visually judge whether the data points exhibit a relatively linear trend. (If so, proceed. If not, try a different topic or data set.)
(LR-3) Find the line of best fit (regression line) and graph it on the scatterplot. State the equation of the line.
(LR-4) State the slope of the line of best fit. Carefully interpret the meaning of the slope in a sentence or two.
(LR-5) Find and state the value of r2, the coefficient of determination, and r, the correlation coefficient. Discuss your findings in a few sentences. Is r positive or negative? Why? Is a line a good curve to fit to this data? Why or why not? Is the linear relationship very strong, moderately strong, weak, or nonexistent?
(LR-6) Choose a value of interest and use the line of best fit to make an estimate or prediction of the future. Show calculation work.
(LR-7) Write a brief narrative of a paragraph or two. Summarize your findings and be sure to mention any aspect of the linear model project (topic, data, scatterplot, line, r, or estimate, etc.) that you found particularly important or interesting.
You may submit all of your project in one document or a combination of documents, which may consist of word processing documents or spreadsheets or scanned handwritten work, provided it is clearly labeled where each task can be found. Be sure to include your name. Projects are graded on the basis of completeness, correctness, ease in locating all of the checklist items, and strength of the narrative portions.
Here are some possible topics:
- Choose an Olympic sport — an event that interests you. Go to http://www.databaseolympics.com/ and collect data for winners in the event for at least 8 Olympic games (dating back to at least 1980). (Example: Winning times in Men’s 400 m dash). Make a quick plot for yourself to “eyeball” whether the data points exhibit a relatively linear trend. (If so, proceed. If not, try a different event.) After you find the line of best fit, use your line to make a prediction for the next Olympics (2014 for a winter event, 2016 for a summer event ).
- Choose a particular type of food. (Examples: Fish sandwich at fast-food chains, cheese pizza, breakfast cereal) For at least 8 brands, look up the fat content and the associated calorie total per serving. Make a quick plot for yourself to “eyeball” whether the data exhibit a relatively linear trend. (If so, proceed. If not, try a different type of food.) After you find the line of best fit, use your line to make a prediction corresponding to a fat amount not occurring in your data set.) Alternative: Look up carbohydrate content and associated calorie total per serving.
- Choose a sport that particularly interests you and find two variables that may exhibit a linear relationship. For instance, for each team for a particular season in baseball, find the total runs scored and the number of wins. Excellent websites:http://www.databasesports.com/ and http://www.baseball-reference.com/