**Paper, Order, or Assignment Requirements**

Directions. All answers should be given in the form of complete sentences. You may use any notes from this course or resources you find useful and within reason all steps for each problem should be shown. For instance, you do need to give each line of simplifying a computation of an F statistic.

The data in the spreadsheet, Assignment Data.xls,that was provided will be used for multiple problems. It represents fake data from a study to understand the properties of a muscle quality index (MQI) that returns scores that are integers from 0 to 50, with higher scores representing higher quality muscle according to the index. Assume that MQI is a noninvasive, clinician-recorded inventory with 10 items each on a scale from 0 to 5. Higher scores represent prediction of higher quality muscle tissue. The study is interested in its properties. The MQI was given twice, with the 2nd administration one month after the first administration (in the range of 28-33 days). The results of the 2nd administration are given by the variable MQI 2.The other variables in the dataset are a group indicator G for controls(G=0), and a resistance exercise treatment (G=1). BMI is the standard body mass index. Age is age in years. A diabetes indicator D is included,as the study suspects this disease may lead to reduction in muscle quality.Finally, the result of a muscle biopsy MB is given, with scores (made upfor our convenience) as an interval variable with biologically plausible rangefrom 0 to 100; high scores on the MB indicate higher quality muscle tissuein the biopsied sample

** Problem 1.** This problem makes use of the dataset provided with the exam. Use the multiple regression results listed on the last page of this document to answer the following questions.

- Use appropriate F or partial F tests to determine if adding Age, BMI, or Diabetes individually improves the ability of MQI to predict MB outcomes. Do any significantly improve prediction? If so, which?

- Determine the R2 for the regressions with just MQI and with MQI and BMI using the table. What is the percentage improvement through the inclusion of BMI? (Use the equation below:)

Improvement =R^{2}_{both − }R^{2}_{MQI}

R^{2}_{MQI}

- Regardless of the results of the previous two parts, suppose you decide to include BMI as a covariate for prediction of MB with MQI. The regression equation from the data is

MB = 1.48 *MQI – 0.94 *BMI + 35.78.

One subject had an observed MB of 70.05. Using their data, what does the regression predict their MB should be? How large is the error in this estimate?

- By contrast, the regression equation using only MQI to predict MB is

MB = 1.48 *MQI + 9.13.

What is the predicted MB value for the subject with observed MB of 70.05? What is the error for this prediction?

- Which of the errors in the previous two parts of this problem is smaller? Does this contradict the results of the first two parts of this problem? Explain the reason(s) for any discrepancies between the conclusions made in terms of the strength of model fit.

- Suppose that from prior work the researchers decide that if MQI can predict MB with an error of at most 4 points on at least 80% of the sample, they will consider it to be valid. That is, after adjusting for the effects of covariates of Age, BMI, and Diabetes, do at least 80% of the predicted values have an error of 4 or less? The regression equation using all three covariates is

MB = 1.48 *MQI +0.02 * Age− 0.9 *BMI −1.02 *Diabetes+ 34.14.

Use the equation and the dataset given to determine if the MQI is valid under this definition of validity by comparing the observed MB to prediction for all subjects and then comparing your results to their definition of validity.

- Whether or not your work showed the MQI to be valid in the previous part, suppose that the MQI is valid by their chosen definition of validity .Does this mean that 80% of all MB predictions using the MQI are correct? Why or why not

** Problem 2.** This problem makes use of the dataset provided with the exam.

- Make a scatterplot of the MB (x-axis) and MQI data (y-axis). Do all of the assumptions for simple linear regression appear to be satisficed?

- Determine the simple linear regression of MQI on MB, even if you found issues with the assumptions in the previous question.

- Are there any outliers – points in the scatterplot that are much further from the regression line than any of the rest? Explain how you came to your conclusion.

- Determine the correlation between MB and MQI and the R
^{2 }for the regression line. Do these suggest the linear regression is a good fit to the data?

** Problem 3.** This problem makes use of the dataset provided with the exam. Suppose that we define the test-retest reliability of MQI as the correlation between the MQI administered at two time points.

- Use the variables MQI and MQI 2 to determine the test-retest reliability of MQI without controlling for the effects of any covariates or the treatment groups.

- If you wanted to control for the effect of diabetes on test-retest reliability of the MQI, how would you do this? Specifically, define what it means to measure the test-retest reliability of MQI controlling for the effect of diabetes. Do not compute it, only explain how to do so.

- If you wanted to control for the effect of BMI on test-retest reliability of the MQI, explain how you would do this? Specifically, define what it means to measure the test-retest reliability of MQI controlling for the effect of BMI. Do not compute it, only explain how to do so. (Hint: the approach is not the same as for diabetes; instead, review types of correlation in multiple regression from your textbook and the lecture slides)

- Use your definitions from the previous two parts of this problem to compute the test-retest reliability of MQI while controlling for diabetes and, separately, for BMI.

** Problem 4.** This problem makes use of the dataset provided with the exam.

- Make histograms for the distribution of MQI data for each treatment group. I recommend using 20 “bins” for each histogram, so that each has a width of 2.05 in the control group and 1.35 in the treatment group (due to differences in the range of data).

- Is there any evidence of non-Normality among the MQI data for either group? Explain your response.

- Given your response to the previous question, use an appropriate statistical test for difference of means (or medians, or…) between the two groups’ MQI scores.