Work of 12 of 13 (Low score is dropped.)

The algorithms for single, complete, and average linkage are identical except for one computation. Describe the general hierarchical clustering algorithm and discuss how it is modified to accommodate single, complete, and average linkage.

Euclidean distances between the observed (D) and fitted (D^) distances can be used to assess the fit of hierarchical clustering algorithms. Explain two ways this is done and explain why average linkage will have a smaller overall distance than either single or complete linkage.

Explain how the k-means clustering algorithms works. Include a discussion of how the initial seed points are found.

What conditions must a correlation matrix satisfy? Explain.

Distinguish between using a=0 and a=1 in constructing the principal components biplot. Which is preferred? Why?

This questions concerns principal component analysis on the crime data.

Y <- crime[, 5:11]
crime.pca <- pca(Y)

Is the PCA above done on the correlation or covariance matrix? Explain.

Put R code here.

How many components would you keep? Are there any near singularities? Justify.

Put R code here.

How are the first two components interpreted?

Put R code here.

The Chi-square test of independence in a two-way contingency table is a function of the count residuals. Explain how these residuals are defined and how they are used in correspondence analysis. How does the biplot explain the relationships between the two factors which define the contingency table?

This question concerns discriminant analysis among normal, overt, chemical, and overt diabetics (CClass):

X <- diabetes[, c(1, 2, 4, 5, 6)]
y <- diabetes[, 8]
diabetes.disc <- disc(X, y)

Explain what diabetes.disc$A represents and why it is not adequate for interpreting discriminant variables?


##                   Disc1         Disc2

RelWeight -9.998330e-01  0.9999663983

GluFast    1.546412e-02 -0.0078063413

GluDiff   -9.229275e-03  0.0018740781

InsTest    7.501817e-05  0.0016313383

SSPG      -3.113892e-03 -0.0002996801

How does the following output assist in interpreting the discriminant variables?


##                Disc1      Disc2

RelWeight -0.1930123  0.5365964

GluFast   -0.8457606 -0.4396387

GluDiff   -0.9646649 -0.2247237

InsTest    0.1929697  0.8047175

SSPG      -0.8592626  0.1028040

What are the assumptions of the g-group discriminant problem? Explain how you would assess these assumptions using R.

Explain how testing is done to determine the number of significant discriminant dimensions.

The (p-t) independent collinearities among a group of p variables is invariant with respect to scale changes of these variables. Explain and justify.

Both classical multidimensional scaling and principal component analysis attempt to represent the n observations in a low-dimensional space. Explain how these representations differ. Be sure to include a discussion of what is optimized in each case.

Explain the algorithm for mapping n points into a low-dimensional space using non-metric MDS. Be sure to include a discussion on what is being optimized and how monotonicity is improved (ideally) at each step. Can you guarantee you have found the global minimum? Why or why not?