xxt                package:snpMatrix                R Documentation

_X._X-_t_r_a_n_s_p_o_s_e _f_o_r _a _s_t_a_n_d_a_r_d_i_z_e_d _s_n_p._m_a_t_r_i_x

_D_e_s_c_r_i_p_t_i_o_n:

     The input snp.matrix is first standardized by subtracting the mean
     (or stratum mean) from each call  and dividing by the expected
     standard deviation under Hardy-Weinberg equilibrium. It is then
     post-multiplied by its transpose. This is a preliminary step in
     the computation of principal components.

_U_s_a_g_e:

     xxt(snps, strata=NULL, correct.for.missing = FALSE, lower.only = FALSE)

_A_r_g_u_m_e_n_t_s:

    snps: The input matrix, of type '"snp.matrix"'

  strata: A 'factor' (or an object which can be coerced into a 
          'factor') with length equal to the number of rows of 'snps'
          defining stratum membership

correct.for.missing: If 'TRUE', an attempt is made to correct for the
          effect of missing data by use of inverse probability weights.
          Otherwise, missing observations are scored zero in the
          standardized matrix

lower.only: If 'TRUE', only the lower triangle of the result is
          returned and the upper triangle is filled with zeros.
          Otherwise, the complete symmetric matrix is returned

_D_e_t_a_i_l_s:

     This computation forms the first step of the calculation of
     principal components for genome-wide SNP data. As pointed out by
     Price et al. (2006), when the data matrix has more rows than
     columns it is most efficient to calculate the eigenvectors of
     X.X-transpose, where X is a  'snp.matrix' whose columns have been 
     standardized to zero mean and unit variance. For autosomes, the
     genotypes are given codes 0, 1 or 2 after subtraction of the mean,
     2p, are divided by the standard deviation  sqrt(2p(1-p)) (p is the
     estimated allele frequency). For SNPs on the X chromosome in male
     subjects, genotypes are coded 0 or 2. Then  the mean is still 2p,
     but the standard deviation is  2sqrt(p(1-p)). If the 'strata' is
     supplied, a stratum-specific estimate value for p is used for
     standardization. 

     Missing observations present some difficulty. Price et al. (2006)
     recommended replacing missing observations by their means, this
     being equivalent to replacement by zeros in the standardized
     matrix. However this results in a biased estimate of the complete
     data result. Optionally this bias can be corrected by inverse
     probability weighting. We assume that the probability that any one
     call is missing is small, and can be predicted by a multiplicative
     model with row (subject) and column (locus) effects. The estimated
     probability of a missing value in a given row and column is then
     given by m = RC/T, where R is the row total number of no-calls, C
     is the column total of no-calls, and T is the overall total number
     of no-calls. Non-missing contributions to X.X-transpose are then
     weighted by w=1/(1-m) for contributions to the diagonal elements,
     and products of the relevant pairs of weights for contributions to
     off-diagonal elements.

_V_a_l_u_e:

     A square matrix containing either the complete X.X-transpose
     matrix, or just its lower triangle

_W_a_r_n_i_n_g:

     The correction for missing observations can result in an output
     matrix which is not positive semi-definite. This should not matter
     in the application for which it is intended

_N_o_t_e:

     In genome-wide studies, the SNP data will usually be held as a
     series of objects (of class '"snp.matrix"' or'"X.snp.matrix"'),
     one per chromosome. Note that the  X.X-transpose matrices produced
     by applying the 'xxt' function to each object in turn can be added
     to yield the genome-wide result.

_A_u_t_h_o_r(_s):

     David Clayton david.clayton@cimr.cam.ac.uk

_R_e_f_e_r_e_n_c_e_s:

     Price et al. (2006) Principal components analysis corrects for
     stratification in genome-wide association studies. \it{Nature
     Genetics}, *38*:904-9

_E_x_a_m_p_l_e_s:

     # make a snp.matrix with a small number of rows
     data(testdata)
     small <- Autosomes[1:100,]
     # Calculate the X.X-transpose matrix
     xx <- xxt(small, correct.for.missing=TRUE)
     # Calculate the principal components
     pc <- eigen(xx, symmetric=TRUE)$vectors

