svdImpute             package:pcaMethods             R Documentation

_S_V_D_i_m_p_u_t_e _a_l_g_o_r_i_t_h_m

_D_e_s_c_r_i_p_t_i_o_n:

     This implements the SVDimpute algorithm as proposed by Troyanskaya
     et al, 2001. The idea behind the algorithm is to estimate the
     missing values as a linear combination of the 'k' most significant
     eigengenes.

     Missing values are denoted as 'NA'

_U_s_a_g_e:

     svdImpute(Matrix, nPcs = 2, center=TRUE, completeObs=TRUE, threshold = 0.01, 
       maxSteps = 100, verbose = interactive(), ...)

_A_r_g_u_m_e_n_t_s:

  Matrix: 'matrix' - Data containing the variables in columns and
          observations in rows. The data may contain missing values,
          denoted as 'NA'.

    nPcs: 'numeric' - Number of components to estimate. The preciseness
          of the missing value estimation depends on the number of
          components, which should resemble the internal structure of
          the data.

  center: Mean center the data if TRUE

completeObs: Return the estimated complete observations if TRUE. This
          is the input data with NA values replaced by the estimated
          values.

threshold: The iteration stops if the change in the matrix falls below
          this threshold, the default is 0.01. (0.01 was empirically
          determined by Troyanskaya et. al)

maxSteps: Maximum number of iteration steps. Default is 100.

 verbose: Print some output if TRUE. Default is interactive()

     ...: Reserved for parameters used in future version of the
          algorithm

_D_e_t_a_i_l_s:

     As SVD can only be performed on complete matrices, all missing
     values are initially replaced by 0 (what is in fact the mean on
     centred data). The algorithm works iteratively until the change in
     the estimated solution falls below a certain threshold. Each step
     the eigengenes of the current estimate are calculated and used to
     determine a new estimate. Eigengenes denote the loadings if pca is
     performed considering genes as observations.

     An optimal linear combination is found by regressing the
     incomplete gene against the 'k' most significant eigengenes. If
     the value at position 'j' is missing, the j^th value of the
     eigengenes is not used when determining the regression
     coefficients.

     *Complexity:* Each iteration, standard PCA ('prcomp') needs to be
     done for each incomplete gene to get the eigengenes. This is
     usually fast for small data sets, but complexity may rise if the
     data sets become very large.

_V_a_l_u_e:

  pcaRes: Standart PCA result object used by all PCA-based methods of
          this package. Contains scores, loadings, data mean and more.
          See 'pcaRes' for details.

_A_u_t_h_o_r(_s):

     Wolfram Stacklies 
      Max Planck Institut fuer Molekulare Pflanzenphysiologie, Potsdam,
     Germany 
      wolfram.stacklies@gmail.com 

_R_e_f_e_r_e_n_c_e_s:

     Troyanskaya O. and Cantor M. and Sherlock G. and Brown P. and
     Hastie T. and Tibshirani R. and Botstein D. and Altman RB. -
     Missing value estimation methods for DNA microarrays.
     _Bioinformatics. 2001 Jun;17(6):520-5._

_S_e_e _A_l_s_o:

     'bpca, ppca, prcomp, nipalsPca, pca, pcaRes'.

_E_x_a_m_p_l_e_s:

     ## Load a sample metabolite dataset (metaboliteData)
     data(metaboliteData)

     # Now remove 10% of the data
     rows <- nrow(metaboliteData)
     cols <- ncol(metaboliteData)
     cond<-matrix(runif(rows * cols),rows,cols) < 0.1
     metaboliteData[cond] <- NA

     ## Perform probabilistic PCA using the 3 largest components
     result <- pca(metaboliteData, method="svdImpute", nPcs=3, center = TRUE)

     ## Get the estimated principal axes (loadings)
     loadings <- result@loadings

     ## Get the estimated scores
     scores <- result@scores

     ## Get the estimated complete observations
     cObs <- result@completeObs

     ## Now plot the scores
     plotPcs(result, scoresLoadings=c(TRUE,FALSE))

