KEstimateFast       package:pcaMethods       R Documentation(latin1)

_E_s_t_i_m_a_t_e _b_e_s_t _n_u_m_b_e_r _o_f _C_o_m_p_o_n_e_n_t_s _f_o_r _m_i_s_s_i_n_g _v_a_l_u_e _e_s_t_i_m_a_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     This is a simple estimator for the optimal number of componets
     when applying PCA or LLSimpute for missing value estimation. No
     cross validation is performed, instead the estimation quality is
     defined as Matrix[!missing] - Estimate[!missing]. This will give a
     relatively rough estimate, but the number of iterations equals the
     length of the parameter evalPcs.
      Does not work with LLSimpute!!

     As error measure the NRMSEP (see Feten et. al, 2005) or the Q2
     distance is used. The NRMSEP basically normalises the RMSD between
     original data and estimate by the variable-wise variance. The
     reason for this is that a higher variance will generally lead to a
     higher estimation error. If the number of samples is small, the
     gene - wise variance may become an unstable criterion and the Q2
     distance should be used instead. Also if variance normalisation
     was applied previously.

_U_s_a_g_e:

     kEstimateFast(Matrix, method = "ppca", evalPcs = 1:3, 
     em = "nrmsep", allVariables = FALSE, verbose = interactive(),...)

_A_r_g_u_m_e_n_t_s:

  Matrix: 'matrix' - numeric matrix containing observations in rows and
           variables in columns

  method: 'character' - One of ppca | bpca | svdImpute | nipals

 evalPcs: 'numeric' - The principal components to use for cross
          validation or cluster sizes if used with llsImpute. Should be
          an array containing integer values, eg. evalPcs = 1:10 or
          evalPcs = C(2,5,8).The NRMSEP is calculated for each
          component.

      em: 'character' - The error measure. This can be nrmsep or q2

allVariables: 'boolean' - If TRUE, the NRMSEP is calculated for all
          variables, If FALSE, only the incomplete ones are included.
          You maybe want to do this to compare several methods on a 
          complete data set.

 verbose: 'boolean' - If TRUE, the NRMSEP and the variance are printed
          to the console each iteration.

     ...: Further arguments to 'pca'

_V_a_l_u_e:

    list: Returns a list with the elements:

             *  minNPcs - number of PCs for which the minimal average
                NRMSEP was obtained

             *  eError - an array of of size length(evalPcs). Contains
                the estimation error for each number of components.

             *  evalPcs - The evaluated numbers of components or
                cluster sizes  (the same as the evalPcs input
                parameter).

_A_u_t_h_o_r(_s):

     Wolfram Stacklies 
      CAS-MPG Partner Institute for Computational Biology, Shanghai,
     China 
              wolfram.stacklies@gmail.com 

_S_e_e _A_l_s_o:

     'kEstimate'.

_E_x_a_m_p_l_e_s:

     ## Load a sample metabolite dataset with 5% missing values (metaboliteData)
     data(metaboliteData)

     # Estimate best number of PCs with ppca for component 2:4
     esti <- kEstimateFast(t(metaboliteData), method = "ppca", evalPcs = 2:4, em="nrmsep")

     # Plot the result
     barplot(drop(esti$eError), xlab = "Components",ylab = "NRMSEP (1 iterations)")

     # The best k value is:
     print(esti$minNPcs)

