lmPerGene               package:GSEAlm               R Documentation

_F_i_t _l_i_n_e_a_r _m_o_d_e_l _f_o_r _e_a_c_h _g_e_n_e

_D_e_s_c_r_i_p_t_i_o_n:

     For each gene, 'lmPerGene' fits the same, user-specified linear
     model. It returns the estimates of the model parameters and their
     variances for each fitted model. The function uses matrix algebra
     so it is much faster than repeated calls to 'lm'.

_U_s_a_g_e:

     lmPerGene(eSet, formula, na.rm=TRUE)

_A_r_g_u_m_e_n_t_s:

    eSet: An 'ExpressionSet' object.

 formula: an object of class 'formula' (or one that can be coerced to
          that class), specifying only the right-hand side starting
          with the '~' symbol. The LHS is automatically set as the
          expression levels provided in  'eSet'. The names of all
          predictors must exist in the phenotypic data of 'eSet'.

   na.rm: 

     { Whether to remove missing observations. }

_D_e_t_a_i_l_s:

     This function efficiently computes the least squares fit of a
     linear regression to a set of gene expression values. We assume
     that there are 'G' genes, on 'n' samples, and that there are 'p'
     variables in the regression equation.  So the result is that 'G'
     different regressions are computed, and various summary statistics
     are returned.

     Since the independent variables are the same in each model
     fitting, instead of repeatedly fitting linear model for each gene,
     'lmPerGene' accelarates the fitting process by calculating the hat
     matrix 'X(X'X)^(-1)X'' first.  Then matrix multiplication, and
     'solve' are to compute estimates of the model parameters.

     Leaving the formula blank (the default) will calculate an 
     intercept-only model, useful for generic pattern and outlier
     identification.

_V_a_l_u_e:

     A list with components: 

    Hmat: The Hat matrix.

coefficients: A matrix of dimension 'p' by 'G' containing the estimated
          model parameters.

sigmaSqr: A vector of length $G$ containing the mean square error for
          that model, the sum of the residuals squared divided by 'n -
          p'.

coef.var: A matrix of dimension 'p' by 'G' containing the estimated
          variances for the model parameters, for each regression.

_A_u_t_h_o_r(_s):

     Robert Gentleman

_S_e_e _A_l_s_o:

     'getResidPerGene' to extract row-by-row residuals; 'gsealmPerm'
     for code that utilizes  'lmPerGene' for gene-set-enrichment
     analysis (GSEA); and  'CooksDPerGene' for diagnostic functions on
     an object produced by 'lmPerGene'.

_E_x_a_m_p_l_e_s:

     data(sample.ExpressionSet)
     layout(1)
     lm1 = lmPerGene(sample.ExpressionSet,~sex)
     qqnorm(lm1$coefficients[2,]/sqrt(lm1$coef.var[2,]),main="Sample Dataset: Sex Effect by Gene",ylab="Individual Gene t-statistic",xlab="Normal Quantile")
     abline(0,1,col=2)
     lm2 = lmPerGene(sample.ExpressionSet,~type+sex)
     qqnorm(lm2$coefficients[2,]/sqrt(lm2$coef.var[2,]),main="Sample Dataset: Case vs. Control Effect by Gene, Adjusted for Sex",ylab="Individual Gene t-statistic",xlab="Normal Quantile")
     abline(0,1,col=2)

