lmFit                 package:limma                 R Documentation

_L_i_n_e_a_r _M_o_d_e_l _f_o_r _S_e_r_i_e_s _o_f _A_r_r_a_y_s

_D_e_s_c_r_i_p_t_i_o_n:

     Fit linear model for each gene given a series of arrays

_U_s_a_g_e:

     lmFit(object,design=NULL,ndups=1,spacing=1,block=NULL,correlation,weights=NULL,method="ls",...) 

_A_r_g_u_m_e_n_t_s:

  object: object of class 'numeric', 'matrix', 'MAList', 'marrayNorm',
          'exprSet', 'ExpressionSet' or 'PLMset' containing log-ratios
          or log-values of expression for a series of microarrays

  design: the design matrix of the microarray experiment, with rows
          corresponding to arrays and columns to coefficients to be
          estimated.  Defaults to the unit vector meaning that the
          arrays are treated as replicates.

   ndups: positive integer giving the number of times each gene is
          printed on an array

 spacing: positive integer giving the spacing between duplicate spots,
          'spacing=1' for consecutive spots

   block: vector or factor specifying a blocking variable on the
          arrays. Has length equal to the number of arrays. Must be
          'NULL' if 'ndups>2'.

correlation: the inter-duplicate or inter-technical replicate
          correlation

 weights: optional numeric matrix containing weights for each spot

  method: character string, '"ls"' for least squares or '"robust"' for
          robust regression

     ...: other optional arguments to be passed to 'lm.series',
          'gls.series' or 'mrlm'

_D_e_t_a_i_l_s:

     This function fits multiple linear models. It accepts data from a
     experiment involving a series of microarrays with the same set of
     probes. A linear model is fitted to the expression data for each
     probe. The expression data should be log-ratios for two-color
     array platforms or log-expression values for one-channel
     platforms. (To fit linear models to the individual channels of
     two-color array data, see 'lmscFit'.) The coefficients of the
     fitted models describe the differences between the RNA sources
     hybridized to the arrays. The probe-wise fitted model results are
     stored in a compact form suitable for further processing by other
     functions in the limma package.

     The function allows for missing values and accepts quantitative
     weights through the 'weights' argument. It also supports two
     different correlation structures. If 'block' is not 'NULL' then
     different arrays are assumed to be correlated. If 'block' is
     'NULL' and 'ndups' is greater than one then replicate spots on the
     same array are assumed to be correlated.   It is not possible at
     this time to fit models with both a block structure and a
     duplicate-spot correlation structure simultaneously.

     If 'object' is a matrix then it should contain log-ratios or
     log-expression data with rows corresponding to probes and columns
     to arrays. (A numeric vector is treated the same as a matrix with
     one column.) For objects of other classes, a matrix of expression
     values is taken from the appropriate component or slot of the
     object. If 'object' is of class 'MAList' or 'marrayNorm', then the
     matrix of log-ratios (M-values) is extracted. If 'object' is of
     class 'exprSet' then the 'exprs' slot is extracted. (This may
     contain log-expression or log-ratio values, depending on the
     platform.) If 'object' is of class 'PLMset' then the matrix of
     chip coefficients 'chip.coefs' is extracted.

     The arguments 'design', 'ndups', 'spacing' and 'weights' will be
     extracted from the data 'object' if available and do not normally
     need to set explicitly in the call. On the other hand, if any of
     these are set in the function call then they will over-ride the
     slots or components in the data 'object'. If 'object' is an
     'PLMset', then weights are computed as
     '1/pmax(object@se.chip.coefs, 1e-05)^2'. If 'object' is an
     'exprSet' object, then weights are not computed.

     If the argument 'block' is used, then it is assumed that
     'ndups=1'.

     The 'correlation' argument has a default value of '0.75', but in
     normal use this default value should not be relied on and the
     correlation value should be estimated using the function
     'duplicateCorrelation'. The default value is likely to be too high
     in particular if used with the 'block' argument.

     The actual linear model computations are done by passing the data
     to one the lower-level functions 'lm.series', 'gls.series' or
     'mrlm'. The function 'mrlm' is used if 'method="robust"'. If
     'method="ls"', then 'gls.series' is used if a correlation
     structure has been specified, i.e., if 'ndups>1' or 'block' is
     non-null and 'correlation' is different from zero. If
     'method="ls"' and there is no correlation structure, 'lm.series'
     is used.

_V_a_l_u_e:

     Object of class 'MArrayLM'

_A_u_t_h_o_r(_s):

     Gordon Smyth

_S_e_e _A_l_s_o:

     An overview of linear model functions in limma is given by
     06.LinearModels.

_E_x_a_m_p_l_e_s:

     # Simulate gene expression data for 100 probes and 6 microarrays
     # Microarray are in two groups
     # First two probes are differentially expressed in second group
     # Std deviations vary between genes with prior df=4
     sd <- 0.3*sqrt(4/rchisq(100,df=4))
     y <- matrix(rnorm(100*6,sd=sd),100,6)
     rownames(y) <- paste("Gene",1:100)
     y[1:2,4:6] <- y[1:2,4:6] + 2
     design <- cbind(Grp1=1,Grp2vs1=c(0,0,0,1,1,1))
     options(digit=3)

     # Ordinary fit
     fit <- lmFit(y,design)
     fit <- eBayes(fit)
     fit
     as.data.frame(fit[1:10,2])

     # Various ways of summarising or plotting the results
     topTable(fit,coef=2)
     qqt(fit$t[,2],df=fit$df.residual+fit$df.prior)
     abline(0,1)
     volcanoplot(fit,coef=2,highlight=2)

     # Various ways of writing results to file
     ## Not run: write.fit(fit,file="exampleresults.txt")
     ## Not run: write.table(fit,file="exampleresults2.txt")

     # Robust fit
     # (There may be some warning messages)
     fit2 <- lmFit(y,design,method="robust")

     # Fit with correlated arrays
     # Suppose each pair of arrays is a block
     block <- c(1,1,2,2,3,3)
     dupcor <- duplicateCorrelation(y,design,block=block)
     dupcor$consensus.correlation
     fit3 <- lmFit(y,design,block=block,correlation=dupcor$consensus)

     # Fit with duplicate probes
     # Suppose two side-by-side duplicates of each gene
     rownames(y) <- paste("Gene",rep(1:50,each=2))
     dupcor <- duplicateCorrelation(y,design,ndups=2)
     dupcor$consensus.correlation
     fit4 <- lmFit(y,design,ndups=2,correlation=dupcor$consensus)
     fit4 <- eBayes(fit3)
     dim(fit4)
     topTable(fit4,coef=2)

