sam                 package:siggenes                 R Documentation

_S_i_g_n_i_f_i_c_a_n_c_e _A_n_a_l_y_s_i_s _o_f _M_i_c_r_o_a_r_r_a_y_s

_D_e_s_c_r_i_p_t_i_o_n:

     Performs a Significance Analysis of Microarrays (SAM) for a set of
     positive thresholds. It is possible to do either an one class or a
     two class SAM analysis.

_U_s_a_g_e:

     sam(data,x,y=NULL,paired=FALSE,mat.samp=NULL,B=100,balanced=FALSE,
         na.rm=FALSE,s0=NA,alpha.s0=seq(0,1,.05),include.s0=TRUE,factor.s0=
         1.4826,p0=NA,lambda.p0=1,vec.lambda.p0=(0:95)/100,delta.fdr=
         (1:10)/5,med.fdr=TRUE,graphic.fdr=TRUE,thres.fdr=seq(0.5,2,.5),
         pty.fdr=TRUE,help.fdr=TRUE,ngenes=NA,iteration=3,initial.delta=
         c(0.1,seq(.2,2,.2),4),rand=NA)

_A_r_g_u_m_e_n_t_s:

    data: the data set that should be analyzed. Every row of this data
          set must correspond to a gene.

       x: vector of the columns of the data set that correspond to the
          treatment group (in the two class case) or to the biological
          samples that should be analyzed (in the one class case). In
          the paired (two class) case (x[i],y[i]) build a pair. If,
          e.g., the first n1 columns contain the gene expression values
          of the treatment group, 'x=1:n1'.

       y: vector of the columns of the data set that correspond to the
          control group (in the two class case). If an one class
          analysis is done, 'y' will be set to 'NULL' (default).  In
          the paired (two class) case (x[i], y[i]) are an observation
          pair.

  paired: paired ('TRUE') or unpaired ('FALSE') data. Default is
          'FALSE'

mat.samp: a permutation matrix. If specified, this matrix will be used,
          even if 'rand' and 'B' are specified.

       B: number of permutations used in the calculation of the null
          density. Default is 'B=100'.

balanced: if 'TRUE', balanced permutations will be used. Default is
          'FALSE'.

   na.rm: if 'FALSE' (default), the expression scores d of genes with
          one or more missing values will be set to 'NA'. If 'TRUE',
          the missing values will be replaced by the genewise mean of
          the non-missing values.

      s0: the fudge factor. If 'NA' (default), the fudge factor s0 will
          be computed automatically.

alpha.s0: the possible values of the fudge factor s0 in terms of
          quantiles of the standard deviations of the genes.

include.s0: if 'TRUE' (default), s0=0 is a possible choice for the
          fudge factor.

factor.s0: constant with which the MAD is multiplied in the computation
          of the fudge factor.

      p0: the probability that a gene is not differentially expressed.
          If not specified (default), it will be computed.

lambda.p0: number between 0 and 1 that is used to estimate p0.  If set
          to '1' (default), the automatic p0 selection using  the
          natural cubic spline fit is used.

vec.lambda.p0: vector of values for lambda used in the automatical
          computation of p0.

delta.fdr: a vector of values for the threshold Delta for which the SAM
          analysis is performed.

 med.fdr: if 'TRUE' (default), the median number, otherwise the
          expected number, of falsely called genes will be computed.

graphic.fdr: if 'TRUE' (default), both the SAM plot and the plots of
          Delta vs. FDR and Delta vs. number of significant genes will
          be generated.

thres.fdr: for each value contained in 'thres.fdr', two lines parallel
          to the 45-degree line are generated in the SAM plot.

 pty.fdr: if 'TRUE' (default), a square SAM Plot will be generated.

help.fdr: if 'TRUE' (default), help-lines will be drawn in both Delta
          plots.

  ngenes: a number or proportion of genes for which the FDR is
          estmated.

iteration: the number of iterations used in the estimation of the FDR
          for a given number or proportion of genes.

initial.delta: a set of initial guesses for Delta in the computation of
          the FDR for a given number or proportion of genes.

    rand: if specified, the random number generator will be put in a 
          reproducible state.

_V_a_l_u_e:

     a table of statistics (estimate of p0, number of significant
     genes, number of falsely called genes and FDR) for the specified
     set of Deltas, a SAM Plot, a Delta vs. FDR plot, and a plot of
     Delta vs. the number of significant genes.

_W_a_r_n_i_n_g:

     In the one class case, the null distribution will only be computed
     correctly, if the expression values are log ratios. So in the one
     class case only log ratios should be used. (There will be no
     checking, if the expression values are really log ratios.)

_N_o_t_e:

     For further analyses with 'sam.plot', the results of 'sam' must be
     assigned to an object.

     SAM was deveoped by Tusher et al. (2001).

     !!! There is a patent pending for the SAM technology at Stanford
     University. !!!

_A_u_t_h_o_r(_s):

     Holger Schwender holger.schw@gmx.de

_R_e_f_e_r_e_n_c_e_s:

     Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance
     analysis of microarrays applied to the ionizing radiation
     response, _PNAS_, 98, 5116-5121.

     Storey, J.D. (2002). A direct approach to the false discovery
     rate, _Journal of the Royal Statistical Society, Series B_, 64,
     479-498.

     Storey, J.D., and Tibshirani, R. (2003). Statistical significance
     for genome-wide experiments, _Technical Report_, Department of
     Statistics, Stanford University.

     Schwender, H. (2003). Assessing the false discovery rate in a
     statistical analysis of gene expression data, Chapter 5, _Diploma
     thesis_, Department of Statistics, University of Dortmund, <URL:
     http://de.geocities.com/holgerschw/thesis.pdf>.

_S_e_e _A_l_s_o:

     'sam.plot' 'sam.wilc' 'sam.lambda'

