EOC                  package:OCplus                  R Documentation

_E_s_t_i_m_a_t_e_d _o_r _e_m_p_i_r_i_c_a_l _F_D_R, _s_e_n_s_i_t_i_v_i_t_y, _e_t_c _a_s _a _f_u_n_c_t_i_o_n _o_f _c_u_t_o_f_f _l_e_v_e_l

_D_e_s_c_r_i_p_t_i_o_n:

     'EOC' computes and optionally plots the estimated operating
     characteristics for data from a microarray experiment with two
     groups of subjects. The false discovery rate (FDR) is estimated
     based on random permutations of the data and plotted against the
     cutoff level on the t-statistic; a curve for the classical
     sensitivity can be added. Different curves for different
     proportions of non-differentially expressed genes can be compared
     in the same plot, and the sample size per group can be varied
     between plots.

     'FDRp' is the function that does the underlying hard work and
     requires package 'multtest'.

_U_s_a_g_e:

     EOC(xdat, grp, p0, paired = FALSE, nperm = 25, seed = NULL, plot = TRUE, ...)

     FDRp(xdat, grp, test = "t.equalvar", p0, nperm, seed)

_A_r_g_u_m_e_n_t_s:

    xdat: the matrix of expression values, with genes as rows and
          samples as columns

     grp: a grouping variable giving the class membership of each
          sample, i.e. each column in 'xdat'; for 'EOC', this can be
          any type of variable, as long as it has exactly two distinct
          values, whereas 'FDRp' expects to see only 0s and 1s, see
          Details.

      p0: if supplied, an estimate for the proportion of
          non-differentially expressed genes; if not supplied, the
          routine will estimate it, see Details.

  paired: logical value indicating whether this is independent sample
          situation (default) or a paired sample situation. Note that
          paired samples need to follow each other in the data matrix
          (as in 010101...

   nperm: number of permutations for establishing the null distribution
          of the t-statistic

    test: the type of test to use, see 'mt.teststat'; when called from
          'EOC', this is always the default.

    seed: the random seed from which the permutations are started

    plot: logical value indicating whether to do the plot

     ...: graphical parameters, passed to 'plot.FDR.result'

_D_e_t_a_i_l_s:

     'EOC' is the empirical counterpart of the function 'TOC'. It
     estimates the FDR and sensitivity for a given data set of
     expression values measured on subjects in two groups. The FDR is
     estimated locally based on the empirical Bayes approach outlined
     by Efron et al., see References. 'FDRp' implements the details of
     this method; this requires among other things the permutation
     distribution of the t-statistic, which is calculated via a call to
     function 'mt.teststat' of package 'multtest'. This explains why
     both functions barf at missing values in the expression data.

     Note that 'p0' is by default estimated from the data, as
     originally suggested by Efron et al. so as to make ratio between
     the densities of the observed distribution of t-statistics and the
     permutation distribution smaller than 1; alternatively, the user
     can supply his own guesstimate of the proportion of
     non-differentially expressed genes in the data.

     Note also that 'FDRp' keeps all permuations in the memory during
     compuations. For a large number of genes, this will limit the
     number of possible permuations.

_V_a_l_u_e:

     For 'EOC', an object of class 'FDR.result', which inherits from
     class 'data.frame'. The three columns list for each gene its
     t-statistic, the estimated FDR (two-sided), and the estimated
     sensitivity. Additionally, the object carries an attribute
     'param', which is a list with four entries: 'p0', the assumed
     proportion of non-differentially expressed genes used in
     calculating the FDR; 'p0.est', a logical value indicating whether
     'p0' was estimated or user-supplied; 'statistic' indicates how the
     t-statistic was computed, i.e. how its sign should be interpreted
     in terms of relative over- or under expression, and a logical flag
     'paired' to indicate whether a paired t-statistic was used. 

     'FDRp' returns a list with essentially the same elements, plus
     additionally the values of the observed and permuted distribution
     of the t-statistics for each gene.

_N_o_t_e:

     Both the curve labels and the legend may be squashed if the
     plotting device is too small. Increasing the size of the device
     and re-plotting should improve readability.

_A_u_t_h_o_r(_s):

     Y. Pawitan and A. Ploner

_R_e_f_e_r_e_n_c_e_s:

     Pawitan Y, Michiels S, Koscielny S, Gusnanto A, Ploner A (2005)
     False Discovery Rate, Sensitivity and Sample Size for Microarray
     Studies. _Bioinformatics_, 21, 3017-3024.

     Efron B, Tibshirani R, Storey JD, Tusher V. (2001) Empirical Bayes
     Analysis of a Microarray Experiment. _JASA_, 96(456), p. 1151-60.

_S_e_e _A_l_s_o:

     'plot.FDR.result', 'OCshow', 'mt.teststat'

_E_x_a_m_p_l_e_s:

     # We simulate a small example with 5 percent regulated genes and
     # a rather large effect size
     set.seed(2003)
     xdat = matrix(rnorm(50000), nrow=1000)
     xdat[1:25, 1:25] = xdat[1:25, 1:25] - 2
     xdat[26:50, 1:25] = xdat[26:50, 1:25] + 2
     grp = rep(c("Sample A","Sample B"), c(25,25))

     # The default, with legend
     ret = EOC(xdat, grp, legend=TRUE)
     # Look at the results: yes
     ret[1:10,]
     which(ret$FDR<0.05)
     # Extra information
     attr(ret,"param")

     # Run the same data with different permutations: fairly stable, but with
     # different p0
     ret = EOC(xdat, grp, seed=2000)
     which(ret$FDR<0.07)

     # Misspecify the p0: not too bad here
     ret = EOC(xdat, grp, p0=0.99)
     which(ret$FDR<0.01)

     # We simulate data in a paired setting
     # Note the arrangement of the columns
     set.seed(2004)
     xdat = matrix(rnorm(50000), nrow=1000)
     ndx1 = seq(1,50, by=2)
     xdat[1:25, ndx1] = xdat[1:25, ndx1] - 2
     xdat[26:50, ndx1] = xdat[26:50, ndx1] + 2
     grp = rep(c("Sample A","Sample B"), 25)

     ret = EOC(xdat, grp, paired=TRUE)
     which(ret$FDR<0.05)

