Hsets                package:multtest                R Documentation

_F_u_n_c_t_i_o_n_s _f_o_r _g_e_n_e_r_a_t_i_n_g _g_u_e_s_s_e_d _s_e_t_s _o_f _t_r_u_e _n_u_l_l _h_y_p_o_t_h_e_s_e_s _i_n _e_m_p_i_r_i_c_a_l _B_a_y_e_s _r_e_s_a_m_p_l_i_n_g-_b_a_s_e_d _m_u_l_t_i_p_l_e _h_y_p_o_t_h_e_s_i_s _t_e_s_t_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     These functions are called internally by the main user-level
     function 'EBMTP'.  They are used for estimating local q-values,
     generating guessed sets of true null hypotheses, and applying
     these results to function closures defining the choice of type I
     error rate (FWER, gFWER, TPPFP, and FDR).

_U_s_a_g_e:

     Hsets(Tn, nullmat, bw, kernel, prior, B, rawp) 

     ABH.h0(rawp) 

     G.VS(V, S = NULL, tp = TRUE, bound)

_A_r_g_u_m_e_n_t_s:

      Tn: The vector of observed test statistics.

 nullmat: The matrix of null test statistics obtained either through
          null transformation of the bootstrap distribution or by
          sampling from an appropriate multivariate normal distribution
          (when 'nulldist='ic''.)

      bw: A character string argument to 'density' indicating the
          smoothing bandwidth to be used during kernel density
          estimation. Default is 'nrd'.

  kernel: A character string argument to 'density' specifying the
          smoothing kernel to be used.  Default is 'gaussian'.

   prior: Character string indicating which choice of prior probability
          to use for estimating local q-values (i.e., the posterior
          probabilities of a null hypothesis being true given the value
          of its corresponding test statistic).  Default is
          'conservative', in which case the prior is set to its most
          conservative value of 1, meaning that all hypotheses are
          assumed to belong to the set of true null hypotheses.  Other
          options include 'ABH' for the adaptive Benjamini-Hochberg
          estimator of the number/proportion of true null hypotheses,
          and 'EBLQV' for the empirical Bayes local q-value value
          estimator of the number/proportion of true null hypotheses. 
          If 'EBLQV', the estimator of the prior probability is taken
          to be the sum of the estimated local q-values divided by the
          number of tests.  Relaxing the prior may result in more
          rejections, albeit at a cost of type I error control under
          certain conditions.  See references.

       B: The number of bootstrap iterations (i.e. how many resampled
          data sets) or the number of samples from the multivariate
          normal distribution (if 'nulldist='ic''). Can be reduced to
          increase the speed of computation, at a cost to precision.
          Default is 1000.

    rawp: A vector of raw (unadjusted) p-values obtained
          bootstrap-based or influence curve null distribution.

       V: A matrix of the numbers of guessed false positives for each
          cut-off, i.e., observed value of a test statistic, within
          each sample in 'B'.

       S: A matrix of the numbers of guessed true positives for each
          cut-off, i.e., observed value of a test statistic, within
          each sample in 'B'.

      tp: Logical indicator which is TRUE if type I error rate is a
          tail probability error rate and FALSE is if it is an expected
          value error rate.

   bound: If a tail probability error rate, the bound to be placed on
          function of guessed false positives and guessed true
          positives.  For, 'fwer', equal to 0; 'gfwer', equal to 'k';
          and tppfp, equal to 'q'.

_D_e_t_a_i_l_s:

     The most important object to be returned from the function 'Hsets'
     is a matrix of indicators, i.e., Bernoulli realizations of the
     estimated local q-values, taking the value of 1 if the hypothesis
     is guessed as belonging to the set of true null hypotheses and 0
     otherwise (guessed true alternative).  Realizations of these
     probabilities are generated with a call to 'rbinom', meaning that
     this function will set the RNG seed forward another 'B'*(the
     number of hypotheses) places.  This matrix, with rows equal to the
     number of hypotheses and columns the number of (bootstrap or
     multivariate normal) samples is used to subset the matrix of null
     test statistics and the vector of observed test statistics at each
     round of (re)sampling into samples of statistics guessed as
     belonging to the sets of true null and true alternative
     hypotheses, respectively.  Using the values of the observed test
     statistics themselves as cut-offs, the numbers of guessed false
     positives and (if applicable) guessed true positives can be
     counted and eventually used to estimate the distribution of a type
     I error rate characterized by the closure returned from 'G.VS'. 
     Counting of guessed false positives and guessed true positives is
     performed in C through the function 'VScount'.

_V_a_l_u_e:

     For the function 'Hsets', a list with the following elements: 

Hsets.mat: A matrix of numeric indicators with rows equal to the number
          of test (hypotheses, typically 'nrow(X)') and columns the
          number of samples of null test statistics, 'B'.  Values of
          one indicate hypotheses guessed as belonging to the set of
          true null hypotheses based on the value of their
          corresponding test statistic.  Values of zero correspond to
          hypotheses guesses as belonging to the set of true
          alternative hypotheses.

  EB.h0M: The estimated proportion of true null hypotheses as
          determined by nonparametric density estimation.  This value
          is the sum of the estimated local q-values divided by the
          total number of tests (hypotheses).

   prior: The value of the prior applied to the local q-value function.
           If 'conservative', the prior is set to one.  Otherwise, the
          prior is the value obtained from the estimator of the
          adaptive Benjamini-Hochberg procedure (if 'prior' is 'ABH')
          or from density estimation (if 'prior' is 'EBLQV').

  pn.out: The vector of estimated local q-values.  This vector is
          returned in the 'lqv' slot of objects of class 'EBMTP'.


     For the function 'ABH.h0', the estimated number of true null
     hypotheses using the estimator from the linear step-up adaptive
     Benjamini-Hochberg procedure. 

     For the function 'G.VS', a closure which accepts as arguments the
     matrices of guessed false positive and true positives (if
     applicable) and applies the appropriate function defining the
     desired type I error rate.

_A_u_t_h_o_r(_s):

     Houston N. Gilbert

_R_e_f_e_r_e_n_c_e_s:

     H.N. Gilbert, K.S. Pollard, M.J. van der Laan, and S. Dudoit
     (2009). Resampling-based multiple  hypothesis testing with
     applications to genomics: New developments in R/Bioconductor 
     package multtest. _Journal of Statistical Software_ (submitted).
     Temporary URL: <URL:
     http://www.stat.berkeley.edu/~houston/JSSNullDistEBMTP.pdf>.

     Y. Benjamini and Y. Hochberg (2000). On the adaptive control of
     the false  discovery rate in multiple testing with independent
     statistics. _J. Behav. Educ. Statist_. Vol 25: 60-83.

     Y. Benjamini, A.M. Krieger and D. Yekutieli (2006). Adaptive
     linear step-up procedures that control the false discovery rate.
     _Biometrika_.  Vol. 93: 491-507.

     M.J. van der Laan, M.D. Birkner, and A.E. Hubbard (2005). 
     Empirical Bayes and Resampling Based Multiple Testing Procedure
     Controlling the Tail Probability of the Proportion of False
     Positives. Statistical Applications in Genetics and Molecular
     Biology, 4(1). <URL:
     http://www.bepress.com/sagmb/vol4/iss1/art29/> 

     S. Dudoit and M.J. van der Laan.  Multiple Testing Procedures and
     Applications to Genomics.  Springer Series in Statistics.
     Springer, New York, 2008. 

     S. Dudoit, H.N. Gilbert, and M.J. van der Laan (2008). 
     Resampling-based empirical Bayes multiple testing procedures for
     controlling  generalized tail probability and expected value error
     rates: Focus on the false discovery rate and simulation study.
     _Biometrical Journal_, 50(5):716-44. <URL:
     http://www.stat.berkeley.edu/~houston/BJMCPSupp/BJMCPSupp.html>. 

     H.N. Gilbert, M.J. van der Laan, and S. Dudoit. Joint multiple
     testing procedures for  graphical model selection with
     applications to biological networks. Technical report,  U.C.
     Berkeley Division of Biostatistics Working Paper Series, April
     2009. URL <URL: http://www.bepress.com/ucbbiostat/paper245>. 

_S_e_e _A_l_s_o:

     'EBMTP', 'EBMTP-class', 'EBMTP-methods'

_E_x_a_m_p_l_e_s:

     set.seed(99)
     data<-matrix(rnorm(90),nr=9)
     group<-c(rep(1,5),rep(0,5))

     #EB fwer control with centered and scaled bootstrap null distribution 
     #(B=100 for speed)
     eb.m1<-EBMTP(X=data,Y=group,alternative="less",B=100,method="common.cutoff")
     print(eb.m1)
     summary(eb.m1)
     par(mfrow=c(2,2))
     plot(eb.m1,top=9)

     abh <- ABH.h0(eb.m1@rawp)
     abh

     eb.m2 <- EBupdate(eb.m1,prior="ABH")
     eb.m2@prior

