safe                  package:safe                  R Documentation

_S_i_g_n_i_f_i_c_a_n_c_e _A_n_a_l_y_s_i_s _o_f _F_u_n_c_t_i_o_n _a_n_d _E_x_p_r_e_s_s_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Performs a significance analysis of function and expression (SAFE)
     for a given gene expression experiment and a given set of
     functional categories. SAFE is a two-stage permutation-based
     method that can be applied to a 2-sample, multi-class, or simple
     linear regression. Other experimental designs can also be
     accommodated through user-defined functions.

_U_s_a_g_e:

     safe(X.mat, y.vec, C.mat, Pi.mat = 1000, local = "default", 
          global = "Wilcoxon", error = "none", write = NA, 
          alpha = NA, method = "permutation", args.local = NULL, 
          args.global = NULL)

_A_r_g_u_m_e_n_t_s:

   X.mat: A matrix or data.frame of expression data; each row
          corresponds to a gene and each column to a sample. Data can
          also be given as the Bioconductor class 'exprSet'. Data
          should be properly normalized and may not contain missing
          values.

   y.vec: a numeric, integer or character vector of length
          'ncol(X.mat)' containing the response of interest. If 'X.mat'
          is an 'exprSet', 'y.vec' can also be the name or column
          number of a covariate in the 'phenoData' slot. For examples
          of the acceptable forms 'y.vec' can take, see the vignette. 

   C.mat: A matrix or data.frame containing the gene category
          assignments. Each column represents a category and should be
          named accordingly. For each column, values of 1 ('TRUE') and
          0 ('FALSE') indicate whether the genes in the corresponding
          rows of 'X.mat' are contained in the category. 

  Pi.mat: A matrix or data.frame containing the permutations, or an
          integer. See 'getPImatrix' for the acceptable form of a
          matrix or data.frame. If 'Pi.mat' is an integer, then 'safe'
          will automatically generate as many random permutations of
          'X.mat'. 

   local: Specifies the gene-specific statistic from the following
          options: "t.Student", "t.Welch" and "t.SAM" for 2-sample
          designs, "f.ANOVA" for 1-way ANOVAs, and "t.LM" for simple
          linear regressions. "default" will choose between "t.student"
          and "f.ANOVA", based on the form of 'y.vec'. User-defined
          local statistics can also be used; details are provided in
          the vignette. 

  global: Specifies the global statistic for a gene categories. By
          default, the Wilcoxon rank sum is used with global =
          "Wilcoxon". Else, a Kolmogorov-Smirnov ("Kolmogorov") or
          hypergeometric ("genelist") statistic is available.
          User-defined global statistics can also be implemented. 

   error: Specifies the method for computing error rate estimates.
          "FDR.YB" computes the Yekutieli-Benjamini FDR estimate,
          "FWER.WY" computes the Westfall-Young FWER estimate, and
          "none" will not compute any error rates. 

   write: Provides a path that permuted global statistics can be
          written to if needed by the user. 

   alpha: Allows the user to define the criterion for significance. By
          default, alpha will be 0.05 for nominal p-values ('error' =
          "none" ), and 0.1 otherwise. 

  method: Currently, 'safe' only assesses significance via
          "permutation". Future versions will allow other resampling
          schemes.

args.local: An optional list to be passed to user-defined local
          statistics that require additional arguments. For default
          statistics, 'args.local = NULL'. 

args.global: An optional list to be passed to global statistics that
          require additional arguments. By default 'args.local = NULL'. 

_D_e_t_a_i_l_s:

     'safe' utilizes a general framework for testing differential
     expression across gene categories that allows it to be used in
     various experimental designs. Through structured permutations of
     the data, 'safe' accounts for the unknown correlation among genes,
     and enables permutation-based estimation of error rates when
     testing multiple categories.  'safe' also provides statistics and
     empirical p-values for the gene-specific differential expression.

_V_a_l_u_e:

     The function returns an object of class 'SAFE'. See help for
     'SAFE-class' for more details.

_A_u_t_h_o_r(_s):

     William T. Barry: wbarry@bios.unc.edu

_R_e_f_e_r_e_n_c_e_s:

     W. T. Barry, A. B. Nobel and F.A. Wright, 2004, _Significance
     Analysis of functional categories in gene expression studies: a
     structured permutation approach_, _Bioinformatics_ In press. 

     See also the vignette included with this package.

_S_e_e _A_l_s_o:

     {'safeplot', 'getCmatrix',  'getPImatrix'.}

_E_x_a_m_p_l_e_s:

     ## Consider a dataset with 1000 genes and 20 arrays in a 2-sample design.
     ## The top 100 genes will be differentially expressed at varying levels

     g.alt <- 100
     g.null <- 900
     n <- 20

     data<-matrix(rnorm(n*(g.alt+g.null)),g.alt+g.null,n)
     data[1:g.alt,1:(n/2)] <- data[1:g.alt,1:(n/2)] + 
                              seq(2,2/g.alt,length=g.alt)
     dimnames(data) <- list(c(paste("Alt",1:g.alt),
                              paste("Null",1:g.null)),
                            paste("Array",1:n))

     ## A treatment vector is also made
     trt <- rep(c("Trt","Ctr"),each=n/2)
     trt

     ## 2 alternative catagories and  18 null categories
     ## will be made of 50 null genes. 

     C.matrix <- kronecker(diag(20),rep(1,50))
     dimnames(C.matrix) <- list(dimnames(data)[[1]],
         c(paste("TrueCat",1:2),paste("NullCat",1:18)))
     dim(C.matrix)

     results <- safe(data,trt,C.matrix,Pi.mat = 100)
     results

     ## SAFE-plot made for the first category
     if (interactive()) { 
     safeplot(results,"TrueCat 1")
     }

