adSplit               package:adSplit               R Documentation

_A_n_n_o_t_a_t_i_o_n-_D_r_i_v_e_n _S_p_l_i_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     This function searches for annotation-driven splits of patients in
     microarray data. A split is a partitionning of patients into two
     groups. In order to do so it referes to GO terms and KEGG
     pathways. In addition, a significance measure can be computed by
     simulating a random distribution of scores. DLD-scores are used 
     to judge the quality of a split.

_U_s_a_g_e:

     adSplit(mydata, annotation.ids, chip.name, 
             min.probes = 20, max.probes = NULL, 
             B = NULL, min.group.size = 5, ngenes = 50, 
             ignore.genes = 5)

_A_r_g_u_m_e_n_t_s:

  mydata: either an expression set as defined by the package 'Biobase'
          or a matrix of expression levels (rows=genes,
          columns=samples).

annotation.ids: a vector of GO or KEGG identifiers in the form "GO:..."
          or "KEGG:..." respectively. The prefix "KEGG:" is removed
          from the KEGG-identifiers before accessing the chip's
          "...PATH2PROBES" hash.

chip.name: the name of the chip by which the expression set is
          measured. 'adSplit' attempts to load a library of the same
          name and expects to find a hash called
          "<chip-name>GO2ALLPROBES" and one called
          "<chip-name>PATH2PROBES" there.

min.probes: annotation identifiers with fewer than this associated
          genes are skipped.

max.probes: annotation identifiers with more than this associated genes
          are skipped. The default is ten percent of the genes on the
          chip.

       B: the number of random gene set samplings to be performed to 
          compute empirical p-values.

min.group.size: filter criteria to avoid splits suggesting tiny groups.
          Splits where one of the two suggested groups are smaller than
          this number are removed from the split set.

  ngenes: number of genes used to compute DLD scores.

ignore.genes: number of best scoring genes to be ignored when computing
          DLD scores.

_D_e_t_a_i_l_s:

     This function applies the same splitting procedure to all
     annotation identifiers provided. Firstly, the associated genes for
     one identifier are determined and extracted from the expression
     data. Then the 'diana2means' function is applied to the restricted
     data and the different splits generated are collected into a
     single 'splitSet' object.

     As annotation identifiers vectors of identifiers of the
     'KEGG:nnnnn' and 'GO:nnnnnn' are valid. In addition, the keywords
     "KEGG", "GO" and "all" are allowed, representing all terms in the
     corresponding ontology. 

     If 'B' is set to a integer number this number of samplings are
     used to generate a null-distribution of DLD-scores. This
     distribution is used to compute empirical p-values for each split.
     If more than one valid split is found, multiple testing is
     corrected for by applying Benjamini-Hochbergs correction from the
     multtest package.

_V_a_l_u_e:

     Returns an object of class 'splitSet' with the following list
     elements:  

    cuts: a matrix of split attributions. One row per annotation
          identifier (GO term or KEGG pathway for which a split has
          been generated. One column per object in the dataset.

   score: one score per generated split.

  pvalue: one empirical p-value per generated split, or 'NULL'

  qvalue: one q-value computed according Benjamini-Hochberg's
          correction for multiple testing per generated split, or
          'NULL'

_A_u_t_h_o_r(_s):

     Claudio Lottaz, Joern Toedling

_S_e_e _A_l_s_o:

     'diana2means', 'randomDiana2means',  'image.splitSet'

_E_x_a_m_p_l_e_s:

      
     # prepare data
     library(golubEsets) 
     data(Golub_Merge) 

     # generate annotation-driven splits for apoptosis and signal transduction
     x <- adSplit(Golub_Merge, "GO:0006915", "hu6800")
     x <- adSplit(Golub_Merge, c("GO:0007165","GO:0006915"), "hu6800", max.probes=7000)

     # generate a split for glutamate metabolism including 
     # an empirical p-value
     x <- adSplit(Golub_Merge, "KEGG:00251", "hu6800", B=100)

     ## Not run: 
     # generate splits for all KEGG pathways.
     x <- adSplit(Golub_Merge, "KEGG", "hu6800")
     image(x)
     ## End(Not run)

