evalClusterHyper          package:goCluster          R Documentation

_E_v_a_l_u_a_t_e_s _a _c_l_u_s_t_e_r_i_n_g _r_e_s_u_l_t _w_i_t_h _r_e_g_a_r_d _t_o _a_n _e_n_r_i_c_h_m_e_n_t _o_f
_a_n_n_o_t_a_t_i_o_n _t_e_r_m_s _i_n _s_p_e_c_i_f_i_c _c_l_u_s_t_e_r_s.

_D_e_s_c_r_i_p_t_i_o_n:

     The function 'evalClusterHyper' runs through a tree of gene groups
     and  calls the function 'evalAnnosetHyper' for each of them. This
     second function employs the hypergeometric distributon to
     calculate a p-value for each of the annotation terms that is
     annotated to the genes in the group.

_U_s_a_g_e:

     evalClusterHyper(X, uniqueid, Annoset)
     evalAnnosetHyper(Selection, uniqueid, Annoset)

_A_r_g_u_m_e_n_t_s:

       X: The tree (list of lists) of clusters.

 Annoset: This is a list and each element holds a different annotation
          dataset. Each of these datasets is composed of two columns
          with the second column holding the genes ids while the first
          column holds the corresponding annotation terms.

uniqueid: The unique id of the elements in the dataset. 

Selection: A list of genes that comprises one cluster. The gene ids
          given have to match ids from the first column of the
          annotation datasets ('Annoset'). 

_D_e_t_a_i_l_s:

     The function 'evalClusterHyper' analyses a "tree" (list of lists)
     of gene clusters.  It determines the probabilities for the
     frequency of annotation terms within each cluster by employing the
     hypergeometric distribution. The function 'evalAnnosetHyper'
     performs the statistical evaluation for each gene cluster. The
     function will determine all annotation terms that are associated
     with the genes in the cluster. For each of these annotation terms
     the  number of matching genes over the whole list of genes (not
     only the cluster) will be calculated. Finally for each annotation
     term the ratio of matching genes within the cluster and total
     number of genes in the cluster will be compared to the ratio of
     matching genes over the whole list and the total number of genes
     in the list. This allows to determine probabilities according to
     the hypergeometric distribution.

_V_a_l_u_e:

 pvalues: p-values according to the hypergeometric distribution.

selectedPerAnnotation: A vector that holds the number of times the
          annotation was found in the given selection. 

elementsPerAnnotation: A vector that holds the number of times the
          annotation was found over all elements.

selectedTotal: Total number of annotation terms in the given selection.

elementsTotal: Total number of annotation terms over all elements.

_A_u_t_h_o_r(_s):

     Gunnar Wrobel, <URL: work@gunnarwrobel.de>, <URL:
     http://www.gunnarwrobel.de>.

_S_e_e _A_l_s_o:

     'clusterStatisticHyper-class'

_E_x_a_m_p_l_e_s:

     ## We will first creat a goCluster object to get the gene ontology
     ## annotation from it
     data(benomylsetupsmall)
     test <- new("goCluster")
     setup(test) <- benomylsetupsmall
     ## Executing the data object will also execute the annotation
     ## object associated with it. The "execute" function needs
     ## to specify the "test" object a second time since we need
     ## to specify a parent object when executing a goCluster subobject.
     annotation <- execute(test@data, test)
     ## Extract the annotation datasets and the unique ids
     Annoset  <- annotation@anno@annoset
     Uniqueid <- annotation@uniqueid

     ## Test clusters (the genes are specified by there position in
     ## the dataset)
     testclusters <- list(
                          list(
                               c(68, 78),
                               c(32,  7, 72)
                               ),
                          list(c(31, 78)
                          ))

     evalClusterHyper(testclusters, Uniqueid, Annoset)

