benhur              package:clusterStab              R Documentation

_A _F_u_n_c_t_i_o_n _t_o _E_s_t_i_m_a_t_e _t_h_e _N_u_m_b_e_r _o_f _C_l_u_s_t_e_r_s _i_n _M_i_c_r_o_a_r_r_a_y _D_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     This function estimates the number of clusters in e.g., microarray
     data using an iterative process proposed by Asa Ben-Hur.

_U_s_a_g_e:

     ## S4 method for signature 'exprSet':
     benhur(object, freq, upper, seednum = NULL, linkmeth
     = "average", iterations = 100)
     ## S4 method for signature 'matrix':
     benhur(object, freq, upper, seednum = NULL, linkmeth
     = "average", iterations = 100)

_A_r_g_u_m_e_n_t_s:

  object: Either a matrix or a 'exprSet' 

    freq: The proportion of samples to use. This should be somewhere
          between 0.6 - 0.8 for best results.

   upper: The upper limit for number of clusters.

 seednum: A value to pass to 'set.seed', which will allow for exact
          reproducibility at a later date.

linkmeth: Linkage method to pass to 'hclust'. Valid values include
          "average", "centroid", "ward", "single", "mcquitty", or
          "median".

iterations: The number of iterations to use. The default of 100 is a
          reasonable number. 

_D_e_t_a_i_l_s:

     This function may be used to estimate the number of true clusters
     that exist in a set of microarray data. This estimate can be used
     to as input for 'clusterComp' to estimate the stability of the
     clusters.

     The primary output from this function is a set of histograms that
     show for each cluster size how often similar clusters are formed
     from subsets of the data. As the number of clusters increases, the
     pairwise similarity of cluster membership will decrease. The basic
     idea is to choose the histogram corresponding to the largest
     number of clusters in which the majority of the data in the
     histogram is concentrated at or near 1.

     If overlay is set to 'TRUE', an additional CDF plot will be
     produced. This can be used in conjunction with the histograms to
     determine at which cluster number the data are no longer
     concentrated at or near 1.

_V_a_l_u_e:

     The output from this function is an object of class 'benhur'. See
     the 'benhur-class' man page for more information.

_A_u_t_h_o_r(_s):

     Originally written by Mark Smolkin <marksmolkin@hotmail.com>
     further modifications by James W. MacDonald
     <jmacdon@med.umich.edu>

_R_e_f_e_r_e_n_c_e_s:

     A. Ben-Hur, A. Elisseeff and I. Guyon. A stability based method
     for discovering structure in clustered data. Pacific Symposium on
     Biocomputing, 2002. Smolkin, M. and Ghosh, D. (2003).  Cluster
     stability scores for microarray data in cancer studies . BMC
     Bioinformatics 4, 36 - 42.

_E_x_a_m_p_l_e_s:

     data(eset)
     tmp <- benhur(eset, 0.7, 5)
     hist(tmp)
     ecdf(tmp)

