macluster              package:maanova              R Documentation

_C_l_u_s_t_e_r_i_n_g _a_n_a_l_y_s_i_s _f_o_r _M_i_c_r_o_a_r_r_a_y _e_x_p_e_r_i_m_e_n_t

_D_e_s_c_r_i_p_t_i_o_n:

     This function bootstraps K-means or hierarchical clusters and
     builds a  consensus tree (consensus group for K-means) from the
     bootstrap result.

_U_s_a_g_e:

     macluster(anovaobj, term, idx.gene, what = c("gene", "sample"), 
         method = c("hc", "kmean"), dist.method = "correlation",
         hc.method = "ward", kmean.ngroups, n.perm = 100)

_A_r_g_u_m_e_n_t_s:

anovaobj: The result object for fitting ANOVA model.

    term: The factor (in formula) used in clustering. The expression
          level for this term will be used in clustering. This term has
          to  correspond to the gene list, e.g, idx.gene in this
          function. The gene list should be the significant hits in
          testing this term.

idx.gene: A vector indicating the list of differentially expressed
          genes. The expression level of these genes will be used to
          construct the cluster.

    what: What to be clustered, either gene or sample.

  method: The clustering method. Right now hierarchical clustering
          ("hc") and K-means ("kmean") are available.

dist.method: Distance measure to be used in hierarchical clustering.
          Besides the methods listed in 'dist', there is a new method
          "correlation" (default). The "correlation" distance equals to
          (1 - $r^2$), where r is the sample correlation between
          observations. 

hc.method: The agglomeration method to be used in hierarchical
          clustering. See 'hclust' for detail.

kmean.ngroups: The number of groups for K-means cluster.

  n.perm: Number of bootstraps. If it is 1, this function will cluster
          the observed data. If it is bigger than 1, a bootstrap will
          be performed.

_D_e_t_a_i_l_s:

     Normally after the F test, user can select a list of
     differentially expressed genes. The next step is to investigate
     the relationship among these genes. Using the expression levels of
     these genes, the user can cluster the genes or the samples using
     either hierarchical or K-means clustering algorithm. In order to
     evaluate the stability of the relationship, this function
     bootstraps the data, re-fits the model and recluster the
     genes/samples. Then for a certain number of bootstrap iterations,
     say, 1000, we have 1000 cluster results. We can use 'consensus' to
      build the consensus tree from these 1000 trees. 

     Note that if you have a large number (say, more than 100) of
     genes/samples to cluster, hierarchical clustering could be very
     unstable. A slight change in the data can result in a big change
     in the tree structure. In that case, K-means will give better
     results.

_V_a_l_u_e:

     An object of class 'macluster'.

_A_u_t_h_o_r(_s):

     Hao Wu

_S_e_e _A_l_s_o:

     'hclust', 'kmeans', 'consensus'

_E_x_a_m_p_l_e_s:

     # load in data
     data(abf1)
     # fit the anova model
     ## Not run: 
     fit.fix = fitmaanova(abf1,formula = ~Strain)
     # test Strain effect 
     test.fix = matest(abf1, fit.fix, term="Strain",n.perm= 1000)
     # pick significant genes - pick the genes selected by Fs test
     idx <- volcano(test.fix)$idx.Fs
     # do k-means cluster on genes
     gene.cluster <- macluster(fit.fix, term="Strain", idx, what="gene", 
        method="kmean", kmean.ngroups=5, n.perm=100)
     # get the consensus group
     consensus(gene.cluster, 0.5)

     # HC cluster on samples
     sample.cluster <- macluster(fit.fix, term="Strain", idx, what="sample",method="hc")
     # get the consensus group
     consensus(sample.cluster, 0.5)## End(Not run)

