edd                   package:edd                   R Documentation

_n_e_w _e_x_p_r_e_s_s_i_o_n _d_e_n_s_i_t_y _d_i_a_g_n_o_s_t_i_c_s _i_n_t_e_r_f_a_c_e

_D_e_s_c_r_i_p_t_i_o_n:

     this will replace edd.unsupervised; has more sensible parameters

_U_s_a_g_e:

     edd(eset, distList=eddDistList, tx=c(sort,flatQQNormY)[[1]],
             refDist=c("multiSim", "theoretical")[1], 
             method=c("knn", "nnet", "test")[1], nRowPerCand=100, ...)

_A_r_g_u_m_e_n_t_s:

    eset: eset - instance of Biobase 'exprSet' class

distList: distList - list comprised of eddDist objects

      tx: tx - transformation of data and reference prior to
          classification 

 refDist: refDist - type of reference distribution system to use

  method: method - type of classifier to use.  knn is k-nearest
          neighbors, nnet is neural net, test is max p-value from
          ks.test

nRowPerCand: nRowPerCand - number of realizations for a multiSim
          reference system

     ...: ... - parameters to classifiers

_D_e_t_a_i_l_s:

     Classifies genes according to distributional shape, by comparing
     observed expression distributions to a collection of references,
     which may be simulated or evaluated theoretically.

     The distList argument is important.  It enumerates the catalog of
     distributions for classification of gene expression vectors by
     distributional shape.  See the HOWTO-edd vignette for information
     on how this list is constructed and how it can be extended.

     The tx argument specifies how the data are processed for
     comparison to the reference catalog.  This is a function on a
     vector returning a vector, but the input and the output need not
     have the same length. The default value of tx is sort, which
     entails that the order statistics are treated as multivariate data
     for classification.  

     The refDist argument selects the type of reference catalog. 
     Options are 'multiSim', for which the reference consists of
     nRowPerCand realizations of each catalog entry, and 'theoretical',
     for which the reference consists of one vector of quantiles for
     each catalog entry.

     The method argument selects the type of classifier. It would be
     desirable to allow this to be a function, but there is
     insufficient structure on classifier argument and return value
     structure to permit this at present; see the e1071 package for
     some work on handling various classifiers programmatically (e.g.,
     'tune').

_V_a_l_u_e:

     a character vector or factor depending on the classifier

_A_u_t_h_o_r(_s):

     Vince Carey <stvjc@channing.harvard.edu>

_S_e_e _A_l_s_o:

     'exprSet'

_E_x_a_m_p_l_e_s:

     require(Biobase)
     data(eset)
     # should filter to genes with reasonable variation
     table( edd(eset, meth="nnet", size=10, decay=.2) )
     library(golubEsets)
     data(golubMerge)
     madvec <- apply(exprs(golubMerge),1,mad)
     minvec <- apply(exprs(golubMerge),1,min)
     keep <- (madvec > median(madvec)) & (minvec > 300)
     gmfilt <- golubMerge[keep==TRUE,]
     ALL <- gmfilt$ALL.AML=="ALL"
     gall <- gmfilt[,ALL==TRUE]
     gaml <- gmfilt[,ALL==FALSE]
     alldists <- edd(gall, meth="nnet", size=10, decay=.2)
     amldists <- edd(gaml, meth="nnet", size=10, decay=.2)
     table(alldists,amldists)

