MLearn-methods         package:MLInterfaces         R Documentation

_u_n_i_f_i_e_d _i_n_t_e_r_f_a_c_e _t_o _m_a_c_h_i_n_e _l_e_a_r_n_i_n_g _m_e_t_h_o_d_s

_D_e_s_c_r_i_p_t_i_o_n:

     unified interface to machine learning methods - new approach
     (August 2005)

_I_n_t_r_o_d_u_c_t_i_o_n:

     Use of 'MLInterfaces' methods to date (version 1.1.3) involves a
     large number of generics with names indicating the method to be
     employed. For example 'knnB()' is used to apply $k$-nearest
     neighbors analysis to an instance of the 'exprSet' class.  In this
     design, the generic has to ``know'' about the parameters to the
     underlying R function implementing the method of interest, and set
     defaults.  This is a somewhat fragile design, in that changes to
     the calling sequences to underlying R functions can break the
     interfaces defined here.

     A new, fully backwards-compatible design is now being introduced. 
     Here there is one generic 'MLearn'.  Its parameters are 'formula',
     'data', 'method', and 'trainInd', and additional parameters to
     underlying implementations of machine learning algorithms are
     passed through ...{}. This new design allows use of ordinary
     formulas and data frames as well as 'exprSet' instances.

     The machine learning methods accommodated in the new design are
     described before the examples below.

_M_e_t_h_o_d_s:

     _f_o_r_m_u_l_a = "_f_o_r_m_u_l_a", _d_a_t_a = "_d_a_t_a._f_r_a_m_e", _m_e_t_h_o_d = "_c_h_a_r_a_c_t_e_r", _t_r_a_i_n_I_n_d = "_n_u_m_e_r_i_c" The
          behavior with this signature is comparable to that of the
          standard R modeling tools, with the exception of the handling
           of the common 'subset' parameter.  Because 'MLInterfaces'
          wishes to inhibit the use of resubstitution estimates of
          generalization error, all 'MLInterfaces' procedures impose
          the requirement of the decomposition of input data into
          training and test subsets.  If you want the behavior of a
          'subset' parameter setting, please form the subset manually
          prior to invoking 'MLearn'.

          Possible values for 'method' are described below, under
          ``Machine learning resources available''.

          Parameter 'trainInd' defines the indices of the records in
          the input dataset that are used for training; remaining
          records are used as a test dataset for evaluation of the
          fitted learner.


     _f_o_r_m_u_l_a = "_c_h_a_r_a_c_t_e_r", _d_a_t_a = "_e_x_p_r_S_e_t", _m_e_t_h_o_d = "_c_h_a_r_a_c_t_e_r", _t_r_a_i_n_I_n_d = "_n_u_m_e_r_i_c" This
          method works for instances of the 'exprSet' class.

          Parameter 'formula' is to be the name of a variable in the
          'pData' slot of the exprSet's 'phenoData'.  In general this
          will be a factor encoding a categorical variable.

          Parameter 'data' is to be an instance of class 'exprSet'.

          Possible values for 'method' are described below, under
          ``Machine learning resources available''.

          Parameter 'trainInd' defines the indices of the records in
          the input dataset that are used for training; remaining
          records are used as a test dataset for evaluation of the
          fitted learner.

          Any additional parameters to be set for 'method' can be
          passed in after 'trainInd'. For example, if '``nnet''' is
          supplied as 'method', the parameter 'size' must be set and
          passed in. .in -5 

_V_a_l_u_e:

     An instance of class 'MLOutput-class'.

_M_a_c_h_i_n_e _l_e_a_r_n_i_n_g _r_e_s_o_u_r_c_e_s _a_v_a_i_l_a_b_l_e:

     Here we provide links to tools that may be identified in the
     'method' parameter.  Just use a string naming the method.  For
     each method, we may have a ``Do not pass parameters'' clause,
     because the interface constructs values of these parameters on the
     basis of parameters set in the call to 'MLearn'.  You may (and in
     some cases must) set and pass parameters not listed in the ``Do
     not pass'' list.

_E_x_a_m_p_l_e_s:

     data(iris)
     tinds <- sample(1:150, 45)
     MLearn(Species~., data=iris, method="nnet", tinds, size=4, decay=.01 )
     MLearn(Species~., data=iris, method="knn", tinds )
     rfdemo <- MLearn(Species~., data=iris, method="randomForest", tinds, importance=TRUE )
     plot(getVarImp(rfdemo))
     # genomics examples
     library(golubEsets)
     MLearn("ALL.AML", golubMerge[1:50,], "rpart", 1:36 )
     MLearn("ALL.AML", golubMerge[1:50,], "knn", 1:36 )

