xvalLoop            package:MLInterfaces            R Documentation

_C_r_o_s_s-_v_a_l_i_d_a_t_i_o_n _i_n _c_l_u_s_t_e_r_e_d _c_o_m_p_u_t_i_n_g _e_n_v_i_r_o_n_m_e_n_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     Use cross-validation in a clustered computing environment

_U_s_a_g_e:

     xvalLoop( cluster, ... )

_A_r_g_u_m_e_n_t_s:

 cluster: Any S4-class object, used to indicate how to perform
          clustered computations.

     ...: Additional arguments used to inform the clustered
          computation.

_D_e_t_a_i_l_s:

     Cross-validiation usually involves repeated calls to the same
     function, but with different arguments. This provides an obvious
     place for using clustered computers to enhance execution. The
     method 'xval' is structured to exploit this; 'xvalLoop' provides
     an easy mechanism to change how 'xval' performs cross-validation.

     The idea is to write an 'xvalLoop' method that returns a function.
     The function is then used to execute the cross-validation. For
     instance, the default method returns the function 'lapply', so the
     cross-validation is performed by using 'lapply'. A different
     method might return a function that executed lapply-like
     functions, but sent different parts of the function to different
     computer nodes.

     An accompanying vignette illustrates the technique in greater
     detail. An effective division of labor is for experienced cluster
     programmers to write lapply-like methods for their favored
     clustering environment. The user then only has to add the cluster
     object to the list of arguments to 'xval' to get clustered
     calculations.

_V_a_l_u_e:

     A function taking arguments like those for 'lapply'

_E_x_a_m_p_l_e_s:

     ## Not run: 
     library(golubEsets)
     data(golubMerge)
     smallG <- golubMerge[200:250,]

     # Evaluation on one node

     lk1 <- xval(smallG, "ALL.AML", knnB, xvalMethod="LOO", group=as.integer(0))
     table(lk1,smallG$ALL.AML)

     # Evaluation on several nodes -- a cluster programmer might write the following...

     library(snow)
     setOldClass("spawnedMPIcluster")

     setMethod("xvalLoop", signature( cluster = "spawnedMPIcluster"),
     ## use the function returned below to evalutae
     ## the central cross-validation loop in xval
     function( cluster, ... ) {
         clusterExportEnv <- function (cl, env = .GlobalEnv)
         {
             unpackEnv <- function(env) {
                 for ( name in ls(env) ) assign(name, get(name, env), .GlobalEnv )
                 NULL
             }
             clusterCall(cl, unpackEnv, env)
         }
         function(X, FUN, ...) { # this gets returned to xval
             ## send all visible variables from the parent (i.e., xval) frame
             clusterExportEnv( cluster, parent.frame(1) )
             parLapply( cluster, X, FUN, ... )
         }
     })

     # ... and use the cluster like this...

     cl <- makeCluster(2, "MPI")
     clusterEvalQ(cl, library(MLInterfaces))

     lk1 <- xval(smallG, "ALL.AML", knnB, xvalMethod="LOO", group=as.integer(0), cluster = cl)
     table(lk1,smallG$ALL.AML)
     ## End(Not run)

