iterateBMAglm.wrapper      package:iterativeBMA      R Documentation

_I_t_e_r_a_t_i_v_e _B_a_y_e_s_i_a_n _M_o_d_e_l _A_v_e_r_a_g_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     This function repeatedly calls 'bic.glm' from the 'BMA' package
     until all variables are exhausted. The data is assumed to consist
     of two classes. Logistic regression is used for classification.

_U_s_a_g_e:

     iterateBMAglm.wrapper (sortedA, y, nbest=10, maxNvar=30, maxIter=20000, thresProbne0=1) 

_A_r_g_u_m_e_n_t_s:

 sortedA: data matrix where columns are variables and rows are 
          observations.  The variables (columns) are assumed to be
          sorted using a univariate measure. In the case of gene
          expression data, the columns (variables)  represent genes,
          while the rows (observations) represent  samples or
          experiments.

       y: class vector for the observations (samples or  experiments)
          in the training data.  Class numbers are assumed to start
          from 0, and the length of this class vector should be equal
          to the number of rows in sortedA. Since we assume 2-class
          data, we expect the class vector consists of zero's and
          one's.

   nbest: a number specifying the number of models of each size 
          returned to 'bic.glm' in the 'BMA' package.  The default is
          10.

 maxNvar: a number indicating the maximum number of variables used in
          each iteration of 'bic.glm' from the 'BMA' package. The
          default is 30.

 maxIter: a number indicating the maximum of iterations of  'bic.glm'.
          The default is 20000.

thresProbne0: a number specifying the threshold for the posterior
          probability that each variable (gene) is non-zero (in
          percent).  Variables (genes) with such posterior  probability
          less than this threshold are dropped in the iterative
          application of 'bic.glm'.  The default is 1 percent.

_D_e_t_a_i_l_s:

     In this function, the variables are assumed to be sorted, and
     'bic.glm' is called repeatedly.  In the first application of the
     'bic.glm' algorithm, the top 'maxNvar' univariate ranked genes are
     used.  After each application of the 'bic.glm' algorithm, the
     genes with 'probne0' < 'thresProbne0' are dropped, and the next
     univariate ordered genes are added to the BMA window. The function
     'iterateBMAglm.train' calls 'BssWssFast' before calling this
     function. Using this function, users can experiment with
     alternative  univariate measures.

_V_a_l_u_e:

     If all variables are exhausted, an object of class  'bic.glm'
     returned by the last iteration of 'bic.glm'. Otherwise, -1 is
     returned.  The object of class 'bic.glm' is a list consisting  of
     the following components: 

  namesx: the names of the variables in the last iteration of 
          'bic.glm'.

postprob: the posterior probabilities of the models selected.

deviance: the estimated model deviances.

   label: labels identifying the models selected.

     bic: values of BIC for the models.

    size: the number of independent variables in each of the models.

   which: a logical matrix with one row per model and one column per 
          variable indicating whether that variable is in the model.

 probne0: the posterior probability that each variable is non-zero  (in
          percent).

postmean: the posterior mean of each coefficient (from model
          averaging).

  postsd: the posterior standard deviation of each coefficient  (from
          model averaging).

condpostmean: the posterior mean of each coefficient conditional on 
          the variable being included in the model.

condpostsd: the posterior standard deviation of each coefficient 
          conditional on the variable being included in the model.

     mle: matrix with one row per model and one column per variable
          giving  the maximum likelihood estimate of each coefficient
          for each model.

      se: matrix with one row per model and one column per variable
          giving  the standard error of each coefficient for each
          model.

 reduced: a logical indicating whether any variables were dropped 
          before model averaging.

 dropped: a vector containing the names of those variables dropped 
          before model averaging.

    call: the matched call that created the bma.lm object.

_N_o_t_e:

     The 'BMA' and 'Biobase' packages are required.

_R_e_f_e_r_e_n_c_e_s:

     Raftery, A.E. (1995).  Bayesian model selection in social research
     (with Discussion). Sociological Methodology 1995 (Peter V.
     Marsden, ed.), pp. 111-196, Cambridge, Mass.: Blackwells.

     Yeung, K.Y., Bumgarner, R.E. and Raftery, A.E. (2005)  Bayesian
     Model Averaging: Development of an improved multi-class, gene
     selection and classification tool for microarray data. 
     Bioinformatics 21: 2394-2402.

_S_e_e _A_l_s_o:

     'iterateBMAglm.train',   'iterateBMAglm.train.predict',
     'iterateBMAglm.train.predict.test', 'BssWssFast'

_E_x_a_m_p_l_e_s:

     library (Biobase)
     library (BMA)
     library (iterativeBMA)
     data(trainData)
     data(trainClass)

     ## Use the BSS/WSS ratio to rank all genes in the training data
     sorted.vec <- BssWssFast (t(exprs(trainData)), trainClass, numClass = 2)
     ## get the top ranked 50 genes
     sorted.train.dat <- t(exprs(trainData[sorted.vec$ix[1:50], ]))
      
     ## run iterative bic.glm
     ret.bic.glm <- iterateBMAglm.wrapper (sorted.train.dat, y=trainClass)

     ## The above commands are equivalent to the following 
     ret.bic.glm <- iterateBMAglm.train (train.expr.set=trainData, trainClass, p=50)

