nudge1                 package:nudge                 R Documentation

_F_u_n_c_t_i_o_n _f_o_r _n_o_r_m_a_l_i_z_i_n_g _d_a_t_a, _f_i_t_t_i_n_g _a _n_o_r_m_a_l-_u_n_i_f_o_r_m _m_i_x_t_u_r_e _a_n_d _e_s_t_i_m_a_t_i_n_g _p_r_o_b_a_b_i_l_i_t_i_e_s _o_f _d_i_f_f_e_r_e_n_t_i_a_l _e_x_p_r_e_s_s_i_o_n _i_n _t_h_e _c_a_s_e _w_h_e_r_e _t_h_e _t_w_o _s_a_m_p_l_e_s _a_r_e _b_e_i_n_g _c_o_m_p_a_r_e_d _d_i_r_e_c_t_l_y

_D_e_s_c_r_i_p_t_i_o_n:

     After a mean and variance normalization, a two component mixture
     model is fitted to the data. The normal component represents the
     genes that are not differentially expressed and the uniform
     component represents the genes that are differentially expressed.
     Posterior probabilities for differential expression are computed
     from the fitted model.

_U_s_a_g_e:

     nudge1(logratio, logintensity, dye.swap = FALSE, span1 = 0.6, span2 = 0.2,
     quant = 0.99, z = NULL, tol = 0.00001,iterlim=500)

_A_r_g_u_m_e_n_t_s:

logratio: A matrix or vector of log (base 2) ratios of intensity
          expressions in 2 samples, with rows indexing genes and
          columns (if necessary) indexing replicates.

logintensity: A matrix or vector of total log (base 2) total
          intensities (defined as the product) of intensity expressions
          in 2 samples, with rows indexing genes and columns (if
          necessary) indexing replicates.

dye.swap: A logical value indicating whether or not the data is from a
          balanced dye-swap. Only used for multiple replicate
          experiments.

   span1: Proportion of data used to fit the loess regression of the
          (average-across-replicates) log ratios on the
          (average-across-replicates) log total intensities for the
          mean normalization.

   span2: Proportion of data used to fit the loess regression of the
          absolute (mean normalized) log ratios on the log total
          intensities for the variance normalization. Only used for
          single replicate experiments.

   quant: Quantile to be used from the distribution of standard
          deviations of log ratios across replicates for all genes
          whose standard deviation was smaller than their absolute
          (mean normalized) average-across-replicates log ratio. Only
          used for multiple replicate experiments.

       z: An optional 2-column matrix with each row giving a starting
          estimate for the probability of the gene (in the
          corresponding row of the log ratio matrix/vector) not being
          differentially expressed and a starting estimate for the
          probability of the gene being differentially expressed. Each
          row should add up to 1.

     tol: A scalar tolerance for relative convergence of the
          loglikelihood.

 iterlim: The maximum number of iterations the EM is run for.

_D_e_t_a_i_l_s:

     A balanced dye swap is where a certain number of replicates have a
     particular dye to sample assigment and the same number of other
     replicates have the reversed assignment. Note in this case log
     ratios should be taken with numerators being the same sample and
     denominators the other sample, i.e. ratios should always be sample
     i/sample j rather than red dye/green dye for all replicates.

_V_a_l_u_e:

     A list including the following components 

  pdiff : A vector with the estimated posterior probabilities of being
          in the group of differentially expressed genes.

 lRnorm : A vector with the normalized (average-across-replicates) log
          ratios.

     mu : The estimated mean of the group of genes that are not
          differentially expressed.

  sigma : The estimated variance of the group of genes that are not
          differentially expressed.

 mixprob: The prior/mixing probability of a gene being in the group of
          genes that are not differentially expressed.

      a : The minimum value of the normalized data.

      b : The maximum value of the normalized data.

loglike : The log likelihood for the fitted mixture model.

   iter : The number of iterations run by the EM algorithm until either
          convergence or iteration limit was reached.

_A_u_t_h_o_r(_s):

     N. Dean and A. E. Raftery

_R_e_f_e_r_e_n_c_e_s:

     N. Dean and A. E. Raftery (2005). Normal uniform mixture
     differential gene expression detection for cDNA microarrays.  BMC
     Bioinformatics. 6, 173-186. 

     <URL: http://www.biomedcentral.com/1471-2105/6/173>

     S. Dudoit, Y. H. Yang, M. Callow and T. Speed (2002). Statistical
     methods for identifying differentially expressed genes in
     replicated cDNA microarray experiments. Stat. Sin. 12, 111-139.

_S_e_e _A_l_s_o:

     'nudge2','norm1a','norm1b','norm1c','norm1d','norm2c','norm2d'

_E_x_a_m_p_l_e_s:

     data(like)
     lR<-log(like[,1],2)-log(like[,2],2)
     lI<-log(like[,1],2)+log(like[,2],2)

     result<-nudge1(lR,lI)

     data(hiv)
     lR<-log(hiv[,1:4],2)-log(hiv[,5:8],2)
     lI<-log(hiv[,1:4],2)+log(hiv[,5:8],2)

     result<-nudge1(lR,lI,dye.swap=TRUE)

