glad                  package:GLAD                  R Documentation

_A_n_a_l_y_s_i_s _o_f _a_r_r_a_y _C_G_H _d_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     This function allows the detection of breakpoints in genomic
     profiles obtained by array CGH technology and affects a status
     (gain, normal or lost) to each clone.

_U_s_a_g_e:

     glad.profileCGH(profileCGH, mediancenter=FALSE,
                     smoothfunc="lawsglad", bandwidth=10, round=1.5,
                     model="Gaussian", lkern="Exponential", qlambda=0.999,
                     base=FALSE, sigma,
                     lambdabreak=8, lambdacluster=8, lambdaclusterGen=40,
                     type="tricubic", param=c(d=6),
                     alpha=0.001, msize=5,
                     method="centroid", nmax=8,
                     verbose=FALSE, ...)

_A_r_g_u_m_e_n_t_s:

profileCGH: Object of class 'profileCGH'

mediancenter: If 'TRUE', LogRatio are center on their median.

smoothfunc: Type of algorithm used to smooth 'LogRatio' by a piecewise
          constant function. Choose either 'lawsglad', 'aws' or 'laws'.

bandwidth: Set the maximal bandwidth 'hmax' in the 'aws' or  'laws'
          function. For example, if 'bandwidth=10' then the 'hmax'
          value is set to 10*X_N where X_N is the position of the last
          clone.

   round: The smoothing results are rounded or not depending on the
          'round' argument. The 'round' value is passed to the argument
          'digits' of the 'round' function.

   model: Determines the distribution type of the LogRatio. Keep always
          the model as "Gaussian" (see 'laws').

   lkern: Determines the location kernel to be used (see 'aws' or
          'laws').

 qlambda: Determines the scale parameter for the stochastic penalty
          (see 'aws' or 'laws')

    base: If 'TRUE', the position of clone is the physical position
          onto the chromosome, otherwise the rank position is used.

   sigma: Value to be passed to either argument 'sigma2'    of' aws'
          function or 'shape' of 'laws'. If 'NULL', sigma is calculated
          from the data.

lambdabreak: Penalty term (lambda') used during the  *Optimization of
          the number of breakpoints* step.

lambdacluster: Penalty term (lambda*) used during the *MSHR clustering
          by chromosome* step.

lambdaclusterGen: Penalty term (lambda*) used during the *HCSR
          clustering throughout the genome* step.

    type: Type of kernel function used in the penalty term during the
          *Optimization of the number of breakpoints* step, the *MSHR
          clustering by chromosome* step and the *HCSR clustering
          throughout the genome* step.

   param: Parameter of kernel used in the penalty term.

   alpha: Risk alpha used for the *Outlier detection* step.

   msize: The outliers MAD are calculated on regions with a cardinality
          greater or equal to msize.

  method: The agglomeration method to be used during the *MSHR
          clustering by chromosome* and the *HCSR clustering throughout
          the genome* clustering steps.

    nmax: Maximum number of clusters (N*max) allowed during the the
          *MSHR clustering by chromosome* and the *HCSR clustering
          throughout the genome* clustering steps.

 verbose: If 'TRUE' some information are printed

     ...: 

_D_e_t_a_i_l_s:

     The function 'glad' implements the methodology which is described
     in the article : Analysis of array CGH data: from signal ratio to
     gain and loss of DNA regions (Hup et al., Bioinformatics 2004
     20(18):3413-3422).

     The principle of the GLAD algorithm: First, the detection of
     breakpoints is based on the estimation of a piecewise constant
     function with the Adaptive Weights Smoothing (AWS) procedure
     (Polzehl and Spokoiny, 2002). Thus, a procedure based on penalyzed
     maximum likelihood optimizes the number of breakpoints allows the
     undesirable breakpoints to be removed. Finally, based on the
     regions previously identified, a two-step unsupervised
     classification (*MSHR  clustering by chromosome* and the *HCSR
     clustering throughout the genome*) with model selection criteria
     allows a status to be assigned for each region (gain, loss or
     normal).

     Main parameters to be tuned:

       'qlambda'           if you want the smoothing to fit some very local effect, choose a smaller 'qlambda'.
       'bandwidth'         choose a bandwidth not to small otherwise you will have a lot of little discontinuities.
       'lambdabreak'       More the parameter is high more the number of undesirable breakpoints is high.
       'lambdacluster'     More the parameter is high more the regions within a chromosome are supposed to belong to the same cluster.
       'lambdaclusterGen'  More the parameter is high more the regions over the whole genome are supposed to belong to the same cluster.

_V_a_l_u_e:

        : An object of class "profileCGH" with the following
          attributes:

profileValues: : a data.frame with the following added information:


        *_S_m_o_o_t_h_i_n_g* The smoothing values correspond to the median of
             each *MSHR (i.e. 'Region').*

        *_B_r_e_a_k_p_o_i_n_t_s* The last position of a region with identical
             amount of DNA is flagged by 1 otherwise it is 0. Note that
             during the "Optimization of the number of breakpoints"
             step, removed breakpoints are flagged by -1.

        *_R_e_g_i_o_n* Each position between two breakpoints are labelled the
             same way with an integer value starting from one. The
             label is incremented by one when a new breakpoints occurs
             or when moving to the next chromosome. The variable
             'region' is what we call MSHR.

        *_L_e_v_e_l* Each position with equal smoothing value are labelled
             the same way with an integer value starting from one. The
             label is incremented by one when a new level occurs or
             when moving to the next chromosome.

        *_O_u_t_l_i_e_r_s_A_w_s* Each AWS outliers are flagged by -1 or 1 
             otherwise  it is 0.

        *_O_u_t_l_i_e_r_s_M_a_d* Each MAD outliers are flagged by -1 (if it is in
             the alpha/2 lower tail of the distribution) or 1 (if it is
             in the alpha/2 upper tail of the distribution) otherwise 
             it is 0.

        *_O_u_t_l_i_e_r_s_T_o_t* OutliersAws + OutliersMad.

        *_Z_o_n_e_C_h_r* Clusters identified after *MSHR (i.e. 'Region')
             clustering by chromosome*.

        *_Z_o_n_e_G_e_n* Clusters identified after *HCSR clustering throughout
             the genome*.

        *_Z_o_n_e_G_N_L* Status of each clone : Gain is coded by 1, Loss by -1
             and Normal by 0.

BkpInfo: : the data.frame attribute 'BkpInfo' which gives the list of
          breakpoints:

        *_P_o_s_O_r_d_e_r* The rank position of each clone on the genome.

        *_P_o_s_B_a_s_e* The base position of each clone on the genome.

        *_C_h_r_o_m_o_s_o_m_e* Chromosome name.

SigmaC: : the data.frame attribute 'SigmaC' gives the estimation of the
          LogRatio standard-deviation for each chromosome:

        *_C_h_r_o_m_o_s_o_m_e* Chromosome name.

        *_V_a_l_u_e* The estimation is based on the Inter Quartile Range.

_N_o_t_e:

     People interested in tools dealing with array CGH analysis can
     visit our web-page <URL: http://bioinfo.curie.fr>.

_A_u_t_h_o_r(_s):

     Philippe Hup, glad@curie.fr.

_S_e_e _A_l_s_o:

     'profileCGH', 'as.profileCGH', 'plotProfile'.

_E_x_a_m_p_l_e_s:

     data(snijders)

     ### Creation of "profileCGH" object
     gm13330$Clone <- gm13330$BAC
     profileCGH <- as.profileCGH(gm13330)


     ###########################################################
     ###
     ###  glad function as described in Hup et al. (2004)
     ###
     ###########################################################

     res <- glad(profileCGH, mediancenter=FALSE,
                     smoothfunc="lawsglad", bandwidth=10, round=1.5,
                     model="Gaussian", lkern="Exponential", qlambda=0.999,
                     base=FALSE,
                     lambdabreak=8, lambdacluster=8, lambdaclusterGen=40,
                     type="tricubic", param=c(d=6),
                     alpha=0.001, msize=5,
                     method="centroid", nmax=8,
                     verbose=FALSE)

     ### Genomic profile on the whole genome
     plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing",
     main="Breakpoints detection: GLAD analysis")

     ###Genomic profile for chromosome 1
     plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1,
     Smoothing="Smoothing", main="Chromosome 1: GLAD analysis")

     ### The standard-deviation of LogRatio are:
     res$SigmaC

     ### The list of breakpoints is:
     res$BkpInfo

