plotOnGene          package:GeneRegionScan          R Documentation

_P_l_o_t _p_r_o_b_e _l_e_v_e_l _d_a_t_a _o_n _a _g_e_n_e

_D_e_s_c_r_i_p_t_i_o_n:

     Function that will investigate the probe level intensity of probes
     as a function of their location in a gene.

_U_s_a_g_e:

         plotOnGene(object, gene, probeData=NULL, label=NULL, genename=NULL, summaryType="median",
         interval=NULL, yMax=NULL, testType=NULL, forcePValue=FALSE, verbose=TRUE, cutoff=0.2, directions="all")

_A_r_g_u_m_e_n_t_s:

  object: A ProbeLevelSet object or a regular ExpressionSet object (in
          which case a probeData argument is required). See
          getLocalProbeIntensities and related functions on how to
          create a ProbeLevelSet.

    gene: A number of gene sequences as DNAstring, vectors of
          DNAStrings, character-vectors or readFASTA outputs.

probeData: Optional if a ProbeLevelSet is submitted as object argument.
          Otherwise, it must be a data frame with rownames
          corresponding to the featureNames of the ExpressionSet and a
          column named "sequence" with the probe sequences as character
          strings

   label: An optional character string specifying a column name in the
          pData of the object. If this argument is given, the gene plot
          will be colour coded based on the different groups (factors)
          in the pData entry. If a summaryType other than 'dots' is
          selected the summarisation is done stratified by the
          different groups in the pData. It can be a numeric or integer
          entry, but it will be coerced to factors.

genename: Optional character string specifying a gene name to include
          in the plot. If not included and a FASTA sequence is given,
          it will default to the name in the FASTA sequence. Otherwise,
          it will default to 'Unknown genename'.

summaryType: Character string specifying one of the following summary
          methods: 'median', 'mean', 'quartiles' or 'dots' (i.e. no
          summary). Specifies how all the sample values or all the
          samples values in a group if 'label' is given, should be
          summarised. Defaults to 'median'.

interval: Optional vector of two integers of bp positions. If given,
          the plot will only include the sequence from gene in the
          given interval. The x-axis annotation is preserved from
          original, so this is useful for zooming on specific regions.

    yMax: Optional integer. If given, this value will be the maximal
          value on the y-axis. This is useful if a few outlier probes
          have very high intensity values, as the default is to set the
          yMax to the maximal intensity value.

testType: Optional character string, defining a statistic procedure to
          identify especially interesting probes. Can be either 'linear
          model', 'students' or 'wilcoxons'. If given, a label must
          also be specified. In this case the plotStatistics function
          will be called and probes that are significantly changed
          between the groups in label at the P-value set in cutoff (see
          cutoff argument) will be circled.

forcePValue: Logical. Is used if the testType argument is used. If TRUE
          all significantly changed probes have P-value given on the
          plot. If FALSE, only plots with less than 10 significant
          probes write P-values. Plots can become very cluttered with
          data if set to TRUE

 verbose: TRUE or FALSE

  cutoff: Integer specifying at what p-value probes should be circled
          when using the 'testType' variable. Defaults to 0.2. For
          cutoffs higher than 0.05, all probes with P >0.05 will be
          circled in grey instead of black.

directions: A character vector of the matching-directions that should
          be scanned (which combinations of complementary and reverse).
          Defaults to "all" which is shorthand for all possible
          directions, but can take anything from:
          c("matchForwardSense", "matchForwardAntisense",
          "matchReverseSense", "matchReverseAntisense")

_D_e_t_a_i_l_s:

     At the very least, this function takes a ProbeLevelSet or an
     ExpressionSet + probeData and the sequence of a gene. It then
     compares the probe sequences given in the ProbeLevelSet or the
     probeData variable with the sequence of the gene given. Any probes
     with sequences found in the gene will be plotted, with their
     intensity level on the y-axis and their location in the gene on
     the x-axis. If no further arguments are given this gives a view of
     relative expression levels along the length of the gene, and can
     be used to investigate which exons are actively transcribed in the
     sample and which are not. An important argument that can be used
     for further investigation is the 'label' argument which specifies
     a column in the pData of the ExpressionSet. In this case the plots
     will be stratified by the factors specified in this column (so
     giving labels with numerical or Date class data will not work).
     This can be very useful when investigating how different sample
     conditions affect various regions of a gene. A transcript isoform
     that is relatively upregulated in a diseased state will for
     example not be discovered if a probeset or metaprobeset covering
     the entire gene is used to summarize the data, since the average
     expression intensity for the gene will remain constant. Using the
     plotOnGene function, however, and specifying a case / control
     label will reveal tendencies for probes at certain exon locations
     to have relation to this label. The testType argument further
     supports this functionality by providing statistical testing and
     highlighting of probes that correlate significantly to the given
     label. In the case / control example, a student's t-test would
     highlight all probes that matched with the exons of the gene that
     was only found in the disease-specific transcript isoform. When
     interpreting the data it is suggested that specific attention is
     paid to the pattern of probes in the same exon. A single probe
     with a P-value < 0.05 might be a false positive caused by chance
     or by cross hybridization of the probe sequence to something else.
     A range of probes in the same exon that all show P-values below or
     close to 0.05, however, is much more likely to be an actual case
     of a transcript isoform having this particular  exon or exons
     being regulated between the groups in the label. Exon structure
     can be easily plotted on the graph using the exonStructure
     function.

     A special case is the search for SNPs which have effect on
     expression levels or variable splicing. The testType argument
     'linear model' is designed for this. The linear model calls the
     internal function doProbeLinear which assign each of the levels in
     the 'label' column of the pdata a value between 1 and the number
     of levels, in the order in which they are sorted. For genotypes
     given as "AA", "AB", "BB" character strings this will give "AA" =
     1, "AB" = 2 and "BB" = 3. The doProbeLinear then calculates a
     linear model between the intensity values and these numbers, and
     returns the P-value. In this case, low P-values can be interpreted
     as a case where the heterozygote samples have intermediary
     expression levels between the two homozygotes. This is the case
     that can be expected to be seen if the nucleotide type of SNP does
     in fact have any effect on the  mRNA concentration levels in the 
     sample.

_V_a_l_u_e:

     No value, but plots the local expression levels relations of each
     probe found in the submitted gene sequence as a function of its
     location along this sequence. Various statistics and
     summarizations on pdata can be employed, as specified in details.

_A_u_t_h_o_r(_s):

     Lasse Folkersen

_S_e_e _A_l_s_o:

     'geneRegionScan', 'plotCoexpression'

_E_x_a_m_p_l_e_s:

             data(exampleProbeLevelSet)
             
             plotOnGene(exampleProbeLevelSet, mrna, summaryType="dots", interval=c(500,1000))
             
             plotOnGene(exampleProbeLevelSet, mrna, label="genotype3", testType="linear model")
             exonStructure(mrna, genomic)

