matchDNAPattern          package:Biostrings          R Documentation

_G_e_n_e_r_i_c _t_o _f_i_n_d _a_l_l _m_a_t_c_h_e_s _o_f _a _p_a_t_t_e_r_n _i_n _a _D_N_A _s_t_r_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     Generic that finds all matches of a pattern in a DNA string.
     Currently two algorithms are implemented. The default algorithm is
     an extension of the Boyer-Moore algorithm. The extended algorithm
     allows some wildcards in addition to the symbols for the bases and
     gap. The other algorithm is a simple forward search that examines
     all substrings of the full string of the same length as the
     pattern from the begining to end.

_U_s_a_g_e:

     matchDNAPattern(pattern, x, algorithm, mismatch)

_A_r_g_u_m_e_n_t_s:

 pattern: An object representing the pattern string. The string in
          'pattern' can use any of the standard DNA pattern letters.
          See 'DNAPatternAlphabet' for all valid letters.

       x: An object representing a DNA string. 

algorithm: Currently the only valid values are '"boyer-moore"',
          '"forward-search"' and '"shift-or"'. The forward search
          algorithm is often as fast as the more sphisticated
          Boyer-Moore algorithm when the patterns being matched are
          very simple. The shift-or algorithm is even faster. However,
          it can only be used for patterns of length at most 32 or 64
          depending on the number of bits in a machine word. The
          shift-or algorithm can also do inexact matches for a given
          number of mismatches. The default is "shift-or" where valid
          and "boyer-moore" otherwise

mismatch: An integer, the number of mismatches allowed. The defualt is
          0. If the default is non-zero an inexact match algorithm is
          used for matching. 

_V_a_l_u_e:

     An object of class "BioString" with the same length as the number
     of matches. Each element in the "BioString" object is a match. To
     obtain the start and end points of the matches, use 'as.matrix' on
     the return value. See documentation for the "BioString" class for
     more details.

_A_u_t_h_o_r(_s):

     Saikat DebRoy

_R_e_f_e_r_e_n_c_e_s:

     Dan Gusfield - Algorithms on strings, trees, and sequences

_S_e_e _A_l_s_o:

     'BioString-class' for the type of the return value.

_E_x_a_m_p_l_e_s:

     x <- DNAString("AAGCGCGATATG")
     m1 <- matchDNAPattern("GCNNNAT", x)
     m1
     as.matrix(m1)
     m2 <- matchDNAPattern("GCNNNAT", x, algorithm="forward-search")
     m2
     as.matrix(m2)
     data('yeastSEQCHR1')
     yeast1 <- DNAString(yeastSEQCHR1)
     PpiI <- "GAACNNNNNCTC" # a restriction enzyme pattern
     match1.PpiI <- matchDNAPattern(PpiI, yeast1)
     match2.PpiI <- matchDNAPattern(PpiI, yeast1, algorithm="forward-search")
     match1.PpiI
     match2.PpiI
     match3.PpiI <- matchDNAPattern(PpiI, yeast1, mismatch=1)
     match3.PpiI

