BioString-class          package:Biostrings          R Documentation

_C_l_a_s_s "_B_i_o_S_t_r_i_n_g", _r_e_p_r_e_s_e_n_t_s _a _b_i_o_l_o_g_i_c_a_l _s_e_q_u_e_n_c_e

_D_e_s_c_r_i_p_t_i_o_n:

     Class "BioString", contains an encoded string representing a
     biological sequence for a particular alphabet (RNA, DNA or amino
     acid). It represents zero or more substrings of the full string.

_O_b_j_e_c_t_s _f_r_o_m _t_h_e _C_l_a_s_s:

     Objects can be created by calls of the form 'new("BioString",
     alphabet, end, start, values, initialized, ...)'. However, it is
     recommended that users should not call this directly. For now, use
     the function 'NucleotideString' to create objects of class
     "BioString" that uses a nucleotide alphabet (RNA or DNA) and the
     function 'DNAString' for objects using DNA alphabet.

_S_l_o_t_s:

     '_a_l_p_h_a_b_e_t': Object of class '"BioAlphabet"', the alphabet used in
          the sequence. 

     '_i_n_i_t_i_a_l_i_z_e_d': Object of class '"logical"', 'TRUE' if the sequence
          initialized with values. Users should not modify this slot
          directly. 

     '_o_f_f_s_e_t_s': Object of class '"matrix"' and storage mode "integer",
          this stores (in two columns) the start and end points of the
          substrings represented in 'x'. Rows with the first value '1'
          and the second value{0} represent empty substrings.

     '_v_a_l_u_e_s': Object of class '"externalptr"', this internally stores
          the actual encoded sequence as a vector. As objects of class
          "externalptr" are passed by value in R, this saves copying of
          long sequences. 

_M_e_t_h_o_d_s:

     _i_n_i_t_i_a_l_i_z_e(._O_b_j_e_c_t, _a_l_p_h_a_b_e_t, _o_f_f_s_e_t_s=_c_b_i_n_d(_1, _0), _v_a_l_u_e_s=_B_i_o_S_t_r_i_n_g_N_e_w_V_a_l_u_e_s(_a_l_p_h_a_b_e_t, _e_n_d), _i_n_i_t_i_a_l_i_z_e_d=!_m_i_s_s_i_n_g(_v_a_l_u_e_s)) Const
          ruct an object of class "BioString". Usually not called
          directly by users. 

     _l_e_n_g_t_h(_x) Return the number of substrings represented by 'x'.

     _x[_i] Return the substrings in 'x' corresponding to index 'i'.

     _x[[_i]] Return the substring in 'x' corresponding to the index 'i'.
          The index 'i' must be of length '1'.

     _n_c_h_a_r(_x, _t_y_p_e) Return the number of characters in each substring
          represented in 'x'. 'type' is not used for now.

     _s_h_o_w(_o_b_j_e_c_t) Display 'object' of class "BioString".

     _a_s._c_h_a_r_a_c_t_e_r(_x) Convert a "BioString" object to a character vector
          using its native alphabet.

     _a_s._m_a_t_r_i_x(_x) Return a two-column matrix of integers, the first
          column representing the start index and the scond column
          representing the end index of the substrings in the full
          string.

     _s_u_b_s_t_r(_x, _s_t_a_r_t, _s_t_o_p) Return another BioString object with value
          equivalent to 'substr(as.character(x), start, stop)'.

     _s_u_b_s_t_r_i_n_g(_t_e_x_t, _f_i_r_s_t, _l_a_s_t) Return another BioString object with
          value equivalent to 'substring(as.character(text), first,
          last)'.

     _m_a_t_c_h_D_N_A_P_a_t_t_e_r_n(_p_a_t_t_e_r_n, _x, _a_l_g_o_r_i_t_h_m, _m_i_s_m_a_t_c_h) Match the DNA
          string 'x' against 'pattern' using 'algorithm'. The pattern
          can use the letters A,C,G,T,- (the last being the gap
          character) and also the wildcards N (matching A,C,G,T), V
          (matching A,G,C), R (matching A,G) and Y (matching C,T).

     _a_l_l_S_a_m_e_L_e_t_t_e_r(_x, _l_e_t_t_e_r) Return a logical vetor indicating which
          of the elements of 'x' are entirely made up of the letter
          'letter'.

_T_h_e _s_t_r_u_c_t_u_r_e _o_f _t_h_e _v_a_l_u_e_s _s_l_o_t:

     The 'values' slot of the "BioString" class is of class
     "externalptr". It always contains an R vector object in its tag
     field. The other fields are not used at present. The vector in the
     tag field is either a 'CHARSXP' or an 'INTSXP'. The exact type
     depends on the length of the alphabet. 'INTSXP' is used if it is
     more than the number of bits in a C 'char' type and 'CHARSXP' is
     used otherwise.

     We use the 'i'-th bit in the 'char' or 'int' (depending on whether
     the vector is of type CHARSXP or INTSXP) to represent the 'i'-th
     letter in the alphabet where 'i=0' represents the first bit. This
     effectively means that we can have at most '32' letters (including
     gap) in our alphabets for all standard computer architectures.

_A_u_t_h_o_r(_s):

     Saikat DebRoy

_S_e_e _A_l_s_o:

     'BioAlphabet-class' and its subclasses for valid alphabet objects.
     'DNAString' for creating objects of class "BioString" representing
     DNA sequences. 'NucleotideString' for creating objects of class
     "BioString" representing DNA or RNA sequences.

_E_x_a_m_p_l_e_s:

     new("BioString", DNAAlphabet()) # creates an empty DNA string
     x <- DNAString("AAGCTANA", gap="N")
     x
     as.character(x)
     substr(x, 2, 4)
     substring(x, 1, seq(length=nchar(x))) # all prefixes of x
     substring(x, seq(length=nchar(x)), nchar(x)) # all suffixes of x
     matchDNAPattern("GC", x)
     x <- substring(x, 1:3, 3:5)
     x[1:2]
     x[-3] # same as x[1:2]
     x[[3]]

