getMipsInfo              package:ScISI              R Documentation

_A _f_u_n_c_t_i_o_n _t_h_a_t _r_e_a_d_s _t_h_e _d_o_w_n_l_o_a_d_e_d _t_e_x_t _f_i_l_e _f_r_o_m _t_h_e _M_I_P_S
_r_e_p_o_s_i_t_o_r_y _a_n_d _g_e_n_e_r_a_t_e_s _a _n_a_m_e_d _l_i_s_t _o_f _p_r_o_t_e_i_n _c_o_m_p_l_e_x_e_s.

_D_e_s_c_r_i_p_t_i_o_n:

     This function reads the downloaded text file from the MIPS
     database and parses the file for those collection of proteins
     either referred to as a "complex", an "-ase" (e.g. RNA
     Polymerase), or a "-some" (e.g. ribosome)  and (or) user supplied
     terms as the protein complex of interest. It returns a list
     containing two items: a named list of protein complexes and a
     character vector (of the same length as the named list) describing
     each protein complex.

_U_s_a_g_e:

     getMipsInfo(wantDefault = TRUE, toGrep = NULL,
     parseType = NULL, eCode = c("901.01.03", "901.01.03.01", "901.01.03.02",
                      "901.01.04", "901.01.04.01", "901.01.04.02",
                      "901.01.05", "901.01.05.01", "901.01.05.02",
                      "902.01.09.02", "902.01.01.02.01.01",
                      "902.01.01.02.01.01.01", "902.01.01.02.01.01.02",
                      "902.01.01.02.01.02", "902.01.01.02.01.02.01",
                      "902.01.01.02.01.02.02", "902.01.01.04",
                      "902.01.01.04.01", "902.01.01.04.01.01",
                      "902.01.01.04.01.02", "902.01.01.04.01.03",
                      "902.01.01.04.02", "901.01.09.02"), wantAllComplexes=TRUE)

_A_r_g_u_m_e_n_t_s:

wantDefault: A logical. If true, the default parameters "complex",
          "\Base\b" and "\Bsome\b" are grepped.

  toGrep: A character vector. Each entry is a term with perl regular
          expressions which are intended to be searched in the Mips
          text file.

parseType: A character vector. Each entry is a term that tells how each
          entry of toGrep should be parsed; e.g. "grep" or "agrep"

   eCode: A character vector. The evidence code is given in the file
          evidence.scheme found in the inst/extdata section of the
          package.

wantAllComplexes: A logical.If TRUE, the function only returns
          aggregate protein complexes. If FALSE, the function will also
          return subcomplexes as well.

_D_e_t_a_i_l_s:

     This function's generic operation is to parse the Mips protein
     complex database (as given by the downloaded text file) and search
     for pre-determined or chosen terms. It returns a named list of
     chracter vectors where the names are MIPS id's from the protein
     complex sub-category and the vectors consist of proteins
     corresponding to that particular MIPS id. Running this function
     has multiple combinations:

     1. If the wantDefault parameter is TRUE, the function will grep
     for "complex", "\Base\b", and "\Bsome\b".

     2. If toGrep is not NULL, it will be a character vector with terms
     and perl regular expressions that are intended for searching in
     the MIPS database. NB - it toGrep is not NULL, then parseType
     should also not be NULL as the parseType indicates how each term
     should be searched.

     3. parseType needs to be supplied if toGrep is not NULL. It is a
     character vector, either a single entry or of length equal to the
     length of toGrep, detailing how each term in toGrep will be parsed
     in the GO database. If only one term is supplied for parseType,
     then all the terms in toGrep will be parsed identically.
     Otherwise, the i-th term in parseType will reflect the parsing of
     the i-th term in toGrep.

     4. The eCode argument is a character vector consistin of MIPS
     evidence codes. A protein will be removed from the protein complex
     is ALL the evidence codes used to annotate the protein are
     supplied in the eCode argument; otherwise, it is left in the
     complex.

     5. If wantAllComplexes parameter is True, the function will return
     the sub-groupings (sub-complexes or sub-structures) as given by
     the clusterings in the MIPS protein complex database.

_V_a_l_u_e:

     The return value is a list - 

    Mips: A named list of the protein complexes. Each list entry is
          denoted by some particlar MIPS ID (with the pre-fix "MIPS-")
          attachedand points to a character vector which are the
          members of that protein complex

    DESC: A named chracter vector describing each protein complex
          parsed by the function. (The names are the MIPS ID)

_A_u_t_h_o_r(_s):

     Tony Chiang

_R_e_f_e_r_e_n_c_e_s:

     mips.gsf.ed

_E_x_a_m_p_l_e_s:

     mips = getMipsInfo(wantAllComplexes = FALSE)
     mipsPhrase = getMipsInfo(wantDefault = FALSE, toGrep = "\Bsomal\b",
     parseType = "grep", wantAllComplexes=FALSE)

