getGOInfo               package:ScISI               R Documentation

_A _f_u_n_c_t_i_o_n _t_h_a_t _p_a_r_s_e_s _t_h_r_o_u_g_h _t_h_e _G_O _d_a_t_a_b_a_s_e; _i_t _a_g_r_e_p_s _f_o_r _t_h_e
_t_e_r_m "_c_o_m_p_l_e_x" _a_n_d _g_r_e_p_s _s_u_f_f_i_x_e_s "-_a_s_e" _a_n_d "-_s_o_m_e" _a_n_d _r_e_t_u_r_n_s _n_o_d_e_s
_w_h_o_s_e _d_e_s_c_r_i_p_t_i_o_n _c_o_n_t_a_i_n_s _s_u_c_h _t_e_r_m_s.

_D_e_s_c_r_i_p_t_i_o_n:

     This function parses through the Cellular Component ontology for
     the GO nodes and searchs for the term "complex" or the suffix
     "-ase" (e.g. RNA Polymerase) or "-some" (e.g. ribosome) and (or)
     other user defined phrases in the description of these nodes.

_U_s_a_g_e:

     getGOInfo(wantDefault = TRUE, toGrep = NULL,
     parseType=NULL, eCode = NULL, wantAllComplexes = TRUE,
     includedGOTerms=NULL, not2BeIncluded=NULL)

_A_r_g_u_m_e_n_t_s:

wantDefault: A logical. If TRUE, the default parameters ("complex",
          "\Base\b" and "\Bsome\b") are used.

  toGrep: A character vector of the phrases (with Perl regular
          expressions) for which the function will parse through the GO
          database and search.

parseType: A character vector.This vector is in one to one
          correspondence with toGrep; it takes in the parse type such
          as "grep", "agrep", etc.

   eCode: A character vector of evidence codes (see
          "http://www.geneontology.org/GO.evidence.shtml" for details).
          The function will disallow any protein inclusion in the
          protein complexes if they are not indexed by evidence code
          other than those found in eCode.

wantAllComplexes: A logical. If TRUE, the function will incorporate all
          GO children of the nodes found by term searches. In addition,
          children of node GO:0043234 (the protein complex node) will
          also be incorporated.

includedGOTerms: A character vector of GO terms that will be parsed
          regardless of the default and parseType parameters. In 
          essence, these GO terms forced to be included.

not2BeIncluded: A character vector of GO terms that should not be
          parsed nor included in the output.

_D_e_t_a_i_l_s:

     This function's generic operation is to parse the GO database and
     search for pre-determined or chosen terms. It returns a named list
     of chracter vectors where the names are GO id's from the CC
     ontoloy and the vectors consist of proteins corresponding to that
     particular GO id. Running this function has multiple combinations:

     1. If the wantDefault parameter is TRUE, the function will agrep
     for "complex" and grep for "\Base\b" and "\Bsome\b".

     2. If toGrep is not NULL, it will be a character vector with terms
     and perl regular expressions that are intended for searching in
     the GO database. NB - it toGrep is not NULL, then parseType should
     also not be NULL as the parseType indicates how each term should
     be searched.

     3. parseType needs to be supplied if toGrep is not NULL. It is a
     character vector, either a single entry or of length equal to the
     length of toGrep, detailing how each term in toGrep will be parsed
     in the GO database. If only one term is supplied for parseType,
     then all the terms in toGrep will be parsed identically.
     Otherwise, the i-th term in parseType will reflect the parsing of
     the i-th term in toGrep.

     4. The eCode argument is a user determined refining mechanism. It
     takes in a vector of evidence codes (as detailed by the GO
     website). The function will dis-allow proteins if and only if
     these proteins are only indexed by evidence codes found within
     eCodes.

     5. If wantAllComplexes parameter is True, the function will also
     return the children of nodes found by parsing terms. In addition,
     the children of GO ID GD:0043234 (the protein complex ID) will be
     returned. The union of complexes is then returned.

_V_a_l_u_e:

     The return value is a list of size n (n depends on the current
     status of the GO database) where the name of each list element is
     a GO ID and each list element itself is a character vector
     consisting of the proteins corresponding to a particular GO ID:

"GO:XXXXXXX": A character vector containing proteins (not indexed by
          only eCode evidence codes) which make up protein complex
          "GO:XXXXXXXX"

_A_u_t_h_o_r(_s):

     Tony Chiang

_R_e_f_e_r_e_n_c_e_s:

     www.geneontology.org

_E_x_a_m_p_l_e_s:

     #go = getGOInfo(wantAllComplexes = FALSE)
     #goCoded = getGOInfo(code = c("IPI","ND","IDA"))
     #goPhrase = getGOInfo(wantDefault = FALSE, toGrep = "\Bsomal\b",
     #parseType = "grep", wantAllComplexes = FALSE)
     #nam1 = names(go)
     #nam2 = names(goCoded)
     #if(length(nam1) == length(nam2) && nam1 == nam2){
     #sapply(nam1, function(x) setdiff(go[[x]], goCoded[[x]]))
     #}

