getGENEONTOLOGY       package:annotationTools       R Documentation

_F_i_n_d _G_e_n_e _O_n_t_o_l_o_g_y (_G_O) _a_n_n_o_t_a_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Takes a vector of probe set identifiers and an annotation table
     and retrieves the corresponding GO annotation.

_U_s_a_g_e:

     getGENEONTOLOGY(ps, annot, diagnose = FALSE, specifics = 0, GOcol = 31, noGOsymbol = NA, noGOprovidedSymbol = "---", sep = " /// ")

_A_r_g_u_m_e_n_t_s:

      ps: character vector containing the probe sets identifiers.

   annot: annotation table (data frame) where each row is a record and
          each column is an annotation field.

diagnose: logical. If TRUE, 3 (logical) vectors used for diagnostic
          purpose are returned in addition to the annotation. If FALSE
          (default) only the annotation is returned.

specifics: can take value 0, 1, 2, 3, ... . If specifics=i with i>0,
          the GO biological process annotation is parsed (using " // "
          as separator) and the i-th part of the expression is
          returned. If specifics=0, the GO biological process
          annotation is not parsed.

   GOcol: column in annotation table containing the GO biological
          processes.

noGOsymbol: character string to be used in output list 'go' if no GO
          biological process is found or provided in the annotation
          table.

noGOprovidedSymbol: character string used in annotation table and
          indicating missing GO biological process.

     sep: character string used in annotation table to separate
          multiple GO biological processes.

_D_e_t_a_i_l_s:

     This function can be used with Affymetrix annotation files (e.g.
     'HG-U133_Plus_2_annot.csv'). It retrieves GO annotation
     corresponding to particular probe set identifiers. GO biological
     processes are returned by default ('GOcol'=31) but GO cellular
     components ('GOcol'=32) or GO molecular functions ('GOcol'=33) can
     be returned by setting 'GOcol' appropriately.

     GO biological processes are returned as elements of list 'go'. If
     multiple GO biological processes are provided for 'ps[i]' (with
     'sep' separating GO biological processes in the annotation table),
     a vector containing all GO biological processes is returned as the
     'i-th' element of list 'go'. 

     The default values for 'GOcol', 'noGOsymbol', 'noGOprovidedSymbol'
     and 'sep' are chosen to suit the format of Affymetrix annotation
     files. However, options can be set to look up any annotation
     table, provided the probe set identifiers are in the first column
     and occur only once.

     Note that each GO annotation in Affymetrix annotation files
     contains 3 attributes: the GO biological process ID, term and
     quality, separated by " // ". Setting the option 'specifics' to 1,
     2, or 3 allows to retrieve any of the 3 attributes separately.

_V_a_l_u_e:

      go: list of length 'length(ps)' the 'i'-th element of which
          contains the GO annotation for 'ps[i]'.

   empty: logical vector of length 'length(ps)'. 'empty[i]' is TRUE if
          'ps[i]' is empty or NA.

 noentry: locial vector of length 'length(ps)'. 'noentry[i]' is TRUE if
          'ps[i]' cannot be found in the first column of the annotation
          table.

    nogo: locial vector of length 'length(ps)'. 'nogo[i]' is TRUE if
          'go[i]==noIDprovidedSymbol' is TRUE.

_N_o_t_e:

     'getANNOTATION' provides a more flexible solution to be used with
     arbitrary annotation tables.

_A_u_t_h_o_r(_s):

     Alexandre Kuhn, alexandre.kuhn@isb-sib.ch

_S_e_e _A_l_s_o:

     'getANNOTATION'

_E_x_a_m_p_l_e_s:

     ##example Affymetrix annotation file and its location
     annotationFile<-'HG-U133_Plus_2_annot_part.csv'
     dataDirectory<-system.file('data',package='annotationTools')

     ##load annotation file
     annotation<-read.csv(paste(dataDirectory,annotationFile,sep='/'),colClasses='character')

     ##get gene GO biological process (full information)
     myPS<-c('117_at','1007_s_at','1552288_at',NA,'xyz_at')
     getGENEONTOLOGY(myPS,annotation)

     ##get gene GO biological process terms only
     getGENEONTOLOGY(myPS,annotation,specifics=2)

     ##track origin of annotation failure for the 3 last probe set IDs
     getGENEONTOLOGY(myPS,annotation,diagnose=TRUE)

     ##GO molecular functions are contained in column 33 of the annotation
     colnames(annotation)

     ##get gene GO molecular functions
     getGENEONTOLOGY(myPS,annotation,GOcol=33)

