fileMuncher            package:AnnBuilder            R Documentation

_D_y_n_a_m_i_c_a_l_l_y _c_r_e_a_t_e _a _P_e_r_l _s_c_r_i_p_t _t_o _p_a_r_s_e _a _s_o_u_r_c_e _f_i_l_e _b_a_s_e _o_n
_u_s_e_r _s_p_e_c_i_f_i_c_a_t_i_o_n_s

_D_e_s_c_r_i_p_t_i_o_n:

     This function takes a base file, a source file, and a segment of
     Perl script specifying how the source file will be pased and the
     generates a fully executable Perl script that is going to be
     called to parse the source file.

_U_s_a_g_e:

     fileMuncher(outName, baseFile, dataFile, parser, isDir = FALSE)
     mergeRowByKey(mergeMe, keyCol = 1, sep = ";")

_A_r_g_u_m_e_n_t_s:

 outName: 'outName' a character string the name of the file where the
          parsed data will be stored

baseFile: 'baseFile' a character string for the name of the file that
          is going to be used as the base to process the source file.
          Only data that are corresponding to the ids defined in the
          base file will be processed and mapped

dataFile: 'dataFile' a character string for the name of the source data
          file

  parser: 'perInst' a character string for the name of the file
          containing a segment of the a Perl script for parsing the
          source file. An output connection to OUT that is for storing
          parsed data, an input connection to BASE for inporting base
          file, and an input connection to DATA for reading the source
          data file are assumed to be open. perlInst should define how
          BASE, DATA will be used to extract data and then store them
          in OUT

   isDir: 'isDir' a boolean indicating whether dataFile is a name of a
          directory (TRUE) or not (FALSE)

 mergeMe: 'mergeMe' a data matrix that is going to be processed to
          merge rows with duplicating keys

  keyCol: 'keyCol' an integer for the index of the column containing
          keys based on which entries will be mereged

     sep: 'sep' a charater string for the separater used to separate
          multiple values

_D_e_t_a_i_l_s:

     The system is assumed to be able to run Perl. Perl scripts
     generated dynamically will also be removed after execution.

     'mergeRowByKey' merges data based on common keys. Keys multiple
     values for a given key will be separated by "sep".

_V_a_l_u_e:

     'fileMuncher' returns a character string for the name of the
     output file

     'mergeRowByKey' returns a matrix with merged data.

_N_o_t_e:

     This function is part of the Bioconductor project at Dana-Farber
     Cancer Institute to provide Bioinformatics functionalities through
     R

_A_u_t_h_o_r(_s):

     Jianhua Zhang

_S_e_e _A_l_s_o:

     'resolveMaps'

_E_x_a_m_p_l_e_s:

     if(interactive()){
     path <- file.path(.path.package("AnnBuilder"), "data")
     temp <- matrix(c("32469_f_at", "D90278", "32469_at", "L00693", "33825_at",
     "X68733", "35730_at", "X03350", "38912_at", "D90042", "38936_at",
     "M16652"), ncol = 2, byrow = TRUE)
     write.table(temp, "tempBase", sep = "\t", quote = FALSE,
     row.names = FALSE, col.names = FALSE)
     # Parse a truncated version of LL\_tmpl.gz from Bioconductor
     srcFile <-
     loadFromUrl("http://www.bioconductor.org/datafiles/wwwsources/Tll_tmpl.gz")  
     fileMuncher(outName = "temp", baseFile = "tempBase", dataFile = srcFile,
     parser =  file.path(path, "gbLLParser"), isDir = FALSE)
     # Show the parsed data
     read.table(file = "temp", sep = "\t", header = FALSE)
     unlink("tempBase")
     unlink("temp")
     }

