\name{ABPkgBuilder}
\alias{ABPkgBuilder}
\alias{getBaseParsers}
\alias{createEmptyDPkg}
\alias{getDirContent}
\alias{getMultiColNames}
\alias{getUniColNames}
\alias{getTypeColNames}
\alias{splitEntry}
\alias{twoStepSplit}

\title{Functions that support a single API for building data packages}
\description{
  These functions support a single API represented by ABPkgBuilder to
  allow users to build annotation data packages by providing a limited
  number of parameters. Other parameters will be figured out by the
  supporting functions. 
}
\usage{
ABPkgBuilder(baseName, srcUrls, baseMapType = c("gb", "ug", "ll"),
otherSrc = NULL, pkgName, pkgPath, organism = c("human", "mouse",
"rat"), version = "1.1.0", makeXML = TRUE, author = list(author = "who",
maintainer = "who@email.com"), fromWeb = TRUE)
getBaseParsers(baseMapType = c("gb", "ug"))
createEmptyDPkg(pkgName, pkgPath, folders, force = TRUE)
getDirContent(dirName, exclude = NULL)
getMultiColNames()
getUniColNames()
getTypeColNames()
splitEntry(dataRow, sep = ";", asNumeric = FALSE)
twoStepSplit(dataRow, entrySep = ";", eleSep = "@", asNumeric = FALSE)
}

\arguments{
  \item{baseName}{\code{baseName} a character string for the name of a
    file to be used as a base file to base source data. The file is
    assumed to have two columns (separated by tabs "\t") with the first
    one being the names of genes (probes) to be annotated and the second one
    being the maps to GenBank accession numbers, UniGene ids, or
    LocusLink ids} 
  \item{srcUrls}{\code{srcUrls} a vector of names character strings for
    the urls where source data files will be retained. Valid sources are
    LocusLink, UniGene, Golden Path, Gene Ontology, and KEGG. The names
    for the character strings should be LL, UG, GP, GO, and KEGG,
    respectively. LL and UG are required} 
  \item{baseMapType}{\code{baseMapType} a character string that is
    either "gb","ug", or "ll" to indicate whether the probe ids in
    baseName are mapped to GenBack accession numbers, UniGene ids, or
    LocusLink ids}
  \item{otherSrc}{\code{otherSrc} a vector of named character strings
    for the names of files that contain mappings between probe ids of
    baseName and LobusLink ids that will be used to obtain the unified
    mappings between probe ids of baseName and LocusLink ids based on
    all the sources. The strings should not contain any number and the
    files have the same structure as baseName}
  \item{pkgName}{\code{pkgName} a character string for the name of the
    data package to be built (e. g. hgu95a, rgu34a)}
  \item{pkgPath}{\code{pkgPath} a character string for the full path of
    an existing directory where the built backage will be stored}
  \item{organism}{\code{organism} a character string for the name of the
    organism of concern (now can only be "human", "mouse", or "rat")}
  \item{version}{\code{version} a character string for the version number}
  \item{makeXML}{\code{makeXML} a boolean to indicate whether an XML
    version will also be generated}
  \item{author}{\code{author} a list of character strings with an author
    element for the name of the author and maintainer element for the
    email address of the author}
  \item{force}{\code{force} a boolean that is set to TRUE if the package
    to be created will replace an existing package with the same name}
  \item{dirName}{\code{dirName} a character string for the name of a
    directory whose contents are of interests}
  \item{exclude}{\code{exclude} a character string for a pattern maching
    parameter that will be used to exclude contents of a directory that
    mach the pattern}
  \item{dataRow}{\code{dataRow} a character string containing data
    elements with elements separated by \code{sep} or \code{entrySep}
    and a descriptive string attached to each element following
    \code{eleSep}}
  \item{sep}{\code{sep} a character string for a separator}
  \item{entrySep}{\code{entrySep} a character string for a separator}
  \item{eleSep}{\code{eleSep} a character string for a separator}
  \item{asNumeric}{\code{asNumeric} a boolean that is TRUE when the
    splited values will be returned as numeric values}
  \item{fromWeb}{\code{fromWeb} a boolean to indicate whether the source
    data will be downloaded from the web or read from a local file}
  \item{folders}{\code{folders} a vector of character strings for the
    names of folders to be created within a package that is going to be
    created} 
}
\details{
  These functions are the results of an effort to make data package
  building easier for urers. As the results, users may not have great
  power controlling the process or imputs. Additionally, some of the
  built in functions that figure out the urls for source data may fail
  when maintainers of the data source web sites change the name,
  structure, ect of the source data. When such event occurs, users may
  have to follow the instructions contained in a vignette named
  AnnBuilder to build data packages.

  \code{\link{getBaseParsers}} figures out which of the built in parsers
  to use to parse the source data based on the type of the mappings done
  for the probes.

  \code{\link{createEmptyDPkg}} creates an empty package with the
  required subdirectories for data to be stored.

  \code{\link{getMultiColNames}} figures out what data elements for
  annotation have many to one relations with a probe. The many parts are
  separated by a separater in parsed annotation data.

  \code{\link{getUniColNames}} figures out what data elements for
  annotation have one to one relations with a probe.

  \code{\link{getTypeColNames}} figures out what data elements for
  annotation have many to one relations with a probe and additional
  information appended to the end of each element following a
  separate. The many parts are also separated by a separater in parsed
  annotation data.

  \code{splitEntry} splits entries by a separator.
  
  \code{twoStepSplit} splits entries by the separator specified by sep
  and the descriptive information of each element by eleSep.
}
\value{
  \code{\link{getBaseParsers}} returns a named vector for the names of
  the parsers to use to parse the source data.

  \code{\link{getDirContent}} returns a vector of chracter strings for
  the content of a directory of interests.

  \code{\link{getMultiColNames}} returns a vector of character srings.
  
  \code{\link{getUniColNames}} returns a vector of character strings.
  
  \code{\link{getTypeColNames}} returns a vector of character strings.

  \code{splitEntry} returns a vector of character strings.

  \code{twoStepSplit} returns a named vector of character strings. The
  names are the desciptive information appended to each element by
  \code{eleSep} 
}
\references{HowTo and AnnBuilder vignettes}
\author{Jianhua Zhang}
\note{The functions are part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{GOPkgBuilder}},\code{\link{KEGGPkgBuilder}}}
\examples{
# Create a temporary directory for the data
myDir <- tempdir()
# Create a temp base data file
geneNMap <- matrix(c("32468_f_at", "D90278", "32469_at", "L00693",
                   "32481_at", "AL031663", "33825_at", " X68733",
                   "35730_at", "X03350", "36512_at", "L32179",
                   "38912_at", "D90042", "38936_at", "M16652",
                   "39368_at", "AL031668"), ncol = 2, byrow = TRUE)
write.table(geneNMap, file = file.path(myDir, "geneNMap"),
sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)
# Urls for truncated versions of source data
mySrcUrls <- c(LL =
               "http://www.bioconductor.org/datafiles/wwwsources/Tll_tmpl.gz", UG = "http://www.bioconductor.org/datafiles/wwwsources/Ths.data.gz", 
GO = "http://www.bioconductor.org/datafiles/wwwsources/Tgo.xml")
# Create temp files for other sources
temp <- matrix(c("32468_f_at", NA, "32469_at", "2",
                   "32481_at", NA, "33825_at", " 9",
                   "35730_at", "1576", "36512_at", NA,
                   "38912_at", "10", "38936_at", NA,
                   "39368_at", NA), ncol = 2, byrow = TRUE)
write.table(temp, file = file.path(myDir, "srcone"), sep = "\t",
quote = FALSE, row.names = FALSE, col.names = FALSE)
temp <- matrix(c("32468_f_at", NA, "32469_at", NA,
                   "32481_at", "7051", "33825_at", NA,
                   "35730_at", NA, "36512_at", "1084",
                   "38912_at", NA, "38936_at", NA,
                   "39368_at", "89"), ncol = 2, byrow = TRUE)
write.table(temp, file = file.path(myDir, "srctwo"), sep = "\t",
quote = FALSE, row.names = FALSE, col.names = FALSE)
otherMapping <- c(srcone = file.path(myDir, "srcone"),
srctwo = file.path(myDir, "srctwo"))
# Runs only upon user's request
if(interactive()){
ABPkgBuilder(baseName = file.path(myDir, "geneNMap"),
srcUrls = mySrcUrls, baseMapType = "gb", otherSrc = otherMapping,
pkgName = "myPkg", pkgPath = myDir, organism = "human", version = "1.1.0",
makeXML = TRUE, author = c(author = "myname",
maintainer = "myname@myemail.com"))
# Output files
list.files(myDir)
# Content of the data package
list.files(file.path(myDir, "myPkg"))
list.files(file.path(myDir, "myPkg", "data"))
list.files(file.path(myDir, "myPkg", "man"))
list.files(file.path(myDir, "myPkg", "R"))
unlink(file.path(myDir, "myPkg"), TRUE)
unlink(file.path(myDir, "myPkg.xml"))
unlink(file.path(myDir, "myPkgByNum.xml")) 
}
unlink(c(file.path(myDir, "geneNMap"), file.path(myDir, "srcone"),
file.path(myDir, "srctwo")))
}
\keyword{manip}


\eof
\name{GEO-class}
\docType{class}
\alias{GEO-class}
\alias{GEO}
\alias{readData,GEO-method}
\title{Class "GEO" represents a GEO object that reads/downloads data
  from the GEO web site}
\description{The GEO web site contains data files represented by GEO
  accession numbers. Class GEO reads/downloads data files from the site
  if correct url and GEO accession numbers are provided}
\section{Objects from the Class}{
Objects can be created by calls of the form \code{new("UG", ...)}.
    A constructor (GEO) is available and should be used to instatiate
    objects of this class  
}
\section{Slots}{
  \describe{
    \item{\code{srcUrl}:}{Object of class \code{"character", from class
	"pubRepo"} - a character string for the url of a CGI script that
    handles data requests, which is:
    \url{http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?} at the time of
    writing} 
  }
}
\section{Extends}{
Class \code{"pubRepo"}, directly.
}
\section{Methods}{
  \describe{
    \item{readData}{\code{signature(object = "GEO")}: reads data from
      GEO and then parses the data to a matrix}
  }
}
\references{Programming with data}
\author{Jianhua Zhang}
\note{This class is part of the BioConductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{queryGEO}},\code{\link{pubRepo-class}}}

\examples{
    if(interactive()){
        geo <- GEO("http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?")
        # The GEOAccNum may be invalid due to changes at GEO site
        data <- readData(geo, GEOAccNum = "GPL16" )
    }
}
\keyword{classes}


\eof
\name{GO-class}
\docType{class}
\alias{GO-class}
\alias{GO}
\alias{readData,GO-method}
\title{Class "GO" a class to handle data from Gene Ontology}
\description{This class is sub-class of pubRepo that is implemented
  specifically to parse data from Gene Ontology. \code{\link{readData}}
  has been over wirtten to process Gene Ontology data}
\section{Objects from the Class}{
Objects can be created by calls of the form \code{new("GO", ...)}.
    A constructor (\code{\link{GO}} is available and should be used to
    instatiate objects of GO} 
}
\section{Slots}{
  \describe{
    \item{\code{srcUrl}:}{Object of class \code{"character", from class
	"pubRepo"} a character string for the url of the source data
      from Gene Ontology}
    \item{\code{parser}:}{Object of class \code{"character", from class
	"pubRepo"} not in use}
    \item{\code{baseFile}:}{Object of class \code{"character", from
	class "pubRepo"} not in use}
  }
}
\section{Extends}{
Class \code{"pubRepo"}, directly.
}
\section{Methods}{
  \describe{
    \item{readData}{\code{signature(object = "GO")}: Downloads/processes
      go\_xxx-termdb from Gene Ontology, where xxx is date. If argument
      xml is set to be TRUE, the data file will be parsed and a matrix
      with three columns will be returned. The first column is for GO
      ids, second for the GO ids of its direct parents, and third for
      the ontology term defined by Gene Ontology. Otherwise, the data
      (not in xml form) will be read in using \code{\link{readLines}}}
  }
}
\references{\url{http://www.godatabase.org}}
\author{Jianhua Zhang}
\note{This class is part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{pubRepo-class}}}

\examples{
# Read a truncated version of GO.xml from Bioconductor
go <- GO(srcUrl =
"http://www.bioconductor.org/datafiles/wwwsources/Tgo.xml")
goxml <- readData(go, xml = TRUE)
}
\keyword{classes}

\eof
\name{GOPkgBuilder}
\alias{GOPkgBuilder}

\title{A functions to builder a data package using GO data}
\description{
  This function builds creates data, documentation, and other supporting
  files that consist a normal R package using data from GO. 
}
\usage{
GOPkgBuilder(pkgName = "GO", pkgPath, version = "1.2.1", srcUrl = getSrcUrl("GO", xml = TRUE), author = c(name = "who", address = "who@email.com"))
}
%- maybe also 'usage' for other objects documented here.
\arguments{
  \item{pkgName}{\code{pkgName} a character string for the name of the
    data package to be built}
  \item{pkgPath}{Describe \code{pkgPath} a character string for the path
    to which the data package to be built will be stored}
  \item{version}{\code{version} a character string for the version
    number of the data package}
  \item{srcUrl}{\code{srcUrl} a character string for the url where the
    source data that will be used to build the data package are stored}
  \item{author}{\code{author} a named vector of character string with a
    name element for the name of the author and address element for the
    email address of the author}
}
\details{
  This package relies on the xml data file from
  \url{http://www.godatabase.org/dev/database/archive/2003-04-01/go_200304-termdb.xml.gz} to obtain the data. The url changes when data are updated. The system has built in code to figure out where the latest data are and use that data to build the data package. 
}
\value{
  This function does not return any value
}
\references{\url{http://www.godatabase.org}}
\author{Jianhua Zhang}
\note{This function is part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{ABPkgBuilder}}, \code{\link{KEGGPkgBuilder}}}
\examples{
if(interactive()){
GOPkgBuilder(pkgName = "GO", pkgPath = tempdir(), version = "1.2.1",
srcUrl = "http://www.bioconductor.org/datafiles/wwwsources/Tgo.xml",
author = c(author = "who", maintainer = "who@email.com"))
list.files(file.path(tempdir(), "GO"))
unlink(file.path(tempdir(), "GO"), TRUE)
}
}
\keyword{manip}


\eof
\name{GOXMLParser}
\alias{GOXMLParser}
\alias{getChildNodes}
\alias{getOffspringNodes}
\alias{getParentNodes}
\alias{getAncestors}
\alias{getTopGOid}
\alias{mapGO2Category}
\alias{getGOGroupIDs}
\alias{mapGO2AllProbe}
\title{Functions to read/parse the XML document of Gene Ontology data}
\description{
  These functions are used by \code{\link{GO-class}} to read/parse the
  Gene Ontology data file (in XML formate) and figures out the
  parent-childe relations.
}
\usage{
GOXMLParser(fileName)
getChildNodes(goid, goData)
getOffspringNodes(goid, goData, keepTree = FALSE)
getParentNodes(goid, goData, sep = ";")
getAncestors(goid, goData, sep = ";", keepTree = FALSE, top = "GO:0003673")
getTopGOid(what = c("mf", "bp", "cc", "go"))
mapGO2Category(goData)
getGOGroupIDs(onto = FALSE)
mapGO2AllProbe(go2Probe, goData, goid = "", sep = ";", all = TRUE)
}

\arguments{
  \item{fileName}{\code{fileName} a character string for the name of the
    file of Gene  Ontology xml data that are stored locally}
  \item{goData}{\code{goData} a matrix with three columns for GO ids,
    parent GO ids, and the ontology terms}
  \item{goid}{\code{goid} a character string for the id of Gene Ontology
    term (e.g. GO:006742)}
  \item{keepTree}{\code{keepTree} a boolean indicating whether the tree
    structure showing parent-chiled relations will be preserved}
  \item{sep}{\code{sep} a character string for separater used to
    separate mulitple entries}
  \item{top}{\code{top} a character string for the GO id that is the
    root for all the other GO ids along parent-child relation tree}
  \item{what}{\code{what} a character string that has to be one of "mf",
    "bp", "cc", "go"}
  \item{onto}{\code{onto} a boolean that is set to TRUE if the GO id for
    the topmost node is to be returned or FALSE if the GO ids for the
    three categories (BP, MF, and CC) to be returned}
  \item{go2Probe}{\code{go2Probe} a matrix that maps GO ids to probe
    ids}
  \item{all}{\code{all} a boolean to indicate whether to map all the GO
    ids contained in goData to probe ids (TRUE) or just the GO ids
    specified by goid (FALSE)}
}
\details{
  The GO site provides an XML document for the molecular function,
 biological process, and cellular component of genes. The basic XML
 structure is something like:
 \code{
 <go:term>
   <go:accession>GO:000xxx</go:accession>
   <go:name>a string for the function, process, or component</go:name>
   <go:isa rdf:resource="http://www.geneontology.org/go#GO:000xxxx" />
   <go:part-of:resource="http://www.geneontology.org/go#GO:000xxxx" />
   .
   .
 </go:term>}

 The XML document read from Gene Ontology site does not differentiats
 among the molecular function,biological process, and cellular component
 of genes as a go:name tag is used for the function, process, and
 component of genes. To determine whether a go:name tag is for the
 function, process, or component of a given gene identified by a GO
 accession number, the go:isa or go:part-of tag that keep reference of
 the parent-children relationship have to be retained for later use to
 move up a tree to find the correct category. As the result, the matrix
 returned by \code{\link{GOXMLParser}} has three columns with one for
 the GOids, one for the GO ids of the direct parents (a ";" is used to
 separate multiple GO ids), and one for the ontology term defined.

 \code{\link{getChildNodes}} finds the direct children of a given GO id
 based on a matrix containing the parent-child relationships (e. g. the
 one returned by \code{\link{GOXMLParser}}). 

 \code{\link{getOffspringNodes}} finds all the direct or direct children
 of a given GO id based on a matrix containing the parent-child
 relationships (e. g. the one returned by \code{\link{GOXMLParser}})
   
 \code{\link{getParentNodes}} finds the direct parent of a given GO id
 based on a matrix containing the parent-child relationships (e. g. the
 one returned by \code{\link{GOXMLParser}}).

 \code{\link{getAncestors}} finds all the direct or direct parents
 of a given GO id based on a matrix containing the parent-child
 relationships (e. g. the one returned by \code{\link{GOXMLParser}})

 \code{\link{getTopGOid}} figures out the root GO id for "mf" - molecular
 funciton, "bp" - biological process, "cc" - celullar component,  and
 "go" - the whole Gene Ontology tree))

 \code{\link{mapGO2Category}} maps GO ids to the three categories (MF,
 BP, CC) they belong to. 

 \code{\link{getGOGroupIDs}} returns the GO id(s) for the topmost or the
 three nodes corresponding to the three categories (MF, BP, and CC).

 \code{\link{mapGO2AllProbe}} maps GO ids to probe ids that are related
 to the GO id and all its offsprings.
}
\value{
  \code{\link{GOXMLParser}} returns a matrix with three columns.

  \code{\link{getChildNodes}} returns a vector of character strings.

  \code{\link{getOffspringNodes}} returns a vector or list of vectors
  depending on wheter the tree structure of parent-childern will be
  preserved.

  \code{\link{getParentNodes}} returns a vector of character string.

  \code{\link{getAncestors}} returns a vector or list of vectors
  depending on wheter the tree structure of parent-childern will be
  preserved.

  \code{\link{mapGO2Category}} returns a matrix with two columns
  containing GO ids and letters representing one of the three categories
  (MF, BP, and CC).

  \code{\link{getGOGroupIDs}} returns a vector of string(s) for GO
  id(s).

  \code{\link{mapGO2AllProbe}} returns a matrix with GO ids as one
  column and mappings to probe ids related to the GO ids and all its
  offsprings as the other column.

  \code{\link{getTopGOid}} returns a character string for a GO id.
}
\references{\url{http://www.geneontology.org}}
\author{Jianhua (John) Zhang }
\note{This function is part of the Biocondutor project within a package at the
  Dana-Farber Cancer Institute to provide Bioinformatics functionalities
  through R}

\seealso{ \code{\link{GO-class}}}

\examples{

# Create the XML doc
  cat(paste("<?xml version='1.0'?>",
         "<!-- A test file for the examples in GOXMLParser.R Doc -->",
         "<go>",            
             "<go:term>",
                 "<go:accession>GO:0003674</go:accession>",
                 "<go:name>molecular_function</go:name>",
                 "<go:is_a rdf='http://wwww.myurl.org/go#GO:0003673' />",
                 "<go:part_of rdf = 'http://wwww.myurl.org/go#GO:0003672' />",
             "</go:term>",
             "<go:term>",
                 "<go:accession>GO:0005575</go:accession>",
                 "<go:name>cellular_cpmponent</go:name>",
                 "<go:is_a rdf= 'http://wwww.myurl.org/go#GO:0003673'/>",
                 "<go:part_of rdf = 'http://wwww.myurl.org/go#GO:0003674' />",
             "</go:term>",
          "</go>"), file = "testDoc")

  # Parse the dummy file using GOXMLParser 
  goData <- GOXMLParser("testDoc")
  # Get the child nodes for a GO id
  getChildNodes("GO:0003674", goData)
  getOffspringNodes("GO:0003673", goData, FALSE)
  getParentNodes("GO:0005575", goData)
  getAncestors("GO:0005575", goData, ";", FALSE, "GO:0003674")
  getTopGOid("GO")
  unlink("testDoc")
}

\keyword{manip}













































\eof
\name{GP-class}
\docType{class}
\alias{GP-class}
\alias{GP}
\alias{getStrand}
\alias{getStrand,GP-method}
\title{Class "GP" a sub-class of pubRepo to get/process data from GoldenPath}
\description{This class is a sub-class of pubRepo with source specific
  functions to get/process data from GoldenPath
  {\url{http://www.genome.ucsc.edu/goldenPath}} to obtain gene
  location and orientation data} 
\section{Objects from the Class}{
Objects can be created by calls of the form \code{new("GP", ...)}.
    A constructor (GP) is available and should be used to instantiate
    objects of this class 
}
\section{Slots}{
  \describe{
    \item{\code{organism}:}{Object of class \code{"character", from
	class "UG"} s character string for the organism of concern}
    \item{\code{srcUrl}:}{Object of class \code{"character", from class
	"UG"} a character string for the url where the source data
      are. As multiple data sources will be used, srcUlr in this case is
      the location where the surce data are
      (e.g. \url{http://www.genome.ucsc.edu/goldenPath/14nov2002/database/})}  
    \item{\code{parser}:}{Object of class \code{"character", from class
	"UG"} not in use}
    \item{\code{baseFile}:}{Object of class \code{"character", from
	class "UG"} not in use}
  }
}
\section{Extends}{
Class \code{"UG"}, directly.
Class \code{"pubRepo"}, by class "UG".
}
\section{Methods}{
  \describe{
    \item{getStrand}{\code{signature(object = "GP")}: Processes the
      refLink and refGene data files and returns a matrix with gene
      location and orientation data}
  }
}
\references{\url{http://www.genome.ucsc.edu}}
\author{Jianhua Zhang}
\note{This class is part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{pubRepo-class}}}

\examples{
# The example may take a few second to finish
if(interactive()){
  # The url (\url{http://www.genome.ucsc.edu/goldenPath/14nov2002/database/})
  # was correct at the time of coding. Replace with a correct one if it
  # is invalid 
  url <- getSrcUrl("GP", organism = "human")
  gp <- GP(srcUrl = url, organism = "human")
  strand <- getStrand(gp)
}
}
\keyword{classes}

\eof
\name{KEGG-class}
\docType{class}
\alias{KEGG-class}
\alias{KEGG}
\alias{findIDNPath}
\alias{mapLL2ECNPName}
\alias{KEGG,KEGG-method}
\alias{findIDNPath,KEGG-method}
\alias{mapLL2ECNPName,KEGG-method}
\title{Class "KEGG" a sub-class of pubRepo to get/process pathway and
  enzyme information}
\description{This class is a sub-class of pubRepo with source specific
  functions to get/process data from KEGG
  {\url{ftp://ftp.genome.ad.jp/pub/kegg/pathways}} to obtain pathway and
  emzyme information for genes}
\section{Objects from the Class}{
Objects can be created by calls of the form \code{new("KEGG", ...)}.
    A constructor (KEGG) is available and should be used to instantiate
    objects of this class 
}
\section{Slots}{
  \describe{
    \item{\code{organism}:}{Object of class \code{"character", from
	class "UG"} a character string for the organism of concern}
    \item{\code{srcUrl}:}{Object of class \code{"character", from class
	"UG"} a character string for the url where source data are
      stored (\url{ftp://ftp.genome.ad.jp/pub/kegg/pathways}) at the
      time of coding}
    \item{\code{parser}:}{Object of class \code{"character", from class
	"UG"} not in use}
    \item{\code{baseFile}:}{Object of class \code{"character", from
	class "UG"} not in use}
  }
}
\section{Extends}{
Class \code{"UG"}, directly.
Class \code{"pubRepo"}, by class "UG".
}
\section{Methods}{
  \describe{
    \item{findIDNPath}{\code{signature(object = "KEGG")}: Finds the
      mappings between KEGG ids and pathway names}
    \item{mapLL2ECNPName}{\code{signature(object = "KEGG")}: Maps
      LocusLink ids to enzyme ids and pathway names}
  }
}
\references{\url{www.genome.ad.jp/kegg/}}
\author{Jianhua Zhang}
\note{This class is part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{pubRepo-class}}, \code{\link{UG-class}} }

\examples{
# The url (\url{ftp://ftp.genome.ad.jp/pub/kegg/pathways}) may change but
# was correct at the time of coding
url <-  getSrcUrl("KEGG")
kegg <- KEGG(srcUrl = url, organism = "human")
## This part takes a while to finish (due to a large number of files to
## process) and is thus commented out. Try it only if you are really
## patient
# pathNEnzyme <- mapLL2ECNPName(kegg)
}
\keyword{classes}

\eof
\name{KEGGPkgbuilder}
\alias{KEGGPkgBuilder}
\alias{getEIdNName}
\alias{getKEGGFile}
\title{A function to make the data package for KEGG}
\description{
  This function generates a data package with rda files mapping KEGG
  pathway or enzyme names to ids and vice versa. The source files for
  making the mapping are from the Internet.
}
\usage{
KEGGPkgBuilder(pkgPath, pkgName = "KEGG", version = "1.0.1", pathwayURL
= getKEGGFile("path"), enzymeURL = getKEGGFile("enzyme"), force = TRUE,
author = c(name = "who", address = "who@email.com")) 
getEIdNName(enzymeURL)
getKEGGFile(whichOne)
}

\arguments{
  \item{pkgPath}{A character string for the name of path to which the
    data package will be stored.}
  \item{pkgName}{A character string for the name of the data package.}
  \item{version}{A character string for the version number of the system
    by which the data package is generated.}
  \item{pathwayURL}{A character string for the URL where the source file
    for pathway data will be downloaded.}
  \item{enzymeURL}{A character string for the URL from which the source
    file for enzyme data will be downloaded.}
  \item{force}{A boolean to indicate whether the existing data package
    will be over written.}
  \item{whichOne}{A charcter string for the name of file type. Valid
    values include "path" or "enzyme"}
  \item{author}{A list of character strings with one element being name
    for the name of the author and another being address being the email
    address of the author}
}
\details{
  The data package produced will have the normal structure of an R
  package (i. g. with R, man, data, and src directories) under a
  directory defined by pkgName under pkgPath.
}
\value{
  This function does not return any value.
}
\references{An Introduction to R - Writting R Extensions.}
\author{Jianhua Zhang}
\note{This function is part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R.}

\seealso{\code{\link{package.skeleton}}}

\examples{
# To be added
}
\keyword{manip}


\eof
\name{LL-class}   
\docType{class}
\alias{LL-class}
\alias{LL}
\title{Class "LL" a sub-class of pubRepo to handle data from LocusLink}
\description{This class is a sub-class of pubRepo that is implemented
  specifically to parse data from LocusLink (ll\_teml.gz)}
\section{Objects from the Class}{
Objects can be created by calls of the form \code{new("LL", ...)}.
    A constructor (LL) is available and should be used to instatiate
    objects of \code{\link{LL}} 
}
\section{Slots}{
  \describe{
    \item{\code{srcUrl}:}{Object of class \code{"character", from class
	"pubRepo"} a character string for the surce url where data
      will be downloaded/processed}
    \item{\code{parser}:}{Object of class \code{"character", from class
	"pubRepo"}  a character string for the name of the file
      containing a segment of perl code with instructions on how the
      source data will be processed and output be generated}
    \item{\code{baseFile}:}{Object of class \code{"character", from
	class "pubRepo"} a character string for the name of the file
      that contains data that will be used as the base to process the
      source data.  Data from the source that are related to elements in
      the base file will be extracted. baseFile is assumed to be a two
      folumn file with the first column being some type of arbitrary ids
      (e.g. Affymetrix probe ids) and the second cloumn being the
      corresponding ids of a given public repository (e.g. GenBank
      accession numbers or UniGene ids)} 
  }
}
\section{Extends}{
Class \code{"pubRepo"}, directly.
}
\section{Methods}{
No methods defined with class "LL" in the signature.
}
\references{\url{www.ncbi.nlm.nih.gov/LocusLink}}
\author{Jianhua Zhang}
\note{This class is part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics funtionalities through R}

\seealso{\code{\link{pubRepo-class}}}

\examples{
if(interactive()){
# Parse a truncated version of LL\_tmpl.gz from Bioconductor
path <- file.path(.path.package("pubRepo"), "data")
temp <- matrix(c("32469_f_at", "D90278", "32469_at", "L00693", "33825_at",
"X68733", "35730_at", "X03350", "38912_at", "D90042", "38936_at",
"M16652"), ncol = 2, byrow = TRUE)
write.table(temp, "tempfile", sep = "\t", quote = FALSE,
row.names = FALSE, col.names = FALSE)  
ll <- LL(srcUrl =
"http://www.bioconductor.org/datafiles/wwwsources/Tll_tmpl.gz",
parser = file.path(path, "basedLLParser"), baseFile = "tempfile")
data <- parseData(ll)
unlink("tempfile")
}
}
\keyword{classes}

\eof
\name{SPPkgBuilder}
\alias{SPPkgBuilder}
\alias{key}
\alias{getDetailV}
\alias{getEnvNames}
\alias{isOneToOne}
\title{A function to build a data pckage using Swiss-Prot protein data}
\description{
  Given the URL to Swiss-Prot protein data, this function creates a data
  package with the data stored as R environment objects in the data
  directory 
}
\usage{
SPPkgBuilder(pkgPath, version, author, fromWeb = TRUE, url =
"ftp://ftp.ebi.ac.uk/pub/databases/swissprot/release/sprot41.dat")
getDetailV(key)
getEnvNames()
isOneToOne(envName)
}
\arguments{
  \item{pkgPath}{\code{pkgPath} a character string for the path where
    the data package created will be stored}
  \item{version}{\code{version} a character string for the version
    number of the data package to be created}
  \item{author}{\code{author} a list with an author elementfor the name
    of the author of the data package and a maintainer element for the
    name and email address of the maintainer of the dat package to be
    created} 
  \item{fromWeb}{\code{fromWeb} a boolean indicating whether the data
    will be read from the internet or locally}
  \item{url}{\code{url} an URL of file name to read the data from}
  \item{key}{\code{key} a character string for the name of Swiss-Prot
    annotation element, e. g. "Swiss-Prot accession number"}
  \item{envName}{\code{envName} a character string for the name of an
    environment object} 
}
\details{
  If \code{fromWeb} is FALSE, url will be the file name of a local file.
}
\value{
  This function returns NULL
}
\references{\url{ftp://ftp.ebi.ac.uk/pub/databases/swissprot/release/sprot41.dat}} 
\author{Jianhua Zhang}
\note{This function is part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide bioinformatics functionalities through R}

\seealso{\code{\link{ABPkgBuilder}}}
\examples{
   # No example is given considering the time required to
   #process the source data
}
\keyword{manip}



\eof
\name{UG-class}
\docType{class}
\alias{UG-class}
\alias{organism}
\alias{organism<-}
\alias{organism,UG-method}
\alias{organism<-,UG-method}
\alias{UG}
\title{Class "UG" a sub-class of pubRepo to handle data from UniGene}
\description{This class is a sub-class of pubRepo that is implemented
  specifically to parse data from UniGene (XX.data.gz, where XX is a
  abbreviation for a given organism)}
\section{Objects from the Class}{
Objects can be created by calls of the form \code{new("UG", ...)}.
    A constructor (UG) is available and should be used to instatiate
    objects of this class
}
\section{Slots}{
  \describe{
    \item{\code{organism}:}{Object of class \code{"character"} a
      character string for the name of the organism of concern}
    \item{\code{srcUrl}:}{Object of class \code{"character", from class
	"pubRepo"} a character string for the url of the source data}
    \item{\code{parser}:}{Object of class \code{"character", from class
	"pubRepo"}  a character string for the name of the file
      containing a segment of perl code with instructions on how the
      source data will be processed and output be generated} 
    \item{\code{baseFile}:}{Object of class \code{"character", from
	class "pubRepo"} a character string for the name of the file
      that contains data that will be used as the base to process the
      source data.  Data from the source that are related to elements in
      the base file will be extracted. baseFile is assumed to be a two
      folumn file with the first column being some type of arbitrary ids
      (e.g. Affymetrix probe ids) and the second cloumn being the
      corresponding ids of a given public repository (e.g. GenBank
      accession numbers or UniGene ids)}
  }
}
\section{Extends}{
Class \code{"pubRepo"}, directly.
}
\section{Methods}{
  \describe{
    \item{organism<-}{\code{signature(object = "UG")}: Sets the value
      for the organism slot}
    \item{organism}{\code{signature(object = "UG")}: Gets the value for
      the organism slot}
  }
}
\references{\url{www.ncbi.nlm.nih.gov/UniGene}}
\author{Jianhua Zhang}
\note{This class is part of Bioconductor project at Dana-Farber Cancer
  Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{pubRepo-class}}}

\examples{
if(interactive()){
# Parse a truncated version of Hs.data.gz from Bioconductor
path <- file.path(.path.package("pubRepo"), "data")
temp <- matrix(c("32469_f_at", "D90278", "32469_at", "L00693", "33825_at",
"X68733", "35730_at", "X03350", "38912_at", "D90042", "38936_at",
"M16652"), ncol = 2, byrow = TRUE)
write.table(temp, "tempfile", sep = "\t", quote = FALSE,
row.names = FALSE, col.names = FALSE)  
ug <- UG(srcUrl =
"http://www.bioconductor.org/datafiles/wwwsources/Ths.data.gz",
parser = file.path(path, "basedUGParser"), baseFile = "tempfile",
organism = "human")
data <- parseData(ug)
unlink("tempfile")
}
}
\keyword{classes}

\eof
\name{YG-class}
\docType{class}
\alias{YG-class}
\alias{YG}
\alias{readData,YG-method}
\title{Class "YG" a sub-class of pubRepo that reads/downloads data from
  yeast genomic}
\description{This class is a sub-class ob pubRepo that has source
  specific functions to extract data from Yeast Genome ftp site
  (\url{ftp://genome-ftp.stanford.edu/pub/yeast/data_download/})}
\section{Objects from the Class}{
Objects can be created by calls of the form \code{new("YG", ...)}.
    A constructor (YG) is available and should be used to instantiate
    objects of this class
}
\section{Slots}{
  \describe{
    \item{\code{srcUrl}:}{Object of class \code{"character", from class
	"pubRepo"} a character string for the url where surce data are
      available
      (\url{ftp://genome-ftp.stanford.edu/pub/yeast/data_download/} at
      the time of coding)}
    \item{\code{parser}:}{Object of class \code{"character", from class
	"pubRepo"} not in use}
    \item{\code{baseFile}:}{Object of class \code{"character", from
	class "pubRepo"} not in use}
  }
}
\section{Extends}{
Class \code{"pubRepo"}, directly.
}
\section{Methods}{
  \describe{
    \item{readData}{\code{signature(object = "YG")}: Reads source data
      defined by argument extenName from the ftp site}
  }
}
\references{\url{ftp://genome-ftp.stanford.edu/pub/yeast/data_download/}}
\author{Jianhua Zhang}
\note{This class is part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{pubRepo-class}}}

\examples{
    # Url may change but was correct at the time of coding
    url <- "ftp://genome-ftp.stanford.edu/pub/yeast/data_download/"
    # Creat a YG object
    ygeno <- YG(srcUrl = url)
    if(interactive()){
        # Read the file named "chromosomal_feature.tab". Takes a few
        # seconds to finish
        data <- readData(ygeno,
                         "chromosomal_feature/chromosomal_feature.tab",
                         cols2Keep =c(6, 1), sep = "\t")
    }
}
\keyword{classes}

\eof
\name{chrLocPkgBuilder}
\alias{chrLocPkgBuilder}
\alias{getChrNum}
\title{A function to build a data package containing mappings between
  LocusLink ids and the chromosomal locations of genes represented by
  the LocusLink ids}
\description{
  This function uses data provided by UCSC to build a data package that
  contains mappings between LocusLink ids and chromosome numbers and the
  chromosomal location of genes represented by LocusLink ids on each
  chromosome
}
\usage{
chrLocPkgBuilder(pkgName = "humanCHRLOC", pkgPath, version, author,
organism = "human", url = getSrcUrl("gp", organism))
getChrNum(chr)
}
\arguments{
  \item{pkgName}{\code{pkgName} a character string for the name of the
    data package to be created}
  \item{pkgPath}{\code{pkgPath} a character string for the directory
    where the created data package will be stored}
  \item{version}{\code{version} a character string for the version
    number of the data package to be created}
  \item{author}{\code{author} a list with an author element for the name
    of the creater of the data package and a maintainer element for the
    email address of the creater} 
  \item{organism}{\code{organism} a character string for the organism
    who will be target of the mapping}
  \item{url}{\code{url} a character string of the url of UCSC ftp site
    where to file refLink.txt.gz and refGene.txt.gz are stored. The
    files will be used to produce the data package}
  \item{chr}{\code{chr} a character string for the chromosome number
    extracted from the source data}
}
\details{
  The data package created mappes LocusLink ids to chromosomal
  locations. Mappings of other public data repository ids including Gene
  Ontology, RefSeq, and UniGene to LocusLink ids can be made available
  using \code{\link{map2LL}} 
}
\value{
  invisible
}
\references{}
\author{Jianhua Zhang}
\note{This function is part of the Bioconductor project at Dana-Farber
  Cancer Institue to provide bioinfomatics functionalities through R}
\seealso{\code{\link{map2LL}}}
\examples{
# Please note that the example will take a while to finish
if(interactive()){
chrLocPkgBuilder(pkgName = "humanCHRLOC", pkgPath = tempdir(),
version = "1.0.1", author = list(author = "who", maintainer =
"who@email.com"), organism = "human", url = getSrcUrl("gp", "human"))
}
}
\keyword{manip}

\eof
\name{cols2Env}
\alias{cols2Env}
\alias{matchAll}
\alias{matchOneRow}

\title{Creates a environment object using data from two columns of a matrix}
\description{
  Given a matrix with two columns, this function creates an environment
  object with values in one of the specified columns as keys and those in the
  other column as values.
}
\usage{
cols2Env(cols, colNames, keyColName = colNames[1], sep)
matchAll(cols, keyColName)
matchOneRow(cols, keyColName, sep = ";")
}

\arguments{
  \item{cols}{\code{cols} a matrix with two columns}
  \item{colNames}{\code{colNames} a charcter string for the name of the
    column whose values will be used for the keys of the environment
    object to be created}
  \item{keyColName}{\code{keyColName} a character string for the name of
    the column whose values will be the corresponding values for keys of
    the environment object to be created}
  \item{sep}{\code{sep} a character for the separaters used to separate
    entries that have multiple values}
}
\details{
  The matrix or matrix convertable object passed to cols2Env must have
  two coloumns with one intented to be used as the key and the other be the
  value.

  Cells in either or both columns may have multiple values separated by a
  separator (e.g. "a;b", "1;2;3") making the mapping between keys and
  the corresponding values not a straitforward operation. cols2Env gets
  all the unique values from the key column by spliting them and maps
  values to each of them.

  \code{\link{cols2Env}} calls \code{\link{matchAll}} that in turn calls
  \code{\link{matchOneRow}} to first split entries and then map entries
  in the two coloumns on one to one bases. Unique keys in the column
  defined as the key column will be assigned a vector containing all the
  values corresponding the keys in the environment to return. 
}
\value{
  This function returns an environment object with key and value pairs
}

\author{Jianhua Zhang}
\note{This function is part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{ABPkgBuilder}}}

\examples{
dataM <- matrix(c("a;b", "1;2;3", "a;b", "4;5", "c", "6;7", "b;a",
"6;7;8"), ncol = 2, byrow = TRUE)

temp <- cols2Env(dataM, c("key", "value"), keyColName = "key")

dataM
multiget(ls(temp), temp)

}
\keyword{manip}


\eof
\name{fileMuncher}
\alias{fileMuncher}
\alias{mergeRowByKey}
\title{Dynamically create a Perl script to parse a source file base on
  user specifications}
\description{
  This function takes a base file, a source file, and a segment of Perl
  script specifying how the source file will be pased and the generates
  a fully executable Perl script that is going to be called to parse the
  source file. 
}
\usage{
fileMuncher(outName, baseFile, dataFile, parser, isDir = FALSE)
mergeRowByKey(mergeMe, keyCol = 1, sep = ";")
}

\arguments{
  \item{outName}{\code{outName} a character string the name of the file
    where the parsed data will be stored}
  \item{baseFile}{\code{baseFile} a character string for the name of the
    file that is going to be used as the base to process the source
    file. Only data that are corresponding to the ids defined in the
    base file will be processed and mapped}
  \item{dataFile}{\code{dataFile} a character string for the name of the
    source data file}
  \item{parser}{\code{perInst} a character string for the name of the
    file containing a segment of the a Perl script for parsing the
    source file. An output connection to OUT that is for storing parsed
    data, an input connection to BASE for inporting base file, and an
    input connection to DATA for reading the source data file are
    assumed to be open. perlInst should define how BASE, DATA will be
    used to extract data and then store them in OUT}
  \item{isDir}{\code{isDir} a boolean indicating whether dataFile is a
    name of a directory (TRUE) or not (FALSE)}
  \item{mergeMe}{\code{mergeMe} a data matrix that is going to be
    processed to merge rows with duplicating keys}
  \item{keyCol}{\code{keyCol} an integer for the index of the column
    containing keys based on which entries will be mereged}
  \item{sep}{\code{sep} a charater string for the separater used to
    separate multiple values}
}
\details{
  The system is assumed to be able to run Perl. Perl scripts generated
  dynamically will also be removed after execution.

  \code{\link{mergeRowByKey}} merges data based on common keys. Keys
  multiple values for a given key will be separated by "sep".
}
\value{
  \code{\link{fileMuncher}} returns a character string for the name of
  the output file

  \code{\link{mergeRowByKey}} returns a matrix with merged data.
}

\author{Jianhua Zhang}
\note{This function is part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{resolveMaps}}}

\examples{
if(interactive()){
path <- file.path(.path.package("AnnBuilder"), "data")
temp <- matrix(c("32469_f_at", "D90278", "32469_at", "L00693", "33825_at",
"X68733", "35730_at", "X03350", "38912_at", "D90042", "38936_at",
"M16652"), ncol = 2, byrow = TRUE)
write.table(temp, "tempBase", sep = "\t", quote = FALSE,
row.names = FALSE, col.names = FALSE)
# Parse a truncated version of LL\_tmpl.gz from Bioconductor
srcFile <-
loadFromUrl("http://www.bioconductor.org/datafiles/wwwsources/Tll_tmpl.gz")  
fileMuncher(outName = "temp", baseFile = "tempBase", dataFile = srcFile,
parser =  file.path(path, "gbLLParser"), isDir = FALSE)
# Show the parsed data
read.table(file = "temp", sep = "\t", header = FALSE)
unlink("tempBase")
unlink("temp")
}
}
\keyword{manip}


\eof
\name{fileToXML}
\alias{fileToXML}

\title{A function to convert a text file to XML.}
\description{
  This function takes a text file and then converts the data contained
  as an XML file. The XML file contains an Attr and a Data node. The
  Attr node contains mata-data and the Data node contains real data from
  the original file. 
}
\usage{
fileToXML(targetName, outName, inName, idColName, colNames,
multColNames, typeColNames,  multSep = ";", typeSep = ";", fileSep =
"\t", header = FALSE, isFile = TRUE, organism = "human",version = "1.0.0")
}

\arguments{
  \item{outName}{\code{outName}A character string for the name of xml file to be
    produced. If the name does not contain a full path, the current
    working directory will be the default}
  \item{inName}{\code{inName} A character string for the name of the input file to be
    written to an XML document}
  \item{idColName}{\code{idColName} A character string for the name of
    the column in the input file where ids of the target of annotation are}
  \item{colNames}{\code{colNames} A vector of character strings for the name of data
    columns in the original file.}
  \item{targetName}{\code{targetName} A character string that will be used as an internal name
    for the meta-data to show the target of the annotation (e.g. U95, U6800.}
  \item{version}{\code{version} A character string or number indicating the version of
    the system used to builder the xml file.}
  \item{multColNames}{\code{multColNames} A vector of character strings for the name of data
    columns that may contain multiple items separated by a separator
    specified by parameter multSep. }
  \item{typeColNames}{\code{typeColNames} A vector of character strings for data columns in the
    original data that may contain type information append to the real
    data with a separater defined by parameter typeSep
    (e.g. "aGeneName;Officila").} 
  \item{multSep}{\code{mutlSep} A character string for the separator used to separate
    multiple data items within a data column of the original file.}
  \item{typeSep}{\code{typeSep} A character string for the separator used to separate
    the real data and type information within a column of the original data.}
  \item{fileSep}{\code{fileSep} A character string specifying how data columns are
    separated in the original file (e.g. sep = "\t" for tab delimited.}
  \item{organism}{\code{organism} A character string for the name of the organism of
    interests} 
  \item{header}{\code{header} A boolean that is set to TRUE if the original file has a
  header row or FALSE otherwise.}
  \item{isFile}{\code{isFile} A boolean that is set to TRUE if parameter fileName is
    the name of an existing file and FALSE if fileName is a R object
    contains the data}
}
\details{
  The original text file is assumed to have rows with columns separated
  by a separator defined by parameter sep. MultCol are used to define
  data columns that capture the one to many relationships between
  data. For example, a given AffyMetrix id may be associated with several
  GenBank accession numbers. In a data set with AffyMetrix ids as one of
  the data columns, the accession number column will be a element in
  multCol with a separator separating individual accession numbers
  (e.g. X00001,X00002,U0003... if the separator is a ",").

  As gene name and gene symbol can be "Official" or "Preferred", a type
  information is attached to a gene name or symbol that is going to be
  the value for attribute type in the resulting XML file
  (e.g. XXXX;Official if the separator is ";").
}
\value{
  This function does not return any value. The XML file will be stored
  as a file.
}
\references{\url{http://www.bioconductor.org/datafiles/dtds/annotate.dtd}}
\author{Jianhua (John) Zhang}
\note{This function is part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinfomatics functionalities through R.}

\seealso{\code{\link{ABPkgBuilder}}}

\examples{
# Create a text file
aFile <- as.data.frame(matrix(c(1:9), ncol = 3))

#Write to an XML file
if(interactive()){
    fileToXML("notReal", outName = "try.xml", inName = aFile, idColName =
    "AFFY", colNames = c("AFFY", "LOCUSID", "UNIGENE"), multColNames = NULL,
    typeColNames = NULL,  multSep = ";", isFile = FALSE)

    #Show the XML file
    readLines("try.xml")

    # Clearn up
    unlink("try.xml")
}
}
\keyword{manip}











\eof
\name{getChroLocation}
\alias{getChroLocation}
\alias{getGPData}
\alias{gpLinkNGene}
\title{Functions to extract data from golden path}
\description{
  These functions are used by objects GP to extract chromosomal location
  and orientation data for genes.
}
\usage{
getChroLocation(srcUrl, exten = gpLinkNGene(), sep = "\t", fromWeb =
TRUE, raw = FALSE)
getGPData(srcUrl, sep = "\t", ncol = 8, keep = c(3,7))
gpLinkNGene(test = FALSE)
}
\arguments{
  \item{srcUrl}{\code{srcUrl} a character string for the url where
    source data are available}
  \item{exten}{\code{exten} a character string for the name of the file
    to be used for the extraction}
  \item{sep}{\code{sep} a character string for the separater used by the
    source file}
  \item{ncol}{\code{ncol} an integer for the total number of columns a
    source data set has}
  \item{keep}{\code{keep} a numeric vector for the columns (defined by
    column number) to be kept}
  \item{test}{\code{test} a boolean to indicate whether the process is
    in a testing mode}
  \item{fromWeb}{\code{fromWeb} a boolean to indicate whether the source
    data should be downloaded from the web or is a local file}
  \item{raw}{\code{raw} a boolean indicating whether chromosomal loation
    data will be returned as a five column data frame with ID,
    Chromosome, strand, start, and end or a two column data with ID and
    processed chromosome location data}
}
\details{
  \code{\link{getChroLocation}} extracts chromosomal location data from
  a data file named refGene.

  \code{\link{getGPData}} Reads data from a source data file defined by
  srcUrl and returns them as a matrix.

  \code{\link{gpLinkNGene}} returns a correct link and gene data file
  names that will be used to get chromosomal location data.
}
\value{
  \code{\link{getChroLocation}} returns a matrix with the first column
  being LocusLink ids, the second being the chromosomal location, and
  third being the orientation for that locus.

  \code{\link{getGPData}} returns a matrix.

  \code{\link{gpLinkNGene}} returns a named vector.
}
\references{\url{http://www.genome.ucsc.edu}}
\author{Jianhua Zhang}
\note{The functions are part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{GP}}}
\examples{
# Truncated versions of files stored in Bioconductor site are used
gpLinkNGene(test = FALSE)
temp <- getGPData(
"http://www.bioconductor.org/datafiles/wwwsources/Tlink.txt",
sep = "\t", ncol = 8, keep = c(3,7))
temp <- getChroLocation(
"http://www.bioconductor.org/datafiles/wwwsources/",
exten = gpLinkNGene(TRUE), sep = "\t")
}
\keyword{manip}


\eof
\name{getDPStats}
\alias{getDPStats}
\alias{getDate}
\alias{getProbeNum}
\alias{matchProbes}
\alias{getPBased}
\alias{formatABQCList}

\title{A function to read in the statistics about a data package}
\description{
  This function generates a list showing the name, data of creation,
  number of genes for each rda file, and the actual number of genes that
  get mapped for each rda file.
}
\usage{
getDPStats(baseF, pkgName, pkgPath, saveList = FALSE, isFile = TRUE)
getDate(pkgName, pkgPath, fromDesc)
getProbeNum(pkgName, pkgPath, noneNA = FALSE)
matchProbes(baseF, pkgName, pkgPath, toMatch, isFile = TRUE)
getPBased()
formatABQCList(x)
}

\arguments{
  \item{baseF}{\code{baseF} a character string for the name of a file
    that is going to be used as the base file to calculate the toatl
    number of probes and matched probes by a data package}
  \item{pkgName}{\code{pkgName} a character string for the name of the
    data package of concern} 
  \item{pkgPath}{\code{pkgPath} a character string for name of the path
    to which the data package is stored.}
  \item{noneNA}{\code{nonoNA} a boolean to indicate whether counts will
    exclude entries with NA as the value.}
  \item{saveList}{\code{sageList} a boolean indicating whether the
    results will be returnd as a list only (FALSE) or saved to a file as
    well (TRUE)} 
  \item{toMatch}{\code{toMatch} a vector of character strings for the
    names of the rda files whos keys will be matched againt the probe
    ids of a base file (baseF)}
  \item{x}{\code{x} a list object produced by function
    \code{\link{getDPStats}}} 
  \item{fromDesc}{\code{fromDesc} a boolean that will get a date from a
    DESCRIPTION file if set TRUE or the current date if FALSE}
  \item{isFile}{\code{isFile} a boolean that will be TRUE if
    \code{baseF} is the name of a file}
}
\details{
  Date of creation is the date when the package was created using
  AnnBuilder and in most cases is not the date when the source file
  AnnBuilder used to create the rda files was created. The date when the
  source data were built are listed in the man page for the package
  (?package name).

  The number of genes and number of genes mapped normally differ because
  not all genes in a given set can be mapped to annotation data. For
  probe based rda files (maps Affymetrix ids to annotation data), the
  number of mapped genes out of the total is given. For non-probe based
  rda files, only the total number of mapped items is given.

  The total number of probes of each rda file will be checked against
  the total of the base file and the names of the rda files whose total
  is off will be listed.
}
\value{
  \item{list}{A list with name and value pairs}
}
\author{Jianhua Zhang}
\note{This function is part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{ABPkgBuilder}}}

\examples{
# Run this code after changing the settings correctly
# Change the varaibles before you run the code
pkgName <- "hgu95a"
pkgPath <- "where/your/data/package/is"
# Call getABStats
# getDPStats(pkgName, pkgPath)
}
\keyword{misc}





\eof
\name{getKEGGIDNName}
\alias{getKEGGIDNName}
\alias{getKEGGOrgName}
\alias{getLLPathMap}
\alias{mapll2EC}
\alias{parseEC}
\title{Functions to get/process pathway and enzyme data from KEGG}
\description{
  These functions extract pathway and enzyme data from KEGG
  \url{ftp://ftp.genome.ad.jp/pub/kegg/pathways}. The functions are used
  by \code{\link{KEGG-class}}.
}
\usage{
getKEGGIDNName(srcUrl, exten = "/map\_title.tab")
getKEGGOrgName(name)
getLLPathMap(srcUrl, idNName, organism)
mapll2EC(id, srcUrl, organism, sep = "\t")
parseEC(llNEC)
}
\arguments{
  \item{srcUrl}{\code{srcUrl} a character string for the url where
    source data are available}
  \item{exten}{\code{exten} a character string for data file name as an
    extension}
  \item{name}{\code{name} a character string for the name of the
    organism of concern. "human", "mouse", and "rat" are the valid
    values for now}
  \item{organism}{\code{organism} same as name}
  \item{idNName}{\code{idNName} a named vector normally obtained obtained
    by using function \code{\link{getKEGGIDNName}}}
  \item{sep}{\code{sep} a character string for the separaters used to
    separater entries in a file}
  \item{llNEC}{\code{llNEC} a line of tab separated character strings
    with the first character string being a LocusLink id and second
    being the mapping enzyme (EC) names}
  \item{id}{\code{id} a character string for the KEGG id used for
    different pathway files}
}
\details{
  \code{\link{getKEGGIDNName}} read the data file "map\_title.tab" from
  KEGG to obtain the mappings between between KEGG ids and pathway
  names.

  \code{\link{getKEGGOrgName}} takes the name for an organism and
  returns a short verion of the name used by KEGG for that organism.

  \code{\link{getLLPathMap}} maps LocusLink ids to pathway and enzyme
  names for an organism using various data files from KEGG.

  \code{\link{mapll2EC}} maps LocusLink ids to enzyme (EC) names for a
  given pathway.

  \code{\link{parseEC}} extracts enzyme data from a line of tab
  separated character strings to map a LocusLink id to enzyme (EC) names.
}
\value{
  \code{\link{getKEGGIDNName}} returns a named vector with KEGG ids
  being the names and pathway names being values.

  \code{\link{getKEGGOrgName}} returns a character string.

  \code{\link{getLLPathMap}} returns a list of two elements named "llec"
  and "llpathname". Each element is a matrix with mappings between
  LocusLink ids to enzyme or pathway names.

  \code{\link{mapll2EC}} returns a matrix with the first column being
  LocusLink ids and second enzyme (EC) names.

  \code{\link{parseEC}} returns two elements vector with the first
  element being a LocusLink id and second being the mapping enzyme (EC)
  names. 
}
\references{\url{www.genome.ad.jp/kegg/}}
\author{Jianhua Zhang}
\note{The functions are part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{KEGG-class}}}
\examples{
getKEGGOrgName("human")
# This group of code needs a while to finish
if(interactive()){
# Url may change but was correct at the time of coding
idNPath <- getKEGGIDNName("ftp://ftp.genome.ad.jp/pub/kegg/pathways")
temp <- getLLPathMap("ftp://ftp.genome.ad.jp/pub/kegg/pathways",
idNPath, "human")
temp <- mapll2EC("00010", "ftp://ftp.genome.ad.jp/pub/kegg/pathways",
"human", sep = "\t")
}
}
\keyword{manip}

\eof
\name{getSrcBuilt}
\alias{getSrcBuilt}
\alias{getLLBuilt}
\alias{getUGBuilt}
\alias{getUCSCBuilt}
\alias{getGOBuilt}
\alias{getKEGGBuilt}
\alias{getYGBuilt}
\alias{getHGBuilt}
\title{Functions that get the built date or number of the source data
  used for annotation}
\description{
  Given a data source name and organism, the built date or number of the
  annotation source data will be returned. The built date or number is
  provided by the data source through its web site. 
}
\usage{
getSrcBuilt(src = "LL", organism = "human")
getLLBuilt(url = "http://www.ncbi.nlm.nih.gov/LocusLink/statistics.html")
getUGBuilt(organism, url = "ftp://ftp.ncbi.nih.gov/repository/UniGene")
getUCSCBuilt(organism)
getGOBuilt(url = "http://www.godatabase.org/dev/database/archive/latest")
getKEGGBuilt(url = "http://www.genome.ad.jp/kegg/kegg2.html")
getYGBuilt()
getHGBuilt()
}

\arguments{
  \item{src}{A character string for name of the data source. See details
    for valid names} 
  \item{organism}{A character string for the name of the organism of
    interests. See details for valid names}
  \item{url}{A character string for the url from which built information
    can be obtained}
}
\details{
  \code{getLLBuilt} finds the built data for LocusLink from the statistics
  page.
  
  \code{getUGBuilt} finds the built data for UniGene from the Xx.info file,
  where Xx is the short organism name (e.g. Hs for human)
  
  \code{getUCSCBuilt} finds the built data for the Human Genome Project from
  the folder for the latest release.
  
  \code{getGOBuilt} finds the built data for Gene Ontology from the timestamp
  for the -ont.xml.gz file.
  
  \code{getKEGGBuilt} finds the built data for KEGG from kegg2.html page
  (Release version and date)

  \code{YGBuilt} gets built information for Yeast Genome data.

  Valid data source names include LL - LocusLink, UG - UniGene, UCSC -
  the Human Genome Project, GO - Gene Ontology, KEGG - KEGG, YG - Yeast
  Genome.

  Valid organism name include human, mouse, rat, and yeast at this time.
}
\value{
  All functions return a string for the built information
}
\references{\url{http://www.ncbi.nlm.nih.gov/LocusLink/statistics.html},
  \url{ftp://ftp.ncbi.nih.gov/repository/UniGene},
  \url{http://www.godatabase.org/dev/database/archive/latest},
  \url{http://www.genome.ad.jp/kegg/kegg2.html},
  \url{ftp://ftp.ncbi.nih.gov/refseq/LocusLink/},
  \url{http://www.yeastgenome.org}}
\author{Jianhua Zhang}
\note{The functions are part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{getSrcUrl}}}

\examples{
# Get built information for LocusLink
ll <- getSrcBuilt(src = "LL")
ug <- getSrcBuilt(src = "UG", organism = "Mouse")
yg <- getYGBuilt()
ll
ug
yg
}
\keyword{datasets}


\eof
\name{getSrcUrl}
\alias{getSrcUrl}
\alias{getAllUrl}
\alias{getLLUrl}
\alias{getUGUrl}
\alias{getUCSCUrl}
\alias{getGOUrl}
\alias{getKEGGUrl}
\alias{readURL}
\alias{getGEOUrl}
\alias{getYGUrl}
\alias{getHGUrl}
\title{Functions that find the correct url for downloading annotation data}
\description{
  Given a source data name and organism name, the url from which the
  source annotation data can be downloaded will be returned. 
}
\usage{
getSrcUrl(src = "LL", organism = "human", xml = TRUE, dateOnly = FALSE)
getAllUrl(organism)
getLLUrl()
getUCSCUrl(organism)
getUGUrl(organism)
getGOUrl(xml = TRUE, dateOnly = FALSE) 
getKEGGUrl()
readURL(url)
getGEOUrl()
getYGUrl()
getHGUrl()
}

\arguments{
  \item{src}{A character string for the name of the data source. See
    details for valid names}
  \item{organism}{A character string for the name of the organism of
    interests}
  \item{url}{A character string for the url where the source data can be
    downloaded}
  \item{dateOnly}{A boolean that is set to TRUE if only the built date
    of the data souce will be returned or TRUE if the source url will be
    returned}
  \item{xml}{A boolean indicating whether the XML format data file will
    be downloaded/processed}
}
\details{
  \code{getAllUrl} finds the urls for all the data source including
  LocusLink, UinGene, the Human Geneome Project, Gene Ontology, and
  KEGG.
  
  \code{getLLUrl} finds the url fro LocusLink.
  
  \code{getUCSCUrl} finds the url for the Human Genome Project.
  
  \code{getUGUrl} finds the url for UniGene.
  
  \code{getGOUrl} finds the url for Gene Ontology.
  
  \code{getKEGGUrl} finds the url for KEGG.

  \code{getGEOUrl} finds the url for GOE (the CGI script)

  \code{getYGUrl} gets the url to the ftp site where Yeast Genome data
  can be downloaded.

  Valid data source names include LL - LocusLink, UG - UniGene, UCSC -
  the Human Genome Project, GO - Gene Ontology, KEGG - KEGG, and YG -
  Yeast Genome.

  Valid organism name include human, mouse, rat, and yeast at this time.
}
\value{
  \code{getAllUrl} returns a vector of character strings and all the others
  return a character string for the url
}
\references{\url{"http://www.ncbi.nlm.nih.gov/LocusLink/statistics.html"},
  \url{"ftp://ftp.ncbi.nih.gov/repository/UniGene"},
  \url{"http://www.godatabase.org/dev/database/archive/latest"},
  \url{"http://www.genome.ad.jp/kegg/kegg2.html"},
  \url{ftp://ftp.ncbi.nih.gov/refseq/LocusLink/},
  \url{http://www.yeastgenome.org}}
\author{Jianhau Zhang}
\note{The functions are part of the Buiconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{getSrcBuilt}}}

\examples{
# The source URL may be checnged or the system is down
urls <- getSrcUrl("ALL")
}
\keyword{datasets}













\eof
\name{getYeastData}
\alias{getYeastData}
\alias{readBadData}
\alias{findNumCol}
\title{Functions to get/process yeast genome data}
\description{
  These functions extract data from the yeast genome web site based on a
  set of arguments.
}
\usage{
getYeastData(url, extenName, cols2Keep, sep)
readBadData(url, sep)
findNumCol(fewLines, sep)
}
\arguments{
  \item{url}{\code{url} a character string for the url where yeast data
    are stored}
  \item{extenName}{\code{extenName} a character string for the name of
    the data file of interest. The name can be a file name or with
    subdirectory names under "url"}
  \item{cols2Keep}{\code{cols2Keep} a vector of index for the columns to
  be extracted from the data file}
  \item{sep}{\code{sep} a character string for the separater used to
    separate data columns in the data file}
  \item{fewLines}{\code{fewLines} a set of character strings separated
    by a new line that is going to be used to determine how many data
    columns each line has}
}
\details{
  The yeast genome web site has files stored in or in subdirectories of
  \url{ftp://genome-ftp.stanford.edu/pub/yeast/data_download/} that can
  be downloaded. \code{\link{getYeastData}} extracts data from a given
  file. The functions are used by an object of \code{\link{YG-class}} to
  extract data.

  Some of the data in the web site may not be well fomatted (e.g. with
  missing columns). \code{\link{readBadData}} deals with these type of
  data files.

  \code{\link{findNumCol}} figures out how many data columns a file
  contains based on a few entries from that file.
  
}
\value{
  \code{\link{getYeastData}} returns a matrix containing data.

  \code{\link{readBadData}} returns a matrix.

  \code{\link{findNumCol}} returns an integer.
}
\references{\url{ftp://genome-ftp.stanford.edu/pub/yeast/data_download/}}
\author{Jianhua Zhang}
\note{The functions are part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{YG-class}}}
\examples{
# Url may change but was correct at the time of coding
url <- "ftp://genome-ftp.stanford.edu/pub/yeast/data_download/"
temp <- getYeastData(url, "chromosomal_feature/chromosomal_feature.tab",
                         cols2Keep = c(6, 1), sep = "\t")
}
\keyword{manip}


\eof
\name{homoPS-class}
\docType{class}
\alias{homoPS-class}
\alias{ps}
\alias{psLL}
\alias{psOrg}
\alias{psType}
\alias{psURL}
\alias{ps,homoPS-method}
\alias{psLL,homoPS-method}
\alias{psOrg,homoPS-method}
\alias{psType,homoPS-method}
\alias{psURL,homoPS-method}
\alias{orgNameNCode}
\title{Class "homoPS"}
\description{A class to present data for HomologGene percent similarity
  data} 
\section{Objects from the Class}{
Objects can be created by calls of the form \code{new("homoPS", ...)}. 
}
\section{Slots}{
  \describe{
    \item{\code{psOrg}:}{Object of class \code{"character"} the
      scientific name of the organism of interest}
    \item{\code{psLL}:}{Object of class \code{"character"} the LocusLink
      id of the gene of interest}
    \item{\code{psType}:}{Object of class \code{"character"} the type of
      similarity. Valid values include B - a recipiprocal best best
      between 3 or more organisms, b - a reciprocal best match, and c -
      a curated homology relationship} 
    \item{\code{ps}:}{Object of class \code{"numeric"} percent
      similarity value}
    \item{\code{psURL}:}{Object of class \code{"character"} the URL for
      curated homology relationship}
  }
}
\section{Methods}{
  \describe{
    \item{ps}{\code{signature(object = "homoPS")}: the get function for
      slot \code{ps}}
    \item{psLL}{\code{signature(object = "homoPS")}: the get function
      for slot \code{psLL}}
    \item{psOrg}{\code{signature(object = "homoPS")}: the get function
      for slot \code{psOrg}}
    \item{psType}{\code{signature(object = "homoPS")}: the get function
      for slot \code{psType}}
    \item{psURL}{\code{signature(object = "homoPS")}: the get function
      for slot \code{psURL}}
  }
}
\references{\url{ftp://ftp.ncbi.nih.gov/pub/HomoloGene/README}}
\author{Jianhua Zhang}
\note{This function is part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinfomatics functionalities throug R}

\seealso{
  \code{\link{homoPkgBuilder}}
}
\examples{
    new("homoPS", ps = 82.3, psLL = "2324853", psOrg = "Homo sapins",
psType = "B", psURL = "")
}
\keyword{classes}

\eof
\name{homoPkgBuilder}
\alias{homoPkgBuilder}
\alias{procHomoData}
\alias{getLL2IntID}
\alias{getIntIDMapping}
\alias{mapIntID}
\alias{mapOrgs}
\alias{writeRdaNMan}
\alias{mapPS}
\alias{getHomoPS}
\title{Functions to build a homology data package using data from NCBI}
\description{
  This function builds a data package that maps internal HomoloGene ids
  of an organism to LocusLink ids, UniGene ids, percent identity of the
  alignment, type of similarities, and url to the source of a curated
  orthology of organisms of all pairwise best matches based on data from
  \url{ftp://ftp.ncbi.nih.gov/pub/HomoloGene/hmlg.ftp}
}
\usage{
homoPkgBuilder(pkgName = "homology", pkgPath, version, author, url =
getSrcUrl("HG"))
procHomoData(url = getSrcUrl("HG"))
getLL2IntID(homoData, organism = "")
getIntIDMapping(homoData)
mapIntID(homoData)
mapOrgs(vect)
writeRdaNMan(homoData, pkgName, pkgPath, what)
mapPS(homoData, pkgName, pkgPath)
getHomoPS(entries)
}
\arguments{
  \item{pkgName}{\code{pkgName} a character string for the name of data
    package to be built}
  \item{pkgPath}{\code{pkgPath} a character string for the name of the
    directory where the created package will be stored}
  \item{version}{\code{version} a character string for the verion number
    of the package to be built}
  \item{author}{\code{author} a list with an author element for the name
    of the author and a maintainer element for the name and e-mail
    address of the maintainer of the package} 
  \item{url}{\code{url} the url to the ftp site from which the source data
    file can be obtained. The default value is
    url{ftp://ftp.ncbi.nih.gov/pub/HomoloGene/hmlg.ftp}}
  \item{homoData}{\code{homoData} a data frame that contains the homology
    data from the source}
  \item{organism}{\code{organism} a character string for the name of the
    organism of interest}
  \item{vect}{\code{vect} a vector of character strings}
  \item{what}{\code{what} a character string for the data environment to
    be created. i. e. HGID2LL, HGID2GB, ...}
  \item{entries}{\code{entries} a vector of character strings}
}
\details{
  procHomoData process the source data and put the data into a data
  frame that will be used later.

  getLL2IntID maps LocusLink ids to HomoloGene internal ids

  getIntIDMapping maps HomoloGene ids to ids include LocusLink ids,
  GneBank accession numbers, percent similarity values, type of
  similarities, and the url to the curated orthology.

  mapIntID captures the reverse mapping between reciprocal homologous
  genes.

  mapOrgs converts organism codes to scientific names.

  writeRdaNMan creates an rda file and the corresponding man page for a
  data environment.

  mapPS maps HomologGene Internal ids to homoPS objects generated using
  data from the source.

  getHomoPS creates a homoPS object using data passed as a vector.
}
\value{
  procHomoData, mapIntID, and getLL2IntID returns a matrix.

  getIntIDMapping returns an R environment with mappings between
  HomoloGene internal ids and mapped data.

  getHomoPS returns a homoPS object with slots filled with data passed.

  mapOrgs returns a vector of character strings.
}
\references{\url{ftp://ftp.ncbi.nih.gov/pub/HomoloGene/README}}
\author{Jianhua Zhang}
\note{The functions are part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities using R}

\seealso{\code{\link{ABPkgBuilder}}}
\examples{
# Examples are provided for only a few functions to avoid lengthy
# execution time
load(file.path(.path.package("AnnBuilder"), "data", "orgNameNCode.rda"),
         env = .GlobalEnv)
mapOrgs(c("10116", "10090", "9606"))
getHomoPS(c("9606", "10116", "B", "12345", "324322", "78.1"))
}
\keyword{manip}

\eof
\name{loadFromUrl}
\alias{loadFromUrl}
\alias{validateUrl}
\alias{unzipFile}
\title{Functions to load files from a web site}
\description{
  Given an url, these functions download a file from a given web site
  and unzip the file if it is compressed.
}
\usage{
loadFromUrl(srcUrl, destDir = "")
validateUrl(srcUrl)
unzipFile(fileName, where = file.path(.path.package("SAGElyzer"),
"data"), isgz = FALSE)
}

\arguments{
  \item{srcUrl}{\code{srcUrl} a character string for the url of the file
    to be downloaded}
  \item{destDir}{\code{destDir} a character string for a loacal
    directory where the file to be downloaded will be saved}
  \item{where}{\code{where} same as destDir}
  \item{isgz}{\code{isga} a boolean indicating whether the downloaded
    file is a gz file}
  \item{fileName}{\code{fileName} a character string for the name of a
    file} 
}
\details{
  These functions are used by various objects in package pubRepo to
  download data files from a web site. If the file is compressed,
  decompressing will be applied and the path for the decompressed file
  will be returned.

  \code{\link{validateUrl}} will terminate the process if an invalid url
  is passed.

  \code{\link{unzipFile}} decompress the file passed as fileName.
}
\value{
  \code{\link{loadFromUrl}} returns a character string for the name of
  the file saved locally.
}
\author{Jianhua Zhang}
\note{The functions are part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{pubRepo-class}}}
\examples{
# Get a dummy data file from Bioconductor web site
data <-
loadFromUrl("http://www.bioconductor.org/datafiles/wwwsources/Tll_tmpl.gz",
destDir = "")
unlink(data)
}
\keyword{manip}


\eof
\name{makeSrcInfo}
\alias{makeSrcInfo}
\alias{getAllSrc}

\title{Functions to make source information available for later use}
\description{
  These functions read from a text file (AnnInfo) that have been stored
  in the data directory and create an environment object called AnnInfo
  that will be available for later access 
}
\usage{
makeSrcInfo(srcFile = "")
getAllSrc()
}

\arguments{
  \item{srcFile}{\code{srcFile} a character string for the name of the
    source file that contains source data information}
}
\details{
  The environment object created (AnnInfo) is a list with four elements:
  \item{short}{a character string for the description that will be used
    to describe an annotation element in an XML file to be generated}
  \item{long}{a character string that will be used to describe an
    annotation element in the help file for a given data environment
    that will be contained in a data package to be created}
  \item{src}{a character string for the short hand name of the source
    (e.g. ll for LocusLink)}
  \item{pbased}{a boolean that is TRUE if the annotation element is for
    a probe or FALSE otherwise}
}
\value{
  \code{\link{getAllSrc}} return a vector of character string for short
  hand names of data sources
}
\author{Jianhua Zhang}
\note{The functions are part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{ABPkgBuilder}}, \code{\link{GOPkgBuilder}},
  \code{\link{KEGGPkgBuilder}}} 
\examples{
  makeSrcInfo()
  ls(AnnInfo)
}
\keyword{file}


\eof
\name{map2LL}
\alias{map2LL}
\alias{getExten}
\alias{saveColSepData}
\alias{getOrgName}
\alias{getReverseMapping}
\alias{getUrl4Org}
\alias{getFullIDName}
\alias{saveData2Env}
\alias{reverseMap4GO}
\title{A function that maps LocusLink ids to other public repository ids
and vice versa} 
\description{
  This function uses data files provided by NCBI to create a data
  package that contains mappings between LocusLink ids and GO, RefSeq,
  and UniGene ids and vice versa 
}
\usage{
map2LL(pkgPath, organism, version, author, url =
"ftp://ftp.ncbi.nih.gov/refseq/LocusLink/")
getExten(what)
getOrgName(organism, what = c("scientific", "short"))
getReverseMapping(data, sep = ";")
getUrl4Org(organism)
getFullIDName(ID)
saveData2Env(data, fun = splitEntry, pkgName, pkgPath, envName)
reverseMap4GO(data, sep = ";")
}
\arguments{
  \item{organism}{\code{organism} a character string for the name of the
    organism of interest}
  \item{pkgPath}{\code{pkgPath} a character string for the name of the
    directory where the created data package will be stored}
  \item{version}{\code{version} a character string for the version
    number of the data package to be created}
  \item{author}{\code{author} a list with an author element for the
    name of the creater of the data package and a maintainer element for
    the email address of the creater} 
  \item{url}{\code{url} a character string for the url of NCBI's ftp
    site where source data are stored. Current value is
    \url{ftp://ftp.ncbi.nih.gov/refseq/LocusLink/}}
  \item{what}{\code{what} a character string for the type of mapping
    source data (i. e. "go", "ug" ...) or description of organism
    name("scientific" or "short")}
  \item{data}{\code{data} a matrix to be processed}
  \item{sep}{\code{sep} a character string the separator used to
    separate data elements for a given entry}
  \item{ID}{\code{ID} a character string for the short name of a data
    source (e. g. LL for LocusLink)}
  \item{envName}{\code{envName} a character string for the name of the
    environment object to be stored in the data package to be created}
  \item{fun}{\code{fun} the name of an R function to be called to
    process a data set before storing the data to an environment object}
  \item{pkgName}{\code{pkgName} a character string for the name of data
    package to be created}
}
\details{
  Three files namely loc2go, loc2ref, and loc2UG will be used to create
  the mappings. The files were in
  \url{ftp://ftp.ncbi.nih.gov/refseq/LocusLink/} at the time of the
  writing. \code{\link{getExten}} maintains names for the three
  files. Should any of the names been changed by the server,
  \code{\link{getExten}} has to be modified.

  \code{\link{getExten}} and \code{\link{saveColSepData}} are supporting
  functions to \code{\link{map2LL}} 
}
\value{
  invisible
}
\references{\url{http://www.ncbi.nlm.nih.gov/LocusLink/}}
\author{Jianhua Zhang}
\note{This function is part of Bioconductor project at Dana-Farber
  Cancer Institute to provide bioinfomatics functionalities through R}

\examples{
# Please note that the example will take a while to finish
if(interactive()){
  map2LL(pkgPath = tempdir(), version = "1.0.0", organism = "human",
author = list(author = "who", maintainer = "who@email.com"))
}
}
\keyword{manip}


\eof
\name{print.ABQCList}
\alias{print.ABQCList}

\title{Prints the quality control results for a given data package in a
  nice format}
\description{
  AnnBuilder has a function (\code{getDPStats}) that generates
  some statistical data (a list) for a givan data package for quality control
  purpose. print.ABQCList prints the results in a more readable format. 
}
\usage{
print.ABQCList(x, ...)
}

\arguments{
  \item{x}{\code{x} A list object of class ABQCList that is generated by
    function \code{getDPStats}} 
  \item{\dots}{\code{\dots} Other data to be included (not implemented
    currently)}
}
\details{
  The list object contains the following elements:
  \item{name}{A character string for the name of an rda file}
  \item{built}{A character string for a date}
  \item{probeNum}{An integer for the total number of probes in a given
    base file} 
  \item{numMissMatch}{A vector of character strings for names of rda
    files whose total number of probes do not match that of a given base
    file}  
  \item{probeMissMatch}{A vector of character strings for names of rda
    files whose probes do not match what are in a given base file}
  \item{probeMapped}{A vector of named integers for the total number of
    probes in a probe based rda file that have been mapped to data from
    public data sources. Names of the integers are the names of the rda files}
  \item{otherMapped}{A vector of named integers for the total number of
    probes in a non-probe based rda file that have been mapped to data
    from public data sources. Names of the intergers are the names of
    the rda files} 
}
  
\value{
  No values are returned
}

\author{Jianhua Zhang}
\note{This function is only used for building data packages}

\seealso{\code{getDPStats}}

\examples{
# Create a ABQCList
x <- c(12250, 7800)
names(x) <- c("file1", "file2")
y <- c(2300, 3456)
names(y) <- c("file3", "file4")
aList <- list(name = "a test", built = date(), probeNum = 12250,
numMissMatch = c("file3", "file4"), probeMissMatch = "file2", probeMapped = x,
otherMapped = y)
class(aList) <- "ABQCList"
aList
}
\keyword{misc}

\eof
\name{pubRepo-class} 
\docType{class}
\alias{pubRepo-class}
\alias{baseFile<-}
\alias{baseFile}
\alias{downloadData}
\alias{parseData}
\alias{parser<-}
\alias{parser}
\alias{readData}
\alias{srcUrl<-}
\alias{srcUrl}
\alias{baseFile<-,pubRepo-method}
\alias{baseFile,pubRepo-method}
\alias{downloadData,pubRepo-method}
\alias{parseData,pubRepo-method}
\alias{parser<-,pubRepo-method}
\alias{parser,pubRepo-method}
\alias{readData,pubRepo-method}
\alias{srcUrl<-,pubRepo-method}
\alias{srcUrl,pubRepo-method}
\alias{pubRepo}
\title{Class "pubRepo" a generic class for downloading/parsing data
  provided by various public data repositories}
\description{This class provides the basic functions to download/parser
  data from different public data repositories. More specific functions
  can be provided by extending this class to include source specific
  features}
\section{Objects from the Class}{
Objects can be created by calls of the form \code{new("pubRepo", ...)}.
    A constructor (\code{\link{pubRepo}} is provided and should be used
    to create objects of this class. 
}
\section{Slots}{
  \describe{
    \item{\code{srcUrl}:}{Object of class \code{"character"} a character
      string for the url of a data source from a public repository}
    \item{\code{parser}:}{Object of class \code{"character"} a character
      string for the name of a file that will be used as part of perl
      script to parse the source data. Parser is a segment of perl code
      containing instructions on how the source data will be processed
      and the content and format of the output}
    \item{\code{baseFile}:}{Object of class \code{"character"} a
      character string for the name of a file that will be used as the 
      base to process the source data. Data from the source that are
      related to elements in the base file will be extracted. baseFile
      is assumed to be a two folumn file with the first column being
      some type of arbitrary ids (e.g. Affymetrix probe ids) and the
      second cloumn being the corresponding ids of a given public
      repository (e.g. GenBank accession numbers or UniGene ids)}
  }
}

\section{Methods}{
  \describe{
    \item{baseFile<-}{\code{signature(object = "pubRepo")}: Sets the
      value for baseFile}
    \item{baseFile}{\code{signature(object = "pubRepo")}: Gets the value
      for baseFile}
    \item{downloadData}{\code{signature(object = "pubRepo")}: Downloads
      data from a data source defined by srcUrl}
    \item{parseData}{\code{signature(object = "pubRepo")}:
      DownLoads/parses data from a data source defined by srcUrl}
    \item{parser<-}{\code{signature(object = "pubRepo")}: Sets the value
      for parser}
    \item{parser}{\code{signature(object = "pubRepo")}: Gets the value
      for parser}
    \item{readData}{\code{signature(object = "pubRepo")}: Reads data
      using \code{\link{readLines}} from a data source defined by srcUrl}
    \item{srcUrl<-}{\code{signature(object = "pubRepo")}: Sets the value
      for srcUrl}
    \item{srcUrl}{\code{signature(object = "pubRepo")}: Gets the value
      for srcUrl}
  }
}
\author{Jianhua Zhang}
\note{This class is part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{GO-class}}, \code{\link{KEGG-class}},
  \code{\link{LL-class}}, \code{\link{UG-class}}, \code{\link{GEO-class}}}

\examples{
# Read a short test file from Bioconductor
test <- pubRepo(srcUrl =
"http://www.bioconductor.org/datafiles/wwwsources/TGene.txt")
data <- readData(test)
}
\keyword{classes}

\eof
\name{queryGEO}
\alias{queryGEO}
\title{Function to extract a data file from the GEO web site}
\description{
  Data files that are available at GEO web site are identified by GEO
  accession numbers. Give a GEO object with the url for a common CGI and
  a GEO accession number, this function extracts data from the web site
  and returns a matrix containing the data portion of the file.
}
\usage{
queryGEO(GEOObj, GEOAccNum)
}
\arguments{
  \item{GEOObj}{\code{GEOObj} a GEO object}
  \item{GEOAccNum}{\code{GEOAccNum} a character string for the GEO
    accession number of a desired file}
}
\details{
  The GEO object contains the url for a CGI script that processes user's
  request. \code{\link{queryGEO}} invokes the CGI by passing a GEO
  accession number and then processes the data file obtained.
}
\value{
  \code{\link{queryGEO}} returns a matrix containing data obtained.
}
\references{\url{www.ncbi.nlm.nih.gov/geo}}
\author{Jianhua Zhang}
\note{This function is part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{GEO-class}}}
\examples{
geo <- GEO()
temp <- queryGEO(geo, "GPL49")
}
\keyword{manip}


\eof
\name{resolveMaps}
\alias{resolveMaps}
\alias{getVote}
\alias{hasDelimit}
\alias{getUnified}
\alias{getNoDup}
\title{Functions to obtain unified mappings between ids}
\description{
  These functions are used to obtained unified mappings between between
  two sets of ids based on the mappings available from different
  sources. Each source provide mappings between two sets of ids.
}
\usage{
resolveMaps(maps, trusted, srcs, colNames = NULL, outName = "", asFile = TRUE)
getVote(voters, sep = "\t")
getUnified(voters)
getNoDup(voters)
hasDelimit(entry, deli = ";") 
}
\arguments{
  \item{maps}{\code{maps} a matrix with mappings for a set of key ids to
    another set of ids provided by different sources. The first column
    is assumed to the key ids and the rest are mappings to another set
    of ids provided by different sources}
  \item{trusted}{\code{trusted} a vector of characters to indicate the
    column number of "maps" whose mappings are more reliable and should
    be used when there are conflicts}
  \item{srcs}{\code{srcs} a vector of character strings for the names of
    columns that contain mappings from different sources}
  \item{colNames}{\code{colNames} a vector of character strings for the
    names of columns in "maps"}
  \item{outName}{\code{outName} a character string for the name of the
    file to contain the unified mappings}
  \item{asFile}{\code{asFile} a boolean to indicate whether the unfied
    mappings will be saved as a file}
  \item{voters}{\code{voters} a vector containing mappings from
    different sources}
  \item{entry}{\code{entry} a character string to be checked for the
    existence of a separater}
  \item{deli}{\code{deli} a character string for a separator}
  \item{sep}{\code{sep} same as deli}
}
\details{
  Each source may have different mappings from the key ids to another
  set of ids. \code{\link{resolveMaps}} resolves the confilicts and
  derives a set of unified mappings based on the mappings provided from
  several sources.

  \code{\link{getVote}} resolves the mappings for a given key id and
  returns a vector with unified mapping and the number of sources that
  agree with the unified mapping.

  \code{\link{getUnified}} finds agreement among values in a vactor
  passed. If some values agree, get the one agreed by mose sources.

  \code{\link{getNoDup}} gets a value based on predefined rules when
  values from different sources do not agree.
  
  \code{\link{hasDelimit}} checks to see if a delimiter exists}
}
\value{
  \code{\link{resolveMaps}} returns a matrix with the first colum being
  the key id set, second being the unified mappings to another id set,
  and third the total number of agreements found among sources.

  \code{\link{getVote}} returns a two element vector.

  \code{\link{getUnified}} returns a character string.

  \code{\link{getNoDup}} returns a character string.

  \code{\link{hasDelimit}} returns TRUE or FALSE.
}
\author{Jianhua Zhang}
\note{The functions are part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{LL-class}}, \code{\link{UG-class}}}
\examples{
maps <- matrix(c("id1", "a", "a", "b", "id2", "c","d", "c",
"id3", "e","e", "e", "id4", NA, "f", NA, "id5", "g", NA, "h", "id6", NA,
"NA", "i", "id7", NA, NA, NA), ncol = 4, byrow = TRUE)
unified <- resolveMaps(maps, c("srcll", "srcug"),
c("srcll", "srcug", "srcgeo"),
colNames = c("key1", "srcll", "srcug", "srcgeo"), outName = "",
asFile = FALSE)
}
\keyword{manip}


\eof
\name{sourceURLs}
\alias{sourceURLs}
\title{A data file contains urls for data available from various public
  repositories} 
\description{
  This data file is used by various objects (through
  \code{\link{getSrcUrl}}) to get the correct urls for various data
  sources to be processed. 
}
\details{
  sourceURLs[[XX]] will get the url for data source XX, where XX is a
  short name for a particular public data repository. Valid names
  include "LL" - LocusLink, "UG" - UniGene, "GP" - GoldenPath, "GO" -
  Gene Ontology, "KEGG" - Kyoto Encyclopedia of Genes and Genomes, "GEO"
  - Gene Expression Omnibus, and "YG" - Yeast Genome.
}
\author{Jianhua Zhang}
\note{The data file part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{pubRepo-class}}}
\examples{
load(file.path(.path.package("AnnBuilder"), "data", "sourceURLs.rda"),
.GlobalEnv)
sourceURLs[["KEGG"]]
}
\keyword{file}


\eof
\name{unifyMappings}
\alias{unifyMappings}

\title{A function to unify mapping result from different sources}
\description{
  Given a base file and mappings from different sources, this function
  resolves the differences among sources in mapping results using a
  voting sheme and derives unified mapping results for targets in the
  base file
}
\usage{
unifyMappings(base, ll, ug, otherSrc, fromWeb)
}

\arguments{
  \item{base}{\code{base} a matrix with two columns. The first column
    contains the target items (genes) to be mapped and the second the
    know mappings of the target to GenBank accession numbers or UniGene ids}
  \item{ll}{\code{ll} an object of class LL}
  \item{ug}{\code{ug} an object of class UG}
  \item{otherSrc}{\code{otherSrc} a vector of character strings for
    names of files that also contain mappings of the target genes in
    base. The files are assumed to have two columns with the first one
    being target genes and second one being the desired mappings}
  \item{fromWeb}{\code{fromWeb} a boolean to indicate whether the source
    data will be read from the web or a local file}
}
\details{
  ll and ug have methods to parse the data from LocusLink and UniGene to
  obtain desored mappings to target genes in base. Correct source urls
  and parsers are needed to obtain the desired mappings 
}
\value{
  The function returns a matrix with four columns. The first two are the
  same as the columns of base, the third are unified mappings, and forth
  are statistics of the agreement among sources.
}

\author{Jianhua Zhang}
\note{This function is part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{\code{\link{LL}}, \code{\link{UG}}}
\examples{
myDir <- file.path(.path.package("AnnBuilder"), "temp")
geneNMap <- matrix(c("32468_f_at", "D90278", "32469_at", "L00693",
                   "32481_at", "AL031663", "33825_at", " X68733",
                   "35730_at", "X03350", "36512_at", "L32179",
                   "38912_at", "D90042", "38936_at", "M16652",
                   "39368_at", "AL031668"), ncol = 2, byrow = TRUE)
colnames(geneNMap) <- c("PROBE", "ACCNUM")
write.table(geneNMap, file = file.path(myDir, "geneNMap"), sep = "\t",
quote = FALSE, row.names = FALSE, col.names = FALSE)

temp <- matrix(c("32468_f_at", NA, "32469_at", "2",
                   "32481_at", NA, "33825_at", " 9",
                   "35730_at", "1576", "36512_at", NA,
                   "38912_at", "10", "38936_at", NA,
                   "39368_at", NA), ncol = 2, byrow = TRUE)
temp
write.table(temp, file = file.path(myDir, "srcone"), sep = "\t",
quote = FALSE, row.names = FALSE, col.names = FALSE)
temp <- matrix(c("32468_f_at", NA, "32469_at", NA,
                   "32481_at", "7051", "33825_at", NA,
                   "35730_at", NA, "36512_at", "1084",
                   "38912_at", NA, "38936_at", NA,
                   "39368_at", "89"), ncol = 2, byrow = TRUE)
temp
write.table(temp, file = file.path(myDir, "srctwo"), sep = "\t",
quote = FALSE, row.names = FALSE, col.names = FALSE)
otherMapping <- c(srcone = file.path(myDir, "srcone"),
srctwo = file.path(myDir, "srctwo"))

baseFile <-  file.path(myDir, "geneNMap")
llParser <- file.path(.path.package("AnnBuilder"), "data", "gbLLParser")
ugParser <- file.path(.path.package("AnnBuilder"), "data", "gbUGParser")
if(.Platform$OS.type == "unix"){
    llUrl <-  "http://www.bioconductor.org/datafiles/wwwsources/Tll_tmpl.gz"
    ugUrl <-  "http://www.bioconductor.org/datafiles/wwwsources/Ths.data.gz"
    fromWeb = TRUE
}else{
    llUrl <- file.path(.path.package("AnnBuilder"), "data", "Tll_tmpl")
    ugUrl <- file.path(.path.package("AnnBuilder"), "data", "Ths.data")
    fromWeb = FALSE
}
ll <- LL(srcUrl = llUrl, parser = llParser, baseFile = baseFile)
ug <- UG(srcUrl = ugUrl, parser = ugParser, baseFile = baseFile,
organism = "human") 
# Only works interactively
if(interactive()){
    unified <- unifyMappings(base =  geneNMap, ll = ll, ug = ug,
               otherSrc = otherMapping, fromWeb = fromWeb)
    read.table(unified, sep = "\t", header = FALSE)

    unlink(c(file.path(myDir, "geneNMap"), file.path(myDir, "srcone"),
    file.path(myDir, "srctwo"), unified))
}
}
\keyword{manip}

\eof
\name{writeChrLength}
\alias{writeChrLength}
\alias{findChrLength}
\alias{writeOrganism}
\title{Functions that creates binary files for chromosome length and organism}
\description{
  These functions figure out the chromosome length and write the length
  and organism binary files to the data directory of the pacakge
}
\usage{
writeChrLength(pkgName, pkgPath, chrLengths)
findChrLength(organism, srcUrl = getSrcUrl("GP", organism))
writeOrganism(pkgName, pkgPath, organism)
}

\arguments{
  \item{pkgName}{\code{pkgName} a character string for the name of a
    data package or R library}
  \item{pkgPath}{\code{pkgPath} a character string for the path where
    pkgname resides} 
  \item{organism}{\code{organism} a character string for the name of the
    organism of interests}
  \item{srcUrl}{\code{srcUrl} a character string for the url of the data
    source used to create the binary file for chromosome length}   
  \item{chrLengths}{\code{chrLengths} a named vector of integers with
    the names being the chromosome numbers and the values of the vector
    being the total lengths of chromosomes}
}
\details{
  \code{\link{findChrLength}} extracts data from the source and figures
  out the total length for each chromosome. The total length for a
  chromosome is determined as the maximum chromosome location plus 1000.

  \code{\link{writeChrLength}} writes the chromosome length data to the
  data directory as a binary file.

  \code{\link{writeOrganism}} writes the name of the organism to the
  data directory as a binary file.
}
\value{
  \code{\link{findChrLength}} returns a named vector of integers. 
}
\references{}
\author{Jianhua Zhang}
\note{The functions are part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide BioInformatics functionalities through R}

\seealso {\code{\link{ABPkgBuilder}}} 

\examples{
if(interactive()){
    path <- file.path(.path.package("AnnBuilder", "temp"))
    dir.create(file.path(path, "test"))
    dir.create(file.path(path, "test", "data"))
    chrLength <- findChrLength("human")
    writeChrLength("test", path, chrLength)
    writeOrganism("test", path, "human")
    list.files(file.path(path, "test", "data"))
    unlink(file.path(path, "test"), TRUE)
}
}
\keyword{manip}

\eof
\name{writeManPage}
\alias{writeManPage}
\alias{writeMan4Fun}
\alias{formatName}
\alias{writeREADME}
\alias{writeDescription}
\alias{getDsrc}
\alias{getItem}
\alias{getSrcNBuilt}
\alias{getUrlNBuilt}
\alias{getDSrc}
\alias{writeAccessory}
\alias{writeZZZ}
\alias{getAllRdaName}
\alias{writeFun}
\alias{escapeLatexChr}
\alias{writeMan4QC}
\alias{getExample}
\alias{getSrcBuiltNRef}
\alias{getBuild4Yeast}
\title{Functions that write supporting files needed by a data package}
\description{
  The functions are mainly used to write man pages and supporting
  functions that are needed for a data package}.  
}
\usage{
writeManPage(pkgName, pkgPath, manName, organism = "human", src, isEnv =
TURE)
writeMan4Fun(pkgName, pkgPath, organism, QCList)
formatName(toFormat)
writeREADME(pkgPath, pkgName, urls)
writeDescription(pkgName, pkgPath, version, author, dataSrc, license)
getDSrc(organism)
getSrcNBuilt(dSrc, organism)
getUrlNBuilt(src, organism)
getDSrc{organism}
writeAccessory(pkgName, pkgPath, organism, version, author = c(name =
"who", address = "who@email.net"), dataSrc, license)
writeFun(pkgPath, pkgName, organism = "human")
writeZZZ(pkgPath, pkgName)
getAllRdaName(pkgName, pkgPath)
escapeLatexChr(item)
writeMan4QC(pkgName, pkgPath)
getExample(pkgName, manName, isEnv = TRUE)
getSrcBuiltNRef(src, organism)
getBuild4Yeast(src, manName)
}

\arguments{
  \item{pkgName}{A character string for the name of a data package or R
    library}
  \item{pkgPath}{A character string for the path where pkgname resides}
  \item{organism}{A character string for the name of the organism of
    interests} 
  \item{toFormat}{A character string form whom any underscore will be
    removed}
  \item{urls}{A vector of character of string for the urls of the data
    source used to create the rda files}   
  \item{name}{A character string to be used for the name of a latex item
    tag} 
  \item{dSrc}{A vector of character strings containing the short names
    of public data sources (e. g. LL for LocusLink)}
  \item{src}{A character string for the short name of a public data
    source}
  \item{version}{A character string for the version number}
  \item{author}{A named vector of strings with two elements named name
    and address, respectively. Name is a character string for the name
    of the person who maintains the data package and address is the email
    address of the person}
  \item{item}{A character string to be escaped by if it is a latex
    character}
  \item{QCList}{A list with statistical data derived from
    \code{\link{getDPStats}}}
  \item{manName}{\code{manName} a character string for the name of the
    man page to be created}
  \item{isEnv}{\code{isEnv} a boolean to indicate whether the object a
    man page concerns is an R environment or not}
  \item{dataSrc}{\code{dataSrc} a vector of character strings for the
    data sources used to create a package}
  \item{license}{\code{license} a character string for the license the
    package is under}
}
\details{
  If pkgname = "XX" and elenames = "yy", the Rd file will be "XXyy.Rd"
  appended to the path if short is FALSE. Otherwise, the Rd file will be
  "yy.Rd" appended to the path.

  \code{\link{writeManPage}} writes a man page for a given object that
  is stored in the data directory.

  \code{\link{getExample}} creates a set of example code that is going
  to be used in a man page depending on whether the man page is for an
  environment object or not.
  
  \code{\link{getSrcBuiltNRef}} creates the text that is going to be
  used for built and reference information in a man page.

  \code{\link{getBuild4Yeast}} creates the text that is going to be used
  for built and reference information for the man page for yeast.
}
\value{
  All functions return a character string.
}
\references{An Introduction to R - Writing R Extensions}
\author{Jianhua Zhang}
\note{The functions are part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide BioInformatics functionalities through R}

\seealso {\code{\link{ABPkgBuilder}}} 

\examples{
makeSrcInfo()
dir.create(file.path(".", "pkg"))
dir.create(file.path(".", "pkg", "data"))
dir.create(file.path(".", "pkg", "man"))
writeManPage("pkg", getwd(), "CHR")
list.files(file.path(getwd(), "pkg", "data"))
unlink("pkg", TRUE)
}
\keyword{manip}

\eof
\name{writeXMLHeader}
\alias{writeXMLHeader}

\title{A function to write header information to an XML file.}
\description{
  This function writes to the Attr node of an annotate XML file. 
}
\usage{
writeXMLHeader(outName, fileCol, name, version, organism="human")
}

\arguments{
  \item{outName}{A character string for the name of the XML file to store
    the generated mata-data.}
  \item{fileCol}{A vector of character strings for the names of data
    columns in the original file that is going to be used to produce the
    Data node of the XML file.}
  \item{name}{A character string for an internal name that is normally
    the target of the annotation (e. g. U95 for the u95 chip).}
  \item{version}{A character string or number for the version of the
    system that produces the XML file.}
  \item{organism}{A character string for the name of the organism of
    interests} 
}
\details{
  The XML file produced has an Attr node to hold the header
  information. The Attr node contains a Target node for the internal
  name, a DataMade node to date the file when it is made, one to many
  SourceFile nodes for names of the source files used for annotation,
  and one to many Element nodes for names of the data items the Data
  node of the XML will contain.
}
\value{
  This function does not return any value.
}
\references{\url{http://www.bioconductor.org/datafiles/dtds/annotate.dtd}}
\author{Jianhua (John) Zhang}
\note{This function is part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R.}

\seealso{\code{\link{fileToXML}}}

\examples{
makeSrcInfo()
#Write the header to a temp file
if(interactive()){
writeXMLHeader(outName = "try.xml", fileCol = c("AFFY", "LOCUSID",
"ACCNUM"), name = "Not Real", version = "0.5", organism = "human")
# View the header
readLines("try.xml")
# Clearn up
unlink("try.xml")
}
}
\keyword{manip}








\eof
\name{yeastAnn}
\alias{yeastAnn}
\alias{getProbe2SGD}
\alias{procYeastGeno}
\alias{formatGO}
\alias{formatChrLoc}
\alias{getGEOYeast}
\title{Functions to annotate yeast genom data}
\description{
  Given a GEO accession number for a yease data set and the extensions
  for annotation data files names that are available from Yeast Genom
  web site, the functions generates a data package with containing
  annoatation data for yeast genes in the GEO data set.
}
\usage{
yeastAnn(base = "", yGenoUrl =
"ftp://genome-ftp.stanford.edu/pub/yeast/data_download/", yGenoNames = c("literature_curation/gene_literature.tab",
"chromosomal_feature/chromosomal_feature.tab",
"literature_curation/go_annotation.tab"), toKeep = list(c(6, 1), c(9, 5,
8, 11, 15), c(3, 2, 6)), colNames = list(c("sgdid", "pubmed"),
c("sgdid", "chrom", "strand", "desc", "enzyme"), c("sgdid", "gene",
"go")), seps = c("\t", "\t", "\t"), by = "sgdid")
getProbe2SGD(probe2ORF, yGenoUrl, fileName
="chromosomal_feature/external_id.tab", toKeep = c(3, 4), colNames =
c("orf","sgdid"), sep = "\t", by = "orf")
procYeastGeno(baseURL =
"ftp://genome-ftp.stanford.edu/pub/yeast/data_download/", fileName,
toKeep, colNames, seps = "\t")
getGEOYeast(GEOAccNum, GEOUrl =
"http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?", geoCols = c(1, 8),
yGenoUrl = "ftp://genome-ftp.stanford.edu/pub/yeast/data_download/") 
formatGO(gos, evis)
formatChrLoc(chr, chrloc, chrori)  
}
\arguments{
  \item{base}{\code{base} a matrix with two columns. The first column is
    probe ids and the second one are the mappings to SGD ids used by all
    the Yeast Genome data files. If \code{base} = "", the whole genome
    will be mapped based on a data file that contains mappings between all
    the ORFs and SGD ids} 
  \item{GEOAccNum}{\code{GEOAccNum} a character string for the accession
    number given by GEO for a yeast data set}
  \item{GEOUrl}{\code{GEOUrl} a character string for the url that
    contains a common CGI for all the GEO data. Currently it is
    \url{http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?}}
  \item{geoCols}{\code{geoCols} a vector of integers for the coloumn
    numbers of the source file from GEO that maps yeast probe ids to ORF
    ids} 
  \item{yGenoUrl}{\code{yGenoUrl} a character string for the url that is
    a directory in Yeast Genom web site that contains directories for
    yeast annotation data. Currently it is
    \url{ftp://genome-ftp.stanford.edu/pub/yeast/data_download/}}
  \item{baseURL}{see yGenoUrl}
  \item{yGenoNames}{\code{yGenoNames} a vector of character strings for
    the names of yeast annotation data. Each of the strings can be
    appended to yGenoUrl to make a complete url for a data file}
  \item{fileName}{a character string for the extension part of the
    source data file that can be used to target genes to SGD ids}
  \item{toKeep}{\code{toKeep} a list of vector of integers with numbers
    corresponding to column numbers of yeast genom data files that will
    kept when data files are processed. The length of toKeep must be the
    same as yGenoName (a vector for each file)}
  \item{colNames}{\code{colNames} a list of vectors of character strings
    for the names to be given to the columns to keep when processing the
    data. Again, the length of colNames must be the same as yGenoNames}
  \item{seps}{\code{seps} a vector of characters for the separators used
    by the data files included in yGenoNames}
  \item{sep}{singular version of seps}
  \item{by}{\code{by} a character string for the column that is common
    in all data files to be processed. The column will be used to merge
    separate data files}
  \item{probe2ORF}{\code{probe2ORF} a matrix with mappings of yease
    target genes to ORF ids that in turn can be mapped to SGD ids}
  \item{gos}{\code{gos} a vector of character strings for GO ids
    retrieved from Yeast Genome Project}
  \item{evis}{\code{evis} a vector of character string for the evidence
    code associated with go ids}
  \item{chr}{\code{chr} a vector of character strings for chromosome
    numbers} 
  \item{chrloc}{\code{chrloc} a vector of integers for chromosomal
    locations}
  \item{chrori}{\code{chrori} a vector of characters that can either be
    w or c that are used for strand of yeast chromosomes}
}
\details{
  To merge files, the system has to map the target genes in the base
  file to SGD ids and then use SGD ids to map traget genes to annotation
  data from different sources.

  \code{\link{formatGO}} adds leading 0s to goids when needed and then
  append the evidence code to the end of a goid following a "@".

  \code{\link{formatChrLoc}} assigns a + or - sing to \code{chrloc}
  depending on whether the corresponding \code{chrori} is w or c and
  then append \code{chr} to the end of \code{chrloc} following a "@".

  \code{\link{getGEOYeast}} gets yeast data from GEO for the columns
  specified. 
}
\value{
  \code{\link{yeastAnn}} returns a matrix with traget genes annotated by
  data from selected data columns in different data sources.

  \code{\link{getProbe2SGD}} returns a matrix with mappings between
  target genes and SGD ids.

  \code{\link{procYeastGeno}} returns a data matrix.

  \code{\link{formatGO}} returns a vector of character strings.

  \code{\link{formatChrLoc}} returns a vector of character strings.

  \code{\link{getGEOYeast}} returns a matrix with the number of columns
  specified. 
}
\references{\url{ftp://genome-ftp.stanford.edu}}
\author{Jianhua Zhang}
\note{The functions are part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}

\seealso{}
\examples{
# The following code will take a while to run and is turned off 
if(FALSE){
yeastData <- yeastAnn(GEOAccNum = "GPL90")
}
}
\keyword{manip}

\eof
\name{yeastPkgBuilder}
\alias{yeastPkgBuilder}
\alias{findYGChrLength}
\title{Functions to do a data package for yeast genome}
\description{
  These functions builds a data package for yeast genome using data from
  Yeast Genome web site of Stanford University, KEGG, and Gene Ontology.
}
\usage{
yeastPkgBuilder(pkgName, pkgPath, base = "", srcUrls = c(KEGG =
getSrcUrl("KEGG", organism = "yeast"), GO = getSrcUrl("GO")), version =
"1.1.0", makeXML = TRUE, author = c(name = "who", address =
"who@email.com"), fromWeb = TRUE)
findYGChrLength(yGenoUrl =
"ftp://genome-ftp.stanford.edu/pub/yeast/data_download/", yGenoName =
"chromosomal_feature/chromosomal_feature.tab", toKeep = c(5, 7), sep = "\t")
}

\arguments{
  \item{pkgName}{\code{pkgName} a character string for the name of the
    data package to be built}
  \item{base}{\code{base} a matrix with two columns with the first one
    being probe ids and the second one being their mappings to ORF (Open
    Reading Frame) ids. Columns have the name "probe" and "orf"}
  \item{pkgPath}{\code{pkgPath} a character string for the directory
    where the data package to be built will be stored}
  \item{srcUrls}{\code{srcUrls} a named vector of strings for the urls
    for KEGG (\url{ftp://ftp.genome.ad.jp/pub/kegg/pathways}) and GO
    (http://www.godatabase.org/dev/database/archive/2003-05-01/go\_200305-termdb.xml.gz). The urls may change over time} 
  \item{version}{\code{version} a character string for the version
    number of the data package to be built}
  \item{makeXML}{\code{makeXML} a boolean to indicate whether an XML
    version of the data will be generated}
  \item{author}{\code{author} a named vector of two character strings
    with a name element for the name and an address element of email
    address of the maintainer of the data package}
  \item{fromWeb}{\code{fromWeb} a boolean to indicate whether the data
    from GO should be downloaded from the web or read locally. The url
    for GO should be the file name of a local file if fromWeb is
    FALSE. For windows users, the data file from GO should be
    downloaded/unzipped manually and set the url for GO to be the name
    of the local file}
  \item{yGenoUrl}{\code{yGenoUrl} a character string for the url to the
    ftp iste where download files are stored. Defaulted to an url that
    was correct at the time of writting}
  \item{yGenoName}{\code{yGenoName} a character string for the name of
    the data file that contains chromosome information
    ("chrosomomal\_featur/chromosomal\_feature.tab")}
  \item{toKeep}{\code{toKeep} a vector of integers for the numbers of the
    columns that will be kept when the file is read}
  \item{sep}{\code{sep} a character of string for the delimiter used by
    the file to separate columns}
}
\details{
  Annotation elements are limited to those provided by Yeast Genome
  (gene name, chromosome number, chromosomal location, GO id, and
  evidence code), KEGG (path and enzyme data) and GO (GO mappings)
}
\value{
  \code{findYGChrLength} returns a named vector of integers with
  chromosome numbers names and length of chromosomes as values. 
}
\references{\url{http://www.yeastgenome.org}}
\author{Jianhua Zhang}
\note{The functions are part of the Bioconductor project at Dana-Farber
  Cancer Institute to provide Bioinformatics functionalities through R}
\examples{
# The code runs only when invoked by an user. 
if(interactive()){
  findYGChrLength()
}
}
\keyword{manip}

\eof
