affxparser-package        package:affxparser        R Documentation

_P_a_c_k_a_g_e _a_f_f_x_p_a_r_s_e_r

_D_e_s_c_r_i_p_t_i_o_n:

     The 'affxparser' package provides methods for fast and memory
     efficient parsing of Affymetrix files [1] using the Affymetrix' 
     Fusion SDK [2].  Both traditional ASCII- and binary (XDA)-based
     files are supported, as well as Affymetrix future binary format
     "Calvin". The efficiency of the parsing is dependent on whether a
     specific file is binary or ASCII.

     Currently, there are methods for reading chip definition file
     (CDF)  and a cell intensity file (CEL).  These files can be read
     either in  full or in part.  For example, probe signals from a few
     probesets  can be extracted very quickly from a set of CEL files
     into a  convenient list structure.

_R_e_q_u_i_r_e_m_e_n_t_s:

     This package requires only a standard R installation, that is, it
     works independently of other CRAN and Bioconductor packages.

_T_o _g_e_t _s_t_a_r_t_e_d:

     To get started, see:

        1.  'readCelUnits'() - reads one or several Affymetrix CEL file
           probeset by probeset. 

        2.  'readCel'() - reads an Affymetrix CEL file. by probe.

        3.  'readCdf'() - reads an Affymetrix CDF file. by probe.

        4.  'readCdfUnits'() - reads an Affymetrix CDF file unit by
           unit. 

        5.  'readCdfCellIndices'() - Like 'readCdfUnits()', but returns
           cell indices only, which is often enough to read CEL files
           unit by unit.

        6.  'applyCdfGroups'() - Re-arranges a CDF structure.

        7.  'findCdf'() - Locates an Affymetrix CDF file by chip type. 
           This page also describes how to setup default search path
           for CDF files.

_S_e_t_t_i_n_g _u_p _t_h_e _C_D_F _s_e_a_r_c_h _p_a_t_h:

     Some of the functions in this package search for CDF files
     automatically by scanning certain directories.  To add directories
     to the default search path, see instructions in 'findCdf'().

_F_u_t_u_r_e _W_o_r_k:

     Other Affymetrix files can be parsed using the Fusion SDK. Given
     sufficient interest we will implement this, e.g. DAT files (image
     files).

_R_u_n_n_i_n_g _e_x_a_m_p_l_e_s:

     In order to run the examples, data files must exists in the
     current directory.  Otherwise, the example scripts will do
     nothing.  Most of the examples requires a CDF file or a CEL file,
     or both.  Make sure the CDF file is of the same chip type as the
     CEL file. 

     Affymetrix provides data sets of different types at <URL:
     http://www.affymetrix.com/support/datasets.affx> that can be used.
      There are both small are very large data sets available.

_T_e_c_n_i_c_a_l _d_e_t_a_i_l_s:

     This package implements an interface to the Fusion SDK from
     Affymetrix.com. This SDK (software development kit) is an open
     source library used for parsing the various files formats used by
     the Affymetrix platform.

     The intention is to provide interfaces to most if not all file
     formats which may be parsed using Fusion.

     The SDK supports parsing of all the different versions of a
     specific fileformat. This means that ASCII, binary as well as the
     new binary format (codename Calvin) used by Affymetrix is
     supported through a single API. We also expect any future changes
     to the file formats to be reflected in the SDK, and subsequently
     in this package.

     However, as the current Fusion SDK does not support compressed
     files, neither does 'affxparser'. This is in contrast to some of
     the existing code in 'affy' and relatives (see below for links).

     In general we aim to provide functions returning all information
     in the respective files. Currently it seems that future Affymetrix
     chip designs may consists of so many features that returning all
     information will lead to an unnecessary overhead in the case a
     user only wants access to a subset. We have tried to make this
     possible.

     For older file, certain entries in the files have been removed
     from newer specifications, and the SDK does not provide utilities
     for reading these entries. This includes eg. the FEAT column of
     CDF files.

     Currently the package as well as the Fusion SDK is in beta stage.
     Bugs may be related to either codebase. We are very interested in
     users being unable to compile/parse files using this library -
     this includes users with custom chip designs.

     In addition, since we aim to return all information stored in the
     file (and accessible using the Fusion SDK) we would like reports
     from users being unable to do that.

     The efficiency of the underlying code may vary with the version of
     the file being parsed. For example, we currently report the number
     of outliers present in a CEL file when reading the header of the
     file using 'readCelHeader'. In order to obtain this information
     from text based CEL files (version 2), the entire file needs to be
     read into memory. With version 3 of the file format, this
     information is stored in the header.

     With the introduction of the Fusion SDK (and the next version of
     their file formats) Affymetrix has made it possible to use
     multibyte character sets. This implies that character information
     may be inaccesible if the compiler used to compile the C++ code
     does not support multibyte character sets (specifically we require
     that the R installation has defined the macro 'SUPPORT_MCBS' in
     the 'Rconfig.h' header file). For example GCC needs to be version
     3.4 or greater on Solaris.

     In the 'info' subdirectory of the package installation,
     information regarding changes to the Fusion SDK is stored, e.g.


         pathname <- system.file("info/README", package="affxparser"))
         file.show(pathname)


_A_c_k_n_o_w_l_e_d_g_m_e_n_t_s:

     We would like to thanks Ken Simpson (WEHI, Melbourne) and Seth
     Falcon (FHCRC, Seattle) for feedback and code contributions.

_L_i_c_e_n_s_e:

     The releases of this package is licensed under LGPL version 2.1 or
     newer. This applies also to the Fusion SDK.

_A_u_t_h_o_r(_s):

     Henrik Bengtsson, hb@stat.berkeley.edu, James Bullard,
     bullard@stat.berkeley.edu and  Kasper Daniel Hansen,
     khansen@stat.berkeley.edu.

_R_e_f_e_r_e_n_c_e_s:

     [1] Affymetrix Inc, Affymetrix GCOS 1.x compatible file formats,
     April, 2006. <URL: http://www.affymetrix.com/support/developer/>
      [2] Affymetrix Inc, Fusion Software Developers Kit (SDK), 2006.
     <URL: http://www.affymetrix.com/support/developer/fusion/>

