Using Bioconductor for Microarray Analysis

Bioconductor Maintainer1*

1Roswell Park Cancer Institute, Elm and Carlton St, Buffalo, NY 14263

*maintainer@bioconductor.org

Abstract

Bioconductor has advanced facilities for analysis of microarray platforms including Affymetrix, Illumina, Nimblegen, Agilent, and other one- and two-color technologies. Bioconductor includes extensive support for analysis of expression arrays, and well-developed support for exon, copy number, SNP, methylation, and other assays. Major workflows in Bioconductor include pre-processing, quality assessment, differential expression, clustering and classification, gene set enrichment analysis, and genetical genomics. Bioconductor offers extensive interfaces to community resources, including GEO, ArrayExpress, Biomart, genome browsers, GO, KEGG, and diverse annotation sources.

Contents

Table of Contents
Sample Workflow
Top

1 Version Info

R version: R version 3.5.0 (2018-04-23)
Bioconductor version: 3.7
Package version: 1.4.0

Table of Contents
Installation and Use
Version Info
Top

2 Sample Workflow

The following code illustrates a typical R / Bioconductor session. It uses RMA from the affy package to pre-process Affymetrix arrays, and the limma package for assessing differential expression.

## Load packages
library(affy)   # Affymetrix pre-processing
library(limma)  # two-color pre-processing; differential
                  # expression
                
## import "phenotype" data, describing the experimental design
phenoData <- 
    read.AnnotatedDataFrame(system.file("extdata", "pdata.txt",
    package="arrays"))

## RMA normalization
celfiles <- system.file("extdata", package="arrays")
eset <- justRMA(phenoData=phenoData,
    celfile.path=celfiles)
## Warning: replacing previous import 'AnnotationDbi::tail' by 'utils::tail' when
## loading 'hgfocuscdf'
## Warning: replacing previous import 'AnnotationDbi::head' by 'utils::head' when
## loading 'hgfocuscdf'
## 
## differential expression
combn <- factor(paste(pData(phenoData)[,1],
    pData(phenoData)[,2], sep = "_"))
design <- model.matrix(~combn) # describe model to be fit

fit <- lmFit(eset, design)  # fit each probeset to model
efit <- eBayes(fit)        # empirical Bayes adjustment
topTable(efit, coef=2)      # table of differentially expressed probesets
##                 logFC   AveExpr         t      P.Value    adj.P.Val        B
## 204582_s_at  3.468416 10.150533  39.03471 1.969915e-14 1.732146e-10 19.86082
## 211548_s_at -2.325670  7.178610 -22.73165 1.541158e-11 6.775701e-08 15.88709
## 216598_s_at  1.936306  7.692822  21.73818 2.658881e-11 7.793180e-08 15.48223
## 211110_s_at  3.157766  7.909391  21.19204 3.625216e-11 7.969130e-08 15.24728
## 206001_at   -1.590732 12.402722 -18.64398 1.715422e-10 3.016740e-07 14.01955
## 202409_at    3.274118  6.704989  17.72512 3.156709e-10 4.626157e-07 13.51659
## 221019_s_at  2.251730  7.104012  16.34552 8.353283e-10 1.049292e-06 12.69145
## 204688_at    1.813001  7.125307  14.75281 2.834343e-09 3.115297e-06 11.61959
## 205489_at    1.240713  7.552260  13.62265 7.264649e-09 7.097562e-06 10.76948
## 209288_s_at -1.226421  7.603917 -13.32681 9.401074e-09 7.784531e-06 10.53327

A top table resulting from a more complete analysis, described in Chapter 7 of Bioconductor Case Studies, is shown below. The table enumerates Affymetrix probes, the log-fold difference between two experimental groups, the average expression across all samples, the t-statistic describing differential expression, the unadjusted and adjusted (controlling for false discovery rate, in this case) significance of the difference, and log-odds ratio. These results can be used in further analysis and annotation.

      ID logFC AveExpr    t  P.Value adj.P.Val     B
636_g_at  1.10    9.20 9.03 4.88e-14  1.23e-10 21.29
39730_at  1.15    9.00 8.59 3.88e-13  4.89e-10 19.34
 1635_at  1.20    7.90 7.34 1.23e-10  1.03e-07 13.91
 1674_at  1.43    5.00 7.05 4.55e-10  2.87e-07 12.67
40504_at  1.18    4.24 6.66 2.57e-09  1.30e-06 11.03
40202_at  1.78    8.62 6.39 8.62e-09  3.63e-06  9.89
37015_at  1.03    4.33 6.24 1.66e-08  6.00e-06  9.27
32434_at  1.68    4.47 5.97 5.38e-08  1.70e-05  8.16
37027_at  1.35    8.44 5.81 1.10e-07  3.08e-05  7.49
37403_at  1.12    5.09 5.48 4.27e-07  1.08e-04  6.21

[ Back to top ]

Table of Contents
Exploring Package Content
Sample Workflow
Top

3 Installation and Use

Follow installation instructions to start using these packages. You can install affy and limma as follows:

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite(c("affy", "limma"))

To install additional packages, such as the annotations associated with the Affymetrix Human Genome U95A 2.0, use

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("hgu95av2.db")

Package installation is required only once per R installation. View a /packagesfull list of available packages.

To use the affy and limma packages, evaluate the commands

library("affy")
library("limma")

These commands are required once in each R session.

[ Back to top ]

Table of Contents
Pre-Processing Resources
Installation and Use
Top

4 Exploring Package Content

Packages have extensive help pages, and include vignettes highlighting common use cases. The help pages and vignettes are available from within R. After loading a package, use syntax like

help(package="limma")
?topTable

to obtain an overview of help on the limma package, and the topTable function, and

browseVignettes(package="limma")

to view vignettes (providing a more comprehensive introduction to package functionality) in the limma package. Use

help.start()

to open a web page containing comprehensive help resources.

[ Back to top ]

Table of Contents
Exploring Package Content
Top

5 Pre-Processing Resources

The following provide a brief overview of packages useful for pre-processing. More comprehensive workflows can be found in documentation (available from package descriptions) and in Bioconductor Books and monographs.

Table of Contents
Affymetrix Exon ST Arrays
Pre-Processing Resources

5.1 Affymetrix 3’-biased Array

affy, gcrma, affyPLM

xps

Table of Contents
Affymetrix Gene ST Arrays
Affymetrix 3’-biased Array
Pre-Processing Resources

5.2 Affymetrix Exon ST Arrays

oligo

exonmap

xps

Table of Contents
Affymetrix SNP Arrays
Affymetrix Exon ST Arrays
Pre-Processing Resources

5.3 Affymetrix Gene ST Arrays

oligo

xps

Table of Contents
Affymetrix Tiling Arrays
Affymetrix Gene ST Arrays
Pre-Processing Resources

5.4 Affymetrix SNP Arrays

oligo

Table of Contents
Nimblegen Arrays
Affymetrix SNP Arrays
Pre-Processing Resources

5.5 Affymetrix Tiling Arrays

oligo

Table of Contents
Affymetrix Tiling Arrays
Pre-Processing Resources

5.6 Nimblegen Arrays

oligo

Table of Contents
Nimblegen Arrays
Pre-Processing Resources

5.7 Illumina Expression Microarrays

lumi

beadarray

[ Back to top ]

sessionInfo()
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.4 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.7-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.7-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] hgfocuscdf_2.18.0   affy_1.58.0         Biobase_2.40.0     
## [4] BiocGenerics_0.26.0 limma_3.36.1        arrays_1.4.0       
## [7] BiocStyle_2.8.1    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.16          AnnotationDbi_1.42.1  knitr_1.20           
##  [4] magrittr_1.5          IRanges_2.14.10       zlibbioc_1.26.0      
##  [7] bit_1.1-13            blob_1.1.1            stringr_1.3.1        
## [10] tools_3.5.0           xfun_0.1              DBI_1.0.0            
## [13] htmltools_0.3.6       bit64_0.9-7           yaml_2.1.19          
## [16] rprojroot_1.3-2       digest_0.6.15         preprocessCore_1.42.0
## [19] bookdown_0.7          affyio_1.50.0         S4Vectors_0.18.2     
## [22] memoise_1.1.0         RSQLite_2.1.1         evaluate_0.10.1      
## [25] rmarkdown_1.9         stringi_1.2.2         compiler_3.5.0       
## [28] BiocInstaller_1.30.0  backports_1.1.2       stats4_3.5.0

[ Back to top ]