How to Download a Microarray Data
- The NCBI Gene Expression Omnibus (GEO) serves as a public repository for a wide range of high-throughput experimental data.
- We can get microarray data from GEO by “getGEO” function in GEOquery package
# install the core bioconductor packages
source("http://bioconductor.org/biocLite.R")
## Bioconductor version 3.7 (BiocInstaller 1.30.0), ?biocLite for help
biocLite()
## BioC_mirror: https://bioconductor.org
## Using Bioconductor 3.7 (BiocInstaller 1.30.0), R 3.5.1 (2018-07-02).
## installation path not writeable, unable to update packages: foreign,
## survival
# install additional bioconductor libraries
biocLite("GEOquery")
## BioC_mirror: https://bioconductor.org
## Using Bioconductor 3.7 (BiocInstaller 1.30.0), R 3.5.1 (2018-07-02).
## Installing package(s) 'GEOquery'
## package 'GEOquery' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\sypark\AppData\Local\Temp\RtmpMt8yxK\downloaded_packages
## installation path not writeable, unable to update packages: foreign,
## survival
library(GEOquery)
## Loading required package: Biobase
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind,
## colMeans, colnames, colSums, dirname, do.call, duplicated,
## eval, evalq, Filter, Find, get, grep, grepl, intersect,
## is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
## paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
## Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
## table, tapply, union, unique, unsplit, which, which.max,
## which.min
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
## Setting options('download.file.method.GEOquery'='auto')
## Setting options('GEOquery.inmemory.gpl'=FALSE)
# Now, we are free to access any GEO accession
gds <- getGEO("GDS1962")
## File stored at:
## C:\Users\sypark\AppData\Local\Temp\RtmpMt8yxK/GDS1962.soft.gz
## Parsed with column specification:
## cols(
## .default = col_double(),
## ID_REF = col_character(),
## IDENTIFIER = col_character()
## )
## See spec(...) for full column specifications.
# Basic Information of GEO Microarray Data
# You can check basic information of the microarray data by typing "gds"
# gds
# since it generates long output, I skipped this
# GEO Data Structure
# Look at the microarray data
head(Table(gds)[1:10, 1:10])
## ID_REF IDENTIFIER GSM97800 GSM97803 GSM97804 GSM97805 GSM97807
## 1 1007_s_at MIR4640 4701.5 4735.0 2863.9 5350.2 4789.4
## 2 1053_at RFC2 282.7 347.9 355.0 319.9 294.2
## 3 117_at HSPA6 769.6 287.9 199.0 182.8 204.3
## 4 121_at PAX8 1616.3 1527.2 1793.8 1880.0 1012.0
## 5 1255_g_at GUCA1A 232.7 204.8 119.3 180.2 156.7
## 6 1294_at MIR5193 357.7 336.5 328.7 304.7 190.1
## GSM97809 GSM97811 GSM97812
## 1 5837.8 4446.7 4264.1
## 2 257.5 321.0 317.9
## 3 184.9 107.5 196.9
## 4 1024.4 1133.8 1295.0
## 5 155.1 236.2 235.9
## 6 253.3 342.5 284.1
# Sample Information
# Look at the sample information
head(Columns(gds)[,1:3])
## sample disease.state tissue
## 1 GSM97800 non-tumor not applicable
## 2 GSM97803 non-tumor not applicable
## 3 GSM97804 non-tumor not applicable
## 4 GSM97805 non-tumor not applicable
## 5 GSM97807 non-tumor not applicable
## 6 GSM97809 non-tumor not applicable
Or more detailed information in GEO homepage http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4290