Contents

Objectives

  • Introducing “R” programming language and basic operations of R for the implementation of statistical analysis in remaining course
  • Introducing “Bioconductor” in “R” and how to handle high throughput biomedical by using basic packages and operations in Bioconductor

First part: R Basics

  • R introduction and installation
  • Data types and basic operations

Second part: Bioconductor & High Throughput Biomedical Data

  • Bioconductor introduction and installation
  • Microarray data as an example of High throughput biomedical data
  • Short introduction of R functions and data sets we will study in this course

Tutorial

Setup working environment

R

  • Official website: http://www.r-project.org/
    • Statistical programming language
    • Open source, development-flexible, extensible
    • Large number of statistical and numerical packages
    • High quality visualization and graphical tools
R

R



Select mirror site



Download R for your OS



Select base for beginner

Start downloading (Windows user case)

Rstudio

  • Official website: https://www.rstudio.com/
    • integrated development environment (IDE) for R
    • console, syntax-highlighting editor
    • tools for plotting, history, debugging and workspace management
Rstudio

Rstudio



Rstudio Open source edition



Download Rstudio Desktop



Select installer or Zip/Tarballs for your OS



R Programming Language Resources

Help

An example of help functions

# help.start()          # Show general help
# help.search("print")  # List all help pages with topics or title matching "print"
# ??utils::help         # List all the topics matching "help" in the utils package
# find("sin")             # Search for packages including "sin"
apropos("sin")          # List all objects containing string "sin"
##  [1] "as.single"           "as.single.default"   "asin"               
##  [4] "asinh"               "deviceIsInteractive" "is.single"          
##  [7] "isIncomplete"        "missing"             "missingArg"         
## [10] "sin"                 "single"              "sinh"               
## [13] "sink"                "sink.number"         "sinpi"
example("sqrt")       # Run all code from the Examples part of R's online help
## 
## sqrt> require(stats) # for spline
## 
## sqrt> require(graphics)
## 
## sqrt> xx <- -9:9
## 
## sqrt> plot(xx, sqrt(abs(xx)),  col = "red")

## 
## sqrt> lines(spline(xx, sqrt(abs(xx)), n=101), col = "pink")
# RSiteSearch("regression") # Search for regression at http://search.r-project.org

Data Types

Modes - logical (Boolean TRUE/FALSE) - numeric (integers and reals) - complex (real + imaginary numbers) - character (strings)

Data structures

Homogeneous Heterogeneous
1 dim vector list
2 dim matrix dataframe

Assignment & Read Scalar Variables

var1 <- TRUE    # Assign a Boolean using "<-" assignment
var1            # Read variable "var1" (by just typing the variable name)
## [1] TRUE
var2 = "string!"    # Assign a string using "=" assignment
var2
## [1] "string!"
var3 <- var4 <- (3 + 1i)    # Assign an imaginary number using nested assignment
var3
## [1] 3+1i
var4
## [1] 3+1i
# Difference between "=" and "<-" assignment
abs(x = -1)   # "=" assignment (abs is for absolute function.)
## [1] 1
x             # "=" assigns variables within the function(block)'s scope!
## Error in eval(expr, envir, enclos): 객체 'x'를 찾을 수 없습니다
abs(x <- -1)    # "<-" assignment
## [1] 1
x             # "<-" assigns variables within the workspace's scope!
## [1] -1

Vector

  • Define vectors
v1 <- c("one", "two", "three")  # Generate a character vector
v1
## [1] "one"   "two"   "three"
v2 <- seq(1, 5, 0.5)              # Generate a numeric sequence 
v2                              # from "1" to "5", spaced by "0.5"
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
v3 <- rep(TRUE, 5)              # Generate a logical vector
v3                              # consisting of "TRUE" for "5" times
## [1] TRUE TRUE TRUE TRUE TRUE
  • Read elements from a vector
v4 <- c(1, 2, 5.3, 6, -2, 4)    # define a vector
v4[3]                           # read 3rd value of v4
## [1] 5.3
v4[c(2,4)]                    # read 2nd and 4th value of v4
## [1] 2 6

List

  • List is an ordered collection of objects (components). A list allows you to gather a variety of (possibly unrelated) objects under one name.
  • Example of a list with 4 components including a string, a numeric vector, a matrix, and a scalar.
w <- list(name = "Fred", mynum = c(1, 2), age = 5.3)
w
## $name
## [1] "Fred"
## 
## $mynum
## [1] 1 2
## 
## $age
## [1] 5.3
# Read elements of a list using the [[]] or $ convention.
w[[2]]  # 2nd component of the list
## [1] 1 2
w$mynum # mynum in the list
## [1] 1 2

Matrix

All columns in a matrix must have the same mode(numeric, character, etc.) and the same length. The general format is shown below.

  • pseudocode
    • mymatrix <- matrix(vector, nrow = r, ncol = c, byrow = FALSE, dimnames = list(char_vector_rownames, char_vector_colnames))
  • byrow=TRUE indicates that the matrix should be filled by rows.
  • byrow=FALSE indicates that the matrix should be filled by columns. (default)
  • dimnames provides optional labels for the columns and rows.
# Generate 5 x 4 numeric matrix
x <- matrix(1:20, nrow = 5, ncol = 4)
x
##      [,1] [,2] [,3] [,4]
## [1,]    1    6   11   16
## [2,]    2    7   12   17
## [3,]    3    8   13   18
## [4,]    4    9   14   19
## [5,]    5   10   15   20
# Generate 2 x 2 numeric matrix with row/column names
cells <- c(1,26,24,68)
rnames <- c("R1", "R2")
cnames <- c("C1", "C2") 
mymatrix <- matrix(cells, nrow = 2, ncol = 2, byrow = TRUE,     dimnames = list(rnames, cnames)) 
mymatrix
##    C1 C2
## R1  1 26
## R2 24 68
# Read rows, columns, or elements using subscripts. 
x[,4]   # 4th column of matrix
## [1] 16 17 18 19 20
x[3,]   # 3rd row of matrix 
## [1]  3  8 13 18
x[2:4,1:3]  # rows 2,3,4 of columns 1,2,3 
##      [,1] [,2] [,3]
## [1,]    2    7   12
## [2,]    3    8   13
## [3,]    4    9   14

Data Frame

A data frame is more general than a matrix, in that different columns can have different types (numeric, character, factor, etc.).

d <- c(1, 2, 3, 4)
e <- c("red", "white", "red", NA)
f <- c(TRUE, TRUE, TRUE, FALSE)
mydata <- data.frame(d, e, f)
mydata
##   d     e     f
## 1 1   red  TRUE
## 2 2 white  TRUE
## 3 3   red  TRUE
## 4 4  <NA> FALSE
names(mydata) <- c("ID","Color","Passed")   #variable names
mydata
##   ID Color Passed
## 1  1   red   TRUE
## 2  2 white   TRUE
## 3  3   red   TRUE
## 4  4  <NA>  FALSE

Variable Information

library(knitr)
vec <- c(1, 26, 24, 68)
rname <- c("R1", "R2")
cname <- c("C1", "C2") 
matrix <- matrix(vec, nrow = 2, ncol = 2, byrow = TRUE, dimnames = list(rname, cname))
kable(matrix)
C1 C2
R1 1 26
R2 24 68
length(matrix)  # get the number of elements
## [1] 4
dim(matrix) # retrieve the dimension
## [1] 2 2
str(matrix) # list the structure of an object
##  num [1:2, 1:2] 1 24 26 68
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:2] "R1" "R2"
##   ..$ : chr [1:2] "C1" "C2"
class(matrix)   # get the class (type) of an object
## [1] "matrix"

Data Manipulation

  • Rearranging & converting data
    • Sorting
    • Merging
    • Data conversion

Sorting

C1 <- c(15, 11, 18)
C2 <- c(20, 24, 68)
C3 <- c(23, 73, 23)
data <- data.frame(C1, C2, C3)
data
##   C1 C2 C3
## 1 15 20 23
## 2 11 24 73
## 3 18 68 23
# Sort by C1 (ascending)
newdata1 = data[order(data$C1),]
newdata1
##   C1 C2 C3
## 2 11 24 73
## 1 15 20 23
## 3 18 68 23
# Sort by C3 (ascending) and C2 (descending)
newdata2 = data[order(data$C3, -data$C2),]
newdata2
##   C1 C2 C3
## 3 18 68 23
## 1 15 20 23
## 2 11 24 73

Merging

To merge two matrices horizontally or vertically, you can use following merge functions.

vec <- c(1, 26, 24, 68)
rname <- c("R1", "R2")
cname <- c("C1", "C2") 
matrix <- matrix(vec, nrow = 2, ncol = 2, byrow = TRUE, dimnames = list(rname, cname))

matrix1 <- cbind(matrix, matrix)    # combine objects as columns
matrix1
##    C1 C2 C1 C2
## R1  1 26  1 26
## R2 24 68 24 68
matrix2 <- rbind(matrix, matrix)    # combine objects as rows
matrix2
##    C1 C2
## R1  1 26
## R2 24 68
## R1  1 26
## R2 24 68

To merge two dataframes (datasets) horizontally, use the merge function. In most cases, you join two dataframes by one or more common key variables (i.e., an inner join).

d1 <- c(1, 2, 3, 4)
d2 <- c("red", "white", "red", NA)
d3 <- c(3, 4, 5)
d4 <- c("red", "red", "white")

df1 <- data.frame(d1, d2)
names(df1) <- c("ID","Color")

df2 <- data.frame(d3, d4)
names(df2) <- c("ID","Color")

m1 <- merge(df1, df2, by = "ID")
m2 <- merge(df1, df2, by = c("ID", "Color"))

df1
##   ID Color
## 1  1   red
## 2  2 white
## 3  3   red
## 4  4  <NA>
df2
##   ID Color
## 1  3   red
## 2  4   red
## 3  5 white
m1
##   ID Color.x Color.y
## 1  3     red     red
## 2  4    <NA>     red
m2
##   ID Color
## 1  3   red

Other Set Algebra

vec1 <- seq(1, 10, 2)
vec2 <- seq(1, 5, 1)
vec1
## [1] 1 3 5 7 9
vec2
## [1] 1 2 3 4 5
union(vec1, vec2)   # union of vec1 and vec2
## [1] 1 3 5 7 9 2 4
intersect(vec1, vec2)   # Intersect of vec1 and vec2
## [1] 1 3 5
setdiff(vec1, vec2) # vec1 - vec2
## [1] 7 9
setdiff(vec2, vec1) # vec2 - vec1
## [1] 2 4

Data Type Check & Conversion

Use is.foo() to test for data type foo. It returns TRUE or FALSE. is.numeric(), is.character(), is.vector(), is.matrix(), is.data.frame() Use as.foo() to explicitly convert the datatype. as.numeric(), as.character(), as.vector(), as.matrix(), as.data.frame)

# Convert date info in format 'mm/dd/yyyy'
strDates <- c("01/05/1965", "08/16/1975")
strDates
## [1] "01/05/1965" "08/16/1975"
dates <- as.Date(strDates, "%m/%d/%Y")
dates
## [1] "1965-01-05" "1975-08-16"
# Convert dates to character data
strDates <- as.character(dates)
strDates
## [1] "1965-01-05" "1975-08-16"

Workspace Management

The workspace is your current R working environment and includes any user-defined objects (vectors, matrices, data frames, lists, functions).

a <- 1
b <- 2

ls()    # Return the names of R objects
##  [1] "a"        "b"        "C1"       "C2"       "C3"       "cells"   
##  [7] "cname"    "cnames"   "d"        "d1"       "d2"       "d3"      
## [13] "d4"       "data"     "dates"    "df1"      "df2"      "dt"      
## [19] "e"        "f"        "m1"       "m2"       "matrix"   "matrix1" 
## [25] "matrix2"  "mydata"   "mymatrix" "newdata1" "newdata2" "rname"   
## [31] "rnames"   "strDates" "v1"       "v2"       "v3"       "v4"      
## [37] "var1"     "var2"     "var3"     "var4"     "vec"      "vec1"    
## [43] "vec2"     "w"        "x"        "xx"
exists("a") # Look for an R object named "a"
## [1] TRUE
rm(list = ls()) # Remove the object list


getwd() # Return the path of current working directory
## [1] "Y:/Lecture/1-BIS335BioStat/2-Rtintro/Tutorial"
setwd("Y:/Lecture/1-BIS335BioStat/2-Rtintro/Tutorial")  # Set the working directory
source("http://bioconductor.org/biocLite.R")    # Read the input source file
install.packages("e1071", repos = "http://cran.us.r-project.org")   # Download and install packages
library("e1071")    # Load add-on packages

Basic Operations

Arithmetic Operators

1 + 2   # addition
## [1] 3
4 - 3   # subtraction
## [1] 1
2 * 5   # multiplication
## [1] 10
7 / 3   # division
## [1] 2.333333
2 ^ 3   # exponentiation
## [1] 8
3 ** 2  # exponentiation
## [1] 9
4 %% 3  # modulus (x mod y)
## [1] 1

Logical Operators

n1 <- 1
n2 <- 2

n1 < n2   # less than 
## [1] TRUE
n1 <= n2    # less than or equal to
## [1] TRUE
n1 > n2   # greater than 
## [1] FALSE
n1 >= n2    # greater than or equal to
## [1] FALSE
n1 == n2    # exactly equal to
## [1] FALSE
n1 != n2    # not equal to
## [1] TRUE
b1 <- TRUE
b2 <- FALSE

!b1        # NOT b1 
## [1] FALSE
b1 | b2    # b1 OR b2
## [1] TRUE
b1 & b2    # b1 AND b2 
## [1] FALSE
isTRUE(b2) # test if b2 is TRUE
## [1] FALSE
is.na(b2)    # test if b2 is empty
## [1] FALSE
setequal(c(2, 4, 6), seq(2, 7, 2)) # Test if two sets are equal
## [1] TRUE

Numeric Functions

# round-off
ceiling(3.475)            # smallest integers not less than the input
## [1] 4
floor(3.475)                # largest integers not less than the input
## [1] 3
trunc(-3.475)             # integers formed by truncating the values in input toward 0
## [1] -3
round(3.475, digits = 1)    # rounded value to the specified digits of decimal place
## [1] 3.5
# functions
abs(-5)             # absolute value
## [1] 5
sqrt(4)             # square root
## [1] 2
log(3)              # natural logarithm
## [1] 1.098612
log(1024, base = 2) # logarithm with specified base
## [1] 10
log10(100)          # common logarithm
## [1] 2
exp(1)              # exponential
## [1] 2.718282
# Trigonometric functions 
sin(0)  # sine
## [1] 0
cos(0)  # cosine
## [1] 1
tan(0)  # tangent
## [1] 0
asin(0) # arc-sine
## [1] 0
acos(1) # arc-cosine
## [1] 0
atan(0) # arc-tangent
## [1] 0

Basic Statistic Functions

vec = c(1, 3, 4, 6, 8, 12)

max(vec)    # maximum value of x
## [1] 12
min(vec)    # largest integers not less than the input
## [1] 1
mean(vec)   # rounded value to the specified digits of decimal place
## [1] 5.666667
sd(vec)   # rounded value to the specified digits of decimal place
## [1] 3.932768

Character Functions (String Manipulation)

str = "hello, world"
dictionary = c("hi", "hello", "hey")

substr(str, start = 1, stop = 5)    # Extract substring
## [1] "hello"
grep("HELLO", dictionary, ignore.case = TRUE)   # Search for pattern.
## [1] 2
sub("hello", "hi", str) # Replace with replacement text
## [1] "hi, world"
strsplit(str, ", ") # Split elements as character vector
## [[1]]
## [1] "hello" "world"
paste(str, "!", sep = "")   # Concatenate strings using seperator
## [1] "hello, world!"
str <- toupper(str) # Change to uppercase
str
## [1] "HELLO, WORLD"
tolower(str)    # Change to lowercase
## [1] "hello, world"

Control Structures

  • if & else
    • if (cond) expr
    • if (cond) cons.expr else alt.expr2
  • ifelse
    • ifelse(test, yes_, no)
  • for
    • for (var in seq) expr
  • while
    • while (cond) expr
x <- -1
if (x < 0) {    # if & else
    print(-x)
} else{
    print(x)
}
## [1] 1
x <- c(6:-4)
ifelse(x >= 0, x, NA)   # ifelse
##  [1]  6  5  4  3  2  1  0 NA NA NA NA
for (i in 1:3) print(1:i)   # for
## [1] 1
## [1] 1 2
## [1] 1 2 3
x <- 1
while (x < 3) { # while
    x <- x + 1;
    print(x);
}
## [1] 2
## [1] 3

Functions

mytrans <- function(x) { 
    if (!is.matrix(x)) {
        warning("not a matrix: returning NA")
        return(NA_real_)
    }
    y <- matrix(1, nrow = ncol(x), ncol = nrow(x)) 
    for (i in 1:nrow(x)) {
        for (j in 1:ncol(x)) {
            y[j,i] <- x[i,j] 
        }
    }
    return(y)
}

mat <- matrix(1:20, nrow = 5, ncol = 4)
matr <- mytrans(mat)
mat
##      [,1] [,2] [,3] [,4]
## [1,]    1    6   11   16
## [2,]    2    7   12   17
## [3,]    3    8   13   18
## [4,]    4    9   14   19
## [5,]    5   10   15   20
matr
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    2    3    4    5
## [2,]    6    7    8    9   10
## [3,]   11   12   13   14   15
## [4,]   16   17   18   19   20
# Apply Function over a list or vector
sapply(seq(1, 5), sqrt) # Return a result of applying function to elements of vector
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068

Input & Output

d <- c(1.55, 2.28, 3.54, 4.79)
e <- c("red", "white", "red", NA)
f <- c(TRUE, TRUE, TRUE, FALSE)
data <- data.frame(d, e, f)
names(data) <- c("ID","Color","Passed")
data
##     ID Color Passed
## 1 1.55   red   TRUE
## 2 2.28 white   TRUE
## 3 3.54   red   TRUE
## 4 4.79  <NA>  FALSE
format(data, justify = "left", digits = 1)  # Format an R object for pretty printing
##   ID Color Passed
## 1  2 red     TRUE
## 2  2 white   TRUE
## 3  4 red     TRUE
## 4  5 NA     FALSE
write.table(data, file = "Y:/Lecture/1-BIS335BioStat/2-Rtintro/Tutorial/test.txt")  # Prints a variable to a file
data.read = read.table(file = "Y:/Lecture/1-BIS335BioStat/2-Rtintro/Tutorial/test.txt", header = TRUE)    # Read a file

Bioconductor

  • Official website: http://www.bioconductor.org/

  • R programming language-based project
  • Open source, open development software project
  • An object-oriented framework for addressing the diversity and complexity of computational biology and bioinformatics problems.
  • Support for
    • Rich statistical simulation and modeling activities.
    • Cutting edge data and model visualization capabilities.
    • Powerful statistical and graphical methods for the analysis of genomic data
    • Associating microarray and other genomic data with biological metadata from public web databases
    • 1,000 packages and active user community

Bioconductor

Bioconductor

Main Features of Bioconductor

  • Statistical and graphical methods
    • for oligonucleotide arrays, sequence analysis, flow cytometry and other high-throughput genomic data.
    • linear and non-linear modeling, cluster analysis, prediction, resampling, survival analysis, and time-series analysis.
  • Annotation
    • associating microarray and other genomic data in real time with biological metadata from web databases such as GenBank, Entrez genes and PubMed
    • assembling and processing genomic annotation data, from databases such as GenBank, the Gene Ontology Consortium, Entrez genes, UniGene, the UCSC Human Genome Project
    • mapping between different probe identifiers (e.g. Affy IDs, Entrez genes)

Installation of Bioconductor Packages

  • The current release is version 3.7 (works with R version \(\geq\) 3.5.0).
  • You can use the biocLite.R script to install Bioconductor packages. To install core packages, type the following in an R command window:
source("https://bioconductor.org/biocLite.R")
biocLite(c("GenomicFeatures", "AnnotationDbi"))

Bioconductor Packages

Bioconductor Packages

Bioconductor Packages

High Throughput Biomedical Data

  • A hybridization test of total mRNAs in a cell under a specified biological condition with pre-deposited array of cDNA (100bp to 2000bp) or oligonucleotides (25bp to80 bp).

  • This allows the simultaneous measurement of expression levels of thousands of genes in different experimental conditions

  • Microarray-based gene expression experiments is a core of functional genomics, the large-scale analysis of the genome-wide function of genes

High Throughput Biomedical Data

High Throughput Biomedical Data

Representative Packages for Microarray Analysis

  • Pre-processing
    • affy, oligo, lumi, beadarray, limma, genefilter, etc.
  • Differential expression
    • limma, etc.
  • Gene set enrichment
    • topGO, GOstats, GSEABase, etc.
  • Annotation
    • AnnotationDbi, chip, org, BSgenome, etc.

Microarray Analysis Workflow with Bioconductor

  • Prior to analysis
    • Biological experimental design (treatments, replication,…)
    • Microarray preparation (especially two-channel)
  • Analysis
    • Pre-processing
    • Quality assessment
    • Normalization
    • Differential expression
    • Clustering
    • Classification
    • Annotation
    • Gene set enrichment
## Load packages
# source("https://bioconductor.orb/biocLite.R")
# biocLiet(c("affy", "limma"))
library(affy)  # Affymetrix pre-processing
library(limma) # two-color pre-procesing; differential expression

## import "phenotype" data, describing the experimental design
phenoData <- read.AnnotatedDataFrame(system.file("extdata", "pdta.txt", package = "arrays"))

## RMA normalization
celfiles <- system.file("extdata", package = "arrays")
eset <- justRMA(phenoData = phenoData, celfile.path = celfiles)

## 

## differential expression
combn <- factor(paste(pData(phenoData)[,1], pData(phenoData)[,2], sep = "_"))
design <- model.matrix(~combn) # describe model to be fit

fit <- lmfit(eset, design) # fit each probeset to model
efit <- eBayes(fit)        # empirical Bayes adjustment
topTable(efit, coef = 2)   # table of differentially expressed probesets

How to Download a Microarray Data

  • The NCBI Gene Expression Omnibus (GEO) serves as a public repository for a wide range of high-throughput experimental data.
  • We can get microarray data from GEO by “getGEO” function in GEOquery package
# install the core bioconductor packages
source("http://bioconductor.org/biocLite.R")
## Bioconductor version 3.7 (BiocInstaller 1.30.0), ?biocLite for help
biocLite()
## BioC_mirror: https://bioconductor.org
## Using Bioconductor 3.7 (BiocInstaller 1.30.0), R 3.5.1 (2018-07-02).
## installation path not writeable, unable to update packages: foreign,
##   survival
# install additional bioconductor libraries
biocLite("GEOquery")
## BioC_mirror: https://bioconductor.org
## Using Bioconductor 3.7 (BiocInstaller 1.30.0), R 3.5.1 (2018-07-02).
## Installing package(s) 'GEOquery'
## package 'GEOquery' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\sypark\AppData\Local\Temp\RtmpMt8yxK\downloaded_packages
## installation path not writeable, unable to update packages: foreign,
##   survival
library(GEOquery)
## Loading required package: Biobase
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind,
##     colMeans, colnames, colSums, dirname, do.call, duplicated,
##     eval, evalq, Filter, Find, get, grep, grepl, intersect,
##     is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
##     paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
##     Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
##     table, tapply, union, unique, unsplit, which, which.max,
##     which.min
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## Setting options('download.file.method.GEOquery'='auto')
## Setting options('GEOquery.inmemory.gpl'=FALSE)
# Now, we are free to access any GEO accession
gds <- getGEO("GDS1962")
## File stored at:
## C:\Users\sypark\AppData\Local\Temp\RtmpMt8yxK/GDS1962.soft.gz
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   ID_REF = col_character(),
##   IDENTIFIER = col_character()
## )
## See spec(...) for full column specifications.
# Basic Information of GEO Microarray Data
# You can check basic information of the microarray data by typing "gds"
# gds
# since it generates long output, I skipped this 

# GEO Data Structure
# Look at the microarray data
head(Table(gds)[1:10, 1:10])
##      ID_REF IDENTIFIER GSM97800 GSM97803 GSM97804 GSM97805 GSM97807
## 1 1007_s_at    MIR4640   4701.5   4735.0   2863.9   5350.2   4789.4
## 2   1053_at       RFC2    282.7    347.9    355.0    319.9    294.2
## 3    117_at      HSPA6    769.6    287.9    199.0    182.8    204.3
## 4    121_at       PAX8   1616.3   1527.2   1793.8   1880.0   1012.0
## 5 1255_g_at     GUCA1A    232.7    204.8    119.3    180.2    156.7
## 6   1294_at    MIR5193    357.7    336.5    328.7    304.7    190.1
##   GSM97809 GSM97811 GSM97812
## 1   5837.8   4446.7   4264.1
## 2    257.5    321.0    317.9
## 3    184.9    107.5    196.9
## 4   1024.4   1133.8   1295.0
## 5    155.1    236.2    235.9
## 6    253.3    342.5    284.1
# Sample Information
# Look at the sample information
head(Columns(gds)[,1:3])
##     sample disease.state         tissue
## 1 GSM97800     non-tumor not applicable
## 2 GSM97803     non-tumor not applicable
## 3 GSM97804     non-tumor not applicable
## 4 GSM97805     non-tumor not applicable
## 5 GSM97807     non-tumor not applicable
## 6 GSM97809     non-tumor not applicable

Or more detailed information in GEO homepage http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4290

R Function List & Data Used In This Course

  • CourseRfunctions.pdf
  • CourseDataset.pdf.