R-tutorial

Objectives

Introducing “R” programming language and basic operations of R for the implementation of statistical analysis in remaining course
Introducing “Bioconductor” in “R” and how to handle high throughput biomedical by using basic packages and operations in Bioconductor

First part: R Basics

R introduction and installation
Data types and basic operations

Second part: Bioconductor & High Throughput Biomedical Data

Bioconductor introduction and installation
Microarray data as an example of High throughput biomedical data
Short introduction of R functions and data sets we will study in this course

Tutorial

Setup working environment

R

Official website: http://www.r-project.org/
- Statistical programming language
- Open source, development-flexible, extensible
- Large number of statistical and numerical packages
- High quality visualization and graphical tools

Select mirror site

Download R for your OS

Select base for beginner

Start downloading (Windows user case)

Rstudio

Official website: https://www.rstudio.com/
- integrated development environment (IDE) for R
- console, syntax-highlighting editor
- tools for plotting, history, debugging and workspace management

Rstudio

Rstudio Open source edition

Download Rstudio Desktop

Select installer or Zip/Tarballs for your OS

R Programming Language Resources

An Introduction to R (https://cran.r-project.org/doc/manuals/R-intro.pdf) - Official manuals for beginners
R Programming in Coursera (https://www.coursera.org/learn/r-programming)

Datacamp (https://www.datacamp.com/home) - Online interactive learning for R

Pluralsight (https://www.pluralsight.com/) - Another online interactive learning for R

Help

An example of help functions

# help.start()          # Show general help
# help.search("print")  # List all help pages with topics or title matching "print"
# ??utils::help         # List all the topics matching "help" in the utils package
# find("sin")             # Search for packages including "sin"
apropos("sin")          # List all objects containing string "sin"

##  [1] "as.single"           "as.single.default"   "asin"               
##  [4] "asinh"               "deviceIsInteractive" "is.single"          
##  [7] "isIncomplete"        "missing"             "missingArg"         
## [10] "sin"                 "single"              "sinh"               
## [13] "sink"                "sink.number"         "sinpi"

example("sqrt")       # Run all code from the Examples part of R's online help

## 
## sqrt> require(stats) # for spline
## 
## sqrt> require(graphics)
## 
## sqrt> xx <- -9:9
## 
## sqrt> plot(xx, sqrt(abs(xx)),  col = "red")

## 
## sqrt> lines(spline(xx, sqrt(abs(xx)), n=101), col = "pink")

# RSiteSearch("regression") # Search for regression at http://search.r-project.org

Data Types

Modes - logical (Boolean TRUE/FALSE) - numeric (integers and reals) - complex (real + imaginary numbers) - character (strings)

Data structures

	Homogeneous	Heterogeneous
1 dim	vector	list
2 dim	matrix	dataframe

Assignment & Read Scalar Variables

var1 <- TRUE    # Assign a Boolean using "<-" assignment
var1            # Read variable "var1" (by just typing the variable name)

## [1] TRUE

var2 = "string!"    # Assign a string using "=" assignment
var2

## [1] "string!"

var3 <- var4 <- (3 + 1i)    # Assign an imaginary number using nested assignment
var3

## [1] 3+1i

var4

## [1] 3+1i

# Difference between "=" and "<-" assignment
abs(x = -1)   # "=" assignment (abs is for absolute function.)

## [1] 1

x             # "=" assigns variables within the function(block)'s scope!

## Error in eval(expr, envir, enclos): 객체 'x'를 찾을 수 없습니다

abs(x <- -1)    # "<-" assignment

## [1] 1

x             # "<-" assigns variables within the workspace's scope!

## [1] -1

Vector

Define vectors

v1 <- c("one", "two", "three")  # Generate a character vector
v1

## [1] "one"   "two"   "three"

v2 <- seq(1, 5, 0.5)              # Generate a numeric sequence 
v2                              # from "1" to "5", spaced by "0.5"

## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

v3 <- rep(TRUE, 5)              # Generate a logical vector
v3                              # consisting of "TRUE" for "5" times

## [1] TRUE TRUE TRUE TRUE TRUE

Read elements from a vector

v4 <- c(1, 2, 5.3, 6, -2, 4)    # define a vector
v4[3]                           # read 3rd value of v4

## [1] 5.3

v4[c(2,4)]                    # read 2nd and 4th value of v4

## [1] 2 6

List

List is an ordered collection of objects (components). A list allows you to gather a variety of (possibly unrelated) objects under one name.
Example of a list with 4 components including a string, a numeric vector, a matrix, and a scalar.

w <- list(name = "Fred", mynum = c(1, 2), age = 5.3)
w

## $name
## [1] "Fred"
## 
## $mynum
## [1] 1 2
## 
## $age
## [1] 5.3

# Read elements of a list using the [[]] or $ convention.
w[[2]]  # 2nd component of the list

## [1] 1 2

w$mynum # mynum in the list

## [1] 1 2

Matrix

All columns in a matrix must have the same mode(numeric, character, etc.) and the same length. The general format is shown below.

pseudocode
- mymatrix <- matrix(vector, nrow = r, ncol = c, byrow = FALSE, dimnames = list(char_vector_rownames, char_vector_colnames))
byrow=TRUE indicates that the matrix should be filled by rows.
byrow=FALSE indicates that the matrix should be filled by columns. (default)
dimnames provides optional labels for the columns and rows.

# Generate 5 x 4 numeric matrix
x <- matrix(1:20, nrow = 5, ncol = 4)
x

##      [,1] [,2] [,3] [,4]
## [1,]    1    6   11   16
## [2,]    2    7   12   17
## [3,]    3    8   13   18
## [4,]    4    9   14   19
## [5,]    5   10   15   20

# Generate 2 x 2 numeric matrix with row/column names
cells <- c(1,26,24,68)
rnames <- c("R1", "R2")
cnames <- c("C1", "C2") 
mymatrix <- matrix(cells, nrow = 2, ncol = 2, byrow = TRUE,     dimnames = list(rnames, cnames)) 
mymatrix

##    C1 C2
## R1  1 26
## R2 24 68

# Read rows, columns, or elements using subscripts. 
x[,4]   # 4th column of matrix

## [1] 16 17 18 19 20

x[3,]   # 3rd row of matrix

## [1]  3  8 13 18

x[2:4,1:3]  # rows 2,3,4 of columns 1,2,3

##      [,1] [,2] [,3]
## [1,]    2    7   12
## [2,]    3    8   13
## [3,]    4    9   14

Data Frame

A data frame is more general than a matrix, in that different columns can have different types (numeric, character, factor, etc.).

d <- c(1, 2, 3, 4)
e <- c("red", "white", "red", NA)
f <- c(TRUE, TRUE, TRUE, FALSE)
mydata <- data.frame(d, e, f)
mydata

##   d     e     f
## 1 1   red  TRUE
## 2 2 white  TRUE
## 3 3   red  TRUE
## 4 4  <NA> FALSE

names(mydata) <- c("ID","Color","Passed")   #variable names
mydata

##   ID Color Passed
## 1  1   red   TRUE
## 2  2 white   TRUE
## 3  3   red   TRUE
## 4  4  <NA>  FALSE

Variable Information

library(knitr)
vec <- c(1, 26, 24, 68)
rname <- c("R1", "R2")
cname <- c("C1", "C2") 
matrix <- matrix(vec, nrow = 2, ncol = 2, byrow = TRUE, dimnames = list(rname, cname))
kable(matrix)

	C1	C2
R1	1	26
R2	24	68

length(matrix)  # get the number of elements

## [1] 4

dim(matrix) # retrieve the dimension

## [1] 2 2

str(matrix) # list the structure of an object

##  num [1:2, 1:2] 1 24 26 68
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:2] "R1" "R2"
##   ..$ : chr [1:2] "C1" "C2"

class(matrix)   # get the class (type) of an object

## [1] "matrix"

Data Manipulation

Rearranging & converting data
- Sorting
- Merging
- Data conversion

Sorting

C1 <- c(15, 11, 18)
C2 <- c(20, 24, 68)
C3 <- c(23, 73, 23)
data <- data.frame(C1, C2, C3)
data

##   C1 C2 C3
## 1 15 20 23
## 2 11 24 73
## 3 18 68 23

# Sort by C1 (ascending)
newdata1 = data[order(data$C1),]
newdata1

##   C1 C2 C3
## 2 11 24 73
## 1 15 20 23
## 3 18 68 23

# Sort by C3 (ascending) and C2 (descending)
newdata2 = data[order(data$C3, -data$C2),]
newdata2

##   C1 C2 C3
## 3 18 68 23
## 1 15 20 23
## 2 11 24 73

Merging

To merge two matrices horizontally or vertically, you can use following merge functions.

vec <- c(1, 26, 24, 68)
rname <- c("R1", "R2")
cname <- c("C1", "C2") 
matrix <- matrix(vec, nrow = 2, ncol = 2, byrow = TRUE, dimnames = list(rname, cname))

matrix1 <- cbind(matrix, matrix)    # combine objects as columns
matrix1

##    C1 C2 C1 C2
## R1  1 26  1 26
## R2 24 68 24 68

matrix2 <- rbind(matrix, matrix)    # combine objects as rows
matrix2

##    C1 C2
## R1  1 26
## R2 24 68
## R1  1 26
## R2 24 68

To merge two dataframes (datasets) horizontally, use the merge function. In most cases, you join two dataframes by one or more common key variables (i.e., an inner join).

d1 <- c(1, 2, 3, 4)
d2 <- c("red", "white", "red", NA)
d3 <- c(3, 4, 5)
d4 <- c("red", "red", "white")

df1 <- data.frame(d1, d2)
names(df1) <- c("ID","Color")

df2 <- data.frame(d3, d4)
names(df2) <- c("ID","Color")

m1 <- merge(df1, df2, by = "ID")
m2 <- merge(df1, df2, by = c("ID", "Color"))

df1

##   ID Color
## 1  1   red
## 2  2 white
## 3  3   red
## 4  4  <NA>

df2

##   ID Color
## 1  3   red
## 2  4   red
## 3  5 white

m1

##   ID Color.x Color.y
## 1  3     red     red
## 2  4    <NA>     red

m2

##   ID Color
## 1  3   red

Other Set Algebra

vec1 <- seq(1, 10, 2)
vec2 <- seq(1, 5, 1)
vec1

## [1] 1 3 5 7 9

vec2

## [1] 1 2 3 4 5

union(vec1, vec2)   # union of vec1 and vec2

## [1] 1 3 5 7 9 2 4

intersect(vec1, vec2)   # Intersect of vec1 and vec2

## [1] 1 3 5

setdiff(vec1, vec2) # vec1 - vec2

## [1] 7 9

setdiff(vec2, vec1) # vec2 - vec1

## [1] 2 4

Data Type Check & Conversion

Use is.foo() to test for data type foo. It returns TRUE or FALSE. is.numeric(), is.character(), is.vector(), is.matrix(), is.data.frame() Use as.foo() to explicitly convert the datatype. as.numeric(), as.character(), as.vector(), as.matrix(), as.data.frame)

# Convert date info in format 'mm/dd/yyyy'
strDates <- c("01/05/1965", "08/16/1975")
strDates

## [1] "01/05/1965" "08/16/1975"

dates <- as.Date(strDates, "%m/%d/%Y")
dates

## [1] "1965-01-05" "1975-08-16"

# Convert dates to character data
strDates <- as.character(dates)
strDates

## [1] "1965-01-05" "1975-08-16"

Workspace Management

The workspace is your current R working environment and includes any user-defined objects (vectors, matrices, data frames, lists, functions).

a <- 1
b <- 2

ls()    # Return the names of R objects

##  [1] "a"        "b"        "C1"       "C2"       "C3"       "cells"   
##  [7] "cname"    "cnames"   "d"        "d1"       "d2"       "d3"      
## [13] "d4"       "data"     "dates"    "df1"      "df2"      "dt"      
## [19] "e"        "f"        "m1"       "m2"       "matrix"   "matrix1" 
## [25] "matrix2"  "mydata"   "mymatrix" "newdata1" "newdata2" "rname"   
## [31] "rnames"   "strDates" "v1"       "v2"       "v3"       "v4"      
## [37] "var1"     "var2"     "var3"     "var4"     "vec"      "vec1"    
## [43] "vec2"     "w"        "x"        "xx"

exists("a") # Look for an R object named "a"

## [1] TRUE

rm(list = ls()) # Remove the object list


getwd() # Return the path of current working directory

## [1] "Y:/Lecture/1-BIS335BioStat/2-Rtintro/Tutorial"

setwd("Y:/Lecture/1-BIS335BioStat/2-Rtintro/Tutorial")  # Set the working directory

source("http://bioconductor.org/biocLite.R")    # Read the input source file
install.packages("e1071", repos = "http://cran.us.r-project.org")   # Download and install packages
library("e1071")    # Load add-on packages

Basic Operations

Arithmetic Operators

1 + 2   # addition

## [1] 3

4 - 3   # subtraction

## [1] 1

2 * 5   # multiplication

## [1] 10

7 / 3   # division

## [1] 2.333333

2 ^ 3   # exponentiation

## [1] 8

3 ** 2  # exponentiation

## [1] 9

4 %% 3  # modulus (x mod y)

## [1] 1

Logical Operators

n1 <- 1
n2 <- 2

n1 < n2   # less than

## [1] TRUE

n1 <= n2    # less than or equal to

## [1] TRUE

n1 > n2   # greater than

## [1] FALSE

n1 >= n2    # greater than or equal to

## [1] FALSE

n1 == n2    # exactly equal to

## [1] FALSE

n1 != n2    # not equal to

## [1] TRUE

b1 <- TRUE
b2 <- FALSE

!b1        # NOT b1

## [1] FALSE

b1 | b2    # b1 OR b2

## [1] TRUE

b1 & b2    # b1 AND b2

## [1] FALSE

isTRUE(b2) # test if b2 is TRUE

## [1] FALSE

is.na(b2)    # test if b2 is empty

## [1] FALSE

setequal(c(2, 4, 6), seq(2, 7, 2)) # Test if two sets are equal

## [1] TRUE

Numeric Functions

# round-off
ceiling(3.475)            # smallest integers not less than the input

## [1] 4

floor(3.475)                # largest integers not less than the input

## [1] 3

trunc(-3.475)             # integers formed by truncating the values in input toward 0

## [1] -3

round(3.475, digits = 1)    # rounded value to the specified digits of decimal place

## [1] 3.5

# functions
abs(-5)             # absolute value

## [1] 5

sqrt(4)             # square root

## [1] 2

log(3)              # natural logarithm

## [1] 1.098612

log(1024, base = 2) # logarithm with specified base

## [1] 10

log10(100)          # common logarithm

## [1] 2

exp(1)              # exponential

## [1] 2.718282

# Trigonometric functions 
sin(0)  # sine

## [1] 0

cos(0)  # cosine

## [1] 1

tan(0)  # tangent

## [1] 0

asin(0) # arc-sine

## [1] 0

acos(1) # arc-cosine

## [1] 0

atan(0) # arc-tangent

## [1] 0

Basic Statistic Functions

vec = c(1, 3, 4, 6, 8, 12)

max(vec)    # maximum value of x

## [1] 12

min(vec)    # largest integers not less than the input

## [1] 1

mean(vec)   # rounded value to the specified digits of decimal place

## [1] 5.666667

sd(vec)   # rounded value to the specified digits of decimal place

## [1] 3.932768

Character Functions (String Manipulation)

str = "hello, world"
dictionary = c("hi", "hello", "hey")

substr(str, start = 1, stop = 5)    # Extract substring

## [1] "hello"

grep("HELLO", dictionary, ignore.case = TRUE)   # Search for pattern.

## [1] 2

sub("hello", "hi", str) # Replace with replacement text

## [1] "hi, world"

strsplit(str, ", ") # Split elements as character vector

## [[1]]
## [1] "hello" "world"

paste(str, "!", sep = "")   # Concatenate strings using seperator

## [1] "hello, world!"

str <- toupper(str) # Change to uppercase
str

## [1] "HELLO, WORLD"

tolower(str)    # Change to lowercase

## [1] "hello, world"

Control Structures

if & else
- if (cond) expr
- if (cond) cons.expr else alt.expr2
ifelse
- ifelse(test, yes_, no)
for
- for (var in seq) expr
while
- while (cond) expr

x <- -1
if (x < 0) {    # if & else
    print(-x)
} else{
    print(x)
}

## [1] 1

x <- c(6:-4)
ifelse(x >= 0, x, NA)   # ifelse

##  [1]  6  5  4  3  2  1  0 NA NA NA NA

for (i in 1:3) print(1:i)   # for

## [1] 1
## [1] 1 2
## [1] 1 2 3

x <- 1
while (x < 3) { # while
    x <- x + 1;
    print(x);
}

## [1] 2
## [1] 3

Functions

mytrans <- function(x) { 
    if (!is.matrix(x)) {
        warning("not a matrix: returning NA")
        return(NA_real_)
    }
    y <- matrix(1, nrow = ncol(x), ncol = nrow(x)) 
    for (i in 1:nrow(x)) {
        for (j in 1:ncol(x)) {
            y[j,i] <- x[i,j] 
        }
    }
    return(y)
}

mat <- matrix(1:20, nrow = 5, ncol = 4)
matr <- mytrans(mat)
mat

##      [,1] [,2] [,3] [,4]
## [1,]    1    6   11   16
## [2,]    2    7   12   17
## [3,]    3    8   13   18
## [4,]    4    9   14   19
## [5,]    5   10   15   20

matr

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    2    3    4    5
## [2,]    6    7    8    9   10
## [3,]   11   12   13   14   15
## [4,]   16   17   18   19   20

# Apply Function over a list or vector
sapply(seq(1, 5), sqrt) # Return a result of applying function to elements of vector

## [1] 1.000000 1.414214 1.732051 2.000000 2.236068

Input & Output

d <- c(1.55, 2.28, 3.54, 4.79)
e <- c("red", "white", "red", NA)
f <- c(TRUE, TRUE, TRUE, FALSE)
data <- data.frame(d, e, f)
names(data) <- c("ID","Color","Passed")
data

##     ID Color Passed
## 1 1.55   red   TRUE
## 2 2.28 white   TRUE
## 3 3.54   red   TRUE
## 4 4.79  <NA>  FALSE

format(data, justify = "left", digits = 1)  # Format an R object for pretty printing

##   ID Color Passed
## 1  2 red     TRUE
## 2  2 white   TRUE
## 3  4 red     TRUE
## 4  5 NA     FALSE

write.table(data, file = "Y:/Lecture/1-BIS335BioStat/2-Rtintro/Tutorial/test.txt")  # Prints a variable to a file
data.read = read.table(file = "Y:/Lecture/1-BIS335BioStat/2-Rtintro/Tutorial/test.txt", header = TRUE)    # Read a file

Bioconductor

Official website: http://www.bioconductor.org/
R programming language-based project
Open source, open development software project
An object-oriented framework for addressing the diversity and complexity of computational biology and bioinformatics problems.
Support for
- Rich statistical simulation and modeling activities.
- Cutting edge data and model visualization capabilities.
- Powerful statistical and graphical methods for the analysis of genomic data
- Associating microarray and other genomic data with biological metadata from public web databases
- 1,000 packages and active user community

Bioconductor

Main Features of Bioconductor

Statistical and graphical methods
- for oligonucleotide arrays, sequence analysis, flow cytometry and other high-throughput genomic data.
- linear and non-linear modeling, cluster analysis, prediction, resampling, survival analysis, and time-series analysis.
Annotation
- associating microarray and other genomic data in real time with biological metadata from web databases such as GenBank, Entrez genes and PubMed
- assembling and processing genomic annotation data, from databases such as GenBank, the Gene Ontology Consortium, Entrez genes, UniGene, the UCSC Human Genome Project
- mapping between different probe identifiers (e.g. Affy IDs, Entrez genes)

Installation of Bioconductor Packages

The current release is version 3.7 (works with R version \(\geq\) 3.5.0).
You can use the biocLite.R script to install Bioconductor packages. To install core packages, type the following in an R command window:

source("https://bioconductor.org/biocLite.R")
biocLite(c("GenomicFeatures", "AnnotationDbi"))

Bioconductor Packages

http://bioconductor.org/packages/

Bioconductor Packages

High Throughput Biomedical Data

A hybridization test of total mRNAs in a cell under a specified biological condition with pre-deposited array of cDNA (100bp to 2000bp) or oligonucleotides (25bp to80 bp).
This allows the simultaneous measurement of expression levels of thousands of genes in different experimental conditions
Microarray-based gene expression experiments is a core of functional genomics, the large-scale analysis of the genome-wide function of genes

High Throughput Biomedical Data

Representative Packages for Microarray Analysis

Pre-processing
- affy, oligo, lumi, beadarray, limma, genefilter, etc.
Differential expression
- limma, etc.
Gene set enrichment
- topGO, GOstats, GSEABase, etc.
Annotation
- AnnotationDbi, chip, org, BSgenome, etc.

Microarray Analysis Workflow with Bioconductor

Prior to analysis
- Biological experimental design (treatments, replication,…)
- Microarray preparation (especially two-channel)
Analysis
- Pre-processing
- Quality assessment
- Normalization
- Differential expression
- Clustering
- Classification
- Annotation
- Gene set enrichment

## Load packages
# source("https://bioconductor.orb/biocLite.R")
# biocLiet(c("affy", "limma"))
library(affy)  # Affymetrix pre-processing
library(limma) # two-color pre-procesing; differential expression

## import "phenotype" data, describing the experimental design
phenoData <- read.AnnotatedDataFrame(system.file("extdata", "pdta.txt", package = "arrays"))

## RMA normalization
celfiles <- system.file("extdata", package = "arrays")
eset <- justRMA(phenoData = phenoData, celfile.path = celfiles)

## 

## differential expression
combn <- factor(paste(pData(phenoData)[,1], pData(phenoData)[,2], sep = "_"))
design <- model.matrix(~combn) # describe model to be fit

fit <- lmfit(eset, design) # fit each probeset to model
efit <- eBayes(fit)        # empirical Bayes adjustment
topTable(efit, coef = 2)   # table of differentially expressed probesets

How to Download a Microarray Data

The NCBI Gene Expression Omnibus (GEO) serves as a public repository for a wide range of high-throughput experimental data.
We can get microarray data from GEO by “getGEO” function in GEOquery package

# install the core bioconductor packages
source("http://bioconductor.org/biocLite.R")

## Bioconductor version 3.7 (BiocInstaller 1.30.0), ?biocLite for help

biocLite()

## BioC_mirror: https://bioconductor.org

## Using Bioconductor 3.7 (BiocInstaller 1.30.0), R 3.5.1 (2018-07-02).

## installation path not writeable, unable to update packages: foreign,
##   survival

# install additional bioconductor libraries
biocLite("GEOquery")

## BioC_mirror: https://bioconductor.org

## Using Bioconductor 3.7 (BiocInstaller 1.30.0), R 3.5.1 (2018-07-02).

## Installing package(s) 'GEOquery'

## package 'GEOquery' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\sypark\AppData\Local\Temp\RtmpMt8yxK\downloaded_packages

## installation path not writeable, unable to update packages: foreign,
##   survival

library(GEOquery)

## Loading required package: Biobase

## Loading required package: BiocGenerics

## Loading required package: parallel

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind,
##     colMeans, colnames, colSums, dirname, do.call, duplicated,
##     eval, evalq, Filter, Find, get, grep, grepl, intersect,
##     is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
##     paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
##     Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
##     table, tapply, union, unique, unsplit, which, which.max,
##     which.min

## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

## Setting options('download.file.method.GEOquery'='auto')

## Setting options('GEOquery.inmemory.gpl'=FALSE)

# Now, we are free to access any GEO accession
gds <- getGEO("GDS1962")

## File stored at:

## C:\Users\sypark\AppData\Local\Temp\RtmpMt8yxK/GDS1962.soft.gz

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   ID_REF = col_character(),
##   IDENTIFIER = col_character()
## )

## See spec(...) for full column specifications.

# Basic Information of GEO Microarray Data
# You can check basic information of the microarray data by typing "gds"
# gds
# since it generates long output, I skipped this 

# GEO Data Structure
# Look at the microarray data
head(Table(gds)[1:10, 1:10])

##      ID_REF IDENTIFIER GSM97800 GSM97803 GSM97804 GSM97805 GSM97807
## 1 1007_s_at    MIR4640   4701.5   4735.0   2863.9   5350.2   4789.4
## 2   1053_at       RFC2    282.7    347.9    355.0    319.9    294.2
## 3    117_at      HSPA6    769.6    287.9    199.0    182.8    204.3
## 4    121_at       PAX8   1616.3   1527.2   1793.8   1880.0   1012.0
## 5 1255_g_at     GUCA1A    232.7    204.8    119.3    180.2    156.7
## 6   1294_at    MIR5193    357.7    336.5    328.7    304.7    190.1
##   GSM97809 GSM97811 GSM97812
## 1   5837.8   4446.7   4264.1
## 2    257.5    321.0    317.9
## 3    184.9    107.5    196.9
## 4   1024.4   1133.8   1295.0
## 5    155.1    236.2    235.9
## 6    253.3    342.5    284.1

# Sample Information
# Look at the sample information
head(Columns(gds)[,1:3])

##     sample disease.state         tissue
## 1 GSM97800     non-tumor not applicable
## 2 GSM97803     non-tumor not applicable
## 3 GSM97804     non-tumor not applicable
## 4 GSM97805     non-tumor not applicable
## 5 GSM97807     non-tumor not applicable
## 6 GSM97809     non-tumor not applicable

Or more detailed information in GEO homepage http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4290

R Function List & Data Used In This Course

CourseRfunctions.pdf
CourseDataset.pdf.

R-tutorial

Bis335

Contents

Objectives

First part: R Basics

Second part: Bioconductor & High Throughput Biomedical Data

Tutorial

Setup working environment

R

Rstudio

R Programming Language Resources

Help

Data Types

Assignment & Read Scalar Variables

Vector

List

Matrix

Data Frame

Variable Information

Data Manipulation

Sorting

Merging

Other Set Algebra

Data Type Check & Conversion

Workspace Management

Basic Operations

Arithmetic Operators

Logical Operators

Numeric Functions

Basic Statistic Functions

Character Functions (String Manipulation)

Control Structures

Functions

Input & Output

Bioconductor

Main Features of Bioconductor

Installation of Bioconductor Packages

Bioconductor Packages

High Throughput Biomedical Data

Representative Packages for Microarray Analysis

Microarray Analysis Workflow with Bioconductor

How to Download a Microarray Data

R Function List & Data Used In This Course