Data Structures
R provides several data structures beyond vectors. Understanding these is crucial for effective data manipulation.
Matrices
A matrix is a two-dimensional array of elements of the same type.
Creating Matrices
# From a vector
matrix(1:12, nrow = 3, ncol = 4)
matrix(1:12, nrow = 3, ncol = 4, byrow = TRUE)
# Direct creation
mat <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
# Using cbind() and rbind()
col1 <- c(1, 2, 3)
col2 <- c(4, 5, 6)
mat <- cbind(col1, col2) # Combine as columns
mat <- rbind(col1, col2) # Combine as rows
Accessing Matrix Elements
mat <- matrix(1:12, nrow = 3, ncol = 4)
mat[1, 2] # Element at row 1, column 2
mat[1, ] # First row
mat[, 2] # Second column
mat[1:2, 2:3] # Submatrix
Matrix Operations
mat1 <- matrix(1:4, nrow = 2)
mat2 <- matrix(5:8, nrow = 2)
mat1 + mat2 # Element-wise addition
mat1 * mat2 # Element-wise multiplication
mat1 %*% mat2 # Matrix multiplication
t(mat1) # Transpose
Arrays
Arrays are multi-dimensional generalizations of matrices.
# 3D array
arr <- array(1:24, dim = c(2, 3, 4))
arr[1, 2, 3] # Access element
dim(arr) # Dimensions
Lists
Lists can contain elements of different types, including other lists.
Creating Lists
# Basic list
my_list <- list(1, "hello", TRUE, c(1, 2, 3))
# Named list
person <- list(
name = "Alice",
age = 30,
scores = c(85, 90, 88)
)
Accessing List Elements
person <- list(name = "Alice", age = 30, scores = c(85, 90, 88))
# By index
person[[1]] # "Alice"
person[["name"]] # "Alice"
person$name # "Alice"
# Multiple elements (returns a list)
person[c("name", "age")]
Modifying Lists
person$city <- "New York"
person[["email"]] <- "alice@example.com"
person[[4]] <- "New element"
Data Frames
Data frames are the most important data structure for data analysis - they're like tables with rows and columns.
Creating Data Frames
# From vectors
df <- data.frame(
name = c("Alice", "Bob", "Charlie"),
age = c(25, 30, 35),
city = c("NYC", "LA", "Chicago")
)
# From existing data
df2 <- data.frame(
x = 1:5,
y = letters[1:5],
z = c(TRUE, FALSE, TRUE, FALSE, TRUE)
)
Accessing Data Frame Elements
df <- data.frame(
name = c("Alice", "Bob", "Charlie"),
age = c(25, 30, 35),
city = c("NYC", "LA", "Chicago")
)
# By column name
df$name
df[["age"]]
df[, "city"]
# By position
df[1, ] # First row
df[, 2] # Second column
df[1, 2] # Element at row 1, col 2
# Subsetting
df[df$age > 27, ]
df[, c("name", "age")]
Data Frame Functions
df <- data.frame(
name = c("Alice", "Bob", "Charlie"),
age = c(25, 30, 35),
city = c("NYC", "LA", "Chicago")
)
nrow(df) # Number of rows
ncol(df) # Number of columns
dim(df) # Dimensions
names(df) # Column names
str(df) # Structure
summary(df) # Summary statistics
head(df) # First few rows
tail(df) # Last few rows
Modifying Data Frames
# Add new column
df$salary <- c(50000, 60000, 70000)
# Add new row
df <- rbind(df, data.frame(
name = "David",
age = 28,
city = "Boston",
salary = 55000
))
# Rename columns
names(df)[1] <- "full_name"
colnames(df) <- c("full_name", "age", "city", "salary")
Factors
Factors are used to represent categorical data.
# Creating factors
gender <- factor(c("M", "F", "M", "F", "M"))
levels(gender) # "F" "M"
# Ordered factors
size <- factor(c("S", "M", "L", "M", "S"),
levels = c("S", "M", "L"),
ordered = TRUE)
# Converting to/from factors
as.factor(c(1, 2, 3))
as.numeric(factor(c("A", "B", "C")))
Checking Data Structures
# Type checking
is.vector(x)
is.matrix(x)
is.array(x)
is.list(x)
is.data.frame(x)
is.factor(x)
# Class information
class(x)
typeof(x)
str(x)
Next Steps
Learn about Control Flow to add logic and loops to your R programs.