Collections

Collections in R
Author

Mugabi Trevor .L

Published

2024-09-04

Keywords

Lists, Vectors, DataFrames, Arrays, Matrices

Introduction to Data Structures

A data structure is a fundamental building block in computer science. It serves as a way to organize, store, and manipulate data within a program.
Think of data structures as the blueprints for how data is arranged and accessed efficiently. They not only store actual data values but also maintain information about relationships among those values.
Data structures determine how data is laid out in memory or stored.They define the functions or operations that can be applied to the data (e.g., adding, searching, deleting).

Types of Data structures in R

The Data structures in R are as follows below:

1. Vectors

A vector in R is a one-dimensional data structure that holds elements of the same type. Think of it as an ordered collection of values grouped together into a single container.
Each value in a vector is called a component, and vectors can store various types of data, including numeric values, logical values, or character strings.
Importantly, vectors are not recursive, meaning they can’t contain other vectors as elements. They’re simple and straightforward—one type of data, one dimension.

To create a vector in R, we often use the c() function (which stands for “combine” or “concatenate”). Here are some examples:

# Creating vector of string type have employee names
employees <- c("Sabby", "Cathy", "Lucy")
print(employees) # Output Entire Employees vector
[1] "Sabby" "Cathy" "Lucy" 
# Creating a vector of numeric type.
numbers <- c(1, 1, 2, 3, 4, 7, 9, 3)
print(numbers) # Output Entire numbers vector
[1] 1 1 2 3 4 7 9 3

You can Access Vector Elements by indexing: Each element in a vector is associated with an index (starting from 1). You can access elements using these indices.(Yes R is 1 based indexing language).

Forexample lets assume we have a vector of fruits and we would like to access the element “Apple” in it. see code below

# Defining the fruit vec.
fruit_vec <- c("Oranges", "Pears", "Lemons", "Apples", "Strawberries")
# if you count "Apples" is in the fouth position therefore lets index the fourth position.
fruit_vec[4] # Output is "Apples" because its in the fourth position.
'Apples'
# Say we are now interested in "Lemons" instead
# lets index them out of the vector

fruit_vec[3] # Output Lemons
'Lemons'
# Say we are now interested in getting two items or more forexample from "pears" to "strawberries".
# we can index the Vector the same way only change is the position we say use a range.
# Why we if you look pears is in the second position and strawberries the fifth, therefore we 
# tell R we want the range 2 to 5 and we say that in R by using a colon ":". ---> 2:5
fruit_vec[2:5]
  1. 'Pears'
  2. 'Lemons'
  3. 'Apples'
  4. 'Strawberries'

Lists

A list in R is an ordered collection of objects, known as its components. Unlike vectors or matrices, lists can contain elements of different modes or types. In other words, a list can hold a mix of numeric vectors, logical values, matrices, character arrays, functions, and more.
Think of a list as a flexible container where each component can be anything—an atomic value, a vector, or even another list.

To create a list in R, you use the list() function. You can include any number of components separated by commas within the function call. Here are some examples:

# Creating list with different data types
items <- list("Dog", 24, 32, 56)
print(items) # Output Entire list
[[1]]
[1] "Dog"

[[2]]
[1] 24

[[3]]
[1] 32

[[4]]
[1] 56
# Creating a list with similar data types (numeric)
list_1 <- list(24, 29, 32, 34)
print(list_1) # Output Entire list
[[1]]
[1] 24

[[2]]
[1] 29

[[3]]
[1] 32

[[4]]
[1] 34

You can access components of a list in two ways: 1. By Names: Naming list components makes it easier to access them. Use the dollar sign ($) followed by the component name.

# By Name.
my_named_list <- list(name = "Chris",
                    age = 25, city = "NewYork")
print(my_named_list$name)  # Output: "Chris"
[1] "Chris"

NB: Naming components not only helps with readability but also simplifies access. You can use descriptive names for each component.

  1. By Indices: Just like seen above in vectors you can access list components the same way
# Assuming we have some employees
emp_names <- c("Mike", "Bob", "John", "Annet")
print(emp_names[2])  # Output: "Bob"
[1] "Bob"

Use Cases 1. Storing complex data structures (e.g., nested lists for hierarchical data). 1. Representing results from statistical models. Holding mixed data types when a data frame isn’t suitable.

NB: Lists are your Swiss Army knife in R, they’re flexible, adaptable, and essential for handling diverse data.

Matrices

A matrix in R is a rectangular arrangement of data with rows and columns. It’s similar to a vector but has an additional dimension attribute. Each element in a matrix is associated with both a row index and a column index.
Unlike lists, matrices are homogeneous, meaning they can only contain elements of the same data type (e.g., all numeric, all character strings, etc.).
Think of a matrix as a grid where each cell holds a value. Rows run horizontally, and columns run vertically.

To create a matrix, you use the matrix() function.

# Creating a 3x3 matrix with specified elements
my_matrix <- matrix(
  c(1, 2, 3, 4, 5, 6, 7, 8, 9),
  nrow = 3,  # Number of rows
  ncol = 3,  # Number of columns
  byrow = TRUE  # Arrange elements by rows (optional)
)
print(my_matrix)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

Accessing Matrix Elements: You can access individual elements using row and column indices

# Accessing the element in the second row, third column
my_value <- my_matrix[2, 3]  # Value: 6
print(my_value)
[1] 6

You can query information about a matrix using the Functions below: 1. Number of rows: nrow(my_matrix) 1. Number of columns: ncol(my_matrix) 1. Dimensions: dim(my_matrix)

# Finding Number of Rows of my_matrix.
nrow(my_matrix)
3
# Finding Number of Columns of my_matrix.
ncol(my_matrix)
3
# Finding the Dimension/Order of my_matrix.
dim(my_matrix)
  1. 3
  2. 3

Use case: Matrices are handy for representing data tables, mathematical operations (e.g., linear algebra), and image processing.

Data Frames

A data frame in R is like a spreadsheet or a table—a two-dimensional structure where rows represent observations (cases) and columns represent variables (attributes).
Technically, data frames are lists of vectors of equal length. Each column can contain different types of variables (numeric, character, logical, etc.).
Think of a data frame as a versatile container for structured data, commonly used for data analysis, statistical modeling, and data visualization.

You can create a data frame using the data.frame() function. Here’s an example:

# Creating a simple data frame
# In this example, lets created a data frame with three columns: “Name,” “Age,” and “Score.”

my_data_frame <- data.frame(
  Name = c("Alice", "Bob", "Charlie"), # notice this is a vector.
  Age = c(25, 30, 22), # notice this is a vector.
  Score = c(85, 92, 78) # notice this is a vector.
)
my_data_frame # You may need to print this out if your using Rstudio i.e print(my_data_frame)
A data.frame: 3 × 3
Name Age Score
<chr> <dbl> <dbl>
Alice 25 85
Bob 30 92
Charlie 22 78

You can access columns (variables) from a data frame using different methods: 1. By column name (using $ notation) 2. By column index (using single brackets [ ])

# Using column name (using $ notation)
my_data_frame$Name
  1. 'Alice'
  2. 'Bob'
  3. 'Charlie'
# Using column index (using single brackets [ ])
my_data_frame[,1]
  1. 'Alice'
  2. 'Bob'
  3. 'Charlie'
# use can also access them using brackets and there column name
my_data_frame["Name"] # Coming from python this look very familiar.
A data.frame: 3 × 1
Name
<chr>
Alice
Bob
Charlie

Adding Rows and Columns 1. To add new rows, use the rbind() function.

# Adding a new row to my_data_frame
new_row <- c("David", 28, 91) # New row to add
updated_data_frame <- rbind(my_data_frame, new_row)
updated_data_frame # Remeber you may need to wrap that in print.
A data.frame: 4 × 3
Name Age Score
<chr> <chr> <chr>
Alice 25 85
Bob 30 92
Charlie 22 78
David 28 91
  1. To add new columns, use the cbind() function.
# Add a new column
new_column <- c(88, 76, 95) # New column to add
updated_data_frame <- cbind(my_data_frame, Exam2 = new_column)
updated_data_frame
A data.frame: 3 × 4
Name Age Score Exam2
<chr> <dbl> <dbl> <dbl>
Alice 25 85 88
Bob 30 92 76
Charlie 22 78 95

Removing Rows and Columns 1. To remove rows or columns, you can use indexing.

# Remove the second row
reduced_data_frame <- my_data_frame[-2, ]

# Remove the "Score" column
reduced_data_frame <- my_data_frame[, -3]

reduced_data_frame
A data.frame: 3 × 2
Name Age
<chr> <dbl>
Alice 25
Bob 30
Charlie 22

NB: Data Frames have a lot more methods than we can cover in this simple tutorial. Find more about them in our R programming Course.

Bonus tip: All the matrix methods dim(), nrow(), or ncol() work on data frames.

Exercise

  1. Lists: Create a list containing the following components
    • A vector of your favorite colors (e.g., “red,” “green,” “blue”).
    • Your age (as a numeric value).
    • A logical value indicating whether you like coffee (e.g., TRUE or FALSE).
  2. Vectors: Create a numeric vector representing the temperatures (in Celsius) for a week
    • Monday: 25°C
    • Tuesday: 28°C
    • Wednesday: 24°C
  3. Matrices: Create a 3x3 matrix representing a simple multiplication table
    • Each cell (i, j) contains the product of i and j.
  4. Data Frames: Create a data frame with information about fictional employees
    • Columns: “Name,” “Age,” “Salary.”
    • Add at least three rows of data