# Creating vector of string type have employee names
employees <- c("Sabby", "Cathy", "Lucy")
print(employees) # Output Entire Employees vector[1] "Sabby" "Cathy" "Lucy"
Mugabi Trevor .L
2024-09-04
Lists, Vectors, DataFrames, Arrays, Matrices
A data structure is a fundamental building block in computer science. It serves as a way to organize, store, and manipulate data within a program.
Think of data structures as the blueprints for how data is arranged and accessed efficiently. They not only store actual data values but also maintain information about relationships among those values.
Data structures determine how data is laid out in memory or stored.They define the functions or operations that can be applied to the data (e.g., adding, searching, deleting).
The Data structures in R are as follows below:
A vector in R is a one-dimensional data structure that holds elements of the same type. Think of it as an ordered collection of values grouped together into a single container.
Each value in a vector is called a component, and vectors can store various types of data, including numeric values, logical values, or character strings.
Importantly, vectors are not recursive, meaning they can’t contain other vectors as elements. They’re simple and straightforward—one type of data, one dimension.
To create a vector in R, we often use the c() function (which stands for “combine” or “concatenate”). Here are some examples:
# Creating vector of string type have employee names
employees <- c("Sabby", "Cathy", "Lucy")
print(employees) # Output Entire Employees vector[1] "Sabby" "Cathy" "Lucy"
# Creating a vector of numeric type.
numbers <- c(1, 1, 2, 3, 4, 7, 9, 3)
print(numbers) # Output Entire numbers vector[1] 1 1 2 3 4 7 9 3
You can Access Vector Elements by indexing: Each element in a vector is associated with an index (starting from 1). You can access elements using these indices.(Yes R is 1 based indexing language).
Forexample lets assume we have a vector of fruits and we would like to access the element “Apple” in it. see code below
# Defining the fruit vec.
fruit_vec <- c("Oranges", "Pears", "Lemons", "Apples", "Strawberries")
# if you count "Apples" is in the fouth position therefore lets index the fourth position.
fruit_vec[4] # Output is "Apples" because its in the fourth position.# Say we are now interested in "Lemons" instead
# lets index them out of the vector
fruit_vec[3] # Output Lemons# Say we are now interested in getting two items or more forexample from "pears" to "strawberries".
# we can index the Vector the same way only change is the position we say use a range.
# Why we if you look pears is in the second position and strawberries the fifth, therefore we
# tell R we want the range 2 to 5 and we say that in R by using a colon ":". ---> 2:5
fruit_vec[2:5]A list in R is an ordered collection of objects, known as its components. Unlike vectors or matrices, lists can contain elements of different modes or types. In other words, a list can hold a mix of numeric vectors, logical values, matrices, character arrays, functions, and more.
Think of a list as a flexible container where each component can be anything—an atomic value, a vector, or even another list.
To create a list in R, you use the list() function. You can include any number of components separated by commas within the function call. Here are some examples:
# Creating list with different data types
items <- list("Dog", 24, 32, 56)
print(items) # Output Entire list[[1]]
[1] "Dog"
[[2]]
[1] 24
[[3]]
[1] 32
[[4]]
[1] 56
# Creating a list with similar data types (numeric)
list_1 <- list(24, 29, 32, 34)
print(list_1) # Output Entire list[[1]]
[1] 24
[[2]]
[1] 29
[[3]]
[1] 32
[[4]]
[1] 34
You can access components of a list in two ways: 1. By Names: Naming list components makes it easier to access them. Use the dollar sign ($) followed by the component name.
# By Name.
my_named_list <- list(name = "Chris",
age = 25, city = "NewYork")
print(my_named_list$name) # Output: "Chris"[1] "Chris"
NB: Naming components not only helps with readability but also simplifies access. You can use descriptive names for each component.
# Assuming we have some employees
emp_names <- c("Mike", "Bob", "John", "Annet")
print(emp_names[2]) # Output: "Bob"[1] "Bob"
Use Cases 1. Storing complex data structures (e.g., nested lists for hierarchical data). 1. Representing results from statistical models. Holding mixed data types when a data frame isn’t suitable.
NB: Lists are your Swiss Army knife in R, they’re flexible, adaptable, and essential for handling diverse data.
A matrix in R is a rectangular arrangement of data with rows and columns. It’s similar to a vector but has an additional dimension attribute. Each element in a matrix is associated with both a row index and a column index.
Unlike lists, matrices are homogeneous, meaning they can only contain elements of the same data type (e.g., all numeric, all character strings, etc.).
Think of a matrix as a grid where each cell holds a value. Rows run horizontally, and columns run vertically.
To create a matrix, you use the matrix() function.
# Creating a 3x3 matrix with specified elements
my_matrix <- matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3, # Number of rows
ncol = 3, # Number of columns
byrow = TRUE # Arrange elements by rows (optional)
)
print(my_matrix) [,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
Accessing Matrix Elements: You can access individual elements using row and column indices
# Accessing the element in the second row, third column
my_value <- my_matrix[2, 3] # Value: 6
print(my_value)[1] 6
You can query information about a matrix using the Functions below: 1. Number of rows: nrow(my_matrix) 1. Number of columns: ncol(my_matrix) 1. Dimensions: dim(my_matrix)
Use case: Matrices are handy for representing data tables, mathematical operations (e.g., linear algebra), and image processing.
A data frame in R is like a spreadsheet or a table—a two-dimensional structure where rows represent observations (cases) and columns represent variables (attributes).
Technically, data frames are lists of vectors of equal length. Each column can contain different types of variables (numeric, character, logical, etc.).
Think of a data frame as a versatile container for structured data, commonly used for data analysis, statistical modeling, and data visualization.
You can create a data frame using the data.frame() function. Here’s an example:
# Creating a simple data frame
# In this example, lets created a data frame with three columns: “Name,” “Age,” and “Score.”
my_data_frame <- data.frame(
Name = c("Alice", "Bob", "Charlie"), # notice this is a vector.
Age = c(25, 30, 22), # notice this is a vector.
Score = c(85, 92, 78) # notice this is a vector.
)
my_data_frame # You may need to print this out if your using Rstudio i.e print(my_data_frame)| Name | Age | Score |
|---|---|---|
| <chr> | <dbl> | <dbl> |
| Alice | 25 | 85 |
| Bob | 30 | 92 |
| Charlie | 22 | 78 |
You can access columns (variables) from a data frame using different methods: 1. By column name (using $ notation) 2. By column index (using single brackets [ ])
# use can also access them using brackets and there column name
my_data_frame["Name"] # Coming from python this look very familiar.| Name |
|---|
| <chr> |
| Alice |
| Bob |
| Charlie |
Adding Rows and Columns 1. To add new rows, use the rbind() function.
# Adding a new row to my_data_frame
new_row <- c("David", 28, 91) # New row to add
updated_data_frame <- rbind(my_data_frame, new_row)
updated_data_frame # Remeber you may need to wrap that in print.| Name | Age | Score |
|---|---|---|
| <chr> | <chr> | <chr> |
| Alice | 25 | 85 |
| Bob | 30 | 92 |
| Charlie | 22 | 78 |
| David | 28 | 91 |
# Add a new column
new_column <- c(88, 76, 95) # New column to add
updated_data_frame <- cbind(my_data_frame, Exam2 = new_column)
updated_data_frame| Name | Age | Score | Exam2 |
|---|---|---|---|
| <chr> | <dbl> | <dbl> | <dbl> |
| Alice | 25 | 85 | 88 |
| Bob | 30 | 92 | 76 |
| Charlie | 22 | 78 | 95 |
Removing Rows and Columns 1. To remove rows or columns, you can use indexing.
# Remove the second row
reduced_data_frame <- my_data_frame[-2, ]
# Remove the "Score" column
reduced_data_frame <- my_data_frame[, -3]
reduced_data_frame| Name | Age |
|---|---|
| <chr> | <dbl> |
| Alice | 25 |
| Bob | 30 |
| Charlie | 22 |
NB: Data Frames have a lot more methods than we can cover in this simple tutorial. Find more about them in our R programming Course.
Bonus tip: All the matrix methods dim(), nrow(), or ncol() work on data frames.