There are lots of great references out there to help orient you with R and R data structures. One of the best is the section on data structures in the Advanced R book by Hadley Wickham. Hadley provides numerous details on differences among the structures and a lot of the nitty gritty on those structures. This lesson is suppose to provide a “working knowledge” of data structures in R, but I strongly encourage you to dive into more.
Vectors
The basic form of data structure in R is the vector.
Vectors have the following structure:
sequence of numbers, characters, factors, strings
all must be the same in the sequence
vectors are 1 dimensional
We create objects using the = or the <- in R. In the example below, we create a sequence of numerical values as object a. The c below means combine.
There is a couple of ways we can look at structure of objects that we create in R.
Once we create an object, we can use functions on that object
There are many types of classes that we can use in R.
Numeric: numbers
Characters: character
Strings: sequence of characters, whitespace is important
Factors: characters with a order
can be defined with levels()
We can convert among classes
We can run functions that will provide a logic (TRUE or FALSE) on the type of data
Matrices
Matrices have the following structure:
rows and columns of numbers, characters, or strings
Liek vectors, all rows and columns must be the same class of data
vectors are 2 dimensional (rows by columns)
All the data in a matrix must be the same type
Like the vectors, we can run functions on matrices. There are several that will calculate column or row statistics, otherwise there are ways to run functions on the elements of a matrices.
Data.frames
Data.frames are the most common types of data that you will likely work with in R. They are a lot like matrices but they offer another level of flexibility. This is the type of data that is most often read in from other types of files (.csv, .tab, .txt)
Data.frames have the following structure:
rows by columns of numbers, characters, factors, strings
all must be the same in the sequence WITHIN a column
data.frames are 2 dimensional (row by column)
Indexing
There are a lot of new packages (e.g., tidyr, dplyr) that make working with certain rows and columns a lot easier. However, they can not do everything and it pays off to work with indices (manipulated the rows and columns) that will pay dividends when trying to work with your own data.
Lists
Lists are the most flexible data structure and pretty much can handle what ever you throw at them. While there may be some out there that have a better idea on what and how to use lists, I tend to use them as a flexible structure to store data.
As you could or might exist they are the loosest, or the wild west, when it comes to data types and storage. I tend to use lists when I am running simulations or loops.