Not all methods for loading data in R are made equal.
The data frame and it's newer variants the data table and tibble sit at the heart of most R analysis.
You can kind of think of data tables as data frames version 2 and tibbles as data frames version 3.
By the end of this video you'll learn about all of the key benefits of learning to use tibbles over it's earlier counter parts.
When R first came out with dataframes it was the most amazing structure for data analysis. A data frame is similiar to a database table except it works all in memory so performs significantly faster. Unlike Excel which needs to perform calculations within each cell of a column data frames perform calculations on the enitre column which makes it run hundreds of times faster.
Most books and courses still teach base R data frames since they are built directly into R without requiring any additional packages.
Data tables require you install the package data tables and tibbles require you to install the library Tidyverse.
The reason each of the libraries was developed was to improve and address short comings of the previous libraries.
In short tibbles are:
- Faster to load
- Do not distort column names or column types
- Automatically output a summary view to stop you from killing your computer with the accidental output of a large datset
- Provides a rich set of functions for working with your data
- Provides a powerful and intuitive syntax which is easier to understand especially for complex logic
If you're using tidyverse you're actually automatically using tibbles for everything anyway.
This is because most tidyverse functions use a tibble as an input and output.
Since tibbles are basically an enhanced, extended version of a data frame they are often converted back forth between these two formats without you ever knowing.
This does however come at a loss of efficiency as the data structure is converted back and forth between these two formats sometimes loosing some details along the way.
The instant tell tale signs that people have used data frames instead of tibbles is that when you load data using the built in R function read.csv it loads data into a data frame.
You can tell by running the class function on your object as you can see here.
Aside from that though:
- Notice how the naming of the columns are changed to fill in spaces with dots
- Character columns get converted to factors instead of characters
- all of the data is displayed when you go to output the results which is unhelpful and slow when you have a very large dataset
- and load time as we'll see soon is realtively slow
The difference when loading a tibble is very subtle. Instead of using read.csv which loads into a data frame. You use the function read_csv which loads data into a tibble.
Notice here how the class type is different.
- Column names are left as is which makes reporting and matching back to source data much easier
- data is displayed as a summary only showing the head which makes it much quicker and easier to preview your data. If you want to get the full dataset you can simply run this through the function View which works much better for diplaying all of the data then outputting it to the console.
- And as you can see using the function system.time this function runs significantly faster.
So far we've covered benefits 1 to 3.
Although subtle these differences alone will make a huge difference to your work.
The most exciting benefits though are points 4 & 5. These two benefits will effortlessly transform you into a data wrangling ninja. You'll be able to simply express complex logic that's just not possible with base R data frames. There's way more than can be covered here so we'll need to cover that in some other videos.
If you're keen to find out more checkout my R course at www.datastrategywithjonathan.com . Everything in my course focuses on tidyverse, tibbles and all of the other best and lastest standards not covered by older R training material.
I'll get you quickly into examples that you can instantly apply to get results without forcing you through all of theory first.
Sign up for your free sample course today and I look forward to seeing you on the inside.