Free R (programming language) software

Dplyr

One of the core packages of the tidyverse in the R programming language, dplyr is primarily a set of functions designed to enable dataframe manipulation in an intuitive, user-friendly way. Data analysts typically use dplyr in order to transform existing datasets into a format better suited for some particular type of analysis, or data visualization. For instance, someone seeking to analyze an enormous dataset may wish to only view a smaller subset of the data. Alternatively, a user may wish to rearrange the data in order to see the rows ranked by some numerical value, or even based on a combination of values from the original dataset. Authored primarily by Hadley Wickham, dplyr was launched in 2014. On the dplyr web page, the package is described as "a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges." (Wikipedia).

Video thumbnail

dplyr: Grouping

The group_by function in dplyr lets use other functions like summarize and mutate on subgroups within a data frame rather than operating on entire columns all at once. This is useful for generating statistics and creating new variables according to the levels of categorical variables. Lin

From playlist dplyr in R

Video thumbnail

dplyr: Getting Started

The is the first video in a tutorial series covering the basics of the dplyr library in R for data manipulation and cleaning. I learned many of the practical coding skills used to make my videos taking courses on DataCamp: ► https://www.datacamp.com?tap_a=5644-dce66f&tap_s=777784-8ccc64&u

From playlist dplyr in R

Video thumbnail

Feature Engineering | Introduction to dplyr Part 4

In the final tutorial of the dplyr series, we will cover ways to do feature engineering both with dplyr (“mutate” and “transmute”) and base R (“ifelse”). You’ll learn how to impute missing values as well as create new values based on existing columns. In addition, we’ll go over four differ

From playlist Introduction to dplyr

Video thumbnail

Hands-on dplyr tutorial for faster data manipulation in R

dplyr is a new R package for data manipulation. Using a series of examples on a dataset you can download, this tutorial covers the five basic dplyr "verbs" as well as a dozen other dplyr functions. Watch the follow-up tutorial: http://youtu.be/2mh1PqfsXVI View the R Markdown document: htt

From playlist Data manipulation in R with dplyr

Video thumbnail

dplyr: summarize

The summarize (aka summarise) function in dplyr lets you create summary statistics from the columns of a data frame. When run on an ungrouped data frame, a summary of a column should be a single value like the mean, median, mode, etc. We will learn how to generate summary statistics on sub

From playlist dplyr in R

Video thumbnail

dplyr: mutate

The mutate function in dplyr lets you add new variables to an existing data frame. Documentation for for mutate: https://dplyr.tidyverse.org/reference/mutate.html Link to the Kaggle Notebook code used for this video series: https://www.kaggle.com/hamelg/dplyr-in-r View the whole dplyr i

From playlist dplyr in R

Video thumbnail

dlpyr: rename and arrange

The rename function in dplyr lets you change the names of the columns of a data frame using a simple name-based syntax. Once your variables have the desired names, you can sort them by a column of interest using the arrange function. Link to the Kaggle Notebook code used for this video se

From playlist dplyr in R

Video thumbnail

dplyr: gather and spread

The gather function in dplyr lets you turn wide data into a long format, while the spread function lets to turn long data into a wide format. Data in R generally likes to be in a long format, so knowing how to gather messy data into a tall format if necessary is an important part of the da

From playlist dplyr in R

Video thumbnail

dplyr: Joins

Joins let you combine two data tables together based on a shared column that uniquely identifies the records, also known as a key column. When your data is spread out across multiple tables, you may need to perform one or more joins to get it all into one big table before doing other data

From playlist dplyr in R

Video thumbnail

Setup and Data Preparation | Introduction to dplyr Part 1

dplyr is a a great tool to perform data manipulation. It makes your data analysis process a lot more efficient. Even better, it's fairly simple to learn and start applying immediately to your work! Oftentimes, with just a few elegant lines of code, your data becomes that much easier to dis

From playlist Introduction to dplyr

Video thumbnail

Reshape, Subset, and Summarize Data | Introduction to dplyr Part 2

We cover some basic functions of dplyr including the mighty group_by and summarize combo that makes dividing up datasets a breeze, as well as arrange, select, and filter that help get the data in a cleaner and more organized format. Group-by aggregation is one of the most powerful, yet sim

From playlist Introduction to dplyr

Video thumbnail

dplyr: filter

The filter function in dplyr lets you subset the rows of a data frame to get records that conform to logical criteria you specify. Filtering is a basic data cleaning and manipulation task that lets you cut your data down to view records of interest. Link to the Kaggle Notebook code used f

From playlist dplyr in R

Video thumbnail

dplyr: separate and unite

The separate function in dplyr lets you split a character column into two or more new columns. The unite function performs the opposite operation, letting you collapse two or more existing columns into one character column. Link to the Kaggle Notebook code used for this video series: http

From playlist dplyr in R

Video thumbnail

dplyr: select

The select function in dplyr lets you subset the columns of a data frame to get variables of interest. Selecting is a basic data cleaning and manipulation task that combined with filter allows you to extract specific data from interest from a data set. Link to the Kaggle Notebook code use

From playlist dplyr in R

Video thumbnail

Data Manipulation with dplyr

dplyr is a a great tool to perform data manipulation. It makes your data analysis process a lot more efficient. Even better, it’s fairly simple to learn and start applying immediately to your work! Oftentimes, with just a few elegant lines of code, your data becomes that much easier to dis

From playlist Short Crash Courses for Data Science & Data Engineering

Related pages

Tidyverse | R (programming language)