Wrangling Data in the Tidyverse

This course is part of Tidyverse Skills for Data Science in R Specialization

Taught in English

Some content may not be translated

Instructors: Carrie Wright, PhD

1,962 already enrolled

Included with Coursera Plus

Learn more

Course

Gain insight into a topic and learn the fundamentals

4.5

(31 reviews)

14 hours (approximately)

Flexible schedule

Learn at your own pace

View course modules

What you'll learn

Apply Tidyverse functions to transform non-tidy data to tidy data
Conduct basic exploratory data analysis
Conduct analyses of text data

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

7 quizzes

Course

Gain insight into a topic and learn the fundamentals

4.5

(31 reviews)

14 hours (approximately)

Flexible schedule

Learn at your own pace

View course modules

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

Build your subject-matter expertise

This course is part of the Tidyverse Skills for Data Science in R Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review

There are 6 modules in this course

Data never arrive in the condition that you need them in order to do effective data analysis. Data need to be re-shaped, re-arranged, and re-formatted, so that they can be visualized or be inputted into a machine learning algorithm. This course addresses the problem of wrangling your data so that you can bring them under control and analyze them effectively. The key goal in data wrangling is transforming non-tidy data into tidy data.

This course covers many of the critical details about handling tidy and non-tidy data in R such as converting from wide to long formats, manipulating tables with the dplyr package, understanding different R data types, processing text data with regular expressions, and conducting basic exploratory data analyses. Investing the time to learn these data wrangling techniques will make your analyses more efficient, more reproducible, and more understandable to your data science team. In this specialization we assume familiarity with the R programming language. If you are not yet familiar with R, we suggest you first complete R Programming before returning to complete this course.

Data never arrive in the condition that you need them in order to do effective data analysis. Data need to be re-shaped, re-arranged, and re-formatted, so that they can be visualized or be inputted into a machine learning algorithm. This module addresses the problem of wrangling your data so that you can bring them under control and analyze them effectively. The key goal in data wrangling is transforming non-tidy data into tidy data.

What's included

19 readings2 quizzes

19 readingsTotal 155 minutes

About This Course3 minutes
Tidy Data Review2 minutes
Reshaping Data2 minutes
Wide Data5 minutes
Long Data5 minutes
Reshaping Data30 minutes
Data Wrangling0 minutes
R Packages15 minutes
The Pipe Operator15 minutes
Filtering Data20 minutes
Reordering15 minutes
Creating New Columns5 minutes
Separating Columns5 minutes
Merging Columns5 minutes
Cleaning Column Names5 minutes
Combining Data Across Data Frames5 minutes
Grouping Data5 minutes
Summarizing Data3 minutes
Operations Across Columns10 minutes

2 quizzesTotal 60 minutes

Reshaping Data Quiz30 minutes
Data Wrangling Quiz30 minutes

In R, categorical data are handled as factors. By definition, categorical data are limited in that they have a set number of possible values they can take. For example, there are 12 months in a calendar year. In a month variable, each observation is limited to taking one of these twelve values. Thus, with a limited number of possible values, month is a categorical variable. Categorical data, which will be referred to as factors for the rest of this lesson, are regularly found in data. Learning how to work with this type of variable effectively will be incredibly helpful.

What's included

14 readings2 quizzes

14 readingsTotal 75 minutes

Working with Factors5 minutes
Factor Review5 minutes
Manually Changing the Labels of Factor Levels: fct_releve()5 minutes
Keeping the Order of the Factor Levels: fct_inorder()5 minutes
Advanced Factoring5 minutes
Re-ordering Factor Levels by Frequency: fct_infreq()5 minutes
Reversing Order Levels: fct_rev()5 minutes
Re-ordering Factor Levels by Another Variable: fct_reorder()5 minutes
Combining Several Levels into One: fct_recode()5 minutes
Converting Numeric Levels to factors: ifelse() + factor()5 minutes
Dates and Times Basics5 minutes
Creating Dates and Date-Time Objects10 minutes
Working with Dates5 minutes
Time Spans5 minutes

2 quizzesTotal 60 minutes

Working With Factors Quiz30 minutes
Working With Dates Quiz30 minutes

Working with text data is increasingly common in data science projects. Text manipulation is often needed to clean up messy datasets and to create numerical measurements out of text input. In addition, often the text themselves are the data and this module covers tools to extract information from the text.

What's included

13 readings2 quizzes

13 readingsTotal 135 minutes

Working with Strings5 minutes
stringr5 minutes
String Basics15 minutes
Regular Expressions3 minutes
glue15 minutes
Tidy Text Format15 minutes
Sentiment Analysis15 minutes
Word and Document Frequency30 minutes
Functional Programming5 minutes
For Loops vs. Functionals2 minutes
map Functions5 minutes
Multiple Vectors15 minutes
Anonymous Functions5 minutes

2 quizzesTotal 60 minutes

Working With Strings Quiz30 minutes
Functional Programming Quiz30 minutes

The goal of an exploratory analysis is to examine, or explore the data and find relationships that weren’t previously known. Exploratory analyses explore how different measures might be related to each other but do not confirm that relationship as causal, i.e., one variable causing another. You’ve probably heard the phrase “Correlation does not imply causation,” and exploratory analyses lie at the root of this saying. Just because you observe a relationship between two variables during exploratory analysis, it does not mean that one necessarily causes the other.

What's included

2 readings

Now we will demonstrate how to import data using our case study examples. When working through the steps of the case studies, you can use either RStudio on your own computer or Coursera lab spaces provided for each case study.

What's included

11 readings2 ungraded labs

11 readingsTotal 180 minutes

Case Studies10 minutes
Healthcare Coverage Data20 minutes
Healthcare Spending Data20 minutes
Join the Data30 minutes
Census Data15 minutes
Violent Crime15 minutes
Brady Scores15 minutes
The Counted Fatal Shootings15 minutes
Unemployment Data15 minutes
Population Density: 201515 minutes
Firearm Ownership10 minutes

2 ungraded labsTotal 20 minutes

Case Study #1: Health Expenditures10 minutes
Case Study #2: Firearms10 minutes

In this project, you will practice data exploration and data wrangling with the tidyverse using consumer complaint data from the Consumer Financial Protection Bureau (CFPB).

What's included

1 reading1 quiz

Instructors

Instructor ratings

4.6 (9 ratings)

Carrie Wright, PhD

Johns Hopkins University

7 Courses7,179 learners

Shannon Ellis, PhD

Johns Hopkins University

5 Courses5,606 learners

Stephanie Hicks, PhD

Johns Hopkins University

5 Courses5,606 learners

Offered by

Johns Hopkins University

Recommended if you're interested in Data Analysis

Johns Hopkins University
Visualizing Data in the Tidyverse
Course
Johns Hopkins University
Importing Data in the Tidyverse
Course
Johns Hopkins University
Modeling Data in the Tidyverse
Course
École Polytechnique Fédérale de Lausanne
Programming Reactive Systems (Scala 2 version)
Course

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

Showing 3 of 31

4.5

31 reviews

5 stars
68.75%
4 stars
18.75%
3 stars
9.37%
2 stars
3.12%
1 star
0%

Reviewed on Apr 24, 2021

Reviewed on Apr 18, 2022

View more reviews

New to Data Analysis? Start here.

Open new doors with Coursera Plus

Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Learn more

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Explore degrees

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Learn more

Frequently asked questions

Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don't see the audit option:

The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.
The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

If you subscribed, you get a 7-day free trial during which you can cancel at no penalty. After that, we don’t give refunds, but you can cancel your subscription at any time. See our full refund policy.

Wrangling Data in the Tidyverse

Course

What you'll learn

Details to know

Course

See how employees at top companies are mastering in-demand skills

Build your subject-matter expertise

Earn a career certificate

There are 6 modules in this course

Wrangling Data in the Tidyverse

What's included

Working With Factors, Dates, and Times

What's included

Working With Strings and Text and Functional Programming

What's included

Exploratory Data Analysis

What's included

Case Studies

What's included

Project: Wrangling data in the Tidyverse

What's included

Instructors

Offered by

Recommended if you're interested in Data Analysis

Visualizing Data in the Tidyverse

Importing Data in the Tidyverse

Modeling Data in the Tidyverse

Programming Reactive Systems (Scala 2 version)

Why people choose Coursera for their career

Learner reviews

New to Data Analysis? Start here.

Open new doors with Coursera Plus

Advance your career with an online degree

Join over 3,400 global companies that choose Coursera for Business

Frequently asked questions

When will I have access to the lectures and assignments?

What will I get if I subscribe to this Specialization?

What is the refund policy?

More questions