Principles, Statistical and Computational Tools for Reproducible Data Science

Learn skills and tools that support data science and reproducible research, to ensure you can trust your own research results, reproduce them yourself, and communicate them to others.

...

Choose your session:

94,995 already enrolled!

Starts Nov 21

Ends Dec 14

Enroll

Starts Dec 14

Enroll

I would like to receive email from HarvardX and learn about other offerings related to Principles, Statistical and Computational Tools for Reproducible Data Science.

About this course

What you'll learn

Instructors

Frequently Asked Questions

Ways to take this course

edX For Business

Principles, Statistical and Computational Tools for Reproducible Data Science

Learn skills and tools that support data science and reproducible research, to ensure you can trust your own research results, reproduce them yourself, and communicate them to others.

8 weeks

3–8 hours per week

Self-paced

Progress at your own speed

Free

Optional upgrade available

Choose your session:

94,995 already enrolled! After a course session ends, it will be archived.

Starts Nov 21

Ends Dec 14

Enroll

Starts Dec 14

Enroll

I would like to receive email from HarvardX and learn about other offerings related to Principles, Statistical and Computational Tools for Reproducible Data Science.

Enroll now

Starts Nov 21

About this course

Skip About this course

Today the principles and techniques of reproducible research are more important than ever, across diverse disciplines from astrophysics to political science. No one wants to do research that can’t be reproduced. Thus, this course is really for anyone who is doing any data intensive research. While many of us come from a biomedical background, this course is for a broad audience of data scientists.

To meet the needs of the scientific community, this course will examine the fundamentals of methods and tools for reproducible research. Led by experienced faculty from the Harvard T.H. Chan School of Public Health, you will participate in six modules that will include several case studies that illustrate the significant impact of reproducible research methods on scientific discovery.

This course will appeal to students and professionals in biostatistics, computational biology, bioinformatics, and data science. The course content will blend video lectures, case studies, peer-to-peer engagements and use of computational tools and platforms (such as R/RStudio, and Git/Github), culminating in a final presentation of a final reproducible research project.

We’ll cover Fundamentals of Reproducible Science; Case Studies; Data Provenance; Statistical Methods for Reproducible Science; Computational Tools for Reproducible Science; and Reproducible Reporting Science. These concepts are intended to translate to fields throughout the data sciences: physical and life sciences, applied mathematics and statistics, and computing.

Consider this course a survey of best practices: we’d like to make you aware of pitfalls in reproducible data science, some failure - and success - stories in the past, and tools and design patterns that might help make it all easier. But ultimately it’ll be up to you to take the skills you learn from this course to create your own environment in which you can easily carry out reproducible research, and to encourage and integrate with similar environments for your collaborators and colleagues. We look forward to seeing you in this course and the research you do in the future!

At a glance

Institution: HarvardX
Subject: Data Analysis & Statistics
Level: Intermediate
Prerequisites:
- Basic knowledge of Rand Git
- A computer that is capable of downloading software to run on it.

Language: English
Video Transcript: English

What you'll learn

Skip What you'll learn

Understand a series of concepts, thought patterns, analysis paradigms, and computational and statistical tools, that together support data science and reproducible research.
Fundamentals of reproducible science using case studies that illustrate various practices
Key elements for ensuring data provenance and reproducible experimental design
Statistical methods for reproducible data analysis
Computational tools for reproducible data analysis and version control (Git/GitHub, Emacs/RStudio/Spyder), reproducible data (Data repositories/Dataverse) and reproducible dynamic report generation (Rmarkdown/R Notebook/Jupyter/Pandoc), and workflows.
How to develop new methods and tools for reproducible research and reporting
How to write your own reproducible paper.

Syllabus

Skip Syllabus

Module 1: Introduction to Reproducible Science

Module 2: Fundamentals of Reproducible Science

Definitions and Concepts
Factors affecting reproducibility

Module 3: Case Studies in Reproducible Research

Module 4: Data Provenance

Project Design
Journal Requirements
Repositories
Privacy and Security

Module 5: Computational Tools for Reproducible Science

R and Rstudio
Python, Git, and GitHub
Creating a repository
Data sources
Dynamic report generation
Workflows

Module 6: A optional deeper dive into Statistical Methods for Reproducible Science

Prediction Models
Coefficient of determination
Brier score
Area Under the Curve (AUC)
Concordance in survival analysis
Cross-validation
Bootstrap
Simulations
Clustering

About the instructors

Who can take this course?

Unfortunately, learners residing in one or more of the following countries or regions will not be able to register for this course: Iran, Cuba and the Crimea region of Ukraine. While edX has sought licenses from the U.S. Office of Foreign Assets Control (OFAC) to offer our courses to learners in these countries and regions, the licenses we have received are not broad enough to allow us to offer this course in all locations. edX truly regrets that U.S. sanctions prevent us from offering all of our courses to everyone, no matter where they live.

Ways to take this course

Choose your path when you enroll.

Enroll now

Starts Nov 21

	Verified Track	Audit Track
Price	$99 USD	Free
Access to course materials	Unlimited	Limited Expires on Jan 16, 2023
World class institutions and universities
edX support
Shareable certificate upon completion
Graded assignments and exams

Read our FAQs about frequently asked questions on these tracks.

Interested in this course for your business or team?

Train your employees in the most in-demand topics, with edX For Business.

Purchase now Request information

Principles, Statistical and Computational Tools for Reproducible Data Science

Choose your session:

Principles, Statistical and Computational Tools for Reproducible Data Science

Choose your session:

Principles, Statistical and Computational Tools for Reproducible Data Science

Enroll now

About this course

At a glance

What you'll learn

Syllabus

About the instructors

Who can take this course?

Ways to take this course

Enroll now

Price

Access to course materials

World class institutions and universities

edX support

Shareable certificate upon completion

Graded assignments and exams

Interested in this course for your business or team?