DataPrep V0.4.4 is out now! Click here for more.


Data Preparation


Collect, clean, and visualize your data in python with a few lines of code
  1. from dataprep.datasets import load_dataset
  2. from dataprep.eda import create_report
  3. df = load_dataset("titanic")
  4. create_report(df).show()
  1. from dataprep.connector import connect
  2. dc = connect("twitter", _auth={"client_id":client_id, "client_secret":client_secret})
  3. df = await dc.query("twitter", q="covid-19", _count=1000)
  1. from dataprep.datasets import load_dataset
  2. from dataprep.clean import clean_address
  3. df = load_dataset("waste_hauler")
  4. clean_address(df, "LOCAL ADDRESS")

Think different.

"We were disappointed, if not surprised, to see that data wrangling still takes the lion’s share of time in a typical data professional’s day. Data preparation and cleansing takes valuable time away from real data science work and has a negative impact on overall job satisfaction."

2020 State of Data Science: Moving From Hype Toward Maturity, Anaconda


Designed for Notebook Users

DataPrep is designed for computational notebooks, the most popular environment among data scientists.


Integrate Seamlessly with the Python Ecosystem

DataPrep is built using Pandas/Dask DataFrame and can be seamlessly integrated with other Python libraries.

GitHub Icon

Embrace Open Source

DataPrep is free, open-source software released under the MIT license. Anyone can reuse DataPrep code for any purpose.


DataPrep.EDA is the fastest and the easiest EDA tool in Python. It allows data scientists to understand a Pandas/Dask DataFrame with a few lines of code in seconds.