The easiest way to prepare data in Python
  1. from dataprep.datasets import load_dataset
  2. from dataprep.eda import create_report
  3. df = load_dataset("titanic")
  4. create_report(df).show()
  1. from dataprep.connector import connect
  2. dc = connect("twitter", _auth={"client_id":client_id, "client_secret":client_secret})
  3. df = await dc.query("twitter", q="covid-19", _count=1000)

Think different.

"We were disappointed, if not surprised, to see that data wrangling still takes the lion’s share of time in a typical data professional’s day. Data preparation and cleansing takes valuable time away from real data science work and has a negative impact on overall job satisfaction."

2020 State of Data Science: Moving From Hype Toward Maturity, Anaconda

Logo Snippet

Designed for Notebook Users

DataPrep is designed for computational notebooks, the most popular environment among data scientists.


Integrate Seamlessly with the Python Ecosystem

DataPrep is built using Pandas/Dask DataFrame and can be seamlessly integrated with other Python libraries.

GitHub Icon

Embrace Open Source

DataPrep is free, open-source software released under the MIT license. Anyone can reuse DataPrep code for any purpose.


DataPrep.EDA is the fastest and the easiest EDA tool in Python. It allows data scientists to understand a Pandas/Dask DataFrame with a few lines of code in seconds.

  1. from dataprep.datasets import load_dataset
  2. from dataprep.eda import plot
  3. df = load_dataset("titanic")
  4. plot(df, "Age")