The easiest way to prepare data in Python
  1. from dataprep.datasets import load_dataset
  2. from dataprep.eda import create_report
  3. df = load_dataset("titanic")
  4. create_report(df).show()
  1. from dataprep.connector import connect
  2. auth_token = "<your_access_token>"
  3. dc = connect("youtube", _auth={"access_token": auth_token})
  4. df = await dc.query("videos", q="Data Science", part="snippet", type="videos", _count=40)
Logo Snippet

Designed for Notebook Users

DataPrep is designed for computational notebooks, the most popular environment among data scientists.


Integrate Seamlessly with the Python Ecosystem

DataPrep is built using Pandas/Dask DataFrame and can be seamlessly integrated with other Python libraries.

GitHub Icon

Embrace Open Source

DataPrep is free, open-source software released under the MIT license. Anyone can reuse DataPrep code for any purpose.


According to the 2020 State of Data Science survey by Anaconda, data preparation still takes the majority of time in a typical data professional’s day. To solve this issue in the next decade, we have to THINK DIFFERENT.

Why DataPrep Chart
  1. from dataprep.connector import connect
  2. dc = connect("dblp")
  3. df = await dc.query("publication", q="lee")
  4. df.head()
0[Dong-Uk Lee, Ho Sung Cho, Jihwan Kim, Young J...22.3 A 128Gb 8-High 512GB/s HBM2E DRAM with a ...[ISSCC]334-3362020Conference and Workshop Papers10.1109/ISSCC19947.2020.9062977
1[Heon-Cheol Lee, Seung-Hee Lee, Seung-Hwan Lee...Comparison and analysis of scan matching techn...[URAI]165-1682011Conference and Workshop Papers10.1109/URAI.2011.6145953
2[Hyun-Woo Lee, Won-Joo Yun, Young-Kyoung Choi,...A 1.6V 3.3Gb/s GDDR3 DRAM with dual-mode phase...[ISSCC]140-1412009Conference and Workshop Papers10.1109/ISSCC.2009.4977347
3[Dae-Won Lee, Kwang-Sik Chung, Hwa-Min Lee, Su...Managing Fault Tolerance Information in Multi-...[IDEAL]104-1082003Conference and Workshop Papers10.1007/978-3-540-45080-1_15
4[Jang-Woo Lee, Dae-Hoon Na, Anil Kavala, Hwasu...A 1.8 Gb/s/pin 16Tb NAND Flash Memory Multi-Ch...[VLSI Circuits]1-22020Conference and Workshop Papers10.1109/VLSICIRCUITS18222.2020.9163052


DataPrep.Connector is an intuitive, open-source API wrapper that speeds up development by standardizing calls to multiple APIs as a simple workflow. Streamline calls to multiple APIs through one intuitive library.


DataPrep.EDA is the fastest and the easiest EDA tool in Python. It allows data scientists to understand a Pandas/Dask DataFrame with a few lines of code in seconds.

  1. from dataprep.datasets import load_dataset
  2. from dataprep.eda import plot
  3. df = load_dataset("titanic")
  4. plot(df, "Age")