DataPrep V0.4.4 is out now! Click here for more.

Low-Code

Data Preparation

 

Collect, clean, and visualize your data in python with a few lines of code

dataprep_eda.py
  1. from dataprep.datasets import load_dataset
  2. from dataprep.eda import create_report
  3. df = load_dataset("titanic")
  4. create_report(df).show()
dataprep_connector.py
  1. from dataprep.connector import connect
  2. dc = connect("twitter", _auth={"client_id":client_id, "client_secret":client_secret})
  3. df = await dc.query("twitter", q="covid-19", _count=1000)
dataprep_clean.py
  1. from dataprep.datasets import load_dataset
  2. from dataprep.clean import clean_address
  3. df = load_dataset("waste_hauler")
  4. clean_address(df, "LOCAL ADDRESS")

Think different.

"We were disappointed, if not surprised, to see that data wrangling still takes the lion’s share of time in a typical data professional’s day. Data preparation and cleansing takes valuable time away from real data science work and has a negative impact on overall job satisfaction."

2020 State of Data Science: Moving From Hype Toward Maturity, Anaconda

Notebooks

Designed for Notebook Users

DataPrep is designed for computational notebooks, the most popular environment among data scientists.

Libraries

Integrate Seamlessly with the Python Ecosystem

DataPrep is built using Pandas/Dask DataFrame and can be seamlessly integrated with other Python libraries.

GitHub Icon

Embrace Open Source

DataPrep is free, open-source software released under the MIT license. Anyone can reuse DataPrep code for any purpose.

DataPrep.EDA

DataPrep.EDA is the fastest and the easiest EDA tool in Python. It allows data scientists to understand a Pandas/Dask DataFrame with a few lines of code in seconds.

DataPrep.Clean

DataPrep.Clean aims to provide a large number of functions with a unified interface for cleaning and standardizing data of various semantic types in a Pandas or Dask DataFrame.

DataPrep.Connector

DataPrep.Connector provides an intuitive, open-source API wrapper that speeds up development by standardizing calls to multiple APIs as a simple workflow. Streamline calls to multiple APIs through one intuitive library.


DataPrep.Connector also support loading data from databases through SQL queries. With one line of code, you can speed up pandas.read_sql by 10X with 3X less memory usage!

DataPrep Components

DataPrep

DataPrep.Connector

Available to use

DataPrep.EDA

Available to use

DataPrep.Clean

Available to use

DataPrep.Feature

Planning

DataPrep.Integrate

Planning

... and more

Get started instantly

pip install -U dataprep

And then check out documentation and examples!

News

Fetching data...

Contribution

There are many ways to contribute to DataPrep:

Getting Started

Learning DataPrep is easy whether you are a data scientist or a beginner in Python:

© 2022 SFU Database System Lab. MIT Licensed.