Pandas AI. The Generative AI Python library

Image courtesy of the editor

Python Pandas is an open source toolkit that provides data scientists and analysts with data manipulation and analysis capabilities using the Python programming language. The pandas library is very popular in machine learning and deep learning pre-development. But now you can do more with it…

A new input data science library: Pandas AI. A Python library that integrates generative artificial intelligence capabilities into pandas, making data frames conversational.

What does it mean to make data frames conversational?

This means exactly what it says: you can talk to your database. Yes, you heard it, you can talk to your data and get quick answers. As a data scientist or analyst, you no longer need to spend endless hours looking at your database, going through rows and columns. Pandas AI doesn’t replace pandas, it just gives it a big boost.

Data scientists and analysts spend a lot of time cleaning data during the analysis phase. They will now be able to take their data analysis to the next level. Data professionals are exploring different methods and processes they can use to minimize the time spent on data preparation, and now they can with Pandas AI.

PandasAI should be used hand in hand with Pandas, it does not replace Pandas. Instead of going through and answering questions about the dataset yourself, you can ask PandasAI these questions and it will provide answers in the form of Pandas DataFrames.

This means that people no longer need to be proficient in Python to achieve data analysis using tools such as the Pandas library.

With the help of the OpenAI API, Pandas AI aims to achieve the goal of virtually talking to the machine to get the results you want, rather than programming the task yourself. The machine will output the result in their own language, machine-interpretable code (DataFrame).

Installing Pandas AI via pip

Importing PandasAI with OpenAI

You need an OpenAI key to use the new Pandas AI library. When you start in your notebook, you should enter the following:

import pandas as pd
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI

llm = OpenAI(api_token=your_API_key)

If you do not have a unique OpenAI API key, you can create an account on the OpenAI platform and generate an API key here. You will receive a $5 credit that can be used to explore and test the API.

Once you have everything set up, you are ready to start using Pandas AI.

Running a model on your data frame

First, you need to run your OpenAI model in Pandas AI.

pandas_ai = PandasAI(openAImodel)

Then you need to run the model on the data frame, which consists of two parameters: the data frame you are working with and the question you want to ask;, prompt="the question you would like to ask?")

For example, you may be looking at your database and are interested in the rows where the column value is greater than 5. You can do this using Pandas AI.

import pandas as pd
from pandasai import PandasAI

# Sample DataFrame
df = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
    "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]

# Instantiate a LLM
from pandasai.llm.openai import OpenAI
llm = OpenAI()

pandas_ai = PandasAI(llm), prompt="Which are the 5 happiest countries?")

It will return a DataFrame output.

6            Canada
7         Australia
1    United Kingdom
3           Germany
0     United States
Name: country, dtype: object

It also has the ability to perform more complex queries, such as mathematical calculations and data visualization.

Example of data reflection.
    "Plot the histogram of countries showing for each the gpd, using different colors for each bar",

Data visualization result.

Pandas AI.  The Generative AI Python library
Image courtesy of Pandas AI

Pandas AI is very new and the team is still looking for ways to improve the library. As of May 10, they still have the following on their to-do list:

  • Add support for more LLMs
  • Make PandasAI available from the CLI
  • Create a web interface for PandasAI
  • Add unit tests

They are welcome for suggestions and contributions. If you are interested in contributing to the growth of Pandas AI, please see the support guidelines.

If you want to see Pandas AI in action, watch this video:

Although Pandas AI does not replace pandas, it is a good tool to boost your workflow. Although you can ask Pandas AI questions about your database, you still need to be proficient in programming to correct and direct the library when it makes errors.

If you’ve had a chance to play with the Pandas AI, let us know what you think about it in the comments below.

Nisha Arya is a data scientist, freelance technical writer, and community lead at KDnuggets. He is particularly interested in providing Data Science career advice or tutorials and theory-based knowledge on Data Science. He also wants to explore the different ways Artificial Intelligence can/can contribute to human longevity. An enthusiastic learner looking to expand his technology knowledge and writing skills while helping to lead others.

Source link