How to import Excel data into a Python script using Pandas

Microsoft Excel is the most widely used spreadsheet software in the world, and there’s good reason for it. Excel has a user-friendly interface and powerful built-in tools that make working with data simpler.

But if you want more advanced data processing, you’ll need to go beyond the capabilities of Excel and start using a programming language like Python. Instead of manually copying data to the database, here’s a quick guide on how to load Excel data into Python using Pandas.

Note: If you’ve never used Python before, this tutorial can be difficult. You should start with websites to learn Python and basic Python examples that Quantrimang.com suggested.

What are Pandas?

Python Data Analysis Library (Pandas) is an open source library for the Python programming language used for data analysis and manipulation.

Pandas loads data into Python objects called Dataframes, which store data in rows and columns just like a traditional database. Once a Dataframe is created, it can be manipulated in Python, opening up a multitude of possibilities.

Install Pandas

Note: You must have Python 2.7 or higher to install Pandas.

To start working with Pandas on your computer, you will need to import the Pandas library. If looking for an advanced solution, you can download it Anaconda Python Distribution, which has Pandas integration. If you are not using Anaconda, Pandas is very simple to install in the terminal.

See more:  How to turn on sharing buttons on personal Facebook, Facebook groups

Pandas is a PyPI package, which means you can install using PIP for Python via the command line. Modern Mac systems come with PIP. For Windows, Linux and many other legacy systems, it is very easy to learn how to install PIP for Python.

Once you have opened the terminal, the latest version of Pandas can be installed with the command:

>> pip install pandas

Pandas also requires the NumPy library, install it on the command line:

>> pip install numpy

Now you have Pandas installed and ready to create your first DataFrame!

Prepare Excel data

For this example, let’s use a sample dataset: An Excel workbook named Cars.xlsx.

An Excel workbook named Cars.xlsx

This dataset shows the make, model, color, and year of the cars entered in the table. The table is displayed as an Excel range. Pandas is smart enough to read the data appropriately.

This workbook is saved to the folder Desktops, Here is the file path used:

/Users/grant/Desktop/Cars.xlsx

You will need to know the file path of the workbook to use Pandas. Let’s start by opening Visual Studio Code to write the script. If you do not have a text editor, the article recommends Visual Studio Code or Atom Editor.

Writing Python scripts

You now have the text editor you need. Next, the article will combine Python and the Cars workbook to create a Pandas DataFrame.

Import Python library

Open a text editor and create a new Python file. Let’s call it Script.py.

To work with Pandas in your script, you’ll need to import it into your code. This is done with one line of code:

import pandas as pd

Here, the example is loading the Pandas library and attaching it to a variable “pd”. You can use any name you want, but the example is currently in use “pd”, stands for Pandas.

See more:  What is the standard LinkedIn image size?

To work with Excel using Pandas, you need an additional object named ExcelFile. ExcelFile is integrated into the Pandas ecosystem, so you can import directly from Pandas:

from pandas import ExcelFile

Working with file paths

To give Pandas access to the workbook, you need to direct your script to the file location. The easiest way to do this is to give the script the full path to the workbook.

Recall the path in this example: /Users/grant/Desktop/Cars.xlsx

You will need this file path referenced in the script to extract the data. Instead of referencing the path inside the function Read_Excel, keep the code “clean” by storing the path in a variable:

Cars_Path="/Users/grant/Desktop/Cars.xlsx"

Now you are ready to extract data with Pandas function!

Extract Excel data using Pandas.Read_Excel()

With Pandas imported and the path variable set, you can now use the functions in the Pandas object to complete the task.

The function you will need to use is called Read_Excel. The Read_Excel function takes the file path of the Excel workbook and returns a DataFrame object with the contents of the workbook. Pandas puts this function as:

pandas.read_excel(path)

Argument “path” is the path to the Cars.xlsx workbook and the article has set the path string to a variable Cars_Path.

You are ready to create the DataFrame object! Let’s put it all together and set the DataFrame object to a variable named DF:

DF = pd.read_excel(Cars_Path)

In the end you’ll want to see the DataFrame, so print the results. Add a print statement at the end of the script, using the DataFrame variable as an argument:

print(DF)

Time to run the script in the terminal!

See more:  How to see Instagram Story translation

Run Python script

Open a terminal or command line and navigate to the directory where your script is located. In this case, we have “Script.py” on the desktop. To execute the script, use the python command followed by the script file:

Run Python script

Python will pull data from “Cars.xlsx” into new DataFrame and print DataFrame to terminal!

Python will pull data from “Cars.xlsx” into new DataFrame and print DataFrame to terminal

Let’s take a closer look at the DataFrame object!

At first glance, the DataFrame looks very similar to a regular Excel table. This makes Pandas DataFrame very easy to understand.

The headers were labeled at the top of the dataset, and Python filled the rows with all the information read from the Cars.xlsx workbook.

Notice the leftmost column, an index starting at 0 and numbering the columns. By default, Pandas will apply this index to the DataFrame, which can be useful in some cases. If you don’t want this index to be created, you can add an additional argument to the code:

DF = pd.read_excel(Cars_Path, index=False)

Set arguments “index” Fort False will remove the index column, leaving only the Excel data.

Now you have the ability to read data from Excel spreadsheets. You can apply Python programming in any way you choose. Working with Pandas is a simple way for experienced Python programmers to work with data stored in Excel workbooks.

The ease of using Python for data analysis and manipulation is one of the many reasons why Python is the programming language of the future.

Hope you are succesful.

Source link: How to import Excel data into a Python script using Pandas
– https://techtipsnreview.com/

, , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *