Data has become the lifeline of almost every business. Every business wants some application or mechanism to organize data in proper format. Data is stored in the tabular form in files such as .xlsx, xls etc. Reading data from excel, manipulating data and visualization of data are the fundamentals for data science processing.
With the help of some small Python scripts, it makes it very easy to organize data in tabular format. With two-three lines of code, we can organize the data but requires specific built-in modules. This article will cater to some of the amazing modules require to read excel file in Python.
Different Ways of Reading Excel file:
There are many builds in packages in python which use read data for reading excel files. Some of them are as:
With these three modules, data can be read in different formats.
Reading excel file using pandas package in python:
Pandas is a built-in package in python which is used to manipulate data. With the panda's package, data can be read from an excel file in the form of the data frame. With the built-in function "read_excel" and passing the location of the excel file data can be read.
DataFrame = pandas.read_excel("path")
import pandas as import pd df = pd.read_excel('table.xlsx') print(df)
First, we import the pandas' package as pd. With the pd package call the method read_excel() by passing the location of the file as table.xlsx.Data is read in the form of the data frame which is stored as a df variable. At last print the df variable.
Excess head of file:
With the inbuilt function in the data frame "head()" the first five rows of the first file can be read. It gives an outer look at the data stored in an excel file.
import pandas as import pd df = pd.read_excel('table.xlsx') df.head()
First, import the pandas' packages as pd and read the excel file. At last print the head of the data frame by calling the head method.
The excess value of specific columns:
Bypassing a list of values of columns as "usecols" inside the read_excel method as a parameter. Selected columns can be read. It makes it possible to work with a set of columns inside a table. The list can contain values such as 0,1,2... Specify the first, second, and third columns etc. The list can also contain the names of columns or combinations of both number and column names.
import pandas as import pd cols=[0,1,2] df = pd.read_excel('table.xlsx' , usecols=cols) df.head()
First import the pandas' package as d .Store the list of selected columns as 0,1,2 which means the first, second, and third columns as cols. Call the read_excel function by passing the location of the file where the file is stored and a variable usecols which has the values as a list of numbers for selecting the columns. At last print the head of the data frame.
Bypassing a parameter "sheet_name" in read_excel selected tables can be read. By default, only the first table will be read.
import pandas as import pd df = pd.read_excel('table.xlsx' , sheet_name = [0,'table2']) df.head( df[table2] )
First import the pandas' package as pd. Read the excel file by the read_excel method by passing the location of the file and a variable sheet_name which stores the list of sheet names. At last print the table2.
Reading excel file using xlrd module in python:
Excel files can be read with the xlrd module in python in the form of a workbook. With the method, "open_workbook" of xlrd module excel files can be read by passing the location of the file in the internal directory.
import xlrd path = ("\table2.xlxs") workbook = xlrd.open_workbook(path) sheet = workbook.sheet_by_index(0) print(sheet.cell_value(0,0))
Here, first, import the xlrd module. Set a variable path that contains the location of the excel file in the form of a string. Next, open the work with the method open_workbook of the xlrd module bypassing the path parameter. Set the index to the first table of the excel file by the method "sheet_by_index(0)".Print the value of the first cell of the table.
Excess Number of Rows And Columns:
By the variables " nrows" and "ncols" the number of rows and columns can be found in a table.
import xlrd path = ("\table2.xlxs") workbook = xlrd.open_workbook(path) sheet = workbook.sheet_by_index(0) sheet.cell_value(0,0) print(sheet.nrows) print(ncols)
Here we have imported the xlrd module and make a workbook. Next, print the number of rows and columns by the excess of the values of nrows and ncols.
Traverse any Row or columns:
With the help of a for loop, the table can be traversed. By the method "cell_value" each cell can be excess bypassing the row number and column number of the cell.
import xlrd path=("\table2.xlxs") workbook = xlrd.open_workbook(path) sheet = workbook.sheet_by_index(0) sheet.cell_value(0,0) print("First Columns:") for I in range(sheet.nrows): print(sheet.cell_value(i,0)) print("First Row::") for i in range(sheet.ncols): print(sheet.cell_value(0,i))
Here, First, traverse the first columns of the table by the first for loop. sheet.cell_value(x, y) gives the value of the cell located at row number x and column number y. Secondly, traverse the first row.
Reading excel file using openyxl module in python:
Excel files can also be read by the openpyxl module in the form of a workbook. With the method, the "load_workbook'' excel file can be loaded as a workbook.
import openpyxl w = openpyxl.load_workbook("table.xlsx") worksheet = w.active for i in range(0 , worksheet.max_row): for j in worksheet.iter_cols(1,worksheet.max_column): print(j[i].value, end = "\t\t") print(" ")
Import the openpyxl module and load the workbook bypassing the vacation of the excel file as a parameter. Then, set a variable worksheet with the first table of excel. With the help of for loop iterate over the rows of the excel file.
In this article, we learn how to read excel files in Python: Different Ways to Excel read the file (such as by pandas package, by xlrd module, by openpyxl Module). Working with a large collection of tabular data stored in excel files is difficult.
Usually data science professionals require to use these modules to organize their datasets in excel sheet so that Machine learning engineers can leverage tem. With the help of Python’s inbuilt modules, it becomes very easy and efficient to handle any large amount of data.