Multiple CSV Files into a Single XLS Workbook using Python 3

These days, Excel workbooks and CSV files are two of the most popular formats for handling large amounts of data; therefore, the topic of why data conversion is necessary arises. Is it even doable? Figuring out the answers to these questions, along with the entire process, will be covered in this article.

Significance of CSV Format

For tabular data storage, the utmost importance is given to the CSV format because of its ease of use, interoperability, and adaptability. The CSV format guarantees simple interpretation without the need for specialized tools thanks to its human-readable structure, comma-separated data values, and rows representing records.

Its small weight makes effective storage and smooth import-export across different software applications possible. To make it a popular option for data analysis using languages like Python, the format can be modified through programming. Because of its function as a middleman for database connections and dependability in data transfer, the importance of CSV is highlighted in many sectors.

Dealing with Multiple CSVs and needing to Combine Them

Multiple CSV file management is a typical data difficulty with a variety of applications. Combining these files becomes essential in situations like data aggregation, when various data sources are combined for thorough analysis. To keep a historical record for trend assessment, periodic updates, such as financial reports, call for merging.

When dealing with several dimensions, such as sales data segmented by different characteristics, combining files is advantageous to disclose cross-dimensional insights. Consolidation is necessary when working on projects that involve multiple different CSV files so that all participants have access to consistent, current information.

Data extraction is made easier with a single dataset for effective reporting and visualization. To provide overall data administration, data synchronization, cleaning, migration, and synchronization are supported by combining. Concluding the basic introduction, let me just brief you on what you can expect from the article in the upcoming sections.

We will begin by gaining knowledge about the two major tools that we will be using in this article. We will then go through some Python libraries that would help us in the operation that we desire to perform. Then we will move on to actually combining and converting the CSV files into an Excel workbook.

Lastly, we would see how we can save whatever we have done by far and then conclude the article

Understanding CSV

Structure of CSV Files

CSV files are a well-liked option for storing tabular data because of their simple structure. Each value inside a row is separated by commas or other specified delimiters, and the row of data is shown by a new line in a CSV file. Data elements like names or sales numbers are organized in columns, which are frequently labeled by headers in the first row.

Working with different applications is made easy, and the readability of data is improved by the simplicity of this format. Users may effectively handle and manipulate CSV data, whether for analysis, reporting, or integration into other systems, by understanding its format.

Merging CSV Files: Reasons

For efficient data management, there are various benefits to merging multiple CSV files. To allow thorough analysis and decision-making, diverse data sources are gathered first. Connections and patterns that could be easily missed when working with separate files are easier to spot as this procedure ensures an understandable analysis of the data.

Important for evolving datasets, data integrity is maintained by avoiding problems that occur while handling different files. Reporting is simplified as a result of the consolidation because using a single dataset improves accuracy and speeds up report creation. Requirements for programming and automation are also met by merging data. Whether for analysis, machine learning, or database maintenance, working on a single dataset makes the coding procedure simpler.

Improving the quality, purification, and transformation of standardized data is made possible by it. The possibilities of storage, retrieval, and querying are improved as merged CSV files connect easily to databases. In a variety of scenarios, including analysis, reporting, automation, and integration, the combining of CSV files improves the use of data.

Python Libraries for Data Manipulation

Role of Pandas for Data Manipulation

Transforming data analysis, a very advanced Python data manipulation library is Pandas. Data loading, cleaning, transformation, and analysis processes are made simpler using Pandas. Reshaping and combining datasets and handling missing data are the routines with which Pandas improves data integrity. Providing even further insights, Pandas has interactions with well-known data visualization libraries.

To learn more about the Pandas Library you can read the linked article.

The below-given code can be used to install the Pandas library on your machine through your terminal/command prompt.

pip install pandas

‘Openyxl’: a Library for Working with Excel Files

To work with Excel files in Python, an important library is ‘openpyxyl’. Allowing both reading and writing to Excel files, a variety of Excel functions like formulas, formatting, and charts are supported by this library.

The Excel workbooks, worksheets, and cell values are under programmatic control thanks to “openpyxl.” It makes it easier to do things like automate data entry, create new Excel files, and change old ones. Smoothly integrating with Python’s data processing tools like Pandas, conversion of data between CSV and Excel is allowed by this module.

Allowing improved data processing and reporting capabilities, ‘openpyxyl’ can be used if we want to make use of Python’s adaptability and effectiveness for Excel-related activities

Loading CSV Files with Pandas

The Python library’s ‘read_csv()’ method is a vital tool for quickly importing data from CSV (Comma-Separated Values) files into Pandas DataFrames. The ability to load entire CSV datasets with a single line of code thanks to this method simplifies the data loading procedure.

In addition to its simplicity of use, the’read_csv()’ function provides customization options, enabling users to define delimiters, handle missing values, skip rows, and specify data types. Column headings are automatically recognized, and memory-saving methods are supported for managing huge datasets. Thanks to its flexibility and efficiency, the foundation for more in-depth data exploration and insights is established, as the ‘read_csv()’ function is a key entry point for data analysis and modification.

Let us see how we can use the ‘read_csv()’ method with the help of a simple example.

import pandas as pd

data = pd.read_csv('./car-sales-data.csv')
print(data)

In the above example, first I imported pandas as pd, then I directly jumped to the point and used the ‘read_csv()’ method on line 3.

The read_csv() method takes one important parameter, which is the path to the CSV file which we want to load Last but not least, I printed the CSV file, which is by default converted into a dataframe by the ‘read_csv()’ function.

If you feel like reading more about the ‘read_csv()’ method, please go through the article in the link.

You can see the output of the above code in the image below:

Now that we have gained knowledge about loading a CSV file, we can move on to more complex tasks.

Combining DataFrames

Concept of Concatenation and Merging DataFrame

The users bringing together several data sources into one single entity for analysis are allowed because a key component of data manipulation is combining dataframes.

Concatenation and merging are the two main methods for combining dataframes. Dataframes are arranged either vertically (along rows) or horizontally (along columns) during concatenation to create a larger dataframe.

On the other hand, merging includes joining dataframes based on shared columns.

Concatenation Methods Provided by Pandas

concat()

Horizontal and vertical concatenation are allowed by this method. Stacking dataframes vertically on top of one another increases the number of rows. It expands the number of columns by concatenating dataframes horizontally that are placed side by side.

The combined data’s integrity is ensured by the intelligent handling of index alignment.

append()

To vertically concatenate two dataframes this is a practical approach. Although it functions similarly to concat(), the syntax is easier to understand when joining dataframes row-wise.

Let us understand the concept with the help of a simple example:

import pandas as pd

data1 = {'Name': ['Alice', 'Bob', 'Charlie'],
         'Age': [25, 30, 28]}
data2 = {'Name': ['David', 'Eva', 'Frank'],
         'Age': [22, 27, 35]}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

concatenated_df = pd.concat([df1, df2], ignore_index=True)

print("Concatenated DataFrame:")
print(concatenated_df)

Rather than using an actual CSV file, I simply used a Python dictionary for demonstrating the process of concatenation.

The code is quite simple, I began by creating two dictionaries, then converted those dictionaries into dataframes ‘df1’ and ‘df2’ using the ‘DataFrame()’ method of the Pandas Library.

Later on, I created a new dataframe called concatenated_df, wherein I used the ‘concat()’ method to combine the two previously created dataframes, and lastly, I printed the newly created dataframe, which you can see in the image below.

Combining multiple dataframes together is what we have learned at this stage, after this example.

The actual problem that we were solving seems to be half-solved at this point.

Let’s see if we could use knowledge to come to a point where we could combine multiple CSVs into an Excel workbook

Creating a Workbook Using Openpyxyl

Introduction to Excel Workbooks

Multiple worksheets are contained in an Excel workbook, which offers a structured setting for data organisation and presentation. Users may effectively categorize and manage data by using the individual tab functionality that each worksheet in a spreadsheet offers. Workbooks’ generation and manipulation are crucial jobs since they are essential to data analysis, reporting, and visualization.

Introduction to Openpyxyl

A Python package called Openpyxl makes it easier for users to interact with Excel files by allowing them to create, edit, and extract data from Excel workbooks. Formatting, formulas, charts, and other Excel capabilities are all supported by this library. Data management and adaptability are improved with Openpyxyl, allowing users to automate Excel-related process with Python

To install openpyxyl on your machine, please run the below-given command in your terminal:

pip install openpyxyl

Let us see an example, through which we can get a brief idea of how the openpyxyl package works.

import openpyxl

workbook = openpyxl.Workbook()

worksheet = workbook.active
worksheet.title = 'Sample Sheet'

worksheet['A1'] = 'Name'
worksheet['B1'] = 'Age'

worksheet['A2'] = 'Ayush'
worksheet['B2'] = 25

worksheet['A3'] = 'Babban'
worksheet['B3'] = 30

workbook.save('sample_workbook.xlsx')

print("Excel workbook created and data written successfull...")

In the above example, I started by importing the openpyxyl module. Then I created a new workbook called ‘workbook’ using the ‘Workbook()’ method. Then on lines 5 and 6, I ensured that the workbook is active and initialized a new worksheet in which I will be working, and then I’ve set a title for the worksheet that I just created.

Then, from lines 8 to 15, I’ve populated the worksheet with data with the help of the index of the cells inside the worksheet. Lastly, I saved the workbook with the name ‘sample_workbook’ and then printed the success message.On successful execution, the above code will produce a new ‘.xlsx’ file with the same name with which we saved the workbook in the previous line in the current working directory, as you can see in the image below:

Converting CSV to XLS Workbook

At this stage, we have covered all the necessary stuff required for converting a CSV file to a XLS workbook. Since we have all the weapons required in our arsenal, let’s jump to an example and see how things would actually work in real life.

import pandas as pd
import openpyxl
from openpyxl.utils.dataframe import dataframe_to_rows

df1 = pd.read_csv('./car-sales-data.csv')
df2 = pd.read_csv('./car-sale2.csv')

concatenate_df = pd.concat([df1, df2], ignore_index=True)

workbook = openpyxl.Workbook()
worksheet = workbook.active

worksheet.title = 'Car Sales'

for row in dataframe_to_rows(concatenate_df, index=False, header=True):
    worksheet.append(row)

workbook.save('car_sales.xlsx')

The above code might seem really huge, but if you look carefully, I have used nothing new; everything that we did in the previous few examples is being used in this example.

If there is no error in the program, you will have a new Excel file created with all the data inside your CSV files written in it, as you can see in the image below:

Summary

‘openpyxl’ and the Pandas packages are used to convert CSV files to Excel workbooks in Python. To effectively manipulate and analyze the data, the method starts with reading CSV data into a Pandas DataFrame. After that, the ‘openpyxl’ library makes it easier to create an Excel workbook by providing the framework for handling worksheets.

The updated method corrects the improper function reference introduced by the original code example, which used ‘dataframe_to_rows()’ from ‘openpyxl.utils.dataframe’ to append rows to the worksheet. Showing the capability of Python to transfom data across various file formats, the unbounded connection of ‘openpyxyl’ with ‘Pandas’ is shown in this article. Improving their data management toolkit, this method allows data professionals to convert CSV data to Excel workbooks with confidence.