How to clean up data with Index Match on Python?
- DragonViz
- May 9, 2021
- 1 min read
Updated: May 31
Original data format:
RegionName Date1 Date2 Date3 ...
x 4 2 3
y 2 8 4
z 4 5 1
Final Output format:
RegionName Date Amount($)
x 1 4
x 2 2
x 3 3
y 1 2
y 2 8
y 3 4
z 1 4
z 2 5
z 3 1
In this sample, I will only need RegionName, Date, and Amount ($) columns.

After cleaning up the data, it should look like this (below). I saved this file as 'df3.xlsx'

You will need the following packages installed on to your python.
import pandas as pd
import csv
import numpy as np
Then you have to import your excel (ie. 'df3.xlsx') onto your python with pd.read_excel() function.
df3 = pd.read_excel('df3.xlsx')
Set index on the desired column, in my sample I will set RegionName as index.
data = df3.set_index('RegionName').stack().reset_index()
Create the desired columns: RegionName, Date, Amount.
Then we print the data, and export it into csv using to_csv function.
data.columns = "RegionName","Date","Amount"
print(data)
data.to_csv('city_toast.csv', index =False)
The expected output should look like the following image (below) on python console:

For more details, check this video out. Let us know other topics you would like us to cover on our next tutorial.
Comments