Data visualization using Matplotlib, Pandas-Django(Beginner’s level)

Vikashvar Rajan
AWS Tip
Published in
7 min readAug 26, 2023

--

In this data driven world, Data plays a vital role in analyzing, decision making and solving various real world problems. Passing data in pictorial form has more reach and understanding than passing data as mere numbers. How could we display numbers in a graphical representation so that information has been conveyed more efficiently? Here, charts comes into the picture.

Speaking about the charts, How could we integrate charts with Django so that we can process data and the data can be displayed in the website so that the end users are benefitted?There are a varity of libraries that helps us to picturize the data.Charts.js lib is one of them and I have a seperate blog on integrating charts.js with Django app.

https://medium.com/aws-tip/data-visualisation-django-with-django-chartjs-e2334cf1ce6c

Speaking about the data, Do you think that all the information that you get are apt for data analysis?The answer is No.Some data has to be removed and some has to be modified.Hence,we are going to process the data before representing it using a popular python library-pandas.

In this blog, we will integrate mathplotlib-a pure python based charts library with our Django app to represent the data to the users in charts form.

What are we going to do:-

— -creating a Django app

— -requesting for an API and fetching data

— -processing the data using pandas

— visualizing the data

prerequisite:

— -Basic knowledge in python and charts

Let’s get started,

First of all, create a new django project

>>django-admin startproject Mathplotlib

Inside the main django project, create an app -lobby and register the app in the installed apps:

>>python manage.py start project lobby

NOTE: To the people new to django, we create separate apps for separate modules inside our main projects.

Ok, create a template folder to store all the html files. It is the standard process for storing all static files of the app. Inside the template folder,create a html file in which the chart will be displayed.

Create an urls.py to map all the views with the corresponding urls. Whenever the url is called,the corresponding view will be triggered and the body of the view will be executed.This is the routing process of the django framework.

In the settings.py file, in the template section,do the following,

It is to join the templates to the base directory so that the django project can find the certain file.

Now, In the views.py file, create a view that renders the lobby page in which the charts will be displayed.This view must be mapped with the urls.

This is just a template and in this view, we are going to code to porcess the data and sending the data to the html file where we are going to create the data.

>>in the urls.py file of the app:

In the main project’s urls.py file, add the app’s urls.py file so that the routing will be done by the main project.

Now,lets dive into the actual process. The process is simple:-

  1. read an API
  2. retrieve the necessary information
  3. plot and display it

Required libraries:

StringIO is used to store the image data in a file like object

render is used to render the html where the chart will be displayed

pandas is used to clean the data

requests is used to read the api

Atlast,matplotlib is used to plot the data

Reading an API :

Here, we use .get() method of requests library to read the data and this method will return a response object of the API and the data is converted into JSON so that we can process the data to retrieve the necessary iinformation. Here, .json() method is used to convert the response object into json format

Cleaning the data using Pandas:

The json file is to be converted into dataframe. It is to simplify the processing and cleaning process of the JSON data. OK now,what is data cleaning? Data cleaning is the process in which the data table is processed and the incorrect data, incorrectly formated data, duplicate data will either be fixed or remove. This step is essential for analyzing data to the higher accuracy.

Few important data cleaning methods:

  • What if a city’s population count is missing? In this case,we can either eliminate the enitre city from the process or the fill the missing data with the mean population of the country.This can be done using pandas libaray:

>>df.dropna() is used to remove the entire cell

>>df.fillna() isused to fill the missing data

NOTE:This data can be processed without converting into dataframe as well.

OK,we know the importance of dataframe,we create our own dataframe.

>>df=pandas.DataFrame(data[‘data’])

The output of the the above line is :

The data is in tabular form.Now we have to process this to get the list of cities and the population to plot them.

NOTE:Here,the data is clean enough as there is no improper data or missing data.In your use case,if you find any,just use .dropna() and .fillna() methods.These methods will create a new dataframe with changes.If you want the changes to be applied in the existing file,pass a param (inplace =True).This is modify the existing file without creating a new file.

If you find any data is missing,try to fill the missing data with the mean value(It is the preferred method).In our use case,if you find my city’s population value is Null,then calculate the mean population of the country and fill it in the empty cell.Now,how can be find the mean of the specific axis.We have a method mean(axis=n) in pandas ,which is used to calculate the mean value of the specific column.Pass the parameter axis to get the mean of the particular axis and similarly,pass n to get the mean value of n th axis.

First of all,we are split the data into several parts.

//first two lines are samples and has nothing to do with the project. You can jump straight to the third line.

df will store the names of the all the cities.df2 will have the population count.NOTE:population count has value,sex and details as well and we want the value alone to plot the graph

df3 will get the city who country matches with the value specified.In our case,it will filter the cities who country is france.

df.loc[] is used to used to return the specific rows who values matches the passed value.Here,It will return the dataframe with cities whose country name is ‘France’.Here,I have hardcoded it for easy understanding. To make this project better,use can pass the user’s input as well.

.head() is used for slicing.Usually,head() will return the first five rows of the dataframe and if any number of rows is specified,it will return those number of rows.Here ,we are going to get first 10 rows.

.to_list() is used to convert the dataframe into the list so that the list can be passed to plot the graph.

Now as I mentioned earlier,This API has populationCount key which has year,value and sex and we want the year only. To do so, we have used list comprehension.

>>populationList=[float(population[pp][0][‘value’]) for pp in range(len(population))]

Now,this population list has the population count in numbers alone.

Plotting the data:

Here,we are going to create a bar chart.

.figure is used to create a new figure .The size of the figure can be set by using .figsize() method.

.bar() is used to create a bar chart. It takes two lists as the parameters,one for the x axis and one for the y axis.

.xlabel() and .ylabel() are used to create labels for the respective axis.

.title() is used give a title for the graph

.tight_layout() is used to make sure that no part of the graph is trimmed or cut.

Now,we have to temporarily store the data and pass it to the html page.

First of all, why this segment?This segment is used to save the fig in a file without write in a file.This helps to transmit the data over the network without directly writing inside a file.

StringIO is used to create a new object for StringIO class which will be helpful in storing the graph data

fig.savefig() is used to save the figure to a file. It takes the StringIO instances and the format of the file

.seek(0) is used to set the file pointer curser to the first line of the file. This ensures that no data is missed or left behind.

.getvalue() is used to get the data inside the file. The output will be stored in a variable so that it can be passed to the html file.

Atlast, this method will return the html page.

In the html page, we are just gonna display the file

The output of the code will be

we have successfully integrated pandas and matplotlib with Django.

takeaways of this blog:

→usage of request libaray

→pandas for data cleaning

→matplotlib for ploting a garph

Thanks for reading.Happy coding!

--

--