This blog will attempt to show you the potential of spatial visualization using Folium library with Python. The blog will give you an insight into the most important visualization tools that can be extremely useful while analyzing spatial data.
For several data exploration tasks, spatial visualization can be extremely important. For problems related to crime mapping, housing prices or travel route optimization, spatial visualization could be the most resourceful tool in getting a glimpse of how the instances are geographically located. This will be beneficial as we are getting massive data from several sources such as cellphones, smartwatches, trackers, etc. In this manner, patterns and correlations, which might go unrecognized otherwise, can be extracted visually.
Inroduction to Folium
Folium is an incredible library that allows you to build Leaflet maps. Using latitude and longitude points, Folium can allow you to create a map of any location in the world. Furthermore, Folium creates interactive maps that may allow you to zoom in and out after the map is rendered.
We’ll get some hands-on practice with building a few maps using the Seattle Real-time Fire 911 calls dataset. This dataset provides Seattle Fire Department 911 dispatches and every instance of this dataset provides information about the address, location, date/time and type of emergency of a particular incident. It's extensive and we’ll limit the dataset to a few emergency types for the purpose of explanation.
Folium can be downloaded using the following commands.
$ pip install folium
$ conda install -c conda-forge folium
Start by importing the required libraries.
import pandas as pd import numpy as np import folium
Let us now create an object named seattle_map which is defined as a folium.Map object. We can add other folium objects on top of the folium.Map to improve the map rendered. The map has been centered to the longitude and latitude points in the location parameters. The zoom parameter sets the magnification level for the map that's going to be rendered. Moreover, we have also set the tiles parameter to 'OpenStreetMap' which is the default tile for this parameter. You can explore more tiles such as StamenTerrain or Mapbox Control in Folium's documentation.
seattle_map = folium.Map( location = [47.6062, -122.3321], tiles = 'OpenStreetMap', zoom_start = 11 ) seattle_map
We can observe the map rendered above. Let's create another map object with a different tile and zoom_level. Through 'Stamen Terrain' tile, we can visualize the terrain data which can be used for several important applications.
We've also inserted a folium.Marker to our seattle_map2 map object below. The marker can be placed to any location specified in the square brackets. The string mentioned in the popup parameter will be displayed once the marker is clicked as shown below.
seattle_map2 = folium.Map( location=[47.6062, -122.3321], tiles = 'Stamen Terrain', zoom_start = 10 ) #inserting marker folium.Marker( [47.6740, -122.1215], popup = 'Redmond' ).add_to(seattle_map2) seattle_map2
We are interested to use the Seattle 911 calls dataset to visualize the 911 calls in the year 2019 only. We are also limiting the emergency types to 3 specific emergencies that took place during this time.
We will now import our dataset which is available through this link (in CSV format). The data-set is huge, therefore, we’ll only import the first 10,000 rows using pandas read_csv method. We'll use the head method to display the first 5 rows.
(This process will take some time because the data-set is huge. Alternatively, you can download it to your local machine and then insert the file path below)
path = "https://data.seattle.gov/api/views/kzjm-xkqj/rows.csv?accessType=DOWNLOAD" seattle911 = pd.read_csv(path, nrows = 10000) seattle911.head()
Using the code below, we'll convert the datatype of our Datetime variable to Date-time format and extract the year, removing all other instances that occurred before 2019.
seattle911['Datetime'] = pd.to_datetime(seattle911['Datetime'], format='%m/%d/%Y %H:%M', utc=True) seattle911['Year'] = pd.DatetimeIndex(seattle911['Datetime']).year seattle911 = seattle911[seattle911.Year == 2019]
We'll now limit the Emergency type to 'Aid Response Yellow', 'Auto Fire Alarm' and 'MVI - Motor Vehicle Incident'. The remaining instances will be removed from the seattle911 dataframe.
seattle911 = seattle911[seattle911.Type.isin(['Aid Response Yellow', 'Auto Fire Alarm', 'MVI - Motor Vehicle Incident'])]
We'll remove any instance that has a missing longitude or latitude coordinate. Without these values, the particular instance cannot be visualized and will cause an error while rendering.
#drop rows with missing latitude/longitude values seattle911.dropna(subset = ['Longitude', 'Latitude'], inplace = True) seattle911.head()
Now let's steep towards the most interesting part. We'll map all the instances onto the map object we created above, seattle_map. Using the code below, we'll loop over all our instances up to the length of the dataframe. Following this, we will create a folium.CircleMarker (which is similar to the folium.Marker we added above). We'll assign the latitude and longitude coordinates to the location parameter for each instance. The radius of the circle has been assigned to 3, whereas the popup will display the address of the particular instance.
As you can notice, the color of the circle depends on the emergency type. We will now render our map.
for i in range(len(seattle911)):
folium.CircleMarker( location = [seattle911.Latitude.iloc[i], seattle911.Longitude.iloc[i]], radius = 3, popup = seattle911.Address.iloc[i], color = '#3186cc' if seattle911.Type.iloc[i] == 'Aid Response Yellow' else '#6ccc31' if seattle911.Type.iloc[i] =='Auto Fire Alarm' else '#ac31cc', ).add_to(seattle_map)
Voila! The map above gives us insights about where and what emergency took place across Seattle during 2019. This can be extremely helpful for the local government to more efficiently place its emergency combating resources.
Advanced Features Provided by Folium
Let us now move towards slightly advanced features provided by Folium. For this, we will use the National Obesity by State dataset which is also hosted on data.gov. There are 2 types of files we'll be using, a csv file containing the list of all states and the percentage of obesity in each state, and a geojson file (based on JSON) that contains geographical features in form of polygons.
Before using our dataset, we'll create a new folium.map object with location parameters including coordinates to center the US on the map, whereas, we've set the zoom_start level to 4 to visualize all the states.
usa_map = folium.Map( location=[37.0902, -95.7129], tiles = 'Mapbox Bright', zoom_start = 4 ) usa_map
We will assign the URLs of our datasets to obesity_link and state_boundaries variables, respectively.
obesity_link = 'http://data-lakecountyil.opendata.arcgis.com/datasets/3e0c1eb04e5c48b3be9040b0589d3ccf_8.csv' state_boundaries = 'http://data-lakecountyil.opendata.arcgis.com/datasets/3e0c1eb04e5c48b3be9040b0589d3ccf_8.geojson'
We will use the state_boundaries file to visualize the boundaries and areas covered by each state on our folium.Map object. This is an overlay on our original map and similarly, we can visualize multiple layers on the same map. This overlay will assist us in creating our choropleth map that is discussed ahead.
The obesity_data dataframe can be viewed below. It contains 5 variables. However, for the purpose of this demonstration, we are only concerned with the NAME and Obesity attributes.
obesity_data = pd.read_csv(obesity_link) obesity_data.head()
Now comes the most interesting part! Creating a choropleth map. We'll bind the obesity_data data frame with our state_boundaries geojson file. We have assigned both the data files to our variables data and geo_data respectively. The columns parameter indicates which DataFrame columns to use, whereas, the key_on parameter indicates the layer in the GeoJSON on which to key the data.
We have additionally specified several other parameters that will define the color scheme we're going to use. Colors are generated from Color Brewer's sequential palettes.
By default, linear binning is used between the min and the max of the values. Custom binning can be achieved with the bins parameter.
folium.Choropleth( geo_data = state_boundaries, name = 'choropleth', data = obesity_data, columns = ['NAME', 'Obesity'], key_on = 'feature.properties.NAME', fill_color = 'YlOrRd', fill_opacity = 0.9, line_opacity = 0.5, legend_name = 'Obesity Percentage' ).add_to(usa_map) folium.LayerControl().add_to(usa_map) usa_map
Awesome! We've been able to create a choropleth map using a simple set of functions offered by Folium. We can visualize the obesity pattern geographically and uncover patterns not visible before. It also helped us in gaining clarity about the data, more than just simplifying the data itself.
You might now feel powerful enough after attaining the skill to visualize spatial data effectively. Go ahead and explore Folium's documentation to discover the incredible capabilities that this open-source library has to offer.
Thanks for reading! If you want more datasets to play with, check out this blog post. It consists of 30 free datasets with questions for you to solve.
This is a companion discussion topic for the original entry at https://blog.datasciencedojo.com/p/d989ee6b-c519-48bd-b9ef-5af0bfdea840/