12 Years of Fires in Sardinia
This summer I was looking for some data visualization challenges and I came across this cool project by Mauro Melis. Mauro created it for a contest organized by Open Data Sardegna, and the jury found it so cool that he won the first prize in the data visualization category.
It’s basically a scrollytelling visualization, a type of visualization popularized - among others - by The New York Times and the guys at The Pudding.
There were no links to the data that Mauro used, but it was pretty easy to find the datasets from 2005 to 2016, namely 12 years of wild fires in Sardinia.
I like scrollytelling, but I wanted to do something quick this time. I also wanted to try an online tool (it’s also a library, but I used the online tool) developed by Uber: Kepler.gl.
Shapefiles? GeoPandas!
The datasets from 2005 to 2016 contain shapefiles, a popular geospatial vector data format. I know that there are several geospatial libraries in Javascript, and of course D3 is awesome for creating maps, but I think that Python is so much better at data wrangling than Javascript, so I decided to go with it.
In Python, if you need to work with data, you pick Pandas.
If you need to work with Geospatial data, you pick GeoPandas.
It’s that simple!
Not much Data Wrangling
Turns out that these datasets were actually pretty good, so I didn’t have to do too much data wrangling. Of course there were differences from year to year, but nothing major. As an example, this is what I did to clean the 2016 dataset:
import os
import geopandas as gpd
gdf2016 = gpd.read_file(os.path.join(data_dir, 'areeIncendiatePerim2016', 'Perimetri_Superfici_Bruciate_2016.shp'))
gdf2016 = gdf2016\
.reset_index(drop=True)\
.drop(columns=['BASE_FID', 'ID_INCE', 'ISTAT', 'ID_PROV', 'STIR', 'STAZIONE',
'COMUNE', 'TIPOLOGIE', 'M2_BOSCO', 'M2_PASCOLO', 'M2_ALTRO', 'SUP_TOT_M2',
'TIPO_INCE', 'dist_ins', 'ID_RILIEVO', 'MODIFICHE'])\
.rename(columns={'TOPONIMO': 'toponym', 'DATA_INCE': 'date', 'N_INCE': 'num_fires', 'SUP_TOT_HA': 'hectars'})
cols = ['toponym', 'hectars', 'date', 'num_fires', 'geometry']
gdf2016 = gdf2016[cols]
Basically I harmonized the datasets from 2005 to 2016, so they had the same structure.
gdf2016.head()
I posted it on the DataIsBeautiful subreddit and it was quite succesfull.
Someone commented that I should have added a legend, and I agree, but apparently I was too lazy to find out how to add it in Kepler.gl.
Other projects where I had to do much more data wrangling had been completely ignored.
Lessons learned:
- Nobody cares about how much you struggled with data wrangling (but you still have to do it).
- Always include a GIF in a README (well, I already knew that…)
- Sardinian cities keep their Italian name in English
Code
You can find the repository on GitHub.
A Note on Reproducibilty
I recently tried to reproduce the notebook and I had to exclude the dataset from 2010. I think this is due to some dependency issues with fiona, which is used by GeoPandas.