Timely information can save lives. Aid organisations must recognize that accurate, timely information is a form of disaster response in its own right
Niskala, Secretary – General of the IFRC (2003-2008)
Humanitarian aid improves with accurate and timely information
When a natural disaster strikes, the local government, NGOs and Red Cross and Red Crescent National Societies quickly need information on the damage (affected population, casualties, road blocks, flood extent, damaged houses) in the areas that were hit by the disaster. The information that is presented to decision-makers in the wake of a disaster needs to be accurate, appropriate, timely and valid.
One of the challenges with disaster response is scarcity of resources: not each affected family can be helped. Therefore it is essential to identify priority areas , by assessing damage and finding vulnerable people that are affected the most. Currently damage assessments and identification of the most vulnerable is a time consuming process, which can takes weeks to complete, due to logistics, safety constraints, or workload.
Assessment teams need to go into the affected area and interview people affected and review damage to houses. Due to time constraints, or limited information sharing, there is a risk that decisions on priority areas are not based on complete and accurate information, and thereby also organizational or political preferences could be taken into account, as well as influence by the media on areas that receive more media attention than other areas.
During a study in the Philippines, Sense-making of the Netherland Red Cross Priority Index Model – case typhoon Haiyan, Philippines (November 2016). 60% of 32 interviewed decision makers (government, NGO’s and UN) have indicated that a faster, more complete and more objective analysis of priority areas (Priority Index) could be useful to identify areas with high damage and number of people affected. Thereby supporting decision makers to prioritize and distribute aid efforts and reach the most vulnerable people in the worst affected areas more efficiently.
Building the priority index for typhoons
Our aim is to develop a methodology to identify high priority areas for humanitarian response, based on (open) secondary data of affected areas, combined with disaster impact data (such as windspeeds and rainfall) and by learning from past disasters. It is important that we invest in data preparedness, so that these pre-crisis secondary datasets are available and up-to-date ([1. ACAPS, & CDAC-Network. (2014). Assessing information & communication needs: A quick and easy guide for those working in humanitarian response (pp. 10).], [2. InterAgencyStandingCommittee. (2010). IASC Guidelines Common Operational Datasets (CODs) in Disaster Preparedness and Response. Paper presented at the 77th IASC Working Group meeting.]).
Applied research on this objective is ongoing for Typhoons (Philippines), Earthquakes (Nepal) and Floods (Malawi). Our objective is to develop machine learning methodologies that can be applied to different countries, using local data, and with minor modifications reach a fast and sufficiently accurate damage prediction. In this blog we describe initial results for the Philippines during Typhoon Haima on October 19th 2016.
Data used for the prediction model includes country wide base line data (administrative boundaries, population, poverty, house wall and roof types), Geographical features per municipality (ruggedness, slope, coastline length, distance to coast), combined with impact data (wind speed, rainfall, typhoon path), and uses a number of specific features created from these data.
Official counts by DSWD and NDRRCM (Philippine government) on damaged houses are used to validate the model. For this we used data from four past typhoons: Haiyan, Melor, Hagupit and Rammasun. More details on the data and its sources are available.
All data was aggregated to the municipality level. Unfortunately barangay level damage counts are not available in the datasets published by the government. All information per municipality was integrated using the PCODE-system, which assigns a unique identifier to each administrative area in the Philippines. To ease this task, an efficient PCoder was developed.
Explained: damage counts. Damage counts, people affected counts and casualties are collected by the barangay council and reported by the barangay captain to the local government units (LGU). The LGU’s report the municipality aggregated data to the regional office of civil defence (OCD), who then reports it to the national OCD. Casualty counts are collected on a personal level and are double checked, accurate and valid. Damage counts are collected without a centralized methodology, and therefor methods of data collection can differ. People affected counts is a very wide term, and there is no centralized definition of when a person is affected. Therefor the counts can differ widely between municipalities. An analysis from Typhoon Haiyan shows that many of the people affected reports are estimates where either 0%, 50% or 100% of the total population in that area is reported as being affected. For a learning algorithm this distribution cannot be used to provide a reliable prediction
The prediction model
In the risk management domain probabilistic models are being developed for determining the likelihood of losses from a disaster (usually economic loss). It creates impact scenario’s that can be used by decision makers to mitigate risk. These models however are not developed to predict impact on people during a recent disaster. Our approach is not to develop sophisticated hydrologic, seismic, or windspeed models, but to use machine learning methods to find the best predictors in existing base line data to predict typhoon impact. Different machine learning methods have been tried (including neural networks). Currently we are using a method called Random Forest Regressor.
Explained: Random Forest Regressor. Its power comes from an interesting strategy of building multiple predictors (decision trees) and averaging their outputs. Each tree is built in a slightly different way, using different subsets of historical data, and randomly selecting different variables during the process of building the trees. This strategy allows to build a model that can handle multidimensional data well and can estimate importance of each input variable. It is a highly configurable method so several experiments were held to select parameters that produce the best results on training data.
The below chart show the importance of the different features in the dataset. The distance to the typhoon path is the most important feature, followed by building materials and weather features. The full log of importance has been made available.
The most important features in the dataset are: distance to the typhoon path, building materials and weather
Testing the model
We had the unique opportunity to test the model in the recent Typhoon. We dropped all our other work and got the team to fast track the development of the model and collect and clean the impact data, so that we were able to release a first Priority Index within 24 hours after landfall. More than four days later the first official counts of damage of parts of the affected area were released. The results where shared with humanitarian organizations, government and through social media. We have produced two types of analysis.
Priority areas within 24 hours
Predicted numbers were used to prioritize municipalities on a scale from 1 to 5 (1 with the lowest predicted number of damaged houses, 5 for the highest predicted number of damaged houses). The map and data (HDX[4. HDX. (n.d.). The Humanitarian Data Exchange. Retrieved June 9, 2016, from //data.hdx.rwlabs.org/]) were shared in the humanitarian community and reviewed by a few organizations such as UN OCHA and the Shelter Cluster.
Absolute damage to fill gaps in government counts
We used the model to complete gaps in the official counts of DSWD and NDRRCM. For this we included the official counts in the model and ran the model again to predict the gaps in the official data. This map was used by the Shelter Cluster to get a better overview of total house damage in the affected areas.
Validation: performance and error in the prediction
Due to its nature the regression model has difficulty to predict really low and really high damage. As we don’t know too much about the methodology of how damage counts are done in the Philippines we are not able to say if we have a high error on these outliers, or that the model actually predicts these fairly accurately.
Explained for the data analysts among you
– The r^2 score is 0.58 +- 0.11 (which is a pretty good score)
– The mean damage error is 1290 damaged buildings per municipality
– The standard deviation of the error goes from 2100 to 1900 damaged buildings per municipality
– The prediction error (on Typhoon Haima only) is 850 damaged buildings per municipality
From our work so far we can conclude that when data preparedness is done right, and disaster impact data collected structurally after an event, then it is possible to use machine learning techniques to build reliable damage predictions.
Although damage predictions by using data are not perfect, they are far more transparent than other prioritization methods, because the underlying data, assumptions and methodologies are shared openly.
While running and improving the model, we have made a few ‘discoveries’ that we are worth mentioning:
- The importance of poverty data seems to be overestimated in other Priority / Severity models out there. Our initial model showed 10% importance of poverty for all 4 typhoons. After adding wall and roof type data to the model, the importance of poverty dropped to 3%. This is also one piece of evidence that poverty is related to how people build.
- It is essential to use features that are proportional to the population. Otherwise population is by far the most important feature in any model.
A complete roadmap to improve the prediction is available on our github page. A few highlights are listed below.
To improve the performance, and reduce the error, of the prediction model we will try the following:
- Add new base line data especially on vulnerability and coping capacity, through a community level inform risk model [4. INFORM. (2016). INFORM Partners Severity Workshop: Towards a global severity index for humanitarian crises and disasters workshop. JRC Ispra, Italy. 21 and 22 April, 2016.]
- Work closer with in-country actors to get more complete building damage data and people affected data, and understand better the data collection methodology.
To reduce the time needed to release a prediction on damage after a new typhoon:
- End-to-end scripting of all data collection, cleaning, aggregation and analysis steps
- Reach agreements with data providers to get timely access to high resolution windspeed, rainfall, earthquake intensity and flood extend data.
To scale up this work to other countries:
- Data on impact (people affected, houses damaged and destroyed, casualties) should be available for a number of recent disasters and collected through the same methodology (standardized).
- Base line data on population, poverty, building materials should be available
- The above should be collected at the same administrative level and collected using the same administrative devisions (Combining data from before and after a significant change in administrative devisions is hardly possible)
Due to differences in how early warning is organized, and how people build, the impact of events between countries can be widely different. It is therefor not adviseable to run the Philippines model on another country without any historical data to validate on.
A special thanks to Andrej Verity (UN OCHA) and Simon Johnson (British Red Cross) for the ideation and initial work on Priority Index models. And to Mark Saunders (University College London) for providing the windspeed data.
WANT TO KNOW MORE? CONTACT US