Identifying aid priority areas by learning from past events
Written by 510 GEOGRAPHIC INFORMATION SPECIALIST Evelien Bulte
In the first few days following a severe earthquake there typically is a scarcity of reliable information about the impact on local communities. In this situation, decision makers of the humanitarian aid sector struggle to get an overview of how the severity of the impact is spatially distributed. Time and quantity of their relief resources are limiting factors, and thus they look for an optimal schedule to effectively target the most affected communities. By learning from data of past events Priority Index Models can rapidly produce an estimate of a disaster’s impact, which can help decision makers to identify aid priority areas.
You might have read our blogposts about the initial and improved priority index model for typhoons. In the meanwhile we have started to develop a similar model for seismic hazards. In the first blogpost we have already highlighted the importance of full area covering impact information for successfully identifying priority areas. During the first days following a natural disaster, when the impact has not yet been field assessed, a rapidly produced impact estimation can serve as a stopgap to support important aid coordination decision-making.
Additionally, we believe that maps in general are a very powerful medium. Estimated impact maps can help aid workers to form an image in their mind about the situation they work in, which can help them distribute relief resources more effectively.
The weighing applied in current earthquake severity index models is often subjective, which can have a negative impact on the accuracy of their output. By learning from past-events, priority index models enable empirically-based decision support.
A Case Study for Nepal
On the 25th of April 2015 a 7.8 Mw earthquake hit the Central Northern part of Nepal. Because of the extensive amount of assessment data produced after this event, it was selected as the first case to build an earthquake priority index model on.
During the search for relevant model input data an important requirement was availability on a low geographic administrative level, to be able to distinguish different levels of severity within the most affected area, and not only between heavily and medium affected areas. This would preferably be administrative level 4, which is the lowest level possible. On this level there are around 3150 VDCs (Village Development Committees) in Nepal (see figure below)1. An output on this low resolution was considered highly preferable over the administrative level 3 (district).
Open data on 27 potential predictor variables was collected mainly from the national population census (2011) and through platforms such as Humanitarian Data Exchange, The predictor variable were divided over four categories: hazard predictors (mean Macroseismic intensity derived from USGS ShakeMaps), exposure predictors (total population and population density), physical vulnerability predictors (slope and various foundation, wall and roof building materials) and socio-economic vulnerability related predictors (literacy rate, household size, school attendance, drinking water source quality and toilet presence).
Concerning the response variable, we tried to find data that could accurately indicate the distinct levels of aid-neediness between multiple geographical entities. Damage to residential buildings was selected as the most suitable response variable, because it is a relatively objective measure and it’s relevant for multiple aid clusters (Shelter, WASH, Health and Food Security). An Initial Rapid Assessment performed by volunteers of the Nepal Red Cross Society provided damage numbers for 517 VDCs in the most affected area. This type of assessment is typically done by letting volunteers assess and report impact soon after a new rapid-onset crisis to provide an overview of how the population has been affected.
Model Fitting and Validating
Different statistical models with different response variables were compared in order to provide solid argumentation for model decisions. Due to its earlier success on the typhoon priority index model, we applied the random forest regression model. In the typhoon Haima priority index blogpost we explained more about this machine learning method. Additionally, a multivariate linear regression model was fitted to the data in order to get insight into the nature of individual indicators’ relations to the response variables. Two response variables were tested with: the absolute number of completely damaged houses and a composite damage variable, referred to as the house damage factor:
house damage factor = (0.75 ∙ completely damaged houses) + (0.25 ∙ partially damaged houses)
Statistically, the random forest model predicting a composite variable of both partially and completely damaged houses performed best with an R-squared of 0.63 on an independent test dataset. However, despite a lower model accuracy, we favour the random forest model predicting only completely damaged houses because the output is more intuitive. Meaning that a non-composite measure is easier to interpret for aid workers. Another reason to favour this model is that the output can be extended, for example by dividing it by the total number of houses the relative damage per VDC can be retrieved. This cannot be done with a composite measure. The R-squared of this model was not much lower with 0.60. Two-third of the highest priority areas were identified correctly (VDCs were divided over five equal-sized priority levels). The standard deviation of the unexplained variance (root mean squared error) was 626 houses per VDC.3
By interpreting the coefficients of determination resulting from the multivariate linear regression model we found that some relationships in the model might actually be unique to this particular case or area. It was observed that heavier damaged VDCs correlated with VDCs with better quality of building foundations, higher toilet presence and higher school attendance.
The below graph shows the increase in Mean Squared Error, reported on a 0% to 100% scale, in case the predictor variable of interest is permuted (randomly shuffled).
Relative importance plot of favoured random forest regression model. (explanation of variable codes: thatch_roof = number of households with thatch/straw roofs, mi = Macroseismic intensity, mud_found = “ mud bonded bricks/stone foundations, mud_wall = “ mud bonded bricks/stone outer walls, pop = population, tap_water = percentage of households with tap water as their main source for drinking water, tile_roof = “ tile/slate roofs, galv_roof = “ galvanized iron roofs, bamboo_wall = “ bamboo outer walls, slope = mean slope value (%), cem_wall = “ cement bonded bricks/stone outer walls, school = school attendance 5-25 year olds (%), wood_found = “ wooden pillar foundations, rcc_roof = “ RCC roofs, literacy_rate = literacy rate, no_toilet = percentage of households without a toilet facility, pop_dens = population density, cem_found = “ cement bonded bricks/stone foundation, wood_wall = “ wood/planks outer walls, rcc_found = “ RCC with pillar foundations, wood_roof = “ wood/planks roofs, hhsize = average household size, unbaked_wall = “ unbaked brick outer walls, mud_roof = ‘’ mud roofs)
The mean Macroseismic intensity (mi) and total population (pop) were among the most important predictors in all models and are therefore considered to be indispensable model components. The importance of the population variable was expected since the model predicts the absolute number of houses damaged per VDC, which logically is closely related to the total number of inhabitants per VDC. The high importance of the Macroseismic intensity variable proofs that this is a good quantitative indicator for seismic hazards. Also single building material variables scored a relatively high importance. This stresses the importance of collecting and preparing datasets on the quality of buildings in earthquake prone areas.
Would the model apply to other events and areas?
At this point, where the model has been trained on one case only, it is unlikely that a useful model output can be produced by the model for a future event in another country. This is mostly due to the presence of case- and country specific relationships in the model, such as the positive relations between school attendance and damage or toilet presence and damage. By training the model on more cases in different environments these relationships can be eliminated with time. Besides that, we found that universality could be improved by excluding secondary hazard susceptibility variables, finding an alternative uniform socio-economic vulnerability variable and using composite building quality variables.
Keeping model input data simple seems to be the only way to create a model that can produce useful output for seismic events in different areas around the world. Apart from that, the successful application of priority index models in general stands or falls by data preparedness. Data collection and pre-processing are time-consuming tasks. Therefore, to be able to run the model within the first hours after initial impact, it is necessary to have data on as much predictor variables as possible prepared in a structured data matrix. Of course data on earthquake parameters can be added only after initial impact.
What is New?
An important change in comparison to our typhoon models, is that we decided to no longer use a composite variable as the response variable. Instead, we favoured the number of completely damaged houses. Besides the reasons mentioned above, this also avoids issues of reliability with the reported numbers for ‘partially damaged’, which can differ based on someone’s judgement of what is partially damaged.
What lies ahead?
The results of the model are quite promising, and we expect that a more universally applicable model can be created based on it. In fact, the model is already operationally useful for the study area in Nepal. Currently we are working on training the model on events at different places in the world. This will show if the model can indeed produce useful estimations for aid decision-makers in a post-earthquake situation.
We believe that rapid impact estimation models trained on data from past events can support the humanitarian aid sector in the near future and will continue to do research on and develop them. To get the details on the earthquake priority index model or the associated research we direct you to the thesis.
2 Nepalese Red Cross Society, 2015a. Initial Rapid Assessment.
3 For more details on predictive accuracy of the model see thesis