WHAT IS IT ABOUT
Annelies’s thesis explores text mining techniques to recognise impact data in humanitarian documents. Annelies used over 700 documents from the Disaster Relief Emergency Fund (DREF) as examples of humanitarian data on which to test modelling techniques. She worked with a team to label sentences in those documents as containing impact data. Annelies then used these labels to train a machine learning model to recognise impact data. She tested several possible models to figure out which was the most effective and explored ways to make the human labelling aspect of this more efficient.
WHO IS THE AUTHOR
Annelies was awarded cum laude with this thesis for her MSc in Statistical Science for the Life and Behavioural Sciences with a specialisation in Data Science at Leiden University. She has previously studied International Relations and Economics.
WHY IS IT NEEDED
Funds are often distributed only after a disaster occurs and the impact is assessed. If the impact of a disaster can be predicted, funding and humanitarian aid can be provided in advance to the people most in need. Annelies’s research contributes to the prediction of disaster impact by helping recognise and quantify the impact of a disaster using historical examples, and specifically by increasing the speed in which this analysis can be done. More historical data analysis helps us to create better prediction models for future disasters and therefore more efficiently provide aid to those impacted.
HOW WE WORKED TOGETHER
Annelies’s thesis contributes to 510’s Predictive Impact Analytics products and was a collaboration between ORTEC and 510. Via ORTEC, EffectAI’s EffectForce assisted in the initial sentence labelling phase of the project. Annelies also visited IFRC headquarters in Geneva to learn about their current Data Entry & Exploration Platform (DEEP) and how her research could contribute to this initiative.
WHAT ARE THE MAIN FINDINGS
- It is possible to use text mining to identify impact data from humanitarian documents, as her most effective model recognised about 80% of sentences containing impact data with few false positives.
- This work can be made more efficient and cost-effective in the future by reducing the number of sentences that require human labelling. Annelies’s analysis suggests that the model can be equally effective with 80% fewer labelled sentences. Future projects should investigate if the model’s learnings from DREF documents can be applied to other text sources.
The next step to integrate this work into 510 products is to extract the numeric impact data identified using Annelies’s techniques and add it directly to forecast-based financing models.
Annelies presents her research findings to 510 staff and volunteers in November 2019.