Projects
Dive into a world where big spatial data comes to life! My projects center on manipulating, automating, and visualizing large datasets to uncover fascinating insights. Leveraging advanced large language models, I've developed workflows to pinpoint authors' locations and distinguish between past and future movements. By analyzing GPS data, I've unraveled intricate human movement patterns and behaviors. My expertise with GIS and ESRI products enables me to create dynamic maps that reveal the hidden stories within spatial data. Here, I showcase some of the most exciting projects I've been a part of, each one a journey through the power of data analysis.
GeoSHARING Project: Enhancing location inference models
Serere, H. N., Resch, B., & Havas, C. R. (2023). Enhanced geocoding precision for location inference of tweet text using spaCy, Nominatim and Google Maps. A comparative analysis of the influence of data selection. PLOS one, 18(3), e0282942.
Geo-social media have become an established data source for the spatial analysis of geographic and social processes in various fields. However, only a small share of geo-social media data are explicitly georeferenced, which often compromises the reliability of the analysis results by excluding large volumes of data from the analysis. In this project, I develop methods in which locations can be inferred from implicitly georeferenced locations. Generally, I work to enhance the process of location extraction, geocoding, and validation. More so, I look at the authors’ position relative to the mentioned or inferred location.
Human movement patterns
April 2022- November 2022
The aim of the project was to explore and map out user movement and duration of stay patterns. In particular, I was tasked with analysing the volume of people arriving and departing the venue, the main points of interest, the effect of the weather on stay duration, and visitation rates, amongst others.
Overview of software and tools used within this project:
- Dataset: NEAR Big Query public dataset
- Data format: GZ; TSV: CSV
- Programming language: Python
- Data exploration and visualisations: GeoPandas, Kepler.gl, movingPandas, Tableau, ArcGIS Desktop
NER model to distinguish authors' in-situ and remote locations
Helen Serere & Bernd Resch 2023- 2024
Twitter authors refer either to their current locations at the time of sending the tweet (in-situ location) or to general locations that do not coincide with their current locations (remote locations). However, distinguishing between in-situ and remote location entities poses significant challenges due to the ambiguity of natural language, informal and unstructured tweets, and dynamic location entities. To address these challenges, we developed a Dynamic Named Entity Recognition model that can differentiate between the two types of location entities in Twitter data. We investigated three annotation approaches and setup an annotation guideline based on strict grammatical rules on a decision tree. Using a custom trained model, we validated the effectiveness of our model on validation dataset representative of the most prominent Twitter sources. Although to a marginal extent, our results show effectiveness of the model to distinguish between the in-situ and remote locations
Forecasting a worst case flooding scenario from 2017 Hurricane Maria
Mater Thesis, ITC University of Twente 2018
Freshwater flooding as a result of Tropical Cyclones (TC) stands as one of the most destructive and complicated disasters to prepare against. Unlike seasonal rainfall events, which follow similar structural rainfall patterns, TC rainfall is particularly erratic in occurrence. By knowing the exact magnitude and intensity of a TC, countries can plan accordingly to reduce mortality rates and property damage. In this project, I analysed the trajectory and total rainfall amounts of the 2017 Atlantic basin TC Maria. TC Maria was, as of date, the most destructive TC within the Caribbean Islands. While working hand in hand with NASA, I modelled what would have been considered a worst-case flooding scenario for Dominica (one of the impacted Caribbean islands). The worst-case scenario was simulated using the highest-magnitude and highest-intensity observed cells. The aim of the project was to raise awareness in Dominica and the Caribbean Islands at large on probable rainfall amounts simulated from an actual TC.
Comparative analysis of Geocoders
Serere, H. N., Kanilmaz, U. N., Ketineni, S., & Resch, B. (2023). A Comparative Study of Geocoder Performance on Unstructured Tweet Locations.
Geocoding is a process of converting human-readable addresses into latitude and longitude points. Whilst most geocoders tend to perform well on structured addresses, their performance drops significantly in the presence of unstructured addresses, such as locations written in informal language. In this paper, we make an extensive comparison of geocoder performance on unstructured location mentions within tweets. Using nine geocoders and a worldwide English-language Twitter dataset, we compare the geocoders’ recall, precision, consensus and bias values. As in previous similar studies, Google Maps showed the highest overall performance. However, with the exception of Google Maps, we found that geocoders which use open data have higher performance than those which do not. The open-data geocoders showed the least per-continent bias and the highest consensus with Google Maps. These results suggest the possibility of improving geocoder performance on unstructured locations by extending or enhancing the quality of openly available datasets.
Food and water security
Spatial Analysis of the fencing situation in Masaai Mara: A case of Pardamat conservancy
February 2019 - May 2019
Food and water security is one of the biggest problems being faced by societies to date. In this project, we accessed the food and water security issues within the Greater Mara region. The Greater Mara also known as the Maasai Mara by reference to its people, is located in the southwest part of Kenya, bordering the Serengeti National Park of Tanzania. The region consists mainly of wildlife and wildlife conservatories, livestock farmers, small and large-scale crop farmers, tourist groups, local women, etc. With the decrease of annual rainfall and the increase in both human and livestock populations, the region is faced with a growing need to secure yields for themselves and their livestock, which has resulted in an exponential increase in fenced areas. In this project, we focused on the consequences that are resulting from this increase in fencing. In particular, we focused on the effects of fences on wildlife migration, access to water, and grazing pastures.
Climate resilient cities
Towards enhancing the infiltration capacity of the Murchison bay catchment of Kampala, Uganda
August 2018 - November 2018
Urban flooding is a serious problem across the world. Without effective flood control interventions, these floods can result in the degradation of water quality, property damage, and the potential loss of human life. For this project, we looked at flooding within Uganda’s largest and most important urban area, Kampala. The main driver of the enhanced flooding in Kampala has been the increase in urbanisation which has led to a loss of wetland vegetation and cultivated areas. Our focus on this project was to evaluate interventions that enhance the infiltration capacity of the Murchison Bay catchment, one of the cities’ largest catchments.
Spatial Electric Load Forecasting
A case of Zimbabwe Electricity Transmission and Distribution Company (ZETDC), Mutare
Bachelor Thesis - 2017
Electricity is a critical source of energy worldwide. Unexpected electrical interruptions can cause serious repercussions on electrical utilities and human life, amongst others. There is therefore a need for electric distribution companies to continuously revaluate infrastructure in line with spatial developments so as to ensure the availability and reliability of electricity for all users. In this thesis, I incorporated GIS into electrical transmission and distribution utilities in an effort to produce more accurate electrical forecasts (temporal, spatial, and magnitude of the expected load).