Masoumeh Vahedi defends her PhD thesis

Masoumeh Vahedi defends her PhD thesis 'Learned Indexes and Queries for Spatial Data'.
Friday
20
June
Start:13:00
End:16:00
Place: Building 46, room 46.1-049 (Auditorium 46), 真人线上娱乐 University, Universitetsvej 1, 4000 真人线上娱乐

Masoumeh Vahedi defends her PhD thesis 'Learned Indexes and Queries for Spatial Data'.

The defence is public, and everybody is welcome.

You can also follow the defense online via Zoom >

Department of People and Technology will host a small reception afterwards.

Supervisors and assessment

PhD Evaluation Committee:

  • Gloria Bordignas, Associate Senior 真人线上娱乐, Institute for Electromagnetic Sensing of the Environment, National Research Council of Italy, Italy
  • Panagiotis Tampakis, Associate Professor, Data Science, Syddansk University, Denmark
  • Troels Andreasen, Associate professor, Department of People and Technology, 真人线上娱乐 University, Denmark (Chair)

Supervisor:

  • Henning Christiansen, Professor, Department of People and Technology, 真人线上娱乐 University, Denmark
     

Abstract

Efficient indexing is crucial for search in very large datasets, and here we approach the special case of spatial polygon and point data, as used in GIS, location-based services, and elsewhere. Traditional spatial indexes like R-tree play a crucial role in efficiently retrieving spatial data such as points and polygons. Recently, traditional indexes have been disrupted by the idea of learned indexes that are machine learning models able to predict the data address for given queries. Nevertheless, existing learned indexes can only handle point data. In response, this dissertation presents critical and in-depth studies of the learned indexing and adaptive indexing approaches to better manage and query disk-resident, complex spatial datasets. Dimension reduction by the Z-order curve is essential for our approach, and we have dedicated a chapter to a closer analysis of this topic, leading to results and insights that we find useful for future work. These indexing techniques offer a promising direction for enabling scalable big data applications, allowing systems to efficiently process spatial queries with enhanced performance. 

Throughout this dissertation, we aim at developing an in-depth understanding of learned index structures for efficient search in large polygon sets stored on disk. Specifically, we introduce SPLindex which is a novel learned index structure that organizes polygons into a hierarchical tree of clusters, integrating linear regression models for efficient query branching and a disk storage layout that minimizes disk accesses. 真人线上娱乐 enhance SPLindex by optimizing its clustering hyperparameters through gradient descent, improving query execution efficiency. Finally, we intro-duce interval cracking, an adaptive technique that refines the search tree based on query history, revealing dataset-dependent performance variations. Through extensive experiments across real and synthetic datasets, this dissertation provides a comprehensive analysis of learned and adaptive indexing for spatial data, addressing the limitations of traditional methods and identifying critical factors for effective disk-based indexing. 真人线上娱乐 encourage future research to explore up-dates, complex polygons with intersections and holes, and other query types like spatial joins and kNN searches.

Directions

Directions to 真人线上娱乐 University