Parallel Trajectory Joins in Spatial Networks
The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider two cases of trajectory similarity joins (TS-Joins), including a threshold-based join (Tb-TS-Join) and a top-$k$ TS-Join ($k$-TS-Join), where the objects are trajectories of vehicles moving in road networks. Given two sets of trajectories and a threshold $\theta$, the Tb-TS-Join returns all pairs of trajectories from the two sets with similarity above $\theta$. In contrast, the $k$-TS-Join does not take a threshold as a parameter, and it returns the top-$k$ most similar trajectory pairs from the two sets. The TS-Joins target diverse applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction.
With these applications in mind, we provide purposeful definitions of similarity. To enable efficient processing of the TS-Joins on large sets of trajectories, we develop search space pruning techniques and enable use of the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer search framework that lays the foundation for the algorithms for the Tb-TS-Join and the $k$-TS-Join that rely on different pruning techniques to achieve efficiency. For each trajectory, the algorithms first find similar trajectories. Then they merge the results to obtain the final result. The algorithms for the two joins exploit different upper and lower bounds on the spatiotemporal trajectory similarity and different heuristic scheduling strategies for search space pruning. Their per-trajectory searches are independent of each other and can be performed in parallel, and the mergings have constant cost. An empirical study with real data offers insight in the performance of the algorithms and demonstrates that they are capable of outperforming well-designed baseline algorithms by an order of magnitude.
商烁博士现任阿联酋起源人工智能研究院资深研究员，沙特阿卜杜拉国王科技大学研究科学家，中国计算机学会数据库专委会委员、中国GIS协会理论与方法委员会委员。他在2008年本科毕业于北京大学、2012年博士毕业于澳大利亚昆士兰大学。他的研究主要集中在时空大数据管理与分析、智能交通、城市计算等方向。他曾担任《Geoinformatica》、《World Wide Web》的客座编委，担任数据库三大会议SIGMOD、VLDB、ICDE以及AAAI、CIKM等会议的程序委员会委员。发表论文60余篇，其中CCF A类会议26篇，谷歌学术引用1000余次。