Abstract:
Three-dimensional point cloud datasets are becoming ubiquitous due to the availability of consumer-grade 3D sensors such as Light Detection and Ranging (LIDAR), and RGB-D cameras. Recent advancements in 3D deep learning has dramatically improved the ability to recognize physical objects and interpret the indoor and outdoor scenes using point clouds acquired through different sensors. This thesis focuses on deep learning based techniques for point cloud processing. We propose novel architectures leveraging graph attention networks for point cloud-based object detection, classification, and segmentation. The proposed architectures work on point cloud scans directly by constructing a connected graph. For point cloud detection, we use the concatenation of relative geometric difference and feature difference between each pair of neighbouring points in the graph. To improve the performance of object detection, we introduce a distance-aware down-sampling scheme for object detection space. For point cloud segmentation and classification, we employ a global aware attention module using global, local, and self feature information. The experiments on different datasets (KITTI, ShapeNet, ModelNet, and Semantic3D) show that our methods yield comparable results for object detection, part segmentation, semantic segmentation, and classification.