HyperPose: Hypernetwork-Infused Camera Pose Localization and an Extended Cambridge Landmarks Dataset
We advocate for incorporating hypernetworks into single-scene and multiscene camera pose regression models.
I am an algorithm engineer specializing in machine-learning and deep-learning applications for computer vision and image processing. My work spans 2D and 3D domains, addressing various needs like hand and body tracking, expression recognition, large-scale visual localization, and eyes and gaze tracking. My academic focus revolves around absolute camera pose estimation, where I explore end-to-end learning methods. I am passionate about leveraging algorithms to solve real-world problems and contribute to the evolving landscape of computer vision.
We advocate for incorporating hypernetworks into single-scene and multiscene camera pose regression models.
We extend our previous MSTransformer approach by introducing a mixed classification-regression architecture that improves the localization accuracy.
We propose to learn multi-scene absolute camera pose regression with Transformers, where encoders are used to aggregate activation maps with self-attention and decoders transform latent features and scenes encoding into candidate pose predictions.
We propose an attention-based approach for pose regression, where the convolutional activation maps are used as sequential inputs.
We propose that scene-specific pose encoders are not required for pose regression and that encodings trained for visual similarity can be used instead.
We review deep learning approaches for camera pose estimation. We describe key methods in the field and identify trends aiming at improving the original deep pose regression solution. We further provide an extensive cross-comparison of existing learning-based pose estimators, together with practical notes on their execution for reproducibility purposes. Finally, we discuss emerging solutions and potential future research directions.