|
|
Sparse Image and Dual-IMU Localization for AR Glasses
Rutika Moharir
,
Dishani Lahiri
[poster]
[code]
Developing a multimodal fusion transformer-based architecture for visual-inertial odometry using inputs from IMU sensors combined with sparse camera images to improve state estimation accuracy.
|
|
|
Hand Puppets: 3D Hand Pose Prediction from Shadow Puppet Images
[poster]
[presentation]
[pdf]
[code]
Developed a ResNet based network to regress MANO model parameters for 3D two hand pose estimation from a single shadow puppet image.Used optimization to model sequence of hand pose transformations from rest pose to the final pose. Additionally, implemented geometric based losses to regularize the hand pose estimation model.monocular depth estimation.
|
|
|
RoboNotes: Reinforcement Learning for Music
Composition
[pdf]
[YouTube]
[code]
In this work, we explore PPO, DQN, and CEM and compare these methods with the performance of a random agent for solving the problem of music composition. We limit our problem space by only considering two octaves of notes on a single track of composition, for a fixed length. The reward function is hand-crafted from a set of common music theory rules.
|
|
|
TeLCoS: On-Device Text Localization with Clustering of Script
2021 International Joint Conference on Neural Networks (IJCNN) Shenzhen, China, 2021
[pdf]
We propose a multi-task dual branch lightweight CNN network that performs real-time on device Text Localization and High-level Script Clustering simultaneously. We also introduce a novel structural similarity based channel pruning mechanism to build an efficient network with only 1.15M parameters.
|
|
|
On-Device Spatial Atention based Sequence Learning Approach for SceneText Script Identification
6th IAPR International Conference on Computer Vision & Image Processing, 2021
[pdf]
We propose a CNN, equipped with a spatial attention module which helps reduce the spatial distortions present in natural images which in turn allows the feature extractor to generate rich image representations while ignoring the deformities. The CNN learns the text feature representation by identifying each character as belonging to a particular script and the long term spatial dependencies within the text are captured using the sequence learning capabilities of the LSTM layers.
|
This page has been accessed at least
times since 16th Sept. 2023.
|