Publications

Publications

|

Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class Imbalance

Authors: Sanchayan Santra, Vishal Chudasama, Pankaj.W, Vineeth N. Balasubramanian
CVPR | 2025

|

Faster Machine Translation Ensembling with Reinforcement Learning and Competitive Correction

Authors: Kritarth Prasad, Mohammadi Zaki, Pratik Rakesh Singh, Pankaj W
NAACL | 2025

|

Enhancing Whisper's Accuracy and Speed for Indian Languages through Prompt-Tuning and Tokenization

Authors: Kumud Tripathi and Pankaj W.
ICASSP | 2025

|

Ready for You When You Are Back: Content-driven Session-based Recommendation for Continuity of Experience

Authors: Brijraj Singh, Sonal Dabral, Niranjan Pedanekar
AAAI | 2025

|

Enhancing Entertainment Translation for Indian Languages using Adaptive Context, Style and LLMs

Authors: Pratik Rakesh Singh, Mohammadi Zaki, Pankaj Wasnik
AAAI | 2025

|

EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion

Authors: Ashish Gudmalwar, Ishan Biyani, Nirmesh Shah, Pankaj S. Wasnik, Rajiv R. Shah
AAAI | 2025

|

LLM-BRec: Personalizing Session-based Social Recommendation with LLM-BERT Fusion Framework

Authors: Raksha Jalan, Tushar Prakash and Niranjan Pedanekar

Generative Information Retrieval (Gen-IR) workshop at the SIGIR 2024 conference | July 2024

Read More

|

DubWise: Video-Guided Speech Duration Control in Multimodal
LLM-based Text-to-Speech for Dubbing

Authors: Neha Sahipjohn, Ashishkumar Gudmalwar, Nirmesh Shah, Pankaj Wasnik, Rajiv Ratn Shah (IIIT Delhi)
INTERSPEECH | September 2024

Read More

|

VECL-TTS: Voice Identity and Emotional Style Aware Cross-Lingual TTS

Authors: Ashishkumar Gudmalwar, Nirmesh Shah, Sai Akarsh, Pankaj Wasnik, Rajiv Ratn Shah (IIIT Delhi)
INTERSPEECH | September 2024

Read More

|

Cross-Modal Fusion and Attention Mechanism for Weakly Supervised Video Anomaly Detection

Authors: Ayush Ghadiya, Purbayan Kar ,Vishal Chudasama, Pankaj Wasnik
Computer Vision and Pattern Recognition (CVPR) 7th MULA Workshop | June 2024

Read More

|

Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning

Authors: Shivam R Mhaskar, Nirmesh Shah, Mohammadi Zaki, Ashishkumar Gudmalwar, Pankaj Wasnik, Rajiv Ratn Shah (IIIT Delhi)
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL),: findings | June 2024

Read More

|

Efficacy of Large Language Models in Predicting Hindi Movies' Attributes: A Comprehensive Survey and Content-Based Analysis

Authors: Prabir Mondal (IIT Patna), Siddharth Singh (IIT Patna), Kushum (IIT Patna), Sriparna Saha (IIT Patna), Jyoti Prakash Singh (IIT Patna), Brijraj Singh, Niranjan Pedanekar
WebConf 2024 (WWW) | May 2024

Read More

|

Optimizing Movie Selections: A Multi-Task, Multi-Modal Framework with Strategies for Missing Modality Challenges

Authors: Subham Raj (IIT Patna), Pawan Agrawal (IIT Patna), Sriparna Saha (IIT Patna), Brijraj Singh, Niranjan Pedanekar
ACM Symposium on Applied Computing (SAC) | April 2024

Read More

|

Estimation of individual causal effects in network setup for multiple treatments

Authors: Abhinav Thorat, Ravi Kolla, Niranjan Pedanekar, Naoyuki Onoe
38th Annual Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence [Graphs and Complex Structure for Learning and Reasoning (GCLR) Workshop] | February 2024

Read More

|

Open-set Object Detection By Aligning Known Class Representations

Authors: Hiran Sarkar, Vishal Chudasama, Naoyuki Onoe, Pankaj Wasnik, Vineeth Balasubramanian (IIT Hyderabad)
Winter Conference on Applications of Computer Vision (WACV) | January 2024

Read More

|

Efficient infusion of self-supervised representations in Automatic Speech Recognition

Authors: Darshan Prabhu, Saiganesh Mirishkar, Pankaj Wasnik
Poster presentation at the Neural Information Processing Systems (NeurIPS) 3rd Workshop | December 2023

Read More

|

Enhancing Social Recommendation with Multi-View BERT Network

Authors: Tushar Prakash, Raksha Jalan, Naoyuki Onoe
IEEE International Conference on Data Mining (ICDM) | December 2023

Read More

|

Fiducial Focus Augmentation for Facial Landmark Detection

Authors: Purbayan Kar, Vishal Chudasama, Naoyuki Onoe, Pankaj Wasnik, Vineeth Balasubramanian
British Machine Vision Conference (BMVC) | November 2023

Read More

|

Impulsion of Movie's Content-Based Factors in Multi-Modal Movie Recommendation System

Authors: Prabir Mondal, Pulkit Kapoor, Siddharth Singh, Sriparna Saha, Naoyuki Onoe, Brijraj Singh
International Conference on Neural Information Processing (ICONIP) | November 2023

Read More

|

LLM Based Generation of Item-Description for Recommendation System

Authors: Arkadeep Acharya, Brijraj Singh and Naoyuki Onoe
Recommender Systems Conference (RECSYS) | September 2023

Read More

|

CR-SoRec: BERT driven Consistency Regularization for Social Recommendation

Authors: Tushar Prakash, Raksha Jalan, Brijraj Singh and Naoyuki Onoe
Recommender Systems Conference (RECSYS) | September 2023

Read More

|

Iteratively Improving Speech Recognition and Voice Conversion

Authors: Mayank Kumar Singh, Naoya Takahashi, Onoe Naoyuki
INTERSPEECH 2023 | August 2023

Read More

|

Cd-HRNN: Content-Driven HRNN to Improve Session-Based Recommendation System

Authors: Sonal Dabral, Brijraj Singh and Naoyuki Onoe
IJCNN Main Conference 2023 | April 2023

|

A Multi-Modal Multi-Task Based Approach for Movie Recommendation

Authors: Sriparna Saha (IIT Patna) and Naoyuki Onoe
IJCNN Main Conference 2023 | April 2023

|

A Meta-Learning Based Generative Model with Graph Attention Network for Multi-Modal Recommender Systems

Authors: Sriparna Saha (IIT Patna) and Naoyuki Onoe
INNS DLIA Workshop /IJCNN 2023 | April 2023

|

Task-Specific and Graph Convolutional Network Based Multi-Modal Movie Recommendation System in Indian Setting

Authors: Sriparna Saha (IIT Patna) and Naoyuki Onoe
INNS DLIA Workshop /IJCNN 2023 | April 2023

|

Revisiting Class Imbalance for End-to-end Semi-Supervised Object Detection

Authors: Purbayan Kar, Vishal Chudasama, Pankaj Wasnik and Naoyuki Onoe
Efficient Deep Learning for Computer Vision (ECV) Workshop in CVPR 2023 | April 2023

Read More

|

Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing

Authors: Nirmesh Shah, Mayank Kumar Singh, Naoya Takahashi, Naoyuki Onoe
ICASSP, the International Conference on Acoustics, Speech, and Signal Processing | February 2023

Read More

|

Hierarchical disentangled representation learning for singing voice conversion

Authors: Naoya Takahashi, Mayank Kumar Singh, Yuki Mitsufuji
ICASSP, the International Conference on Acoustics, Speech, and Signal Processing | February 2023

Read More

|

Graph Network based Approaches for Multi-modal Movie Recommendation System

Authors: Daipayan Chakder (IIT Patna), Parbir Mondal (IIT Patna), Subham Raj (IIT Patna), Sriparna Saha (IIT Patna), Angshuman Gosh, Naoyuki Onoe
IEEE International Conference on System, Man, and Cybernetics (SMC 2022) | November 2022
Read More ➜

|

Semi-supervised Acoustic and Language Modeling for Hindi ASR

Authors: Tarun Sai Bandarupalli (IISc Bangalore), Shakti Rath (IISc Bangalore), Nirmesh Shah, Onoe Naoyuki, Sriram Ganapathy (IISc Bangalore)
INTERSPEECH 2022| September 2022

Read More

|

Towards Developing a Multi-Modal Video Recommendation System

Authors: Sriram Pingali (IIT Patna), Prabir Mondal (IIT Patna), Daipayan Chakder (IIT Patna), Sriparna Saha (IIT Patna), Angshuman Ghosh
International Joint Conference on Neural Networks (IJCNN 2022)| September 2022

Read More

|

Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

Authors: Shrutina Agarwal (IISc Bangalore), Sriram Ganapathy (IISc Bangalore), Naoya Takahashi
INTERSPEECH 2022| September 2022

Read More

|

M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation

Authors: Vishal Chudasama, Purbayan Kar, Ashish Gudmalwar, Nirmesh Shah, Pankaj Wasnik, Naoyuki Onoe
Conference on Computer Vision and Pattern Recognition (CVPR 2022)| June 2022

Read More

|

A Unified Model for Fingerprint Authentication and Presentation Attack Detection

Authors: Additya Popli (IIIT Hyderabad), Saraansh Tandon (IIIT Hyderabad), Joshua J. Engelsma (Michigan State University), Naoyuki Onoe, Atsushi Okubo, Anoop Namboodiri (IIIT Hyderabad)
International Conference on Acoustics, Speech, and Signal Processing (IJCB 2021)| April 2021

Read More

|

End-to-end lyrics Recognition with Voice to Singing Style Transfer

Authors: Sakya Basak (IISc Bangalore), Shrutina Agarwal (IISc Bangalore), Sriram Ganapathy (IISc Bangalore), Naoya Takahashi
International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)| February 2021

Read More

***International Institute of Information Technology Hyderabad **Indian Institute of Technology Patna *Indian Institute of Science, Bangalore #Michigan State University

Skip to content