Publications

|

Face Time Traveller : Travel Through Ages Without Losing Identity

Authors: Purbayan Kar, Ayush Ghadiya, Vishal Chudasama, Pankaj W., and Prof. C.V. Jawahar
CVPR Findings Track

Read More ➜

|

EW-DETR: Evolving World Object Detection via Incremental Low-Rank DEtection Transformer

Authors: Munish Monga, Vishal Chudasama, Pankaj Wasnik, C.V. Jawahar
CVPR Conference | 2026

Read More ➜

|

Windowed Summary Mixing: An Efficient Fine-Tuning of Self-Supervised Learning Models for Low-Resource Speech Recognition

Authors: Aditya Menon, Kumud Tripathi, Raj Gohil, Pankaj W.
ICASSP, Barcelona, Spain | 2026

|

Can Hierarchical Cross-Modal Fusion Predict Human Perception of AI Dubbed Content

Authors: Ashwini Dasare, Nirmesh Shah, Ashish Gudmalwar, Pankaj W.
ICASSP, Barcelona, Spain | 2026

|

Listen Like a Teacher: Mitigating Whisper Hallucinations using Adaptive Layer Attention and Knowledge Distillation

Authors: Kumud Tripathi, Aditya Menon, Aman Gaurav, Raj Gohil, and Pankaj W
AAAI Conference Main Technical Track | 2026

|

Large Language Model-based Recommendation System Agents

Authors: Tommaso Carraro, Brijraj Singh, Niranjan Pedanekar
Demo paper at RecSys | 2025

|

Summarizing ‘KANITE: Kolmogorov-Arnold Networks for ITE Estimation’

Authors: Abhinav Thorat, Ravi Kolla, Niranjan Pedanekar
ECML PKDD | 2025

Read More ➜

|

Graph-Assisted Culturally Adaptable Idiomatic Translation for Indic languages

Authors: Pratik Rakesh Singh, Kritarth Prasad, Mohammadi Zaki, Pankaj.W
ACL Findings | 2025

|

In-Domain African Languages Translation Using LLMs and Multi-armed Bandits

Authors: Pratik Rakesh Singh, Kritarth Prasad, Mohammadi Zaki, Pankaj. W
ACL AfricaNLP Workshop | 2025

|

LASPA: Language Agnostic Speaker Disentanglement with Prefix-Tuned Cross-Attention

Authors: Aditya Menon*, Raj Prakash Gohil*, Kumud Tripathi, Pankaj W
INTERSPEECH | 2025

Read More ➜

|

Attention Is Not Always the Answer Optimizing Voice Activity Detection with Simple Feature Fusion

Authors: Kumud Tripathi, Chowdam Venkata Thirumala Kumar, and Pankaj.W
INTERSPEECH | 2025

Read More ➜

|

REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion

Authors: Ishan Biyani, Nirmesh Shah, Ashishkumar Gudmalwar, Pankaj.W and Prof Rajiv R. Shah
INTERSPEECH | 2025

Read More ➜

|

KANITE: Kolmogorov–Arnold Networks for ITE estimation

Authors: Eshan Mehendale, Abhinav Thorat, Ravi Kolla, Niranjan Pedanekar
ECML PKDD | 2025

Read More ➜

|

DuET: Dual Incremental Object Detection via Exemplar-Free Task Arithmetic

Authors: Munish Monga, Vishal Chudasama, Pankaj Wasnik, Prof Biplab Banerjee
ICCV | 2025

|

Dynamic Task-adaptive Meta Optimization for Cold-Start Recommendation

Authors: Tushar Prakash, Raksha Jalan, Brijraj Singh, Niranjan Pedanekar
ECAI | 2025

|

Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class Imbalance

Authors: Sanchayan Santra, Vishal Chudasama, Pankaj.W, Vineeth N. Balasubramanian
CVPR | 2025

Read More ➜

|

Faster Machine Translation Ensembling with Reinforcement Learning and Competitive Correction

Authors: Kritarth Prasad, Mohammadi Zaki, Pratik Rakesh Singh, Pankaj W
NAACL | 2025

Read More ➜

|

Enhancing Whisper's Accuracy and Speed for Indian Languages through Prompt-Tuning and Tokenization

Authors: Kumud Tripathi and Pankaj W.
ICASSP | 2025

Read More ➜

|

Ready for You When You Are Back: Content-driven Session-based Recommendation for Continuity of Experience

Authors: Brijraj Singh, Sonal Dabral, Niranjan Pedanekar
AAAI | 2025

|

Enhancing Entertainment Translation for Indian Languages using Adaptive Context, Style and LLMs

Authors: Pratik Rakesh Singh, Mohammadi Zaki, Pankaj Wasnik
AAAI | 2025

Read More ➜

|

EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion

Authors: Ashish Gudmalwar, Ishan Biyani, Nirmesh Shah, Pankaj S. Wasnik, Rajiv R. Shah
AAAI | 2025

Read More ➜

|

LLM-BRec: Personalizing Session-based Social Recommendation with LLM-BERT Fusion Framework

Authors: Raksha Jalan, Tushar Prakash and Niranjan Pedanekar

Generative Information Retrieval (Gen-IR) workshop at the SIGIR 2024 conference | July 2024

Read More ➜

|

DubWise: Video-Guided Speech Duration Control in Multimodal
LLM-based Text-to-Speech for Dubbing

Authors: Neha Sahipjohn, Ashishkumar Gudmalwar, Nirmesh Shah, Pankaj Wasnik, Rajiv Ratn Shah (IIIT Delhi)
INTERSPEECH | September 2024

Read More ➜

|

VECL-TTS: Voice Identity and Emotional Style Aware Cross-Lingual TTS

Authors: Ashishkumar Gudmalwar, Nirmesh Shah, Sai Akarsh, Pankaj Wasnik, Rajiv Ratn Shah (IIIT Delhi)
INTERSPEECH | September 2024

Read More ➜

|

Cross-Modal Fusion and Attention Mechanism for Weakly Supervised Video Anomaly Detection

Authors: Ayush Ghadiya, Purbayan Kar ,Vishal Chudasama, Pankaj Wasnik
Computer Vision and Pattern Recognition (CVPR) 7^th MULA Workshop | June 2024

Read More ➜

|

Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning

Authors: Shivam R Mhaskar, Nirmesh Shah, Mohammadi Zaki, Ashishkumar Gudmalwar, Pankaj Wasnik, Rajiv Ratn Shah (IIIT Delhi)
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL),: findings | June 2024

Read More ➜

|

Efficacy of Large Language Models in Predicting Hindi Movies' Attributes: A Comprehensive Survey and Content-Based Analysis

Authors: Prabir Mondal (IIT Patna), Siddharth Singh (IIT Patna), Kushum (IIT Patna), Sriparna Saha (IIT Patna), Jyoti Prakash Singh (IIT Patna), Brijraj Singh, Niranjan Pedanekar
WebConf 2024 (WWW) | May 2024

Read More ➜

|

Optimizing Movie Selections: A Multi-Task, Multi-Modal Framework with Strategies for Missing Modality Challenges

Authors: Subham Raj (IIT Patna), Pawan Agrawal (IIT Patna), Sriparna Saha (IIT Patna), Brijraj Singh, Niranjan Pedanekar
ACM Symposium on Applied Computing (SAC) | April 2024

Read More ➜

|

Estimation of individual causal effects in network setup for multiple treatments

Authors: Abhinav Thorat, Ravi Kolla, Niranjan Pedanekar, Naoyuki Onoe
38th Annual Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence [Graphs and Complex Structure for Learning and Reasoning (GCLR) Workshop] | February 2024

Read More ➜

|

Open-set Object Detection By Aligning Known Class Representations

Authors: Hiran Sarkar, Vishal Chudasama, Naoyuki Onoe, Pankaj Wasnik, Vineeth Balasubramanian (IIT Hyderabad)
Winter Conference on Applications of Computer Vision (WACV) | January 2024

Read More ➜

|

Efficient infusion of self-supervised representations in Automatic Speech Recognition

Authors: Darshan Prabhu, Saiganesh Mirishkar, Pankaj Wasnik
Poster presentation at the Neural Information Processing Systems (NeurIPS) 3rd Workshop | December 2023

Read More ➜

|

Enhancing Social Recommendation with Multi-View BERT Network

Authors: Tushar Prakash, Raksha Jalan, Naoyuki Onoe
IEEE International Conference on Data Mining (ICDM) | December 2023

Read More ➜

|

Fiducial Focus Augmentation for Facial Landmark Detection

Authors: Purbayan Kar, Vishal Chudasama, Naoyuki Onoe, Pankaj Wasnik, Vineeth Balasubramanian
British Machine Vision Conference (BMVC) | November 2023

Read More ➜

|

Impulsion of Movie's Content-Based Factors in Multi-Modal Movie Recommendation System

Authors: Prabir Mondal, Pulkit Kapoor, Siddharth Singh, Sriparna Saha, Naoyuki Onoe, Brijraj Singh
International Conference on Neural Information Processing (ICONIP) | November 2023

Read More ➜

|

LLM Based Generation of Item-Description for Recommendation System

Authors: Arkadeep Acharya, Brijraj Singh and Naoyuki Onoe
Recommender Systems Conference (RECSYS) | September 2023

Read More ➜

|

CR-SoRec: BERT driven Consistency Regularization for Social Recommendation

Authors: Tushar Prakash, Raksha Jalan, Brijraj Singh and Naoyuki Onoe
Recommender Systems Conference (RECSYS) | September 2023

Read More ➜

|

Iteratively Improving Speech Recognition and Voice Conversion

Authors: Mayank Kumar Singh, Naoya Takahashi, Onoe Naoyuki
INTERSPEECH 2023 | August 2023

Read More ➜

|

Cd-HRNN: Content-Driven HRNN to Improve Session-Based Recommendation System

Authors: Sonal Dabral, Brijraj Singh and Naoyuki Onoe
IJCNN Main Conference 2023 | April 2023

|

A Multi-Modal Multi-Task Based Approach for Movie Recommendation

Authors: Sriparna Saha (IIT Patna) and Naoyuki Onoe
IJCNN Main Conference 2023 | April 2023

|

A Meta-Learning Based Generative Model with Graph Attention Network for Multi-Modal Recommender Systems

Authors: Sriparna Saha (IIT Patna) and Naoyuki Onoe
INNS DLIA Workshop /IJCNN 2023 | April 2023

|

Task-Specific and Graph Convolutional Network Based Multi-Modal Movie Recommendation System in Indian Setting

Authors: Sriparna Saha (IIT Patna) and Naoyuki Onoe
INNS DLIA Workshop /IJCNN 2023 | April 2023

|

Revisiting Class Imbalance for End-to-end Semi-Supervised Object Detection

Authors: Purbayan Kar, Vishal Chudasama, Pankaj Wasnik and Naoyuki Onoe
Efficient Deep Learning for Computer Vision (ECV) Workshop in CVPR 2023 | April 2023

Read More ➜

|

Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing

Authors: Nirmesh Shah, Mayank Kumar Singh, Naoya Takahashi, Naoyuki Onoe
ICASSP, the International Conference on Acoustics, Speech, and Signal Processing | February 2023

Read More ➜

|

Hierarchical disentangled representation learning for singing voice conversion

Authors: Naoya Takahashi, Mayank Kumar Singh, Yuki Mitsufuji
ICASSP, the International Conference on Acoustics, Speech, and Signal Processing | February 2023

Read More ➜

|

Graph Network based Approaches for Multi-modal Movie Recommendation System

Authors: Daipayan Chakder (IIT Patna), Parbir Mondal (IIT Patna), Subham Raj (IIT Patna), Sriparna Saha (IIT Patna), Angshuman Gosh, Naoyuki Onoe
IEEE International Conference on System, Man, and Cybernetics (SMC 2022) | November 2022

Read More ➜

|

Semi-supervised Acoustic and Language Modeling for Hindi ASR

Authors: Tarun Sai Bandarupalli (IISc Bangalore), Shakti Rath (IISc Bangalore), Nirmesh Shah, Onoe Naoyuki, Sriram Ganapathy (IISc Bangalore)
INTERSPEECH 2022| September 2022

Read More ➜

|

Towards Developing a Multi-Modal Video Recommendation System

Authors: Sriram Pingali (IIT Patna), Prabir Mondal (IIT Patna), Daipayan Chakder (IIT Patna), Sriparna Saha (IIT Patna), Angshuman Ghosh
International Joint Conference on Neural Networks (IJCNN 2022)| September 2022

Read More ➜

|

Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

Authors: Shrutina Agarwal (IISc Bangalore), Sriram Ganapathy (IISc Bangalore), Naoya Takahashi
INTERSPEECH 2022| September 2022

Read More ➜

|

M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation

Authors: Vishal Chudasama, Purbayan Kar, Ashish Gudmalwar, Nirmesh Shah, Pankaj Wasnik, Naoyuki Onoe
Conference on Computer Vision and Pattern Recognition (CVPR 2022)| June 2022

Read More ➜

|

A Unified Model for Fingerprint Authentication and Presentation Attack Detection

Authors: Additya Popli (IIIT Hyderabad), Saraansh Tandon (IIIT Hyderabad), Joshua J. Engelsma (Michigan State University), Naoyuki Onoe, Atsushi Okubo, Anoop Namboodiri (IIIT Hyderabad)
International Conference on Acoustics, Speech, and Signal Processing (IJCB 2021)| April 2021

Read More ➜

|

End-to-end lyrics Recognition with Voice to Singing Style Transfer

Authors: Sakya Basak (IISc Bangalore), Shrutina Agarwal (IISc Bangalore), Sriram Ganapathy (IISc Bangalore), Naoya Takahashi
International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)| February 2021

Read More ➜

***International Institute of Information Technology Hyderabad **Indian Institute of Technology Patna *Indian Institute of Science, Bangalore #Michigan State University

Publications

|

Face Time Traveller : Travel Through Ages Without Losing Identity

|

EW-DETR: Evolving World Object Detection via Incremental Low-Rank DEtection Transformer

|

Windowed Summary Mixing: An Efficient Fine-Tuning of Self-Supervised Learning Models for Low-Resource Speech Recognition

|

Can Hierarchical Cross-Modal Fusion Predict Human Perception of AI Dubbed Content

|

Listen Like a Teacher: Mitigating Whisper Hallucinations using Adaptive Layer Attention and Knowledge Distillation

|

Large Language Model-based Recommendation System Agents

|

Summarizing ‘KANITE: Kolmogorov-Arnold Networks for ITE Estimation’

|

Graph-Assisted Culturally Adaptable Idiomatic Translation for Indic languages

|

In-Domain African Languages Translation Using LLMs and Multi-armed Bandits

|

LASPA: Language Agnostic Speaker Disentanglement with Prefix-Tuned Cross-Attention

|

Attention Is Not Always the Answer Optimizing Voice Activity Detection with Simple Feature Fusion

|

REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion

|

KANITE: Kolmogorov–Arnold Networks for ITE estimation

|

DuET: Dual Incremental Object Detection via Exemplar-Free Task Arithmetic

|

Dynamic Task-adaptive Meta Optimization for Cold-Start Recommendation

|

Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class Imbalance

|

Faster Machine Translation Ensembling with Reinforcement Learning and Competitive Correction

|

Enhancing Whisper's Accuracy and Speed for Indian Languages through Prompt-Tuning and Tokenization

|

Ready for You When You Are Back: Content-driven Session-based Recommendation for Continuity of Experience

|

Enhancing Entertainment Translation for Indian Languages using Adaptive Context, Style and LLMs

|

EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion

|

LLM-BRec: Personalizing Session-based Social Recommendation with LLM-BERT Fusion Framework

|

DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing

|

VECL-TTS: Voice Identity and Emotional Style Aware Cross-Lingual TTS

|

Cross-Modal Fusion and Attention Mechanism for Weakly Supervised Video Anomaly Detection

|

Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning

|

Efficacy of Large Language Models in Predicting Hindi Movies' Attributes: A Comprehensive Survey and Content-Based Analysis

|

Optimizing Movie Selections: A Multi-Task, Multi-Modal Framework with Strategies for Missing Modality Challenges

|

Estimation of individual causal effects in network setup for multiple treatments

|

Open-set Object Detection By Aligning Known Class Representations

|

Efficient infusion of self-supervised representations in Automatic Speech Recognition

|

Enhancing Social Recommendation with Multi-View BERT Network

|

Fiducial Focus Augmentation for Facial Landmark Detection

|

Impulsion of Movie's Content-Based Factors in Multi-Modal Movie Recommendation System

|

LLM Based Generation of Item-Description for Recommendation System

|

CR-SoRec: BERT driven Consistency Regularization for Social Recommendation

|

Iteratively Improving Speech Recognition and Voice Conversion

|

Cd-HRNN: Content-Driven HRNN to Improve Session-Based Recommendation System

|

A Multi-Modal Multi-Task Based Approach for Movie Recommendation

|

DubWise: Video-Guided Speech Duration Control in Multimodal
LLM-based Text-to-Speech for Dubbing