{"id":13142,"date":"2025-05-12T11:16:44","date_gmt":"2025-05-12T11:16:44","guid":{"rendered":"https:\/\/whiteriversmediasolutions.com\/Sony\/ready-for-you-when-you-are-back-content-driven-session-based-recommendation-for-continuity-of-experience-copy\/"},"modified":"2025-05-13T04:48:08","modified_gmt":"2025-05-13T04:48:08","slug":"summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion","status":"publish","type":"post","link":"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/","title":{"rendered":"Summarizing \u2018EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion\u2019"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"13142\" class=\"elementor elementor-13142\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-cd44eb5 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"cd44eb5\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-9f11b70\" data-id=\"9f11b70\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-215a70e elementor-widget elementor-widget-heading\" data-id=\"215a70e\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">BLOGS<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-28dc161 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"28dc161\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-63cf269\" data-id=\"63cf269\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-6837436 elementor-widget elementor-widget-heading\" data-id=\"6837436\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Summarizing \u2018EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion\u2019<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9bd1630 elementor-widget elementor-widget-text-editor\" data-id=\"9bd1630\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAshishkumar Gudmalwar, Ishan D. Biyani, Nirmesh Shah, Pankaj W, Rajiv Ratn Shah\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7a034cb elementor-hidden-desktop elementor-hidden-tablet elementor-hidden-mobile elementor-widget elementor-widget-text-editor\" data-id=\"7a034cb\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>30<sup>th<\/sup> September 2024<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a7d1e72 elementor-hidden-desktop elementor-hidden-tablet elementor-hidden-mobile elementor-widget elementor-widget-image\" data-id=\"a7d1e72\" data-element_type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"380\" height=\"190\" src=\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/Blog-Cover-image.png\" class=\"attachment-medium_large size-medium_large wp-image-13108\" alt=\"\" srcset=\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/Blog-Cover-image.png 380w, https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/Blog-Cover-image-300x150.png 300w\" sizes=\"(max-width: 380px) 100vw, 380px\" style=\"width:100%;height:50%;max-width:380px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9202657 elementor-widget elementor-widget-text-editor\" data-id=\"9202657\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Ashish Gudmalwar summarises paper titled <a href=\"https:\/\/arxiv.org\/pdf\/2412.20359\">EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion<\/a> co-authored by Ashishkumar Gudmalwar, Ishan Biyani, Nirmesh Shah, Pankaj. W and Rajiv Ratn Shah accepted in Main Track at the <a href=\"https:\/\/aaai.org\/conference\/aaai\/aaai-25\/\">The 39th Annual AAAI Conference on Artificial Intelligence AAAI 2025 | Feb-March 2025<\/a><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f0a3e28 elementor-widget elementor-widget-text-editor\" data-id=\"f0a3e28\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<h4><strong>Introduction: <\/strong><\/h4>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d95d9a3 elementor-widget elementor-widget-text-editor\" data-id=\"d95d9a3\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Despite significant progress in the field of Generative AI, speech synthesis models still encounter several challenges when it comes to the AI-based dubbing of entertainment content such as movies and serials. AI-based dubbing involves replicating input speech emotion and controlling its intensity depending on the context and emotion of the scene. Most of today\u2019s text-to-speech (TTS) systems can produce high-quality, high-fidelity, natural speech output, but they still lack expressiveness and fine control over emotional states. The Emotional Voice Conversion (EVC) aims to convert the discrete emotional state from the source emotion to the target for a given speech utterance while preserving linguistic content. In this paper, we propose regularizing emotion intensity in the diffusion-based EVC framework to generate precise speech of the target emotion. Traditional approaches control the intensity of an emotional state in the utterance via emotion class probabilities or intensity labels that often lead to inept style manipulations and degradations in quality.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1246231 elementor-widget elementor-widget-text-editor\" data-id=\"1246231\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><strong>To address these issues, we introduce following components<\/strong><\/p><ul><li><strong>Direction Latent Vector Modelling (DVM):<\/strong>\u00a0 We propose a novel DVM for obtaining fine control over intensity while transitioning across different emotional states.<\/li><li><strong>SSL Framework:<\/strong> The proposed EmoReg utilizes the SSL-based audio feature representations, which are obtained after finetuning the SSL-based framework for a downstream task related to emotions classification.<\/li><li><strong>Diffusion-based VC:<\/strong> These emotion embeddings can be modified based on the given target emotion intensity and the corresponding direction vector. Finally, the updated embeddings can be fused in the reverse diffusion process to generate the speech with the desired emotion and intensity value.<\/li><\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a880aa7 elementor-widget elementor-widget-image\" data-id=\"a880aa7\" data-element_type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"750\" height=\"388\" data-src=\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/image1-768x397.png\" class=\"attachment-medium_large size-medium_large wp-image-13145 lazyload\" alt=\"\" data-srcset=\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/image1-768x397.png 768w, https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/image1-300x155.png 300w, https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/image1.png 940w\" data-sizes=\"(max-width: 750px) 100vw, 750px\" style=\"--smush-placeholder-width: 750px; --smush-placeholder-aspect-ratio: 750\/388;width:100%;height:51.7%;max-width:940px\" src=\"data:image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-580c6ea elementor-widget elementor-widget-text-editor\" data-id=\"580c6ea\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p style=\"text-align: center;\">Fig. 1: Block diagram of the proposed DVM-based Emotion Intensity Regularized EVC architecture. Dotted arrows represent operations performed only during training. Also, GT \u0304X are derived by replacing each phoneme Mel-spectrogram feature in the input with its corresponding pre-calculated average feature.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-424bb7e elementor-widget elementor-widget-text-editor\" data-id=\"424bb7e\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<h4><strong>Results:<\/strong><\/h4><p>Demo samples are available on our <a href=\"https:\/\/nirmesh-sony.github.io\/EmoReg\/\">Demo Page<\/a>.<\/p><p>The emotion similarity score for emotion voice conversion is calculated for Neutral-to-Angry, Neutral-to-Sad, and Neutral-to-Happy emotion conversion scenarios for the proposed approach and baseline methods. Table 1 illustrates the comprehensive evaluation of the proposed EmoReg with DVM outperforms all the SOTA approaches in the EVC task.<\/p><p><strong>Table 1: <\/strong>Analysis of emotion similarity scores along with margin of error corresponding to the 95% CI.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ed2de4b elementor-widget elementor-widget-text-editor\" data-id=\"ed2de4b\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<table width=\"576\"><tbody><tr><td width=\"122\"><strong>Methods<\/strong><\/td><td width=\"95\"><strong>Neu-Ang \u2191<\/strong><\/td><td width=\"104\"><strong>Neu-Sad \u2191<\/strong><\/td><td width=\"123\"><strong>Neu-Hap \u2191<\/strong><\/td><td width=\"132\"><strong>Average \u2191<\/strong><\/td><\/tr><tr><td width=\"122\"><strong>Emovox<\/strong><\/td><td width=\"95\">0.94 \u00b1 0.004<\/td><td width=\"104\">0.94 \u00b1 0.004<\/td><td width=\"123\">0.95 \u00b1 0.004<\/td><td width=\"132\">0.94 \u00b1 0.004<\/td><\/tr><tr><td width=\"122\"><strong>Mixed Emotion<\/strong><\/td><td width=\"95\">0.94 \u00b1 0.004<\/td><td width=\"104\">0.92 \u00b1 0.004<\/td><td width=\"123\">0.90 \u00b1 0.004<\/td><td width=\"132\">0.92 \u00b1 0.004<\/td><\/tr><tr><td width=\"122\"><strong>CycleGAN-EVC<\/strong><\/td><td width=\"95\">0.96 \u00b1 0.004<\/td><td width=\"104\">0.92 \u00b1 0.004<\/td><td width=\"123\">0.91 \u00b1 0.004<\/td><td width=\"132\">0.93 \u00b1 0.004<\/td><\/tr><tr><td width=\"122\"><strong>StarGAN-EVC<\/strong><\/td><td width=\"95\">0.95 \u00b1 0.004<\/td><td width=\"104\">0.91 \u00b1 0.004<\/td><td width=\"123\">0.91 \u00b1 0.004<\/td><td width=\"132\">0.93 \u00b1 0.004<\/td><\/tr><tr><td width=\"122\"><strong>Seq2Seq-EVC<\/strong><\/td><td width=\"95\">0.96 \u00b1 0.004<\/td><td width=\"104\">0.93 \u00b1 0.004<\/td><td width=\"123\">0.87 \u00b1 0.004<\/td><td width=\"132\">0.92 \u00b1 0.004<\/td><\/tr><tr><td width=\"122\"><strong>StyleVC<\/strong><\/td><td width=\"95\">0.96 \u00b1 0.004<\/td><td width=\"104\">0.92 \u00b1 0.004<\/td><td width=\"123\">0.91 \u00b1 0.004<\/td><td width=\"132\">0.93 \u00b1 0.004<\/td><\/tr><tr><td width=\"122\"><strong>DISSC<\/strong><\/td><td width=\"95\">0.88 \u00b1 0.004<\/td><td width=\"104\">0.91 \u00b1 0.004<\/td><td width=\"123\">0.87 \u00b1 0.004<\/td><td width=\"132\">0.89 \u00b1 0.004<\/td><\/tr><tr><td width=\"122\"><strong>Ablation<\/strong><\/td><td width=\"95\">0.96 \u00b1 0.004<\/td><td width=\"104\">0.93 \u00b1 0.004<\/td><td width=\"123\">0.95 \u00b1 0.004<\/td><td width=\"132\">0.94 \u00b1 0.004<\/td><\/tr><tr><td width=\"122\"><strong>Proposed<\/strong><\/td><td width=\"95\"><strong>0.97 \u00b1 0.003<\/strong><\/td><td width=\"104\"><strong>0.96 \u00b1 0.003<\/strong><\/td><td width=\"123\"><strong>0.95 \u00b1 0.003<\/strong><\/td><td width=\"132\"><strong>0.96 \u00b1 0.003<\/strong><\/td><\/tr><\/tbody><\/table>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5466e0a elementor-widget elementor-widget-text-editor\" data-id=\"5466e0a\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The effectiveness of the proposed approach is evaluated across different databases using similar objective and subjective assessments for both English and Hindi languages. Table 2 shows the emotion similarity scores for Neutral-to-Angry, Neutral-to-Sad, and Neutral-to-Happy emotion voice conversion for ablation and the proposed EmoReg approach for both languages. It is evident from Table 2 that the proposed approach also performs well for the Hindi language.<br \/><br \/><\/p><p><strong>Table 2<\/strong>: Emotion Similarity scores across languages along with 95 % confidence interval.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8140b59 elementor-widget elementor-widget-text-editor\" data-id=\"8140b59\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<table width=\"671\">\n<tbody>\n<tr>\n<td><strong>Methods \u2191<\/strong><\/td>\n<td><strong>Neu-Ang \u2191<\/strong><\/td>\n<td><strong>Neu-Sad \u2191<\/strong><\/td>\n<td><strong>Neu-Hap \u2191<\/strong><\/td>\n<td><strong>Avg \u2191<\/strong><\/td>\n<\/tr>\n<tr>\n<td colspan=\"5\" style=\"text-align:center;\"><strong>English<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Ablation<\/strong><\/td>\n<td>0.96 \u00b1 0.004<\/td>\n<td>0.93 \u00b1 0.004<\/td>\n<td>0.95 \u00b1 0.004<\/td>\n<td>0.94 \u00b1 0.004<\/td>\n<\/tr>\n<tr>\n<td><strong>Proposed<\/strong><\/td>\n<td>0.97 \u00b1 0.003<\/td>\n<td>0.96 \u00b1 0.003<\/td>\n<td>0.95 \u00b1 0.003<\/td>\n<td>0.96 \u00b1 0.003<\/td>\n<\/tr>\n<tr>\n<td colspan=\"5\" style=\"text-align:center;\"><strong>Hindi<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Ablation<\/strong><\/td>\n<td>0.89 \u00b1 0.003<\/td>\n<td>0.86 \u00b1 0.003<\/td>\n<td>0.89 \u00b1 0.003<\/td>\n<td>0.88 \u00b1 0.003<\/td>\n<\/tr>\n<tr>\n<td><strong>Proposed<\/strong><\/td>\n<td>0.91 \u00b1 0.003<\/td>\n<td>0.87 \u00b1 0.003<\/td>\n<td>0.87 \u00b1 0.003<\/td>\n<td>0.88 \u00b1 0.003<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b9f75c6 elementor-widget elementor-widget-text-editor\" data-id=\"b9f75c6\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The performance of the EmoReg approach is further investigated on different intensity scores using emotion similarity score, as shown in Figure 2. We used an intensity scale range of 0 to 1 with a 0.2 step size and we could only consider EmoVox and MixedEmotion models, as others i.e., CycleGAN-StarGAN-EVC and Seq2Seq-EVC lack support for emotion intensity regularization due to their architectural limitations. Additionally, from Figure 2, it is apparent that the emotion similarity score increases with an increase in emotion intensity scale which shows that the proposed EmoReg with DVM can achieve fine control over emotion intensity. Whereas, the emotion similarity score of the baselines and ablation does not vary with an increase in intensity scale and hence, fails to achieve fine control over emotion intensity.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-13cce26 elementor-widget elementor-widget-image\" data-id=\"13cce26\" data-element_type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"693\" height=\"350\" data-src=\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/image2.png\" class=\"attachment-medium_large size-medium_large wp-image-13146 lazyload\" alt=\"\" data-srcset=\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/image2.png 693w, https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/image2-300x152.png 300w\" data-sizes=\"(max-width: 693px) 100vw, 693px\" style=\"--smush-placeholder-width: 693px; --smush-placeholder-aspect-ratio: 693\/350;width:100%;height:50.51%;max-width:693px\" src=\"data:image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c16a1bb elementor-widget elementor-widget-text-editor\" data-id=\"c16a1bb\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p style=\"text-align: center;\">Fig. 2: Analysis of emotion similarity score with respect to incremental emotion intensity scale<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d5416e1 elementor-widget elementor-widget-text-editor\" data-id=\"d5416e1\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<h4><strong>Conclusion<\/strong>:<\/h4>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9aff2cc elementor-widget elementor-widget-text-editor\" data-id=\"9aff2cc\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>We introduced the EmoReg model for emotion voice conversion with emotion intensity regularization. By leveraging SSL-based emotion embeddings, we achieved effective emotion representation from speech. We proposed a DVM to transition between emotional states while controlling emotion intensity. We evaluated our approach against the SOTA architectures for both English and Hindi languages. In summary, the proposed EmoReg model outperformed existing methods in various objective evaluations.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0f723f4 elementor-widget elementor-widget-text-editor\" data-id=\"0f723f4\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<h4><strong>Citation<\/strong>:<\/h4>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ec362a6 elementor-widget elementor-widget-text-editor\" data-id=\"ec362a6\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>@inproceedings{emoreg2025,<br \/>title={EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion},<br \/>author={Gudmalwar, Ashishkumar and Biyani, Ishan and Shah, Nirmesh and W, Pankaj and Shah, Rajiv Ratn},<br \/>booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},<br \/>year={2025}<br \/>}<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-67d4f0b elementor-widget elementor-widget-text-editor\" data-id=\"67d4f0b\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>To know more about Sony Research India\u2019s Research Publications, visit the \u2018Publications\u2019 section on our \u2018Open Innovation\u2019s page:\u00a0<a href=\"https:\/\/www.sonyresearchindia.com\/open-innovation\/\">Open Innovation with Sony R&amp;D \u2013 Sony Research India<\/a><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0362925 elementor-hidden-desktop elementor-hidden-tablet elementor-hidden-mobile elementor-widget elementor-widget-text-editor\" data-id=\"0362925\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In most of the cases, it has been found that Content Driven sessions outperform the time driven sessions. The results are obtained on 6 baselines: STAMP, NARM, GRU4Rec, CD-HRNN, Tr4Rec on datasets like Movielens (Movies), GoodRead Book, LastFM (Music), Amazon (e-commerce).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-c0518a1 elementor-hidden-desktop elementor-hidden-tablet elementor-hidden-mobile elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"c0518a1\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-b15be70\" data-id=\"b15be70\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-55dd72b\" data-id=\"55dd72b\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-e06d72d elementor-widget elementor-widget-image\" data-id=\"e06d72d\" data-element_type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"512\" height=\"322\" data-src=\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2024\/02\/19th-Cover-Image-2.png\" class=\"attachment-full size-full wp-image-11786 lazyload\" alt=\"\" data-srcset=\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2024\/02\/19th-Cover-Image-2.png 512w, https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2024\/02\/19th-Cover-Image-2-300x189.png 300w\" data-sizes=\"(max-width: 512px) 100vw, 512px\" style=\"--smush-placeholder-width: 512px; --smush-placeholder-aspect-ratio: 512\/322;width:100%;height:62.89%;max-width:512px\" src=\"data:image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-fd52b32\" data-id=\"fd52b32\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-9b69060 elementor-hidden-desktop elementor-hidden-tablet elementor-hidden-mobile elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"9b69060\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-cfbe302\" data-id=\"cfbe302\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-6d045fb elementor-widget elementor-widget-text-editor\" data-id=\"6d045fb\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe introduced modules and techniques help the proposed method to align known class\nrepresentations effectively so that it can detect the unknown objects accurately. To validate\nthis, we carried out extensive experiments &#038; ablation studies and found that the proposed\nmethod outperforms existing SOTA methods with significant improvement on the MS-COCO\n&#038; PASCAL VOC dataset for the OSOD task.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f97c4c4 elementor-widget elementor-widget-text-editor\" data-id=\"f97c4c4\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tTo know more about the paper, visit: <a href=\"https:\/\/openaccess.thecvf.com\/content\/WACV2024\/papers\/Sarkar_Open-Set_Object_Detection_by_Aligning_Known_Class_Representations_WACV_2024_paper.pdf\" target=\"_blank\" rel=\"noopener\">Open-Set Object Detection by Aligning Known Class\nRepresentations (thecvf.com)<\/a>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9e2f9cc elementor-widget elementor-widget-text-editor\" data-id=\"9e2f9cc\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tTo know more about Sony Research India\u2019s Research Publications, visit the \u2018Publications\u2019\nsection on our \u2018Open Innovation\u2019s page: <a href=\"https:\/\/www.sonyresearchindia.com\/open-innovation\/\" target=\"_blank\" rel=\"noopener\">Open Innovation with Sony R&amp;D \u2013 Sony Research\nIndia<\/a>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Ashish Gudmalwar summarises paper titled EmoReg:&#8230;<\/p>\n","protected":false},"author":1,"featured_media":13153,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"elementor_header_footer","format":"standard","meta":{"footnotes":""},"categories":[22,17],"tags":[],"class_list":["post-13142","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-all-blogs","category-technology","entry"],"yoast_head":"\n<title>Summarizing \u2018EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion\u2019 - Sony Research India<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Summarizing \u2018EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion\u2019 - Sony Research India\" \/>\n<meta property=\"og:description\" content=\"Ashish Gudmalwar summarises paper titled EmoReg:...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/\" \/>\n<meta property=\"og:site_name\" content=\"Sony Research India\" \/>\n<meta property=\"article:published_time\" content=\"2025-05-12T11:16:44+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-05-13T04:48:08+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/Ashish-Gudmalwar_EmoReg.png\" \/>\n\t<meta property=\"og:image:width\" content=\"380\" \/>\n\t<meta property=\"og:image:height\" content=\"190\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"sri_user@2021\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"sri_user@2021\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/\"},\"author\":{\"name\":\"sri_user@2021\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/person\/c3cd529e8ce8f5b822c5efaf92fc96cb\"},\"headline\":\"Summarizing \u2018EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion\u2019\",\"datePublished\":\"2025-05-12T11:16:44+00:00\",\"dateModified\":\"2025-05-13T04:48:08+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/\"},\"wordCount\":978,\"publisher\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#organization\"},\"image\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/Ashish-Gudmalwar_EmoReg.png\",\"articleSection\":[\"All Blogs\",\"Technology\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/\",\"url\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/\",\"name\":\"Summarizing \u2018EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion\u2019 - Sony Research India\",\"isPartOf\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/Ashish-Gudmalwar_EmoReg.png\",\"datePublished\":\"2025-05-12T11:16:44+00:00\",\"dateModified\":\"2025-05-13T04:48:08+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/#primaryimage\",\"url\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/Ashish-Gudmalwar_EmoReg.png\",\"contentUrl\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/Ashish-Gudmalwar_EmoReg.png\",\"width\":380,\"height\":190},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Summarizing \u2018EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion\u2019\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#website\",\"url\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/\",\"name\":\"Sony Research India\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#organization\",\"name\":\"sonyresearchindia\",\"url\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/03\/Sony_Logo.png\",\"contentUrl\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/03\/Sony_Logo.png\",\"width\":168,\"height\":31,\"caption\":\"sonyresearchindia\"},\"image\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/person\/c3cd529e8ce8f5b822c5efaf92fc96cb\",\"name\":\"sri_user@2021\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/e0c9edcfb42567c720cc449d4b1e0812298e8172a5a7e4296127a0adba7e705b?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/e0c9edcfb42567c720cc449d4b1e0812298e8172a5a7e4296127a0adba7e705b?s=96&d=mm&r=g\",\"caption\":\"sri_user@2021\"},\"sameAs\":[\"http:\/\/whiteriversmediasolutions.com\/staging\/SRI\"]}]}<\/script>\n","yoast_head_json":{"title":"Summarizing \u2018EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion\u2019 - Sony Research India","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/","og_locale":"en_US","og_type":"article","og_title":"Summarizing \u2018EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion\u2019 - Sony Research India","og_description":"Ashish Gudmalwar summarises paper titled EmoReg:...","og_url":"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/","og_site_name":"Sony Research India","article_published_time":"2025-05-12T11:16:44+00:00","article_modified_time":"2025-05-13T04:48:08+00:00","og_image":[{"width":380,"height":190,"url":"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/Ashish-Gudmalwar_EmoReg.png","type":"image\/png"}],"author":"sri_user@2021","twitter_card":"summary_large_image","twitter_misc":{"Written by":"sri_user@2021","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/#article","isPartOf":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/"},"author":{"name":"sri_user@2021","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/person\/c3cd529e8ce8f5b822c5efaf92fc96cb"},"headline":"Summarizing \u2018EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion\u2019","datePublished":"2025-05-12T11:16:44+00:00","dateModified":"2025-05-13T04:48:08+00:00","mainEntityOfPage":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/"},"wordCount":978,"publisher":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#organization"},"image":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/#primaryimage"},"thumbnailUrl":"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/Ashish-Gudmalwar_EmoReg.png","articleSection":["All Blogs","Technology"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/","url":"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/","name":"Summarizing \u2018EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion\u2019 - Sony Research India","isPartOf":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#website"},"primaryImageOfPage":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/#primaryimage"},"image":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/#primaryimage"},"thumbnailUrl":"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/Ashish-Gudmalwar_EmoReg.png","datePublished":"2025-05-12T11:16:44+00:00","dateModified":"2025-05-13T04:48:08+00:00","breadcrumb":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/#primaryimage","url":"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/Ashish-Gudmalwar_EmoReg.png","contentUrl":"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2025\/05\/Ashish-Gudmalwar_EmoReg.png","width":380,"height":190},{"@type":"BreadcrumbList","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/summarizing-emoreg-directional-latent-vector-modeling-for-emotional-intensity-regularization-in-diffusion-based-voice-conversion\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/whiteriversmediasolutions.com\/Sony\/"},{"@type":"ListItem","position":2,"name":"Summarizing \u2018EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion\u2019"}]},{"@type":"WebSite","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#website","url":"https:\/\/whiteriversmediasolutions.com\/Sony\/","name":"Sony Research India","description":"","publisher":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/whiteriversmediasolutions.com\/Sony\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#organization","name":"sonyresearchindia","url":"https:\/\/whiteriversmediasolutions.com\/Sony\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/logo\/image\/","url":"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/03\/Sony_Logo.png","contentUrl":"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/03\/Sony_Logo.png","width":168,"height":31,"caption":"sonyresearchindia"},"image":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/person\/c3cd529e8ce8f5b822c5efaf92fc96cb","name":"sri_user@2021","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/e0c9edcfb42567c720cc449d4b1e0812298e8172a5a7e4296127a0adba7e705b?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e0c9edcfb42567c720cc449d4b1e0812298e8172a5a7e4296127a0adba7e705b?s=96&d=mm&r=g","caption":"sri_user@2021"},"sameAs":["http:\/\/whiteriversmediasolutions.com\/staging\/SRI"]}]}},"_links":{"self":[{"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/posts\/13142","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/comments?post=13142"}],"version-history":[{"count":12,"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/posts\/13142\/revisions"}],"predecessor-version":[{"id":13172,"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/posts\/13142\/revisions\/13172"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/media\/13153"}],"wp:attachment":[{"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/media?parent=13142"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/categories?post=13142"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/tags?post=13142"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}