{"id":9660,"date":"2023-06-15T04:29:59","date_gmt":"2023-06-15T04:29:59","guid":{"rendered":"https:\/\/www.sonyresearchindia.com\/causal-inference-the-question-of-why-in-machine-learning-and-business-analytics-copy\/"},"modified":"2023-11-30T13:11:37","modified_gmt":"2023-11-30T13:11:37","slug":"iteratively-improving-speech-recognition-and-voice-conversion","status":"publish","type":"post","link":"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/","title":{"rendered":"Iteratively Improving Speech Recognition and Voice Conversion"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"9660\" class=\"elementor elementor-9660\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-cd44eb5 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"cd44eb5\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-9f11b70\" data-id=\"9f11b70\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-215a70e elementor-widget elementor-widget-heading\" data-id=\"215a70e\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">BLOGS<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-28dc161 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"28dc161\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-63cf269\" data-id=\"63cf269\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-6837436 elementor-widget elementor-widget-heading\" data-id=\"6837436\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Iteratively Improving Speech Recognition\n<br>and Voice Conversion<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9bd1630 elementor-widget elementor-widget-text-editor\" data-id=\"9bd1630\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tBy Mayank Kumar Singh, Senior Engineer At Sony Research India\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7a034cb elementor-widget elementor-widget-text-editor\" data-id=\"7a034cb\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t15<sup>th<\/sup> June 2023\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-acbeaeb elementor-hidden-desktop elementor-hidden-tablet elementor-hidden-mobile elementor-widget elementor-widget-text-editor\" data-id=\"acbeaeb\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In this blog, Mayank Kumar Singh breaks down the the paper titled \u2018<a href=\"https:\/\/arxiv.org\/pdf\/2305.15055.pdf\">Iteratively Improving Speech Recognition and Voice Conversion<\/a>\u2019 co-authored by Mayank Kumar Singh, Naoya Takahashi, Onoe Naoyuki which has been accepted at the <a href=\"https:\/\/interspeech2023.org\/\">INTERSPEECH Conference 2023<\/a>.<\/p><p>\u00a0<\/p><p>For demo samples, please refer to the website: <a href=\"https:\/\/demosamplesites.github.io\/IterativeASR_VC\/\">https:\/\/demosamplesites.github.io\/IterativeASR_VC\/<\/a><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1b9bf5c elementor-widget elementor-widget-text-editor\" data-id=\"1b9bf5c\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In this blog, Mayank Kumar Singh breaks down the the paper titled <a href=\"https:\/\/arxiv.org\/pdf\/2305.15055.pdf\" target=\"_blank\" rel=\"noopener\">\u2018Iteratively Improving Speech Recognition and Voice Conversion\u2019<\/a> co-authored with Naoya Takahashi (Sony Research), Onoe Naoyuki which has been accepted at the <a href=\"https:\/\/interspeech2023.org\/\" target=\"_blank\" rel=\"noopener\">INTERSPEECH Conference 2023<\/a> at Dublin, Ireland from 20th-24th May 2023.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-9b69060 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"9b69060\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-cfbe302\" data-id=\"cfbe302\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-fa4789b elementor-widget elementor-widget-text-editor\" data-id=\"fa4789b\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<h4>Introduction<\/h4>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7132bf0 elementor-widget elementor-widget-text-editor\" data-id=\"7132bf0\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Speech processing technologies such as voice conversion (VC) and automatic speech recognition (ASR) have dramatically improved in the past decade owing to the advancements in deep learning technologies. However, the task of training these models remains challenging on <b>low resource domains<\/b> as they suffer from over-fitting and do not generalize well for practical applications.<\/p><p>\u00a0<\/p><p>In this paper, we propose to iteratively improve a voice conversion model along with an automatic speech recognition model.<\/p><p>\u00a0<\/p><p>As VC models often rely on ASR model for extracting content features or imposing content consistency loss, degradation of ASR directly affects the quality of VC models. On the other hand, to improve the generalization capability of ASR model, a variety of data augmentation techniques have been proposed with voice conversion being one of them.<\/p><p>\u00a0<\/p><p>This creates a causality dilemma, wherein poor quality of ASR model affects VC model training which in turn leads to low quality data augmentation for training ASR models. Conversely, improving the ASR model should lead to better VC models, which should produce better data augmentation samples for improving the ASR models. Motivated from this, in this work we propose to iteratively improve the ASR model by using the VC model as a data augmentation method for training the ASR and simultaneously improve the VC model by using the ASR model for linguistic content preservation.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-85bbfff elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"85bbfff\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-8881599\" data-id=\"8881599\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-3818f26 elementor-widget elementor-widget-text-editor\" data-id=\"3818f26\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<h4>Proposed Method<\/h4>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-c0518a1 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"c0518a1\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-b15be70\" data-id=\"b15be70\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-7c22261\" data-id=\"7c22261\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-5946472 elementor-widget elementor-widget-image\" data-id=\"5946472\" data-element_type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"962\" height=\"680\" src=\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/06\/Cover-Image.png\" class=\"attachment-full size-full wp-image-9662\" alt=\"\" srcset=\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/06\/Cover-Image.png 962w, https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/06\/Cover-Image-300x212.png 300w, https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/06\/Cover-Image-768x543.png 768w\" sizes=\"(max-width: 962px) 100vw, 962px\" style=\"width:100%;height:70.69%;max-width:962px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-cd801eb\" data-id=\"cd801eb\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-a8149c3 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"a8149c3\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-80dcf9e\" data-id=\"80dcf9e\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-1833c2e elementor-widget elementor-widget-text-editor\" data-id=\"1833c2e\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tFigure 1: Overview of our proposed method. We first train an ASR model and a VC model with a default training regime. Then using the trained VC model as data augmentation for the ASR model, we further improve the ASR model. The updated ASR model is further used in the training of an improved VC model. This step is repeated until convergence of both the ASR and VC models.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-ee1a6d9 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"ee1a6d9\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-5f204ad\" data-id=\"5f204ad\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-1c88955 elementor-widget elementor-widget-text-editor\" data-id=\"1c88955\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe proposed iterative training framework is illustrated in Figure 1 and its pseudo code is shown in Algorithm 1.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-13b9cab elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"13b9cab\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-0e437d8\" data-id=\"0e437d8\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-ab28673\" data-id=\"ab28673\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-6e1d9d9 elementor-widget elementor-widget-image\" data-id=\"6e1d9d9\" data-element_type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"460\" height=\"279\" data-src=\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/06\/image1.png\" class=\"attachment-full size-full wp-image-9663 lazyload\" alt=\"\" data-srcset=\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/06\/image1.png 460w, https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/06\/image1-300x182.png 300w\" data-sizes=\"(max-width: 460px) 100vw, 460px\" style=\"--smush-placeholder-width: 460px; --smush-placeholder-aspect-ratio: 460\/279;width:100%;height:60.65%;max-width:460px\" src=\"data:image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-f2acee6\" data-id=\"f2acee6\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-7d32cc5 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"7d32cc5\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-ba0a914\" data-id=\"ba0a914\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-c07bf80 elementor-widget elementor-widget-text-editor\" data-id=\"c07bf80\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAlgorithm 1: Iterative training of ASR and VC models\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-3c1f6a4 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"3c1f6a4\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-d7b49f7\" data-id=\"d7b49f7\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-9e57ece elementor-widget elementor-widget-text-editor\" data-id=\"9e57ece\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<h4>Results and Experiments<\/h4>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7be4a46 elementor-widget elementor-widget-text-editor\" data-id=\"7be4a46\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tFor the results and experiments, we refer the reader to our paper which contains the detailed experimental details along with the insights that we drew from the experiments. <a href=\"https:\/\/arxiv.org\/pdf\/2305.15055.pdf\">https:\/\/arxiv.org\/pdf\/2305.15055.pdf<\/a>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-fa5b76a elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"fa5b76a\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-ed8ee78\" data-id=\"ed8ee78\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-4c359ab elementor-widget elementor-widget-text-editor\" data-id=\"4c359ab\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<h4>Conclusions<\/h4>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-30ab312 elementor-widget elementor-widget-text-editor\" data-id=\"30ab312\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tWe present a novel iterative framework for improving voice conversion models and automatic speech recognition models on low resource domains and verify its applications on the Hindi speech domain and English singing domain. We show improved speech preservation and MOS quality of the converted samples on voice conversion tasks as well as improved the word error rate on ASR tasks using this framework.\n\nFuture work includes further improving the content preservation of the one-shot VC models so as to bring WER of the VC converted samples closer to the WER on the ground truth samples which would also lead to better MOS quality. We would also like to investigate combining the ASR and VC training in an end-to-end system.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-68a3517 elementor-widget elementor-widget-text-editor\" data-id=\"68a3517\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>To know more about Sony Research India\u2019s Research Publications, visit the \u2018Publications\u2019 section on our \u2018Open Innovation\u2019s page:<\/p><p><a href=\"https:\/\/www.sonyresearchindia.com\/open-innovation\/\">Open Innovation with Sony R&amp;D \u2013 Sony Research India<\/a><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>In this blog, Mayank Kumar Singh breaks down the the paper titled&#8230;<\/p>\n","protected":false},"author":1,"featured_media":11331,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"elementor_header_footer","format":"standard","meta":{"footnotes":""},"categories":[22,17],"tags":[],"class_list":["post-9660","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-all-blogs","category-technology","entry"],"yoast_head":"\n<title>Iteratively Improving Speech Recognition and Voice Conversion - Sony Research India<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Iteratively Improving Speech Recognition and Voice Conversion - Sony Research India\" \/>\n<meta property=\"og:description\" content=\"In this blog, Mayank Kumar Singh breaks down the the paper titled...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/\" \/>\n<meta property=\"og:site_name\" content=\"Sony Research India\" \/>\n<meta property=\"article:published_time\" content=\"2023-06-15T04:29:59+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-11-30T13:11:37+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/06\/unnxcfxvamed.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"380\" \/>\n\t<meta property=\"og:image:height\" content=\"190\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"sri_user@2021\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"sri_user@2021\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/\"},\"author\":{\"name\":\"sri_user@2021\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/person\/589cf1e285a7c37cf0cb9feba7ae4338\"},\"headline\":\"Iteratively Improving Speech Recognition and Voice Conversion\",\"datePublished\":\"2023-06-15T04:29:59+00:00\",\"dateModified\":\"2023-11-30T13:11:37+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/\"},\"wordCount\":623,\"publisher\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#organization\"},\"image\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/06\/unnxcfxvamed.jpg\",\"articleSection\":[\"All Blogs\",\"Technology\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/\",\"url\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/\",\"name\":\"Iteratively Improving Speech Recognition and Voice Conversion - Sony Research India\",\"isPartOf\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/06\/unnxcfxvamed.jpg\",\"datePublished\":\"2023-06-15T04:29:59+00:00\",\"dateModified\":\"2023-11-30T13:11:37+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/#primaryimage\",\"url\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/06\/unnxcfxvamed.jpg\",\"contentUrl\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/06\/unnxcfxvamed.jpg\",\"width\":380,\"height\":190},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Iteratively Improving Speech Recognition and Voice Conversion\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#website\",\"url\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/\",\"name\":\"Sony Research India\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#organization\",\"name\":\"sonyresearchindia\",\"url\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/03\/Sony_Logo.png\",\"contentUrl\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/03\/Sony_Logo.png\",\"width\":168,\"height\":31,\"caption\":\"sonyresearchindia\"},\"image\":{\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/person\/589cf1e285a7c37cf0cb9feba7ae4338\",\"name\":\"sri_user@2021\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/e0c9edcfb42567c720cc449d4b1e0812298e8172a5a7e4296127a0adba7e705b?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/e0c9edcfb42567c720cc449d4b1e0812298e8172a5a7e4296127a0adba7e705b?s=96&d=mm&r=g\",\"caption\":\"sri_user@2021\"},\"sameAs\":[\"http:\/\/whiteriversmediasolutions.com\/staging\/SRI\"]}]}<\/script>\n","yoast_head_json":{"title":"Iteratively Improving Speech Recognition and Voice Conversion - Sony Research India","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/","og_locale":"en_US","og_type":"article","og_title":"Iteratively Improving Speech Recognition and Voice Conversion - Sony Research India","og_description":"In this blog, Mayank Kumar Singh breaks down the the paper titled...","og_url":"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/","og_site_name":"Sony Research India","article_published_time":"2023-06-15T04:29:59+00:00","article_modified_time":"2023-11-30T13:11:37+00:00","og_image":[{"width":380,"height":190,"url":"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/06\/unnxcfxvamed.jpg","type":"image\/jpeg"}],"author":"sri_user@2021","twitter_card":"summary_large_image","twitter_misc":{"Written by":"sri_user@2021","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/#article","isPartOf":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/"},"author":{"name":"sri_user@2021","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/person\/589cf1e285a7c37cf0cb9feba7ae4338"},"headline":"Iteratively Improving Speech Recognition and Voice Conversion","datePublished":"2023-06-15T04:29:59+00:00","dateModified":"2023-11-30T13:11:37+00:00","mainEntityOfPage":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/"},"wordCount":623,"publisher":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#organization"},"image":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/#primaryimage"},"thumbnailUrl":"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/06\/unnxcfxvamed.jpg","articleSection":["All Blogs","Technology"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/","url":"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/","name":"Iteratively Improving Speech Recognition and Voice Conversion - Sony Research India","isPartOf":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#website"},"primaryImageOfPage":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/#primaryimage"},"image":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/#primaryimage"},"thumbnailUrl":"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/06\/unnxcfxvamed.jpg","datePublished":"2023-06-15T04:29:59+00:00","dateModified":"2023-11-30T13:11:37+00:00","breadcrumb":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/#primaryimage","url":"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/06\/unnxcfxvamed.jpg","contentUrl":"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/06\/unnxcfxvamed.jpg","width":380,"height":190},{"@type":"BreadcrumbList","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/iteratively-improving-speech-recognition-and-voice-conversion\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/whiteriversmediasolutions.com\/Sony\/"},{"@type":"ListItem","position":2,"name":"Iteratively Improving Speech Recognition and Voice Conversion"}]},{"@type":"WebSite","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#website","url":"https:\/\/whiteriversmediasolutions.com\/Sony\/","name":"Sony Research India","description":"","publisher":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/whiteriversmediasolutions.com\/Sony\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#organization","name":"sonyresearchindia","url":"https:\/\/whiteriversmediasolutions.com\/Sony\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/logo\/image\/","url":"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/03\/Sony_Logo.png","contentUrl":"https:\/\/whiteriversmediasolutions.com\/Sony\/uvaftoap\/2023\/03\/Sony_Logo.png","width":168,"height":31,"caption":"sonyresearchindia"},"image":{"@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/person\/589cf1e285a7c37cf0cb9feba7ae4338","name":"sri_user@2021","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/whiteriversmediasolutions.com\/Sony\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/e0c9edcfb42567c720cc449d4b1e0812298e8172a5a7e4296127a0adba7e705b?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e0c9edcfb42567c720cc449d4b1e0812298e8172a5a7e4296127a0adba7e705b?s=96&d=mm&r=g","caption":"sri_user@2021"},"sameAs":["http:\/\/whiteriversmediasolutions.com\/staging\/SRI"]}]}},"_links":{"self":[{"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/posts\/9660","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/comments?post=9660"}],"version-history":[{"count":34,"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/posts\/9660\/revisions"}],"predecessor-version":[{"id":11340,"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/posts\/9660\/revisions\/11340"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/media\/11331"}],"wp:attachment":[{"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/media?parent=9660"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/categories?post=9660"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/whiteriversmediasolutions.com\/Sony\/wp-json\/wp\/v2\/tags?post=9660"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}