Basic Usage Guide is Now Available!

VoiceMelody - Kawaii Voice Changer

High-performance voice changer at a low price! We have been able to offer a voice changer available for commercial use and can be used for various purposes such as video production, VTuber activities, singing voice conversion, game mod production, etc, at a low price.

[h1]Usage[/h1] Thank you for your interest in our application! We would first like to apologize for the lack of explanation regarding the application. Here we would like to explain the basic usage. [h2]File Inference[/h2] [olist] [*]Click "Open..." in "File Inference" in the upper right corner and select the file to be inferred. [*]Click "Inference" in the lower right corner. [/olist] [h2]Real-Time Inference[/h2] [list] [*] Click "Start/Resume Voice Changer" to start the voice changer. [*] Click "Stop" to stop the voice changer. [*] To reflect the changed parameters, click "Start/Stop Voice Changer" again. This is because it takes a few seconds to start up the voice changer due to optimization. [*] Although this is not the default setting, if you select devices of different protocols for "Input Device" and "Output Device," an error occurs. Here, "protocol" refers to the text in parentheses on the right side, such as (MME)/(Windows Directvoice)/(ASIO). [*] When using Microsoft Sound Mapper, the specification requires that both input and output devices be specified to Microsoft Sound Mapper. [/list] [h2]Tips[/h2] If you speak with your voice out while keeping the front of your mouth in mind, you will be able to convert it well! [h1]Parameter Description[/h1] [h2]Important parts[/h2] [h3]Common[/h3] [list] [*]Utterance Judgment Threshold: Boundary value (dB) to decide whether to convert or not. [*]Pitch (semitone): Adjust the pitch to match your voice and the target voice. [*]Use Basic Frequency Prediction Model: Use a model for the pitch of the output voice instead of using the pitch of the input voice. While this makes the output closer to the target voice, it cannot be used for singing voice conversion, as the pitch will be unstable. [*]Percentage of use of nearest neighbor searcher output: Larger values bring the input phonemes closer to the target voice, but the intelligibility of the input phonemes is reduced. [*]Maximum and minimum of base frequency: This is a priori information on the minimum and maximum values of the pitch of the input voice. This can be used in the rare case where the pitch jumps an octave. [/list] [h3]Real-time inference[/h3] [list] [*] Processing unit length: This is the unit for performing inference. The smaller the unit, the lower the delay, but the higher the load. [*] Additional Inference Length (before): This is a buffer to stabilize the voice. The larger it is, the greater the load. [*] Additional Inference Length (after): Buffer to stabilize the voice. The larger the length, the higher the load and the higher the delay. [*] The delay is (length of processing unit) + (length of additional inference (after)) + (device delay). [*] The load is ((length of processing unit) + (length of additional inference (before)) + (length of additional inference (after)) + (crossfade length) + (crossfade length for waveform matching (fixed value)))/(processing unit length). [*] Algorithm: When the algorithm is changed to "2", the conversion starts after the end of speaking. This provides the same quality as file inference, but with increased delay. With this setting, some of the above parameters are ignored. This is somewhat old-fashioned behavior. [/list] [h3]Preset[/h3] [list] [*]Presets can be created to record and change all the above settings. [/list] [h2]Details[/h2] [h3]Common[/h3] [list] [*]The method to determine the base frequency: Use dio if lightweight is required, and use crepe if stability is required. [*]Amount of noise added to the latent variable: Not much influence. [*]Minimum/maximum speech judgement length: This parameter is related to the length of the speech segment. The minimum speech decision length also affects the fineness of the speech decision. [*]Length of blank time add to before and after: Not much influence. [*]Do not relatively determine the threshold for utterance judgement: Whether or not the voice is normalized with the maximum value before inference. [/list] [h1]How to input the output of the voice changer to other applications?[/h1] Please install your favorite virtual cable such as VB-Cable (https://vb-audio.com/Cable/), and set the output of the voice changer to the input of the virtual cable and the input of the relevant application to the output of the virtual cable. We have not been able to integrate the virtual cable into the software due to a lack of information on how to implement this function. We apologize for your inconvenience.