Making the Voice of the AI


Today I'm here to share with you the script I wrote to create the voice of our AI character! 


This script will go through a text file line by line, turning the words of each line into hiragana and then romaji. These romaji words are then checked against a list of existing wav files named to match Japanese syllables (ru.wav, tsu.wav, a.wav) and will build a .wav file out of these to create that voiced line. For example, the line "Wake up, Master!" will become "wake upu masuteru" and combine wa.wav, ke.wav, u.wav, pu.wav, ma.wav, su.wav, te.wav, and ru.wav to created 1.wav, which will play the audio "wa ke u pu ma su te ru" when opened. 


Let's break the code down into chunks. First things first, let's cover the libraries I use and why I chose them:

  • romajitable - used to turn English text into similar Hiragana. "My name is Mikhail" -> むゆ・なめ・いす・みくはいる
  • pykakasi - since romajitable doesn't have a way to access the romaji generated by it to create the hiragana text above (afaik at least), this turns Japanese text like the above hiragana into romaji
  • re - used in the lamba function to parse words into valid romaji syllables
  • pydub - used to build wav files. Requires installation of ffmpeg
  • os.path - used in checking if wav files exist that match the syllables checked by the lamba function 


If you've never worked with Python before, you need to import libraries like so:

import romajitable 
import pykakasi 
import re 
from pydub import AudioSegment 
import os.path


Next, I initialize a few variables, but the only one I believe needs discussion is the lamba function:

L = lambda x:re.sub('[bghkmnpr]~([auoei]|y[auo])|[sz]~[auoe]|[dt]~[aeo]|w~[ao]|([fv]~|ts)u|(j~|[cs]h)(i|y[auo])|y~[auo]|[auoien]'.replace('~','{1,2}'),'',x)=='' 


I found this function here, and it checks whether or not a given string is a valid romaji syllable. It's not a perfect check of valid romaji, but it will perfectly evaluate the text feed using this program.


From there, the main meat of the program begins:

with open('test.txt', 'r') as f:
    lines = f.readlines()
    for line in lines: 
            romaji = romajitable.to_kana(line)
            hira = romaji.hiragana.replace("・", "")
            weeb = kks.convert(hira)
            romajiLines = []
            removedDuplicates = []


I open up my text file, read it line by line, and then turn each line into a romaji line. Romaji table places a dot・in ・between・words, so I removed all of them from the hiragana line before turning the hiragana line into romaji.

Next, I store the romaji lines in an array and remove duplicated lines were created by pykakashi. It's possible my implementation duplicated lines, but I don't currently see where as I reset the arrays each iteration of the line in lines for loop.

            for item in weeb:
                romajiLines.append(format(item['hepburn'])) #turn line of text into romaji
            
            for v in romajiLines:
                if v not in removedDuplicates:
                    print(v)
                    removedDuplicates.append(v) #remove duplicate lines made by pykakasi


So now we've got an array with romaji lines and no duplicates! Next, I needed to get the syllables of each word in that line and store them in an array. I begin by setting up my temporary variables:

 for item in removedDuplicates:
                split_strings = []
                n  = 2
                export_sounds = []
                combined_sounds = AudioSegment.empty()
                combined = AudioSegment.empty()


combined_sounds and combined are calling functions from pydub that open a new instance of a wav file to create. To make sure that I build the files correctly, I need to make sure that the proper files are called, and since my wav files are named to match romaji syllables the thing to check is the text!

 for index in range(0, len(item), n):
                  test = item[index : index + n] #increment 2 syllables at a time 
                  print(index, test, L(test))
                  if (L(test) != True): #check if pair of chars is valid romaji, if not...
                      test = item[index : index + n - 1] #try the first char
                      if (L(test)): #if first char is valid romaji
                          print(index, test, L(test)) 
                          split_strings.append(test)  
                          test = item[index + 1: index + n + 1]
                          if (L(test)): #check if second char is start of new syllable, if it is...
                              print(index, test, L(test))
                              split_strings.append(test)  
                          else:  #check if initial 2 chars + next char make a valid syllable
                              test = item[index: index + n + 1]
                              if (L(test)):
                                  print(index, test, L(test))
                                  split_strings.append(test) 
                              else:
                                continue
                      else: #if valid romaji
                          split_strings.append(test)
                          print(index, test, L(test))
                  else: #if pair of chars if valid romaji...
                      print(test)
                      split_strings.append(test)     


The majority of romaji syllables are two characters long, so this functions goes through each romaji word in a romaji line by two characters at a time. If it's valid romaji, then we add that syllable to an array. If it's not, we check if the first character by itself is and if the second character plus the character in front of it is, and if the two characters being checked plus the next character make a valid syllable.

On the third check of 'wake upu' the program will find 'up,' check that 'u' is valid romaji, check that 'pu' is valid romaji and then add them to the split_strings array that houses each syllable.

The rest of the program is fairly simple. I check that each syllable matches the name of a wav file and if it is add it to an array housing the order of wav files to call.

                for syllable in split_strings:
                    syllable = 'C:/Users/User/Desktop/test/voice/' + syllable + '.wav'
                    if (os.path.isfile(syllable)):
                        sound = AudioSegment.from_wav(syllable)
                        export_sounds.append(sound)


Next, that file is created and will be exported every second iteration it will create a file. This is because of the way the text is saved when going line by line, an empty line is added in between each line. This allows text lines to be exported with the correct line number!

                for fname in export_sounds:
                    combined += fname    
                if (i % 2) == 0:
                    print(j)
                    generatedFile = 'test/' + str(j) + '.wav'
                    combined.export(generatedFile, format='wav')    
                    j += 1
                
                i += 1


Now, this program has some areas for improvement, but currently does a satisfactory job. The two most important areas for improvement I see are:

  1. Creating 'Engrish' versions of the lines requires knowing the pronunciation of the word that is not stored in just the text of a word. For example, this program will turn 'master' into 'masuteru' but the ideal program would create "masutaa" instead.
  2. Optimization. I believe this can be cleaned up to look a lot more readable. It currently uses nested for loops, but when I call the lamba regex function I see room for a recursive function. The below code also has some for loops nested in others despite not needing to be. Oops! I also left my print variables in there.


The full code is below:

import romajitable
import pykakasi
import re
from pydub import AudioSegment
import os.path
kks = pykakasi.kakasi()
L = lambda x:re.sub('[bghkmnpr]~([auoei]|y[auo])|[sz]~[auoe]|[dt]~[aeo]|w~[ao]|([fv]~|ts)u|(j~|[cs]h)(i|y[auo])|y~[auo]|[auoien]'.replace('~','{1,2}'),'',x)==''
i = 0
j = 1
with open('test.txt', 'r') as f:
    lines = f.readlines()
    for line in lines: 
            romaji = romajitable.to_kana(line) 
            hira = romaji.hiragana.replace("・", "")
            weeb = kks.convert(hira)
            romajiLines = []
            removedDuplicates = []
            
            for item in weeb:
                romajiLines.append(format(item['hepburn'])) #turn line of text into romaji
            
            for v in romajiLines:
                if v not in removedDuplicates:
                    removedDuplicates.append(v) #remove duplicate lines made by kks
            
            for item in removedDuplicates:
                split_strings = []
                n  = 2
                export_sounds = []
                combined_sounds = AudioSegment.empty()
                combined = AudioSegment.empty()
                for index in range(0, len(item), n):
                  test = item[index : index + n] #increment 2 syllables at a time 
                  if (L(test) != True): #check if pair of chars is valid romaji, if not...
                      test = item[index : index + n - 1] #try the first char
                      if (L(test)): #if first char is valid romaji
                          split_strings.append(test)  
                          test = item[index + 1: index + n + 1]
                          if (L(test)): #check if second char is start of new syllable, if it is...
                              split_strings.append(test)  
                          else:  #check if initial 2 chars + next char make a valid syllable
                              test = item[index: index + n + 1]
                              if (L(test)):
                                  split_strings.append(test) 
                              else:
                                continue
                      else: #if valid romaji
                          split_strings.append(test)
                          print(index, test, L(test))
                  else: #if pair of chars if valid romaji...
                      print(test)
                      split_strings.append(test)    
             
               
                for syllable in split_strings:
                    syllable = 'C:/Users/User/Desktop/test/voice/' + syllable + '.wav'
                    if (os.path.isfile(syllable)):
                        sound = AudioSegment.from_wav(syllable)
                        export_sounds.append(sound)
                for fname in export_sounds:
                    combined += fname    
                if (i % 2) == 0:
                    print(j)
                    generatedFile = 'test/' + str(j) + '.wav'
                    combined.export(generatedFile, format='wav')    
                    j += 1
                
                i += 1
            

Files

Nia Alpha Build Mac (PW Protected) 747 MB
Jul 30, 2021
Nia Alpha Build PC (PW Protected) 774 MB
Jul 30, 2021

Get Once More

Buy Now$14.99 USD or more

Leave a comment

Log in with itch.io to leave a comment.