Speech To Text

Post by **jollybv** » Thu Sep 22, 2016 8:39 am

Hi Guys

I would like to use the speech component in my project and not sure where to start. What I'm trying to achieve is text to speech can this be done? and if so are there any examples of how to go about this. As I'm about to change my PCB and would like to add this feature to the board by adding a LM386 amplifier if it is possible. Also is there any filter caps i need to add to the PWM output to make the speech sound better than just a bad robot?

Post by **Benj** » Thu Sep 22, 2016 9:45 am

Hi Brian,

Yes there are some examples here to help get you started.

http://www.matrixtsl.com/wiki/index.php ... de5f9b41a8

Let me know how you're getting on.

Post by **Benj** » Thu Sep 22, 2016 12:52 pm

Whoops just seen you are looking for "text to speech". This will work in the simulation only using the real speak engine.

You can get the embedded code to do a similar thing but this means breaking up words into phonemes and then outputting the phonemes. I don't think there is an automated way of doing this so your embedded code could only say specific pre-programmed phrases. Think of the old talking watches they can tell you the time but they can't read you a story. I used to have one of these back in the day

[/youtube]

I would be tempted to record audio to say wav files on an SD card and this way you get a much more natural voice output with the same limited dictionary. This is how things like musical cards work, though maybe using a Flash IC rather than a SD card.

Your topic title is "speech to text", this is not possible though there are IC's out there that claim to do this for you. The truth is that even smart phone's with a high speed internet link to a super computer like Sirri struggle with this and all the unique regional dialects etc. It may be possible to do specific word recognition in a specific dialect on an embedded device e.g. "on" and "off", there may be Arduino projects like this but this is not trivial and would be an immense amount of work and then would probably switch on if you say something none relevant like tonne or one or won etc.

Post by **jollybv** » Thu Sep 22, 2016 2:13 pm

Hi Ben

Thanks for that I do have an SD card on board that would work as i only have a few sentences that need to be sent if someone dose not do something. In my product the instructions are on the LCD screen but still find people do not read so what i want to do is prerecord a number of responses that when this button is pressed and it is at a certain place in the program it will talk the people through by playing a predetermined audio file.
How would i stream the audio from a SD card? Could i record a message on the PC say for instance Call it Response 1 then just select the file when required also dose the streaming use the PWM out

P.S I have made a mistake in the Title it should be Text to Speech

Post by **Benj** » Thu Sep 22, 2016 3:24 pm

Hello,

Yes you can create audio .wav files on your PC using the free software Audacity and then simply stream these files from the SD card using a PWM or a DAC.

I show an example of 16-bit 16KHz audio streaming here. Look for the macros in the Master project that begin with WAV.
http://www.matrixtsl.com/mmforums/viewt ... ris#p73371

I have the audio so it auto loops but you could instead simply kill the timer interrupt when the end of file is detected.

I can reduce this into a more concise example if you like, it would be a nice addition to the wiki page.

16-bit @ 16KHz is quite demanding. It depends what type of quality your going for but 8-bit @ 16KHz or 16-bit @ 8KHz are both also good options while essentially halving the data throughput.

8-bit @ 8KHz is usually quite poor but requires the least amount of processing time and buffering.

Post by **Benj** » Thu Sep 22, 2016 4:29 pm

Simpler example now available from here.
http://www.matrixtsl.com/wikiv7/index.p ... udio_Files

Post by **jollybv** » Thu Sep 22, 2016 4:45 pm

Hi Ben

Yes please reduce this into a more concise example. I would like quite good quality so i think 8bit @ 16Khz as im using a PIC18F47J13

Post by **jollybv** » Fri Sep 23, 2016 8:39 am

Benj wrote:Hi Brian,

Yes there are some examples here to help get you started.

http://www.matrixtsl.com/wiki/index.php ... de5f9b41a8

Let me know how you're getting on.

Hi Ben

Thanks for this I downloaded the real speak Voice downloads and installed them now have no clue how to use them what i would like to try first before going the SD card way is use the speech component to for instance say something Like "Enter Unit number press Call then select user 1 to 6 by pressing key 1 to 6" I will be using the PWM pin RB4 on the PIC18F47J13 now how would i use the real speak voice doing it this way?
I also think there is a bug in the Speech Component If i select output channel if i select channel 1 it sometimes work if i go to channel 2 it dose not change if i get to channel 5 it gives me RB4 but after a while it Changes to RB5 by its self

Post by **jollybv** » Mon Sep 26, 2016 7:09 am

Hi guys

I have downloaded the RealSpeak voice and installed them but can seem to be able to select them in the rs_voice is there a specific directory i need to install them into?

Post by **Benj** » Mon Sep 26, 2016 12:09 pm

Hi Brian,

The RealSpeak voices seem to be unavailable now. Maybe the company had a policy change or got bought out. Windows should have at least one voice installed as standard for sight impaired users, at least my Win7 machine does.

You can still download some of the voices here for free however they don't seem to be working with the current version of the RealSpeak DLL we are using in Flowcode so I'll double check there is not a new version of the DLL available.
https://www.freedomscientific.com/downl ... nthesizers

The RealSpeak is simulation only and so passing to a PWM pin is not possible. The only downloadable way is to use the Phoneme approach or the WAV file approach. You could always use the RealSpeak simulation to generate the WAV files using the Audacity software to record the audio live from your PC. If you search YouTube for Audacity then there should be tonnes of tutorials you can use to help you get started.

You can test if the speech is installed on your machine by running the following project in the simulator with the Windows sound switched on and turned up.

SpeechTest.fcfx: (7.43 KiB) Downloaded 455 times

It should be as the Narrator can be switched on via the Ease of access control panel. Note you don't have to switch this on for the Flowcode sim to work. It should simply just work.

: Narrator.jpg (163.97 KiB) Viewed 19032 times

Post by **jollybv** » Mon Sep 26, 2016 4:04 pm

Hi Ben

Thanks I have checked it and i do have the standard windows 7 voice installed on my machine but i think the easiest would be to record wave files on a SD card then call them up as i need them and stream them as there is no easy way to do text to speech. Maybe i should try Phoneme approach is there a library of sounds for this that i can make up sentences from?

Post by **Benj** » Mon Sep 26, 2016 4:10 pm

Hi Brian,

Yep the library of sounds is the aa, ae, ao etc phoneme sounds.

The first two examples on this page use the Phoneme approach.

http://www.matrixtsl.com/wiki/index.php ... :_General)

You can string together the Phonemes for example Flowcode would look something like this.

Output Phoneme ("ff,ll,ow,kk3,ow,dd1")

The Phoneme data is stored in the ROM of the microcontroller and if all the Phoneme's are present then they will consume 64KB of ROM memory on their own.

The talking Volt meter is another good example. Both Phoneme examples should also simulate so you can hear what it will sound like before you download to a chip.

Building up words out of Phonemes is largely trial and error to see what sounds best though it seems there is software to help with this.

https://en.wikipedia.org/wiki/Arpabet

Post by **jollybv** » Wed Sep 28, 2016 3:31 pm

Hi Ben

I have decided to go the wav_Streaming root for the quality and have modified the WAV_Streaming so that i can understand how things work before add it to my program. I am using a PIC18F47j13 processor with a 20Mhz crystal. This processor can run at i think 48Mhz which i think i have set up right (could you please check the Config file let me know if I'm wrong). Also if this processor will work with the streaming as i see you using a DSPIC33 with a clock speed of 140Mhz and also have a really big circular buffer in the example.
I have been stepping through the example and found that when i get to open the file it returns 239 in .TempB which if I'm correct should be 0 if file is open. I have set the simulation directory to C:\FC\ and have created the same folder on my hard drive. In this folder i have copied a wave file "Audio1" but this dose not seem to open, can i open audio files in simulation and test them?
Also how and where would i set this to 8bit@16khz

Post by **Benj** » Thu Sep 29, 2016 10:50 am

Hi Brian,

Right I've had a play with the file for you and hopefully this will run from a 20MHz crystal at 48MHz. You had some problems with your configuration settings so I have fixed these for you.

WAV_Streaming_Mod.fcfx: (19.42 KiB) Downloaded 368 times

As you are only streaming out the audio you can have a significantly smaller circular buffer. I used such a big one as I wanted to stream out the audio while doing a lot of other things. e.g. playing Tetris.

To lower the size of the buffer change the Buffer Size property on the SoundBuffer component and also change the SoundBufferSize constant to match. You probably also have to change the circular buffer memory type property to near memory.

The value 239 (file not found) when opening a file could be down to the config settings you were using so hopefully it will work correctly for you now (the extended instruction set configuration setting is known to corrupt strings). If your still having problems then try changing the FAT prescaler properties to see if that makes any difference.

Post by **jollybv** » Thu Sep 29, 2016 2:26 pm

Hi Ben

Thanks for that i will have a play around with this one question in my project i have 2 circular buffers already one of 30 bites and one of 200 could i use the 200 bite buffer for the streaming or should i increase the 200 to 512 bites

Post by **Benj** » Thu Sep 29, 2016 4:53 pm

Hi Brian,

200 bytes might be plenty as you are sitting in a while 1 loop keeping the buffer topped up.

You can calculate how long a full buffer will last.

Here is assuming 16-bit @ 8KHz

200 / 2 bytes = 100 16-bit samples.

100 / 8000 = 0.0125 Seconds worth of data in the buffer

The only problem you might have is if the FAT component has to go round the houses to get to the data sectors in the file. If this is the case then you may need a slightly bigger buffer to work around the file seek time. Formatting the card before you store the data files could help here.

I would keep the buffer size to 200 and see how well the audio plays. If you notice pauses or stutters in the audio then increase the buffer size.

Post by **jollybv** » Fri Sep 30, 2016 6:45 am

Hi Ben

Thanks this is great but still having a problem opening the wave file when i step through the program still getting 239 returned. If im correct if it opens then it will play it back to me in the simulation just after the open file. I created a audio file in Audacity and exported it to C:\FC as WAV (Microsoft) signed 16-bit PCM is this the correct format.
I did try attaching the audio file for you to see if its correct format but it wont let me

Post by **Benj** » Fri Sep 30, 2016 7:10 pm

Hi Brian,

If you stick all the files in a zip archive then the forums should let you upload that.

Post by **jollybv** » Mon Oct 03, 2016 7:00 am

Hi Ben

Thanks here is the audio file

Post by **Benj** » Mon Oct 03, 2016 11:05 am

Hi Brian,

I thought the file looked a bit big for a 8 second sound file. You currently have the sample rate set at 48KHz and the Bit depth set to 32-bit.

I would recommend one of the following formats.

16KHz @ 16-Bit
16KHz @ 8-bit
8KHz @ 16-bit
8KHz @ 8-bit

As your using an 8-bit PIC you might struggle to maintain the first format, the middle two might be possible.

You can change all this using Audacity and hear the quality change between different formats.

The value you're getting 239 is file not found. So check that the Root Directory property for the FAT component is correct. If it says $(srcdir) then this is the current Flowcode project directory.

Post by **jollybv** » Wed Oct 05, 2016 7:29 am

Hi Ben

Thanks i have managed to record at 8Khz @ 16Bit and can open the file but now i cant seem to compile it is giving me the following message

Licensed to FlowCode User under Single user Pro License for 1 node(s)
Limitations: PIC18 max code size:Unlimited, max RAM banks:Unlimited
WAV_Streaming_Mod.c
Starting preprocessor: "C:\Program Files\Flowcode 6\compilers\pic\boostc\pp.exe" WAV_Streaming_Mod.c -i "C:\Program Files\Flowcode 6\compilers\pic\boostc\include" -d _PIC18F47J13 -la -c2 -o WAV_Streaming_Mod.pp -v -d _BOOSTC -d _PIC18 -d _CHAR_INDEX
..........
WAV_Streaming_Mod.c(495): error: missing right paren
WAV_Streaming_Mod.c(495): error: failure
failure
Completed BoostC compilation, return = 1
C:\Program Files\Flowcode 6\compilers\pic\boostc\boostc_18F.exe reported error code 1
FINISHED

What i have seen is that your example is set up for 16Khz @ 16Bit how would i change this to 8Khz @ 16bit do i change Timer1 to Interrupt properties from (read/write in one 16bit) to (read/write in two 8bit)

Post by **Benj** » Wed Oct 05, 2016 10:05 am

Hi Brian,

The only thing you should have to do is change the timer interrupt enable icon in the WavStartStream macro. You are looking for an interrupt frequency of 8000Hz e.g. 8KHz.

I changed the timer from timer 1 to timer 2 as this is more flexible on an 8-bit PIC device.

At your current clock speed I have used these settings.

: IntFreq.jpg (33.72 KiB) Viewed 18501 times

The compilation error is happening because the soundbuffer component was set to use Far memory which is not available on your target microcontroller. I've changed this now to the Near memory type instead. We probably need to make this more obvious or switch to the near model if the target chip doesn't support far memory.

WAV_Streaming_Mod.fcfx: (23.21 KiB) Downloaded 310 times

Post by **jollybv** » Wed Oct 05, 2016 12:17 pm

Hi Ben

Thanks will play a bit and let you know how i get along

Post by **jollybv** » Fri Oct 07, 2016 7:35 am

Hi Ben

Have been playing with this with no luck the file opens then starts to play but its just noise and takes long to run. I found a circuit of yours at https://sites.google.com/site/progic2/P ... dioAmp.GIF which i using for playback amp with the low pass filter. I have checked it on the scope but looks nothing like Audio I am using the RB7 as the PWM output is there maybe any other settings in the WAVStreemIntrrupt that i should change.

Post by **Benj** » Mon Oct 10, 2016 3:17 pm

Hi Brian,

Have you tried increasing the size of the circular buffer to see if this makes things any more reliable. It might be that simply jumping from sector to sector on the card is taking too much time and the buffer is not big enough to cope. Increase the buffer size until the compilation fails and then slowly start decreasing the size until you can compile again and this can be your benchmark to see if it's going to work at all on the 8-bit PIC.

PWM audio is generally reasonably good (the speaker will generally act as a low pass filter for you) but a DAC will be significantly better, especially if looking at the signal using a scope. The higher you can get the PWM frequency the better the output will be. On my tetris table I started with PWM and you get a bit of an audible hiss to the audio, switching to a high speed SPI DAC the hiss was completely gone.

Also the signing of the WAV file can have an effect. If you're still having problems then make sure the file is a signed 16-bit format. If this is wrong then you do get an unaudible static sound.

Matrix user forums

Speech To Text

Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text

Re: Speech To Text