Skip to content
Snippets Groups Projects

STT Coqui Server

REST API for local STT on Speech Interaction Picrofts based on the websocket-based server coqui-ai python example. Using a REST API for easier usage in Mycroft project.

Configuration

Server configuration is specified in the application.conf file.

Usage

Setup

Install OS dependencies:

sudo apt-get update && \
    sudo apt-get install --no-install-recommends -y wget ffmpeg && \
    sudo apt-get clean

In case of an apt-get update error changed its 'Suite' value from 'stable' to 'oldstable' which is likely on picroft use sudo apt-get update --allow-releaseinfo-change

Download pretrained model:

mkdir ~/coqui_stt
cd coqui_stt
mkdir models
cd models
curl -O https://github.com/coqui-ai/STT-models/releases/download/english/coqui/v1.0.0-large-vocab/model.tflite"
curl -O https://github.com/coqui-ai/STT-models/releases/download/english/coqui/v1.0.0-large-vocab/large-vocabulary.scorer

Checkout the model zoo to find versions matching your usecase.

Create and activate venv:

python3 -m venv stt-venv
source stt-venv/bin/activate

This likely fails on the picroft there you need to run sudo apt get install python3-venv first as suggested in the corresponding error message.

Clone this repo and install dependencies:

git clone https://gitlab.mi.hdm-stuttgart.de/heisler/stt-coqui-server.git
cd stt-coui-server
pip3 install https://github.com/coqui-ai/STT/releases/download/v1.3.0/stt-1.3.0-cp37-cp37m-linux_armv7l.whl
pip3 install -r requirements.txt

Note that installing stt using pip might fail on a raspberry pi (ERROR: Could not find a version that satifies the requirement stt). The command above worked for me and is documented here

Starting the server

Make sure your model and scorer files are present in the same directory as the application.conf file. Then execute:

python -m stt_server.app

(Automatically) Starting the server as a service (on startup)

Put a file called stt.service in /etc/systemd/system with the following contents:

[Unit]
Description=STT API Service
After=network.target

[Service]
User=pi
Restart=always
Type=simple
WorkingDirectory=/home/pi/coqui-stt
ExecStart=/home/pi/coqui-stt/venv-stt/bin/python -m /home/pi/coqui-stt/venv-stt/stt_server.app

[Install]
WantedBy=multi-user.target

Enable and start the new defined service:

sudo systemctl enable deepspeech.service
sudo systemctl start deepspeech.service

Sending requests to server

Use curl or rather requests.post()

curl -X POST -F "speech=@speech.mp3" http://127.0.0.1:8000/api/v1/stt

Speech data may be provided in whatever audio format which ffmpeg is able to convert to wav.

TODO: since mycroft only records in wav we might get rid of the ffmpeg dependency.

Usage in mycroft

  • Create according class in mycroft-core/mycroft/stt/__init__.py
  • Add it to STTFactory in same file
  • Adjust settings to use it