STT Coqui Server
REST API for local STT on Speech Interaction Picrofts based on the websocket-based server coqui-ai python example. Using a REST API for easier usage in Mycroft project.
Configuration
Server configuration is specified in the application.conf
file.
Usage
Setup
Install OS dependencies:
sudo apt-get update && \
sudo apt-get install --no-install-recommends -y wget ffmpeg && \
sudo apt-get clean
In case of an apt-get update
error changed its 'Suite' value from 'stable' to 'oldstable'
which is likely on picroft use sudo apt-get update --allow-releaseinfo-change
Download pretrained model:
mkdir ~/coqui_stt
cd coqui_stt
mkdir models
cd models
curl -O https://github.com/coqui-ai/STT-models/releases/download/english/coqui/v1.0.0-large-vocab/model.tflite"
curl -O https://github.com/coqui-ai/STT-models/releases/download/english/coqui/v1.0.0-large-vocab/large-vocabulary.scorer
Checkout the model zoo to find versions matching your usecase.
Create and activate venv:
python3 -m venv stt-venv
source stt-venv/bin/activate
This likely fails on the picroft there you need to run sudo apt get install python3-venv
first as suggested in the corresponding error message.
Clone this repo and install dependencies:
git clone https://gitlab.mi.hdm-stuttgart.de/heisler/stt-coqui-server.git
cd stt-coui-server
pip3 install https://github.com/coqui-ai/STT/releases/download/v1.3.0/stt-1.3.0-cp37-cp37m-linux_armv7l.whl
pip3 install -r requirements.txt
Note that installing stt
using pip
might fail on a raspberry pi (ERROR: Could not find a version that satifies the requirement stt
). The command above worked for me and is documented here
Starting the server
Make sure your model and scorer files are present in the same directory as the application.conf
file. Then execute:
python -m stt_server.app
(Automatically) Starting the server as a service (on startup)
Put a file called stt.service
in /etc/systemd/system
with the following contents:
[Unit]
Description=STT API Service
After=network.target
[Service]
User=pi
Restart=always
Type=simple
WorkingDirectory=/home/pi/coqui-stt
ExecStart=/home/pi/coqui-stt/venv-stt/bin/python -m /home/pi/coqui-stt/venv-stt/stt_server.app
[Install]
WantedBy=multi-user.target
Enable and start the new defined service:
sudo systemctl enable deepspeech.service
sudo systemctl start deepspeech.service
Sending requests to server
Use curl
or rather requests.post()
curl -X POST -F "speech=@speech.mp3" http://127.0.0.1:8000/api/v1/stt
Speech data may be provided in whatever audio format which ffmpeg is able to convert to wav.
TODO: since mycroft only records in wav we might get rid of the ffmpeg dependency.
Usage in mycroft
- Create according class in
mycroft-core/mycroft/stt/__init__.py
- Add it to
STTFactory
in same file - Adjust settings to use it