Python

Getting Started

To get started, import AprilASR

import april_asr as april

Model

You can load a model like so:

your_model_path = "/path/to/model.april"
model = april.Model(your_model_path)

Models have a few metadata methods:

name: str = model.get_name()
description: str = model.get_description()
language: str = model.get_language()
sample_rate: int = model.get_sample_rate()

Session

Before creating a session, define a handler callback. Here is an example handler function that concatenates the tokens to a string and prints it:

def handler(result_type, tokens):
    s = ""
    for token in tokens:
        s = s + token.token
    
    if result_type == april.Result.FINAL_RECOGNITION:
        print("@"+s)
    elif result_type == april.Result.PARTIAL_RECOGNITION:
        print("-"+s)
    else:
        print("")

Now, a session may be created:

session = april.Session(model, handler)

Session Options

There are more options when it comes to creating a session, here is the initializer signature:

 class Session (model: april_asr.Model, callback: Callable[[april_asr.Result, List[april_asr.Token]], None], asynchronous: bool = False, no_rt: bool = False, speaker_name: str = '')

Refer to the General Concepts page for an explanation on asynchronous, non-realtime, and speaker name options

Feed data

Most of the examples use a very simple method like this to load and feed audio:

with open(wav_file_path, "rb") as f:
    data = f.read()

session.feed_pcm16(data)

This works only if the wav file is PCM16 and sampled in the correct sample rate. When you attempt to load an mp3, non-PCM16/non-16kHz wav file, or any other audio file in this way, you will likely get gibberish or no results.

To load more arbitrary audio files, you can use a Python library that handles audio loading (make sure librosa is installed: pip install librosa):

import librosa

# Load the audio samples as numpy floats
data, sr = librosa.load("/path/to/anything.mp3", sr=model.get_sample_rate(), mono=True)

# Convert the floats to PCM16 bytes
data = (data * 32767).astype("short").astype("<u2").tobytes()

session.feed_pcm16(data)

You can flush the session once the end of the file has been reached to force a final result:

session.flush()

Asynchronous

Asynchronous sessions are a little more complicated. You can create one by setting the asynchronous flag to true:

session = april.Session(model, handler, asynchronous=True)

Now, when feeding audio, be sure to feed it in realtime.

import librosa
import time

data, sr = librosa.load("/path/to/anything.mp3", sr=model.get_sample_rate(), mono=True)
data = (data * 32767).astype("short").astype("<u2").tobytes()

while len(data) > 0:
    chunk = data[:2400]
    data = data[2400:]
    
    session.feed_pcm16(chunk)
    if session.get_rt_speedup() > 1.5:
        print("Warning: System can't keep up, realtime speedup value of " + str(session.get_rt_speedup()))

    time.sleep(2400 / model.get_sample_rate())

Complete example

import april_asr as april
import librosa

# Change these values
model_path = "aprilv0_en-us.april"
audio_path = "audio.wav"

model = april.Model(model_path)


def handler(result_type, tokens):
    s = ""
    for token in tokens:
        s = s + token.token
    
    if result_type == april.Result.FINAL_RECOGNITION:
        print("@"+s)
    elif result_type == april.Result.PARTIAL_RECOGNITION:
        print("-"+s)
    else:
        print("")

session = april.Session(model, handler)

data, sr = librosa.load(audio_path, sr=model.get_sample_rate(), mono=True)
data = (data * 32767).astype("short").astype("<u2").tobytes()

session.feed_pcm16(data)
session.flush()

Congratulations! You have just performed speech recognition using AprilASR!

april-asr Documentation