april-asr

aprilasr is a minimal library that provides an API for offline streaming speech-to-text applications

Documentation

Status

This library is currently under development. Some features are unimplemented, it may have bugs and crashes, and there may be significant changes to the API. It may not yet be production-ready.

Furthermore, there's only one model that only does English and has some accuracy issues at that.

Language support

The library has a C API, and there are C# and Python bindings available, but these may not be stable yet.

Example

An example use of this library is provided in example.cpp. It can perform speech recognition on a wave file, or do streaming recognition by reading stdin.

It's built as the target main. After building aprilasr, you can run it like so:

$ ./main /path/to/file.wav /path/to/model.april

For streaming recognition, you can pipe parec into it:

$ parec --format=s16 --rate=16000 --channels=1 --latency-ms=100 | ./main - /path/to/model.april

Models

Currently only one model is available, the English model, based on csukuangfj's trained icefall model as the base, and trained with some extra data.

To make your own models, check out extra/exporting-howto.md

Building on Linux

Building requires ONNXRuntime v1.13.1. You can either try to build it from source or just download the release binaries.

Downloading ONNXRuntime

Run ./download_onnx_linux_x64.sh for linux-x64.

For other platforms the script should be very similar, or visit https://github.com/microsoft/onnxruntime/releases/tag/v1.13.1 and download the right zip/tgz file for your platform and extract the contents to a directory named lib.

You may also define the env variable ONNX_ROOT containing a path to where you extracted the archive, if placing it in lib isn't a choice.

Building ONNXRuntime from source (untested)

You don't need to do this if you've downloaded ONNXRuntime.

Follow the instructions here: https://onnxruntime.ai/docs/how-to/build/inferencing.html#linux

then run

cd build/Linux/RelWithDebInfo/
sudo make install

Building aprilasr

Run:

$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Release ..
$ make -j4

You should now have main, libaprilasr.so and libaprilasr_static.so.

If running main fails because it can't find libonnxruntime.so.1.13.1, you may need to make libonnxruntime.so.1.13.1 accessible like so:

$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`pwd`/../lib/lib/

Building on Windows (msvc)

Create a folder called lib in the april-asr folder.

Download onnxruntime-win-x64-1.13.1.zip and extract the insides of the onnxruntime-win-x64-1.13.1 folder to the lib folder

Run cmake to configure and generate Visual Studio project files. Make sure you select x64 as the target if you have downloaded the x64 version of ONNXRuntime.

Open the ALL_BUILD.vcxproj and everything should build. The output will be in the Release or Debug folders.

When running main.exe you may receive an error message like this:

The application was unable to start correctly (0xc000007b)

To fix this, you need to make onnxruntime.dll available. One way to do this is to copy onnxruntime.dll from lib/lib/onnxruntime.dll to build/Debug and build/Release. You may need to distribute the dll together with your application.

Applications

Currently I'm developing Live Captions, a Linux desktop app that provides live captioning.

Acknowledgements

Thanks to the k2-fsa/icefall contributors for creating the speech recognition recipes and models.

This project makes use of a few libraries:

  • pocketfft, authored by Martin Reinecke, Copyright (C) 2008-2018 Max-Planck-Society, licensed under BSD-3-Clause
  • Sonic library, authored by Bill Cox, Copyright (C) 2010 Bill Cox, licensed under Apache 2.0 license
  • tinycthread, authored by Marcus Geelnard and Evan Nemerson, licensed under zlib/libpng license

The bindings are based on the Vosk API bindings, which is another speech recognition library based on previous-generation Kaldi. Vosk is Copyright 2019 Alpha Cephei Inc. and licensed under the Apache 2.0 license.

General Concepts

Before writing any code, it's recommended to understand these concepts. These apply to all of the language bindings.

Model

Model Diagram

Models end with the file extension .april. You can load these files using the AprilASR API.

Each model has its own sample rate in which it expects audio. There is a method to get the expected sample rate. Usually, this is 16000 Hz.

Models also have additional metadata such as name, description, language.

After loading a model, you can create one or more sessions that use the model.

Session

Session Diagram

In the most common case, you will have one session based on one model.

The session is what performs the actual speech recognition. It has methods to input audio, and it calls your given handler with decoded results.

Decoding

Data flow diagram

To perform speech-to-text, feed PCM16 audio of the speech to the session through the feed_pcm16 method (or equivalent in the language). Make sure it's in the correct sample rate and mono.

PCM16 means array of shorts with values between -32768 to 32767, each one describing one sample.

After calling feed_pcm16, the session will invoke the neural network and call your specified handler with a result. You can present this result to the user or do whatever you want with the result.

Multiple sessions

Multi-Session Diagram

In more advanced use cases, you may have multiple sessions performing recognition on multiple separate audio streams. When doing this, you can re-use the same model to minimize the memory use.

Async vs Non-Async

Synchronous session (default)

Synchronous session diagram

The simplest (and default) mode of operation are synchronous sessions.

In a synchronous session, when you call the function to feed audio, it will process the audio synchronously, call the handler if a new result is decoded, and finally return once it's done.

This means that calls to feed audio are extremely slow. This may be undesirable in some cases, such as in a live captioning situation. For this reason, you can choose to construct asynchronous sessions instead.

Asynchronous session

Asynchronous session diagram

An asynchronous session does not perform calculations on the calling thread.

Calls to feed audio are quick, as it copies the data and triggers a second thread to do the actual calculations. The second thread calls the handler at some point, when processing is done.

A caveat is that you must feed audio at a rate that comes out to 1 second per second. You should not feed multiple seconds or minutes at once. The internal buffer cannot fit more than a few seconds.

Asynchronous sessions are intended for streaming audio as it comes in, for live captioning for example. If you feed more than 1 second every second, you will get poor results (if any).

Async-Realtime vs Async-Non-Realtime

In an asynchronous session, there is a problem that the system may not be fast enough to process audio at the rate that it's coming in. This is where realtime and non-realtime sessions differ in behavior.

Realtime session diagram

A realtime session will work around this by automatically deciding to speed up incoming audio to a rate where the system can keep up. This involves some audio processing code, which may or may not be desirable.

Speeding up audio may reduce accuracy. It may not be severe at small values (such as 1.2x), but at larger values (such as over 2.0x) the accuracy may be severely impacted. There is a method you can call to get the current speedup value to know when this is happening, so you can display a warning to the user or similar.

A non-realtime session ignores this problem and assumes the system is fast enough. If this is not the case, the results will fall behind, the internal buffer will get full, ErrorCantKeepUp result will be called, and the results will be disastrously horrible.

Handler

The results are given via a callback (handler). It gets called by the session whenever it has new results. The parameters given to the callback include the result type and the token array.

Note that in an asynchronous session, the handler will be called from a different thread. Be sure to expect this and write thread-safe code, or use a synchronous session.

You should try to make your handler function fast to avoid slowing down the session.

The actual text can be extracted from the token array.

Result Type

The handler gets called with an enum explaining the result type:

  • Partial Recognition - the token array is a partial result and an updated array will be given in the next call
  • Final Recognition - the token array is final, the next call will start from an empty array
  • Error Can't Keep Up - called in an asynchronous non-realtime session if the system is not fast enough, may also be called briefly in an asynchronous realtime session, the token array is empty or null
  • Silence - there has been silence, the token array is empty or null

Token

A token may be a single letter, a word chunk, an entire word, punctuation, or other arbitrary set of characters.

To convert a token array to a string, simply concatenate the strings from each token. You don't need to add spaces between tokens, the tokens contain their own formatting.

Token example list

For example, the text "THAT'S COOL ELEPHANTS" may be represented as tokens like so:

  • [" THAT", "'", "S", " CO", "OL", " E", "LE", "P", "H", "ANT", "S"]
  • Simply concatenating these strings will give you the correct " THAT'S COOL ELEPHANTS", but with an extra space at the beginning. You may want to strip the final string to avoid the extra space.

Tokens contain more data than just the string however. They also contain the log probability, and a boolean denoting whether or not it's a word boundary. In English, the word boundary value is equivalent to checking if the first character is a space.

Dependencies

AprilASR depends on ONNXRuntime for ML inference. You will need both libraries for it to work:

  • Linux: libaprilasr.so and libonnxruntime.so
  • Windows: libaprilasr.dll and onnxruntime.dll

Model Downloads

There are two published English models:

Some non-English models have been trained which are available for testing:

All models are based on RNN-T (lstm-transducer-stateless2 recipe on icefall)

More models to come

Python

Installation

Run pip install april_asr

Getting Started

To get started, import AprilASR

import april_asr as april

Model

You can load a model like so:

your_model_path = "/path/to/model.april"
model = april.Model(your_model_path)

Models have a few metadata methods:

name: str = model.get_name()
description: str = model.get_description()
language: str = model.get_language()
sample_rate: int = model.get_sample_rate()

Session

Before creating a session, define a handler callback. Here is an example handler function that concatenates the tokens to a string and prints it:

def handler(result_type, tokens):
    s = ""
    for token in tokens:
        s = s + token.token
    
    if result_type == april.Result.FINAL_RECOGNITION:
        print("@"+s)
    elif result_type == april.Result.PARTIAL_RECOGNITION:
        print("-"+s)
    else:
        print("")

Now, a session may be created:

session = april.Session(model, handler)

Session Options

There are more options when it comes to creating a session, here is the initializer signature:

 class Session (model: april_asr.Model, callback: Callable[[april_asr.Result, List[april_asr.Token]], None], asynchronous: bool = False, no_rt: bool = False, speaker_name: str = '')

Refer to the General Concepts page for an explanation on asynchronous, non-realtime, and speaker name options

Feed data

Most of the examples use a very simple method like this to load and feed audio:

with open(wav_file_path, "rb") as f:
    data = f.read()

session.feed_pcm16(data)

This works only if the wav file is PCM16 and sampled in the correct sample rate. When you attempt to load an mp3, non-PCM16/non-16kHz wav file, or any other audio file in this way, you will likely get gibberish or no results.

To load more arbitrary audio files, you can use a Python library that handles audio loading (make sure librosa is installed: pip install librosa):

import librosa

# Load the audio samples as numpy floats
data, sr = librosa.load("/path/to/anything.mp3", sr=model.get_sample_rate(), mono=True)

# Convert the floats to PCM16 bytes
data = (data * 32767).astype("short").astype("<u2").tobytes()

session.feed_pcm16(data)

You can flush the session once the end of the file has been reached to force a final result:

session.flush()

Asynchronous

Asynchronous sessions are a little more complicated. You can create one by setting the asynchronous flag to true:

session = april.Session(model, handler, asynchronous=True)

Now, when feeding audio, be sure to feed it in realtime.

import librosa
import time

data, sr = librosa.load("/path/to/anything.mp3", sr=model.get_sample_rate(), mono=True)
data = (data * 32767).astype("short").astype("<u2").tobytes()

while len(data) > 0:
    chunk = data[:2400]
    data = data[2400:]
    
    session.feed_pcm16(chunk)
    if session.get_rt_speedup() > 1.5:
        print("Warning: System can't keep up, realtime speedup value of " + str(session.get_rt_speedup()))

    time.sleep(2400 / model.get_sample_rate())

Complete example

import april_asr as april
import librosa

# Change these values
model_path = "aprilv0_en-us.april"
audio_path = "audio.wav"

model = april.Model(model_path)


def handler(result_type, tokens):
    s = ""
    for token in tokens:
        s = s + token.token
    
    if result_type == april.Result.FINAL_RECOGNITION:
        print("@"+s)
    elif result_type == april.Result.PARTIAL_RECOGNITION:
        print("-"+s)
    else:
        print("")

session = april.Session(model, handler)

data, sr = librosa.load(audio_path, sr=model.get_sample_rate(), mono=True)
data = (data * 32767).astype("short").astype("<u2").tobytes()

session.feed_pcm16(data)
session.flush()

Congratulations! You have just performed speech recognition using AprilASR!

<!doctype html>

april_asr API documentation

Package april_asr

april_asr provides Python bindings for the aprilasr library.

aprilasr provides an API for offline streaming speech-to-text applications, and enables low-latency on-device realtime speech recognition for live captioning or other speech recognition use cases.

Expand source code
"""
april_asr provides Python bindings for the aprilasr library.

aprilasr provides an API for offline streaming speech-to-text applications, and enables low-latency on-device realtime speech recognition for live captioning or other speech recognition use cases. """

all = ["Token", "Result", "Model", "Session"]

from ._april import Token, Result, Model, Session

Classes

class Model (path: str)

Models end with the file extension .april. You need to pass a path to such a file to construct a Model type.

Each model has its own sample rate in which it expects audio. There is a method to get the expected sample rate. Usually, this is 16000 Hz.

Models also have additional metadata such as name, description, language.

After loading a model, you can create one or more sessions that use the model.

Expand source code
class Model:
    """
    Models end with the file extension `.april`. You need to pass a path to
    such a file to construct a Model type.
Each model has its own sample rate in which it expects audio. There is a
method to get the expected sample rate. Usually, this is 16000 Hz.

Models also have additional metadata such as name, description, language.

After loading a model, you can create one or more sessions that use the
model.
&#34;&#34;&#34;
def __init__(self, path: str):
    self._handle = _c.ffi.aam_create_model(path)

    if self._handle is None:
        raise Exception(&#34;Failed to load model&#34;)

def get_name(self) -&gt; str:
    &#34;&#34;&#34;Get the name from the model&#39;s metadata&#34;&#34;&#34;
    return _c.ffi.aam_get_name(self._handle)

def get_description(self) -&gt; str:
    &#34;&#34;&#34;Get the description from the model&#39;s metadata&#34;&#34;&#34;
    return _c.ffi.aam_get_description(self._handle)

def get_language(self) -&gt; str:
    &#34;&#34;&#34;Get the language from the model&#39;s metadata&#34;&#34;&#34;
    return _c.ffi.aam_get_language(self._handle)

def get_sample_rate(self) -&gt; int:
    &#34;&#34;&#34;Get the sample rate from the model&#39;s metadata&#34;&#34;&#34;
    return _c.ffi.aam_get_sample_rate(self._handle)

def __del__(self):
    _c.ffi.aam_free(self._handle)
    self._handle = None</code></pre>

Methods

def get_description(self) ‑> str

Get the description from the model's metadata

Expand source code
def get_description(self) -> str:
    """Get the description from the model's metadata"""
    return _c.ffi.aam_get_description(self._handle)
def get_language(self) ‑> str

Get the language from the model's metadata

Expand source code
def get_language(self) -> str:
    """Get the language from the model's metadata"""
    return _c.ffi.aam_get_language(self._handle)
def get_name(self) ‑> str

Get the name from the model's metadata

Expand source code
def get_name(self) -> str:
    """Get the name from the model's metadata"""
    return _c.ffi.aam_get_name(self._handle)
def get_sample_rate(self) ‑> int

Get the sample rate from the model's metadata

Expand source code
def get_sample_rate(self) -> int:
    """Get the sample rate from the model's metadata"""
    return _c.ffi.aam_get_sample_rate(self._handle)
class Result (value, names=None, *, module=None, qualname=None, type=None, start=1)

Result type that is passed to your handler

Expand source code
class Result(IntEnum):
    """
    Result type that is passed to your handler
    """
PARTIAL_RECOGNITION = 1,
&#34;&#34;&#34;A partial recognition. The next handler call will contain an updated
list of tokens.&#34;&#34;&#34;

FINAL_RECOGNITION = 2,
&#34;&#34;&#34;A final recognition. The next handler call will start from an empty
token list.&#34;&#34;&#34;

ERROR_CANT_KEEP_UP = 3,
&#34;&#34;&#34;In an asynchronous session, this may be called when the system can&#39;t
keep up with the incoming audio, and samples have been dropped. The
accuracy will be affected. An empty token list is given&#34;&#34;&#34;

SILENCE = 4
&#34;&#34;&#34;Called after some silence. An empty token list is given&#34;&#34;&#34;</code></pre>

Ancestors

  • enum.IntEnum
  • builtins.int
  • enum.Enum

Class variables

var ERROR_CANT_KEEP_UP

In an asynchronous session, this may be called when the system can't keep up with the incoming audio, and samples have been dropped. The accuracy will be affected. An empty token list is given

var FINAL_RECOGNITION

A final recognition. The next handler call will start from an empty token list.

var PARTIAL_RECOGNITION

A partial recognition. The next handler call will contain an updated list of tokens.

var SILENCE

Called after some silence. An empty token list is given

class Session (model: april_asr.Model, callback: Callable[[april_asr.Result, List[april_asr.Token]], None], asynchronous: bool = False, no_rt: bool = False, speaker_name: str = '')

The session is what performs the actual speech recognition. It has methods to input audio, and it calls your given handler with decoded results.

You need to pass a Model when constructing a Session.

Expand source code
class Session:
    """
    The session is what performs the actual speech recognition. It has
    methods to input audio, and it calls your given handler with decoded
    results.
You need to pass a Model when constructing a Session.
&#34;&#34;&#34;
def __init__(self,
        model: Model,
        callback: Callable[[Result, List[Token]], None],
        asynchronous: bool = False,
        no_rt: bool = False,
        speaker_name: str = &#34;&#34;
    ):
    config = _c.AprilConfig()
    config.flags = _c.AprilConfigFlagBits()

    if asynchronous and no_rt:
        config.flags.value = 2
    elif asynchronous:
        config.flags.value = 1
    else:
        config.flags.value = 0

    if speaker_name != &#34;&#34;:
        spkr_data = struct.pack(&#34;@q&#34;, hash(speaker_name)) * 2
        config.speaker = _c.AprilSpeakerID.from_buffer_copy(spkr_data)

    config.handler = _HANDLER
    config.userdata = id(self)

    self.model = model
    self._handle = _c.ffi.aas_create_session(model._handle, config)
    if self._handle is None:
        raise Exception()

    self.callback = callback

def get_rt_speedup(self) -&gt; float:
    &#34;&#34;&#34;
    If the session is asynchronous and realtime, this will return a
    positive float. A value below 1.0 means the session is keeping up, and
    a value greater than 1.0 means the input audio is being sped up by that
    factor in order to keep up. When the value is greater 1.0, the accuracy
    is likely to be affected.
    &#34;&#34;&#34;
    return _c.ffi.aas_realtime_get_speedup(self._handle)

def feed_pcm16(self, data: bytes) -&gt; None:
    &#34;&#34;&#34;
    Feed the given pcm16 samples in bytes to the session. If the session is
    asynchronous, this will return immediately and queue the data for the
    background thread to process. If the session is not asynchronous, this
    will block your thread and potentially call the handler before
    returning.
    &#34;&#34;&#34;
    _c.ffi.aas_feed_pcm16(self._handle, data)

def flush(self) -&gt; None:
    &#34;&#34;&#34;
    Flush any remaining samples and force the session to produce a final
    result.
    &#34;&#34;&#34;
    _c.ffi.aas_flush(self._handle)

def __del__(self):
    _c.ffi.aas_free(self._handle)
    self.model = None
    self._handle = None</code></pre>

Methods

def feed_pcm16(self, data: bytes) ‑> None

Feed the given pcm16 samples in bytes to the session. If the session is asynchronous, this will return immediately and queue the data for the background thread to process. If the session is not asynchronous, this will block your thread and potentially call the handler before returning.

Expand source code
def feed_pcm16(self, data: bytes) -> None:
    """
    Feed the given pcm16 samples in bytes to the session. If the session is
    asynchronous, this will return immediately and queue the data for the
    background thread to process. If the session is not asynchronous, this
    will block your thread and potentially call the handler before
    returning.
    """
    _c.ffi.aas_feed_pcm16(self._handle, data)
def flush(self) ‑> None

Flush any remaining samples and force the session to produce a final result.

Expand source code
def flush(self) -> None:
    """
    Flush any remaining samples and force the session to produce a final
    result.
    """
    _c.ffi.aas_flush(self._handle)
def get_rt_speedup(self) ‑> float

If the session is asynchronous and realtime, this will return a positive float. A value below 1.0 means the session is keeping up, and a value greater than 1.0 means the input audio is being sped up by that factor in order to keep up. When the value is greater 1.0, the accuracy is likely to be affected.

Expand source code
def get_rt_speedup(self) -> float:
    """
    If the session is asynchronous and realtime, this will return a
    positive float. A value below 1.0 means the session is keeping up, and
    a value greater than 1.0 means the input audio is being sped up by that
    factor in order to keep up. When the value is greater 1.0, the accuracy
    is likely to be affected.
    """
    return _c.ffi.aas_realtime_get_speedup(self._handle)
class Token (token)

A token may be a single letter, a word chunk, an entire word, punctuation, or other arbitrary set of characters.

To convert a token array to a string, simply concatenate the strings from each token. You don't need to add spaces between tokens, the tokens contain their own formatting.

Tokens also contain the log probability, and a boolean denoting whether or not it's a word boundary. In English, the word boundary value is equivalent to checking if the first character is a space.

Expand source code
class Token:
    """
    A token may be a single letter, a word chunk, an entire word, punctuation,
    or other arbitrary set of characters.
To convert a token array to a string, simply concatenate the strings from
each token. You don&#39;t need to add spaces between tokens, the tokens
contain their own formatting.

Tokens also contain the log probability, and a boolean denoting whether or
not it&#39;s a word boundary. In English, the word boundary value is equivalent
to checking if the first character is a space.
&#34;&#34;&#34;

token: str = &#34;&#34;
logprob: float = 0.0
word_boundary: bool = False
sentence_end: bool = False
time: float = 0.0

def __init__(self, token):
    self.token = token.token.decode(&#34;utf-8&#34;)
    self.logprob = token.logprob
    self.word_boundary = (token.flags.value &amp; 1) != 0
    self.sentence_end = (token.flags.value &amp; 2) != 0
    self.time = float(token.time_ms) / 1000.0</code></pre>

Class variables

var logprob : float
var sentence_end : bool
var time : float
var token : str
var word_boundary : bool

C#

Installation

Install the nuget package from https://www.nuget.org/packages/AprilAsr

Getting Started

To get started, import AprilAsr

using AprilAsr;

Model

You can load a model like so:

string modelPath = "/path/to/model.april"; 
AprilModel model = new AprilModel(modelPath);

Models have a few metadata fields:

string name = model.Name;
string description = model.Description;
string language = model.Language;
int sampleRate = model.SampleRate;

Session

A session needs a callback. You can define one inline, this example concatenates the tokens to a string and prints it.

AprilSession session = new AprilSession(model, (result, tokens) => {
    if (tokens == null) return;

    string s = "";
    if(result == AprilResultKind.PartialRecognition) {
        s = "- ";
    }else if(result == AprilResultKind.FinalRecognition) {
        s = "@ ";
    }else{
        s = " ";
    }

    foreach(AprilToken token in tokens) {
        s += token.Token;
    }

    Console.WriteLine(s);
});

Session Options

There are more options when it comes to creating a session, here is the initializer signature:

public AprilSession(AprilModel model, SessionCallback callback, bool async = false, bool noRT = false, string speakerName = "") {

Refer to the General Concepts page for an explanation on asynchronous, non-realtime, and speaker name options

Feed data

Most of the examples use a very simple method like this to load and feed audio:

// Read the file data (assumes wav file is 16-bit PCM wav)
var fileData = File.ReadAllBytes(wavFilePath);
short[] shorts = new short[fileData.Length / 2];
Buffer.BlockCopy(fileData, 0, shorts, 0, fileData.Length);

// Feed the data
session.FeedPCM16(shorts, shorts.Length);

This works only if the wav file is PCM16 and sampled in the correct sample rate. When you attempt to load an mp3, non-PCM16/non-16kHz wav file, or any other audio file in this way, you will likely get gibberish or no results.

Asynchronous

Asynchronous sessions are a little more complicated. You can create one by setting the asynchronous flag to true:

AprilSession session = new AprilSession(..., async: true);

Now, when feeding audio, be sure to feed it in realtime.

var fileData = File.ReadAllBytes(wavFilePath);
short[] shorts = new short[2400];

for(int i=0; i<(fileData.Length/2); i+=shorts.Length){
    int size = Math.Min(shorts.Length, (fileData.Length/2) - i);
    Buffer.BlockCopy(fileData, i*2, shorts, 0, size*2);
    session.FeedPCM16(shorts, size);
    Thread.Sleep(size * 1000 / model.SampleRate);
}

session.Flush();

Complete example

using AprilAsr;

var modelPath = "aprilv0_en-us.april";
var wavFilePath = "audio.wav";

// Load the model and print metadata
var model = new AprilModel(modelPath);
Console.WriteLine("Name: " + model.Name);
Console.WriteLine("Description: " + model.Description);
Console.WriteLine("Language: " + model.Language);

// Create the session with an inline callback
var session = new AprilSession(model, (result, tokens) => {
    string s = "";
    if(result == AprilResultKind.PartialRecognition) {
        s = "- ";
    }else if(result == AprilResultKind.FinalRecognition) {
        s = "@ ";
    }else{
        s = " ";
    }

    foreach(AprilToken token in tokens) {
        s += token.Token;
    }

    Console.WriteLine(s);
});

// Read the file data (assumes wav file is 16-bit PCM wav)
var fileData = File.ReadAllBytes(wavFilePath);
short[] shorts = new short[fileData.Length / 2];
Buffer.BlockCopy(fileData, 0, shorts, 0, fileData.Length);

// Feed the data and flush
session.FeedPCM16(shorts, shorts.Length);
session.Flush();

Congratulations! You have just performed speech recognition using AprilAsr!

AprilAsr: AprilAsr Namespace Reference
AprilAsr
AprilAsr Namespace Reference

Classes

class  AprilModel
 Models end with the file extension .april. You need to pass a path to such a file to construct a Model type. More...
 
class  AprilSession
 The session is what performs the actual speech recognition. It has methods to input audio, and it calls your given handler with decoded results. More...
 
struct  AprilToken
 A token may be a single letter, a word chunk, an entire word, punctuation, or other arbitrary set of characters. More...
 

Enumerations

enum class  AprilResultKind : int { PartialRecognition = 1 , FinalRecognition = 2 , ErrorCantKeepUp = 3 , Silence = 4 }
 Result type that is passed to your handler More...
 

Functions

delegate void SessionCallback (AprilResultKind kind, AprilToken[] tokens)
 Session callback type. You must provide one of such type when constructing a session. More...
 

Enumeration Type Documentation

◆ AprilResultKind

enum AprilAsr.AprilResultKind : int
strong

Result type that is passed to your handler

Enumerator
PartialRecognition 

A partial recognition. The next handler call will contain an updated list of tokens.

FinalRecognition 

A final recognition. The next handler call will start from an empty token list.

ErrorCantKeepUp 

In an asynchronous session, this may be called when the system can't keep up with the incoming audio, and samples have been dropped. The accuracy will be affected. An empty token list is given

Silence 

Called after some silence. An empty token list is given

Function Documentation

◆ SessionCallback()

delegate void AprilAsr.SessionCallback ( AprilResultKind  kind,
AprilToken[]  tokens 
)

Session callback type. You must provide one of such type when constructing a session.