Package april_asr
april_asr provides Python bindings for the aprilasr library.
aprilasr provides an API for offline streaming speech-to-text applications, and enables low-latency on-device realtime speech recognition for live captioning or other speech recognition use cases.
Expand source code
"""
april_asr provides Python bindings for the aprilasr library.
aprilasr provides an API for offline streaming speech-to-text applications, and
enables low-latency on-device realtime speech recognition for live captioning
or other speech recognition use cases.
"""
__all__ = ["Token", "Result", "Model", "Session"]
from ._april import Token, Result, Model, Session
Classes
class Model (path: str)
-
Models end with the file extension
.april
. You need to pass a path to such a file to construct a Model type.Each model has its own sample rate in which it expects audio. There is a method to get the expected sample rate. Usually, this is 16000 Hz.
Models also have additional metadata such as name, description, language.
After loading a model, you can create one or more sessions that use the model.
Expand source code
class Model: """ Models end with the file extension `.april`. You need to pass a path to such a file to construct a Model type. Each model has its own sample rate in which it expects audio. There is a method to get the expected sample rate. Usually, this is 16000 Hz. Models also have additional metadata such as name, description, language. After loading a model, you can create one or more sessions that use the model. """ def __init__(self, path: str): self._handle = _c.ffi.aam_create_model(path) if self._handle is None: raise Exception("Failed to load model") def get_name(self) -> str: """Get the name from the model's metadata""" return _c.ffi.aam_get_name(self._handle) def get_description(self) -> str: """Get the description from the model's metadata""" return _c.ffi.aam_get_description(self._handle) def get_language(self) -> str: """Get the language from the model's metadata""" return _c.ffi.aam_get_language(self._handle) def get_sample_rate(self) -> int: """Get the sample rate from the model's metadata""" return _c.ffi.aam_get_sample_rate(self._handle) def __del__(self): _c.ffi.aam_free(self._handle) self._handle = None
Methods
def get_description(self) ‑> str
-
Get the description from the model's metadata
Expand source code
def get_description(self) -> str: """Get the description from the model's metadata""" return _c.ffi.aam_get_description(self._handle)
def get_language(self) ‑> str
-
Get the language from the model's metadata
Expand source code
def get_language(self) -> str: """Get the language from the model's metadata""" return _c.ffi.aam_get_language(self._handle)
def get_name(self) ‑> str
-
Get the name from the model's metadata
Expand source code
def get_name(self) -> str: """Get the name from the model's metadata""" return _c.ffi.aam_get_name(self._handle)
def get_sample_rate(self) ‑> int
-
Get the sample rate from the model's metadata
Expand source code
def get_sample_rate(self) -> int: """Get the sample rate from the model's metadata""" return _c.ffi.aam_get_sample_rate(self._handle)
class Result (value, names=None, *, module=None, qualname=None, type=None, start=1)
-
Result type that is passed to your handler
Expand source code
class Result(IntEnum): """ Result type that is passed to your handler """ PARTIAL_RECOGNITION = 1, """A partial recognition. The next handler call will contain an updated list of tokens.""" FINAL_RECOGNITION = 2, """A final recognition. The next handler call will start from an empty token list.""" ERROR_CANT_KEEP_UP = 3, """In an asynchronous session, this may be called when the system can't keep up with the incoming audio, and samples have been dropped. The accuracy will be affected. An empty token list is given""" SILENCE = 4 """Called after some silence. An empty token list is given"""
Ancestors
- enum.IntEnum
- builtins.int
- enum.Enum
Class variables
var ERROR_CANT_KEEP_UP
-
In an asynchronous session, this may be called when the system can't keep up with the incoming audio, and samples have been dropped. The accuracy will be affected. An empty token list is given
var FINAL_RECOGNITION
-
A final recognition. The next handler call will start from an empty token list.
var PARTIAL_RECOGNITION
-
A partial recognition. The next handler call will contain an updated list of tokens.
var SILENCE
-
Called after some silence. An empty token list is given
class Session (model: april_asr.Model, callback: Callable[[april_asr.Result, List[april_asr.Token]], None], asynchronous: bool = False, no_rt: bool = False, speaker_name: str = '')
-
The session is what performs the actual speech recognition. It has methods to input audio, and it calls your given handler with decoded results.
You need to pass a Model when constructing a Session.
Expand source code
class Session: """ The session is what performs the actual speech recognition. It has methods to input audio, and it calls your given handler with decoded results. You need to pass a Model when constructing a Session. """ def __init__(self, model: Model, callback: Callable[[Result, List[Token]], None], asynchronous: bool = False, no_rt: bool = False, speaker_name: str = "" ): config = _c.AprilConfig() config.flags = _c.AprilConfigFlagBits() if asynchronous and no_rt: config.flags.value = 2 elif asynchronous: config.flags.value = 1 else: config.flags.value = 0 if speaker_name != "": spkr_data = struct.pack("@q", hash(speaker_name)) * 2 config.speaker = _c.AprilSpeakerID.from_buffer_copy(spkr_data) config.handler = _HANDLER config.userdata = id(self) self.model = model self._handle = _c.ffi.aas_create_session(model._handle, config) if self._handle is None: raise Exception() self.callback = callback def get_rt_speedup(self) -> float: """ If the session is asynchronous and realtime, this will return a positive float. A value below 1.0 means the session is keeping up, and a value greater than 1.0 means the input audio is being sped up by that factor in order to keep up. When the value is greater 1.0, the accuracy is likely to be affected. """ return _c.ffi.aas_realtime_get_speedup(self._handle) def feed_pcm16(self, data: bytes) -> None: """ Feed the given pcm16 samples in bytes to the session. If the session is asynchronous, this will return immediately and queue the data for the background thread to process. If the session is not asynchronous, this will block your thread and potentially call the handler before returning. """ _c.ffi.aas_feed_pcm16(self._handle, data) def flush(self) -> None: """ Flush any remaining samples and force the session to produce a final result. """ _c.ffi.aas_flush(self._handle) def __del__(self): _c.ffi.aas_free(self._handle) self.model = None self._handle = None
Methods
def feed_pcm16(self, data: bytes) ‑> None
-
Feed the given pcm16 samples in bytes to the session. If the session is asynchronous, this will return immediately and queue the data for the background thread to process. If the session is not asynchronous, this will block your thread and potentially call the handler before returning.
Expand source code
def feed_pcm16(self, data: bytes) -> None: """ Feed the given pcm16 samples in bytes to the session. If the session is asynchronous, this will return immediately and queue the data for the background thread to process. If the session is not asynchronous, this will block your thread and potentially call the handler before returning. """ _c.ffi.aas_feed_pcm16(self._handle, data)
def flush(self) ‑> None
-
Flush any remaining samples and force the session to produce a final result.
Expand source code
def flush(self) -> None: """ Flush any remaining samples and force the session to produce a final result. """ _c.ffi.aas_flush(self._handle)
def get_rt_speedup(self) ‑> float
-
If the session is asynchronous and realtime, this will return a positive float. A value below 1.0 means the session is keeping up, and a value greater than 1.0 means the input audio is being sped up by that factor in order to keep up. When the value is greater 1.0, the accuracy is likely to be affected.
Expand source code
def get_rt_speedup(self) -> float: """ If the session is asynchronous and realtime, this will return a positive float. A value below 1.0 means the session is keeping up, and a value greater than 1.0 means the input audio is being sped up by that factor in order to keep up. When the value is greater 1.0, the accuracy is likely to be affected. """ return _c.ffi.aas_realtime_get_speedup(self._handle)
class Token (token)
-
A token may be a single letter, a word chunk, an entire word, punctuation, or other arbitrary set of characters.
To convert a token array to a string, simply concatenate the strings from each token. You don't need to add spaces between tokens, the tokens contain their own formatting.
Tokens also contain the log probability, and a boolean denoting whether or not it's a word boundary. In English, the word boundary value is equivalent to checking if the first character is a space.
Expand source code
class Token: """ A token may be a single letter, a word chunk, an entire word, punctuation, or other arbitrary set of characters. To convert a token array to a string, simply concatenate the strings from each token. You don't need to add spaces between tokens, the tokens contain their own formatting. Tokens also contain the log probability, and a boolean denoting whether or not it's a word boundary. In English, the word boundary value is equivalent to checking if the first character is a space. """ token: str = "" logprob: float = 0.0 word_boundary: bool = False sentence_end: bool = False time: float = 0.0 def __init__(self, token): self.token = token.token.decode("utf-8") self.logprob = token.logprob self.word_boundary = (token.flags.value & 1) != 0 self.sentence_end = (token.flags.value & 2) != 0 self.time = float(token.time_ms) / 1000.0
Class variables
var logprob : float
var sentence_end : bool
var time : float
var token : str
var word_boundary : bool