Skip to main content
POST
/
tasks
/
modal
/
stt
Transcribe Audio via Modal Whisper ASR
curl --request POST \
  --url https://api.sunbird.ai/tasks/modal/stt \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form audio='@example-file' \
  --form 'language=<string>'
{
  "audio_transcription": "<string>",
  "diarization_output": {},
  "formatted_diarization_output": "<string>",
  "audio_transcription_id": 123,
  "audio_url": "<string>",
  "language": "<string>",
  "was_audio_trimmed": false,
  "original_duration_minutes": 123
}

Documentation Index

Fetch the complete documentation index at: https://docs.sunbird.ai/llms.txt

Use this file to discover all available pages before exploring further.

Authorizations

Authorization
string
header
required

The access token received from the authorization server in the OAuth 2.0 flow.

Body

multipart/form-data
audio
file
required

Audio file to transcribe

language
string | null

Optional language hint for better transcriptions especially in local languages. Accepts a 3-letter ISO 639-2 code (e.g. 'eng', 'lug') or a full language name (e.g. 'english', 'luganda'). If omitted, the model auto-detects the language.

Response

Successful Response

Response model for speech-to-text transcription results.

This model represents the output of an STT transcription request, including the transcribed text, diarization data, and metadata.

Attributes: audio_transcription: The transcribed text from the audio. diarization_output: Speaker diarization data as a dictionary. formatted_diarization_output: Human-readable diarization output. audio_transcription_id: Database ID of the saved transcription. audio_url: URL or path to the processed audio file. language: The language code used for transcription. was_audio_trimmed: Whether the audio was trimmed to max duration. original_duration_minutes: Original duration if audio was trimmed.

audio_transcription
string | null

The transcribed text from the audio

diarization_output
Diarization Output · object

Speaker diarization data

formatted_diarization_output
string | null

Human-readable diarization output

audio_transcription_id
integer | null

Database ID of the saved transcription

audio_url
string | null

URL or path to the processed audio file

language
string | null

The language code used for transcription

was_audio_trimmed
boolean | null
default:false

Whether the audio was trimmed to max duration

original_duration_minutes
number | null

Original duration in minutes if audio was trimmed