Skip to main content
POST
/
tasks
/
modal
/
stt
Transcribe Audio via Modal Whisper ASR
curl --request POST \
  --url https://api.sunbird.ai/tasks/modal/stt \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form audio='@example-file' \
  --form 'language=<string>'
{
  "audio_transcription": "<string>",
  "diarization_output": {},
  "formatted_diarization_output": "<string>",
  "audio_transcription_id": 123,
  "audio_url": "<string>",
  "language": "<string>",
  "was_audio_trimmed": false,
  "original_duration_minutes": 123
}

Authorizations

Authorization
string
header
required

The access token received from the authorization server in the OAuth 2.0 flow.

Body

multipart/form-data
audio
file
required

Audio file to transcribe

language
string | null

Optional language hint for better transcriptions especially in local languages. Accepts a 3-letter ISO 639-2 code (e.g. 'eng', 'lug') or a full language name (e.g. 'english', 'luganda'). If omitted, the model auto-detects the language.

Response

Successful Response

Response model for speech-to-text transcription results.

This model represents the output of an STT transcription request, including the transcribed text, diarization data, and metadata.

Attributes: audio_transcription: The transcribed text from the audio. diarization_output: Speaker diarization data as a dictionary. formatted_diarization_output: Human-readable diarization output. audio_transcription_id: Database ID of the saved transcription. audio_url: URL or path to the processed audio file. language: The language code used for transcription. was_audio_trimmed: Whether the audio was trimmed to max duration. original_duration_minutes: Original duration if audio was trimmed.

audio_transcription
string | null

The transcribed text from the audio

diarization_output
Diarization Output · object

Speaker diarization data

formatted_diarization_output
string | null

Human-readable diarization output

audio_transcription_id
integer | null

Database ID of the saved transcription

audio_url
string | null

URL or path to the processed audio file

language
string | null

The language code used for transcription

was_audio_trimmed
boolean | null
default:false

Whether the audio was trimmed to max duration

original_duration_minutes
number | null

Original duration in minutes if audio was trimmed