Upload an audio file and get transcription using the Modal-hosted Whisper model. Optionally specify a language to improve accuracy.
The access token received from the authorization server in the OAuth 2.0 flow.
Successful Response
Response model for speech-to-text transcription results.
This model represents the output of an STT transcription request, including the transcribed text, diarization data, and metadata.
Attributes: audio_transcription: The transcribed text from the audio. diarization_output: Speaker diarization data as a dictionary. formatted_diarization_output: Human-readable diarization output. audio_transcription_id: Database ID of the saved transcription. audio_url: URL or path to the processed audio file. language: The language code used for transcription. was_audio_trimmed: Whether the audio was trimmed to max duration. original_duration_minutes: Original duration if audio was trimmed.
The transcribed text from the audio
Speaker diarization data
Human-readable diarization output
Database ID of the saved transcription
URL or path to the processed audio file
The language code used for transcription
Whether the audio was trimmed to max duration
Original duration in minutes if audio was trimmed