This skill allows users to convert spoken language into text across 23 languages, providing various output modes like transcribe, translate, and more. It's designed for developers needing speech recognition capabilities in their applications.
$ npx skills add https://github.com/sarvamai/skills --skill speech-to-text.This skill transcribes audio to text using Sarvam AI's Saaras v3 model, supporting 23 Indian languages with auto language detection. It offers five output modes: transcribe, translate, verbatim, transliteration, and code-mixed text. Developers can choose between REST API (up to 30 seconds), WebSocket streaming (up to 8 hours), or Batch API with speaker diarization for longer audio files. The skill is ideal for building voice-enabled applications, meeting transcription systems, and voice interfaces that require accurate speech recognition across Indian languages.
Install via command line and integrate using the Sarvam AI Python client.
Transcribing meetings or lectures into text
Translating spoken content for multilingual audiences
Creating subtitles or captions for videos
Building voice-enabled applications for accessibility
$ npx skills add https://github.com/sarvamai/skills --skill speech-to-text.git clone https://github.com/sarvamai/skillsCopy the install command above and run it in your terminal.
Launch Claude Code, Cursor, or your preferred AI coding agent.
Use the prompt template or examples below to test the skill.
Adapt the skill to your specific use case and workflow.
Convert the following [LANGUAGE] audio into text. Use the [OUTPUT_MODE] mode to [TRANSCRIBE/TRANSLATE] the content. Ensure accuracy for technical terms in [INDUSTRY]. Audio file: [UPLOAD_AUDIO_FILE_LINK].
### Transcription & Translation Results **Original Language:** Spanish (es-ES) **Transcription:** > "Hola equipo, hoy revisaremos los datos de ventas del Q2 en [COMPANY]. Necesitamos identificar las tendencias clave en el mercado europeo, especialmente en Alemania e Italia. También compararemos estos resultados con los del Q1 para evaluar nuestro progreso." **Translated to English:** > "Hi team, today we’ll review Q2 sales data for [COMPANY]. We need to identify key trends in the European market, particularly in Germany and Italy. We’ll also compare these results with Q1 to assess our progress." **Confidence Score:** 98% **Detected Industry:** Retail/E-commerce **Speaker Count:** 1 (male, 30s) *Notes:* - Audio was clear with minimal background noise. - Technical terms (e.g., "Q2", "mercado europeo") were accurately transcribed. - Translation prioritized clarity over literal word-for-word accuracy.
Take a free 3-minute scan and get personalized AI skill recommendations.
Take free scan