Back to Projectsexperimental

Voice Note Transcriber

An experimental tool that converts voice memos into structured notes with AI-powered summarization and action item extraction.

October 20, 2023
2 min read
Aaron M Sabu
WhisperPythonFastAPIReactOpenAI

Overview

Voice Note Transcriber is an experiment in turning spoken thoughts into organized, actionable notes.

The Experiment

I often record voice memos with ideas, meeting notes, or random thoughts. The problem? They pile up and never get processed. This tool aims to:

  1. Transcribe voice recordings accurately
  2. Structure the content into organized notes
  3. Extract action items and key points
  4. Summarize for quick review

How It Works

1. Transcription

Using OpenAIs Whisper model for accurate speech-to-text:

import whisper

model = whisper.load_model("base")
result = model.transcribe("voice_memo.mp3")
transcript = result["text"]

2. Processing

The transcript is then processed by GPT-4 to:

  • Correct transcription errors based on context
  • Add punctuation and formatting
  • Identify speakers (if multiple)

3. Structuring

The AI organizes content into:

## Summary
Brief overview of the main points

## Key Points
- Point 1
- Point 2
- Point 3

## Action Items
- [ ] Task extracted from the recording
- [ ] Another task

## Raw Transcript
Full transcription for reference

Technical Challenges

Audio Quality

Voice memos are often recorded in noisy environments. Solutions:

  • Noise reduction preprocessing
  • Multiple transcription passes
  • Confidence scoring for uncertain words

Context Understanding

Spoken language is different from written:

  • Filler words ("um", "uh")
  • Incomplete sentences
  • Topic jumping

The AI needs to clean this up while preserving meaning.

Current Status

This is an ongoing experiment. Current capabilities:

  • Transcription accuracy: ~95% for clear audio
  • Structure quality: Good for meeting notes, improving for brainstorms
  • Processing time: ~30 seconds for a 5-minute recording

Future Ideas

  • Real-time transcription
  • Mobile app with one-tap recording
  • Integration with note-taking apps (Notion, Obsidian)
  • Speaker identification for meetings

What Im Learning

  • Speech-to-text has come incredibly far
  • The gap between transcription and understanding is where AI shines
  • Voice interfaces are underutilized for productivity