Tag: open-source

I’ve nearly completed a transcription publishing pipeline I’ve wanted since 2005
Audio in. Text out. Publishing only when I say so — all on my own machine. Image made with AI.

I’ve always wanted a transcription machine because for years, typing has been a bottleneck.

Not thinking.
Not clarity.
Not ideas.

Typing.

Back when the first iPhones came out, I had a simple wish:

let me talk, and let my words appear in my blog.

At the time, that was fantasy. Speech recognition existed, but only in research labs, big companies, or cloud services that didn’t really work well and definitely weren’t private. I moved on, kept typing, and learned to live with the speed limit.

Fast-forward to now.

Modern hardware.

Local machine learning.

Open models.

Enough computing power sitting on my desk to do what used to require a lab.

So I finally did it.

I built a fully local voice cloning and publishing pipeline on my own laptop. No cloud inference. No subscriptions. No dashboards. No usage caps. No data leaving my machine unless I explicitly choose it.

My intellectual property never leaves my machine unless I explicitly choose it.

That constraint mattered more than the tech itself.

What I wanted (and what I refused)

I didn’t want:
- another AI subscription
- another web interface
- another service asking me to “upgrade” my own brain
- another place my raw thoughts were stored on someone else’s servers
I wanted:
- text → audio
- audio → text
- both directions
- locally
- for free
- automated, but only when I asked for it
The tool I built

At a high level, the system now does two things:
1. Transcription
  - I record audio
  - Drop it in a folder
  - Whisper runs locally on Apple Silicon using Metal
  - Clean, readable text appears
  - Optional publishing happens only if I explicitly speak intent
2. Voice synthesis
  - I provide my own voice reference
  - Text files dropped into a folder become .m4a files
  - The voice is mine
  - The processing is local
  - The output is mine to keep or discard
No GPU calls inside Python ML stacks.

No fragile cloud dependencies.

No long-running services pretending to be “magic.”

Just files, folders, and clear contracts.

Why this is finally possible

In 2008, this idea simply wasn’t realistic.

Speech models weren’t good enough. Hardware wasn’t accessible. Tooling didn’t exist outside academic circles.

Today, it is.

Not because of one model or one framework, but because the ecosystem finally matured:
- open speech models
- commodity GPUs
- local inference
- better system-level tooling
This is the kind of problem that’s only solvable now.

What this unlocks for me

I can think out loud without restraint.

I can write at the speed of thought.

I can turn raw thinking into drafts without ceremony.

And I can do it knowing:
- my data stays local
- my voice is mine
- my process is under my control
This isn’t a product (yet).

It’s a personal tool.

But it’s also a case study in how I approach problems:

constraints first, workflow second, technology last.

If you’re curious how it works in detail, I’ve written more about the architecture and tradeoffs here:

👉 My Local Transcription Pipeline

More soon.
January 18, 2026
AI helped me develop a free solution to a real problem: using AI in 2025
Hours Working With AI, Dozens of Dead Ends, and discovering the Right Way to Do Whisper on macOS.

Sometimes progress doesn’t feel like progress. It feels like friction, wrong turns, and the quiet realization that the thing you’re trying to force is never going to cooperate.

This was one of those hours.
Hours Working With AI, Dozens of Dead Ends, and discovering the Right Way to Do Whisper on macOS.
Sometimes progress doesn’t feel like progress. It feels like friction, wrong turns, and the quiet realization that the thing you’re trying to force is never going to cooperate.
Not the “it works on my machine” solution. The durable one.
The Trap: Python Everywhere, All the Time
None of this is obvious at the start.
The Bigger Takeaway (This Is the Real Value)
The Pivot: Native Whisper, Native Metal
whisper.cpp.
What I Ended Up With (And Why It Matters)
Audio files dropped into a folder get picked up, moved, transcribed, logged, and written out without intervention. Python still exists in the system, but only as glue and orchestration. The heavy lifting happens where it belongs: in native code.
The Project (If You Want the Source Code)
The Real Lesson
Be well.
In roughly six hours of real human time, I managed to:
- Diagnose Python version incompatibilities
- Run headlong into PEP 668 and Homebrew’s “externally managed” rules
- Set up and tear down multiple virtual environments
- Confirm GPU availability on Apple Silicon
- Discover numerical instability with MPS-backed PyTorch inference
- Identify backend limitations in popular Python ML stacks
- Switch architectures entirely
- And finally land on the correct long-term solution
Not the “it works on my machine” solution. The durable one.

This post is about how I got there, and more importantly, what changed in my thinking along the way.

The Trap: Python Everywhere, All the Time

My first instinct was predictable. Whisper transcription? Python.

Faster-Whisper. Torch. MPS. Virtual environments. Requirements files.

And to be fair, that path mostly works. Until it doesn’t.

On macOS with Apple Silicon, Python ML stacks sit at an awkward intersection:
- PyTorch supports MPS, but not all models behave well
- Some backends silently fall back to CPU
- Others appear to run on GPU while producing NaNs
- Version pinning becomes a minefield
- One Homebrew update can break everything
None of this is obvious at the start.

You only find out after you’ve already invested time and energy trying to stabilize a system that fundamentally does not want to be stable.

That’s when the signal finally cut through the noise.

The Bigger Takeaway (This Is the Real Value)

I learned a durable rule that I’ll carry forward:

On macOS + Apple Silicon, prefer native Metal tools over Python ML stacks for production workflows.

This isn’t an anti-Python stance. It’s about choosing the right tool for the job.

Python remains excellent for:
- Glue code
- Orchestration
- Text processing
- Automation
- Pipelines that coordinate other tools
But it is not ideal for:
- Long-running GPU inference
- Fire-and-forget background jobs
- Stability-critical systems
- Workflows that should survive OS upgrades untouched
Trying to force Python into that role on macOS is like building a house on sand and then blaming the hammer.

The Pivot: Native Whisper, Native Metal

Once I stopped asking “How do I make Python behave?” and instead asked “What does macOS want me to do?”, the solution became obvious.

whisper.cpp.

A native implementation of Whisper, compiled directly for Apple Silicon, using Metal properly. No Python ML runtime. No torch. No MPS heuristics. No dependency roulette.

Just:
- A native binary
- A Metal backend
- Predictable performance
- Deterministic behavior
I rebuilt the system around that assumption instead of fighting it.

What I Ended Up With (And Why It Matters)

The final system is intentionally boring. That’s the highest compliment I can give it.

I now have:
- A watch-folder transcription system
- Using a native Metal GPU backend
- With zero Python ML dependencies
- Fully automated
- Crash-resistant
- macOS-appropriate
- Future-proof
Audio files dropped into a folder get picked up, moved, transcribed, logged, and written out without intervention. Python still exists in the system, but only as glue and orchestration. The heavy lifting happens where it belongs: in native code.

This is the setup people usually arrive at after months of trial and error. I got there in an afternoon because I stopped trying to be clever and started listening to the platform.

The Project (If You Want the Source Code)

The full pipeline is open-sourced here:

https://github.com/berchman/macos-whisper-metal

It includes:
- A Metal-accelerated Whisper backend
- A folder-watching automation script
- Clear documentation
- A frozen, reproducible system state
- No hidden magic
If you’re on Apple Silicon and want transcription that just works, this is a sane place to start.

The Real Lesson

This wasn’t about Whisper.

It was about recognizing when a stack is fighting you instead of supporting you.

About knowing when to stop patching and switch up entirely.

About respecting the grain of the operating system instead of sanding against it.

The tools we choose shape not just our code, but our cognitive load. The right architecture doesn’t just run faster. It lets you stop thinking about it. And sometimes, that’s the whole point.

Be well.
December 15, 2025

Tag: open-source

I’ve nearly completed a transcription publishing pipeline I’ve wanted since 2005

I’ve always wanted a transcription machine because for years, typing has been a bottleneck.

Fast-forward to now.

What I wanted (and what I refused)

The tool I built

Why this is finally possible

What this unlocks for me

AI helped me develop a free solution to a real problem: using AI in 2025

Hours Working With AI, Dozens of Dead Ends, and discovering the Right Way to Do Whisper on macOS.

Sometimes progress doesn’t feel like progress. It feels like friction, wrong turns, and the quiet realization that the thing you’re trying to force is never going to cooperate.

Not the “it works on my machine” solution. The durable one.

The Trap: Python Everywhere, All the Time

None of this is obvious at the start.

The Bigger Takeaway (This Is the Real Value)

The Pivot: Native Whisper, Native Metal

whisper.cpp.

What I Ended Up With (And Why It Matters)

Audio files dropped into a folder get picked up, moved, transcribed, logged, and written out without intervention. Python still exists in the system, but only as glue and orchestration. The heavy lifting happens where it belongs: in native code.

The Project (If You Want the Source Code)

The Real Lesson

The tools we choose shape not just our code, but our cognitive load. The right architecture doesn’t just run faster. It lets you stop thinking about it. And sometimes, that’s the whole point.

Be well.

`whisper.cpp`.