Tag: automation

  • Audio Transcribed into WordPress Draft; Completely Private

    Audio Transcribed into WordPress Draft; Completely Private

    Privacy wasn’t a feature — it was a constraint

    The original idea didn’t start as “an AI project.” It started as a very specific itch I’ve had since 2007/08, right when smartphones began making it effortless to record audio.

    Back then, I was already deep inside the WordPress ecosystem, even custom-coding templates. And I had a simple wish: let me think out loud, then have the text show up in my blog.

    Not because I love transcription.
    Because I love unrestrained thinking.

    Typing is a speed limit.
    Speaking is closer to the velocity of thought.

    When an idea is moving fast, typing becomes friction, and friction becomes loss. So the dream was: record the thought while it’s alive… then let it become editable text later, when I’m calm and focused.

    That idea wasn’t really solvable for regular people at the time.

    Speech-to-text existed, but not at this level, not locally, and not with reliability that you’d trust for a real workflow. If you had access to a lab-grade setup in the late 2000s, you might have been able to stitch something together. Most of us didn’t have that. I definitely didn’t.

    Fast-forward to now: Apple Silicon is absurdly capable, Whisper-class transcription is accessible, and “local-first” tooling has finally caught up with what I was after 15+ years ago.

    And that’s where this project actually begins.


    Free mattered more than anything

    Illustration of a human brain connected to a software dashboard labeled “Premium Mode,” symbolizing frustration with subscription-based automation tools and loss of creative control. WordPress.

    I don’t want a dashboard telling me my brain is now in “premium mode.”

    I wanted this to be free to run forever.

    Every cloud service I tried eventually turned into the same contract:

    • free minutes (daily/weekly/monthly)
    • hit the ceiling
    • pay to keep going

    I’m not morally opposed to paying for good tools.

    But I knew I’d burn through limits fast because I don’t want to ration thinking. If I’m on a roll, I’m on a roll.

    I don’t want a dashboard telling me my brain is now in “premium mode.”

    So the goal became clear:

    Record audio → drop into a folder → get a transcript.
    Optionally: a WordPress draft waiting for me.

    No subscriptions.
    No login loops.
    No cloud inference by default.

    And this line ended up becoming the north star:

    “My intellectual property never leaves my machine unless I explicitly choose it.”

    That’s not paranoia. That’s design.


    Privacy stayed inside my orbit

    I’m in the Apple ecosystem, which made the privacy model unusually clean.

    The audio starts on my iPhone.

    The processing happens on my MacBook Pro.

    The transfer happens via AirDrop, which keeps the file movement inside my immediate environment.

    The audio doesn’t need to touch a third-party server just to become text.

    That matters for obvious reasons (privacy), but also for less obvious ones (creative freedom). When you’re speaking raw ideas, you’re not just recording words.

    You’re recording unreleased drafts of your thinking.
    That’s intellectual property, even if it’s messy.

    So the system architecture became a kind of promise:

    • Local transcription
    • Local automation
    • Local storage
    • And publishing only happens when I explicitly authorize it

    The real breakthrough: a spoken publishing contract

    The most important part of this system isn’t Whisper. It’s the rule that prevents automation from turning into a runaway machine.

    This is the difference between:

    • Automation that empowers
    • Automation that erodes judgment

    So I designed a “spoken contract” that the system must hear before it does anything beyond transcription.

    A transcript only becomes a WordPress draft if I say both:

    • “Meta note” (or “System note”)
    • “Create blog post” (or “Create a blog post”)

    That’s it. If I don’t say the words, the system stays quiet.

    That means I can record:

    • personal notes
    • sketch ideas
    • work drafts
    • private reflections

    …and the system will transcribe them, but it won’t publish them. No accidental posts. No surprises. No “AI guessed what you meant.”

    This is production-grade behavior, not a demo.


    The final stack (and why it’s the right one)

    We started in the Python ecosystem because that’s where most “AI workflow” advice leads. But on macOS, the most durable lesson I learned was this:

    If you want long-running, stable, GPU-accelerated transcription on Apple Silicon, prefer native Metal tooling over Python ML stacks.

    Python is great for:

    • glue
    • orchestration
    • parsing
    • publishing logic

    But it’s not where you want to host GPU inference if your goal is “drop audio and walk away.”

    So the final system has three responsibility layers:

    1. Shell + whisper.cpp: audio → text (Metal GPU, local, stable)
    2. Python (glue only): parse intent + publish to WordPress
    3. Launch Agents: daemonized lifecycle so it runs automatically

    No ML runtime lives in Python.
    No GPU calls happen outside native code.
    No process depends on another being “just right.”

    That’s how systems survive.


    What’s next: formatting, tags, and polish

    Now that the pipeline is stable, the remaining work is refinement:

    • timestamps in the transcript (useful for editing)
    • paragraph breaks based on pauses (conservative guesstimate is: 1.5s+)
    • a word-count footer in the transcript and the WordPress draft; this helps me when I start editing
    • simple auto-tags based on frequency (top ~5–7, biased toward broad concepts but specific when warranted; content and context based)

    None of those features change the heart of the project.

    The heart is still the same thing I wanted in 2007:

    A way to think out loud at full speed… and turn it into text without handing my raw ideas to someone else’s servers.

    And now it finally exists.