Category: Technology

  • I’ve nearly completed a transcription publishing pipeline I’ve wanted since 2005

    I’ve nearly completed a transcription publishing pipeline I’ve wanted since 2005

    Illustration showing audio sound waves transforming into binary data and then into text and media files, representing a local, privacy-first audio-to-text publishing workflow.
    Audio in. Text out. Publishing only when I say so — all on my own machine. Image made with AI.

    I’ve always wanted a transcription machine because for years, typing has been a bottleneck.

    Not thinking.
    Not clarity.
    Not ideas.

    Typing.

    Back when the first iPhones came out, I had a simple wish:

    let me talk, and let my words appear in my blog.

    At the time, that was fantasy. Speech recognition existed, but only in research labs, big companies, or cloud services that didn’t really work well and definitely weren’t private. I moved on, kept typing, and learned to live with the speed limit.


    Fast-forward to now.

    Modern hardware.

    Local machine learning.

    Open models.

    Enough computing power sitting on my desk to do what used to require a lab.

    So I finally did it.

    I built a fully local voice cloning and publishing pipeline on my own laptop. No cloud inference. No subscriptions. No dashboards. No usage caps. No data leaving my machine unless I explicitly choose it.

    My intellectual property never leaves my machine unless I explicitly choose it.

    That constraint mattered more than the tech itself.


    What I wanted (and what I refused)

    I didn’t want:

    • another AI subscription
    • another web interface
    • another service asking me to “upgrade” my own brain
    • another place my raw thoughts were stored on someone else’s servers

    I wanted:

    • text → audio
    • audio → text
    • both directions
    • locally
    • for free
    • automated, but only when I asked for it


    The tool I built

    At a high level, the system now does two things:

    1. Transcription
      • I record audio
      • Drop it in a folder
      • Whisper runs locally on Apple Silicon using Metal
      • Clean, readable text appears
      • Optional publishing happens only if I explicitly speak intent
    2. Voice synthesis
      • I provide my own voice reference
      • Text files dropped into a folder become .m4a files
      • The voice is mine
      • The processing is local
      • The output is mine to keep or discard

    No GPU calls inside Python ML stacks.

    No fragile cloud dependencies.

    No long-running services pretending to be “magic.”

    Just files, folders, and clear contracts.


    Why this is finally possible

    In 2008, this idea simply wasn’t realistic.

    Speech models weren’t good enough. Hardware wasn’t accessible. Tooling didn’t exist outside academic circles.

    Today, it is.

    Not because of one model or one framework, but because the ecosystem finally matured:

    • open speech models
    • commodity GPUs
    • local inference
    • better system-level tooling

    This is the kind of problem that’s only solvable now.


    What this unlocks for me

    I can think out loud without restraint.

    I can write at the speed of thought.

    I can turn raw thinking into drafts without ceremony.

    And I can do it knowing:

    • my data stays local
    • my voice is mine
    • my process is under my control

    This isn’t a product (yet).

    It’s a personal tool.

    But it’s also a case study in how I approach problems:

    constraints first, workflow second, technology last.

    If you’re curious how it works in detail, I’ve written more about the architecture and tradeoffs here:

    👉 My Local Transcription Pipeline

    More soon.

  • Audio Transcribed into WordPress Draft; Completely Private

    Audio Transcribed into WordPress Draft; Completely Private

    Privacy wasn’t a feature — it was a constraint

    The original idea didn’t start as “an AI project.” It started as a very specific itch I’ve had since 2007/08, right when smartphones began making it effortless to record audio.

    Back then, I was already deep inside the WordPress ecosystem, even custom-coding templates. And I had a simple wish: let me think out loud, then have the text show up in my blog.

    Not because I love transcription.
    Because I love unrestrained thinking.

    Typing is a speed limit.
    Speaking is closer to the velocity of thought.

    When an idea is moving fast, typing becomes friction, and friction becomes loss. So the dream was: record the thought while it’s alive… then let it become editable text later, when I’m calm and focused.

    That idea wasn’t really solvable for regular people at the time.

    Speech-to-text existed, but not at this level, not locally, and not with reliability that you’d trust for a real workflow. If you had access to a lab-grade setup in the late 2000s, you might have been able to stitch something together. Most of us didn’t have that. I definitely didn’t.

    Fast-forward to now: Apple Silicon is absurdly capable, Whisper-class transcription is accessible, and “local-first” tooling has finally caught up with what I was after 15+ years ago.

    And that’s where this project actually begins.


    Free mattered more than anything

    Illustration of a human brain connected to a software dashboard labeled “Premium Mode,” symbolizing frustration with subscription-based automation tools and loss of creative control. WordPress.

    I don’t want a dashboard telling me my brain is now in “premium mode.”

    I wanted this to be free to run forever.

    Every cloud service I tried eventually turned into the same contract:

    • free minutes (daily/weekly/monthly)
    • hit the ceiling
    • pay to keep going

    I’m not morally opposed to paying for good tools.

    But I knew I’d burn through limits fast because I don’t want to ration thinking. If I’m on a roll, I’m on a roll.

    I don’t want a dashboard telling me my brain is now in “premium mode.”

    So the goal became clear:

    Record audio → drop into a folder → get a transcript.
    Optionally: a WordPress draft waiting for me.

    No subscriptions.
    No login loops.
    No cloud inference by default.

    And this line ended up becoming the north star:

    “My intellectual property never leaves my machine unless I explicitly choose it.”

    That’s not paranoia. That’s design.


    Privacy stayed inside my orbit

    I’m in the Apple ecosystem, which made the privacy model unusually clean.

    The audio starts on my iPhone.

    The processing happens on my MacBook Pro.

    The transfer happens via AirDrop, which keeps the file movement inside my immediate environment.

    The audio doesn’t need to touch a third-party server just to become text.

    That matters for obvious reasons (privacy), but also for less obvious ones (creative freedom). When you’re speaking raw ideas, you’re not just recording words.

    You’re recording unreleased drafts of your thinking.
    That’s intellectual property, even if it’s messy.

    So the system architecture became a kind of promise:

    • Local transcription
    • Local automation
    • Local storage
    • And publishing only happens when I explicitly authorize it

    The real breakthrough: a spoken publishing contract

    The most important part of this system isn’t Whisper. It’s the rule that prevents automation from turning into a runaway machine.

    This is the difference between:

    • Automation that empowers
    • Automation that erodes judgment

    So I designed a “spoken contract” that the system must hear before it does anything beyond transcription.

    A transcript only becomes a WordPress draft if I say both:

    • “Meta note” (or “System note”)
    • “Create blog post” (or “Create a blog post”)

    That’s it. If I don’t say the words, the system stays quiet.

    That means I can record:

    • personal notes
    • sketch ideas
    • work drafts
    • private reflections

    …and the system will transcribe them, but it won’t publish them. No accidental posts. No surprises. No “AI guessed what you meant.”

    This is production-grade behavior, not a demo.


    The final stack (and why it’s the right one)

    We started in the Python ecosystem because that’s where most “AI workflow” advice leads. But on macOS, the most durable lesson I learned was this:

    If you want long-running, stable, GPU-accelerated transcription on Apple Silicon, prefer native Metal tooling over Python ML stacks.

    Python is great for:

    • glue
    • orchestration
    • parsing
    • publishing logic

    But it’s not where you want to host GPU inference if your goal is “drop audio and walk away.”

    So the final system has three responsibility layers:

    1. Shell + whisper.cpp: audio → text (Metal GPU, local, stable)
    2. Python (glue only): parse intent + publish to WordPress
    3. Launch Agents: daemonized lifecycle so it runs automatically

    No ML runtime lives in Python.
    No GPU calls happen outside native code.
    No process depends on another being “just right.”

    That’s how systems survive.


    What’s next: formatting, tags, and polish

    Now that the pipeline is stable, the remaining work is refinement:

    • timestamps in the transcript (useful for editing)
    • paragraph breaks based on pauses (conservative guesstimate is: 1.5s+)
    • a word-count footer in the transcript and the WordPress draft; this helps me when I start editing
    • simple auto-tags based on frequency (top ~5–7, biased toward broad concepts but specific when warranted; content and context based)

    None of those features change the heart of the project.

    The heart is still the same thing I wanted in 2007:

    A way to think out loud at full speed… and turn it into text without handing my raw ideas to someone else’s servers.

    And now it finally exists.

  • AI helped me develop a free solution to a real problem: using AI in 2025

    AI helped me develop a free solution to a real problem: using AI in 2025

    Stories about art, mental health and more. Art by Bert with AI.
    Stories about art, mental health and more. Art by Bert with AI.

    Hours Working With AI, Dozens of Dead Ends, and discovering the Right Way to Do Whisper on macOS.

    Sometimes progress doesn’t feel like progress. It feels like friction, wrong turns, and the quiet realization that the thing you’re trying to force is never going to cooperate.

    This was one of those hours.

    In roughly six hours of real human time, I managed to:

    • Diagnose Python version incompatibilities
    • Run headlong into PEP 668 and Homebrew’s “externally managed” rules
    • Set up and tear down multiple virtual environments
    • Confirm GPU availability on Apple Silicon
    • Discover numerical instability with MPS-backed PyTorch inference
    • Identify backend limitations in popular Python ML stacks
    • Switch architectures entirely
    • And finally land on the correct long-term solution

    Not the “it works on my machine” solution. The durable one.

    This post is about how I got there, and more importantly, what changed in my thinking along the way.


    The Trap: Python Everywhere, All the Time

    My first instinct was predictable. Whisper transcription? Python.

    Faster-Whisper. Torch. MPS. Virtual environments. Requirements files.

    And to be fair, that path mostly works. Until it doesn’t.

    On macOS with Apple Silicon, Python ML stacks sit at an awkward intersection:

    • PyTorch supports MPS, but not all models behave well
    • Some backends silently fall back to CPU
    • Others appear to run on GPU while producing NaNs
    • Version pinning becomes a minefield
    • One Homebrew update can break everything

    You only find out after you’ve already invested time and energy trying to stabilize a system that fundamentally does not want to be stable.

    That’s when the signal finally cut through the noise.


    The Bigger Takeaway (This Is the Real Value)

    I learned a durable rule that I’ll carry forward:

    This isn’t an anti-Python stance. It’s about choosing the right tool for the job.

    Python remains excellent for:

    • Glue code
    • Orchestration
    • Text processing
    • Automation
    • Pipelines that coordinate other tools

    But it is not ideal for:

    • Long-running GPU inference
    • Fire-and-forget background jobs
    • Stability-critical systems
    • Workflows that should survive OS upgrades untouched

    The Pivot: Native Whisper, Native Metal

    Once I stopped asking “How do I make Python behave?” and instead asked “What does macOS want me to do?”, the solution became obvious.

    whisper.cpp.

    A native implementation of Whisper, compiled directly for Apple Silicon, using Metal properly. No Python ML runtime. No torch. No MPS heuristics. No dependency roulette.

    Just:

    • A native binary
    • A Metal backend
    • Predictable performance
    • Deterministic behavior

    I rebuilt the system around that assumption instead of fighting it.


    What I Ended Up With (And Why It Matters)

    The final system is intentionally boring. That’s the highest compliment I can give it.

    I now have:

    • A watch-folder transcription system
    • Using a native Metal GPU backend
    • With zero Python ML dependencies
    • Fully automated
    • Crash-resistant
    • macOS-appropriate
    • Future-proof

    Audio files dropped into a folder get picked up, moved, transcribed, logged, and written out without intervention. Python still exists in the system, but only as glue and orchestration. The heavy lifting happens where it belongs: in native code.

    This is the setup people usually arrive at after months of trial and error. I got there in an afternoon because I stopped trying to be clever and started listening to the platform.


    The Project (If You Want the Source Code)

    The full pipeline is open-sourced here:

    https://github.com/berchman/macos-whisper-metal

    It includes:

    • A Metal-accelerated Whisper backend
    • A folder-watching automation script
    • Clear documentation
    • A frozen, reproducible system state
    • No hidden magic

    The Real Lesson

    This wasn’t about Whisper.

    It was about recognizing when a stack is fighting you instead of supporting you.

    About knowing when to stop patching and switch up entirely.

    About respecting the grain of the operating system instead of sanding against it.

    The tools we choose shape not just our code, but our cognitive load. The right architecture doesn’t just run faster. It lets you stop thinking about it. And sometimes, that’s the whole point.

    Be well.

  • How Focus Music Helped Me in a Distracted World

    How Focus Music Helped Me in a Distracted World

    Focus Music and My Struggle With Concentration in a World Full of Noise

    Focusing on a single task has felt like trying to read in the middle of a concert. My mind darts from thought to thought—unfinished emails, buzzing notifications, ideas half-written and instantly forgotten. The world demands more of our attention than ever, and I found myself chronically depleted and deeply frustrated.

    It wasn’t just about being unproductive. It was emotional. The guilt of not finishing things, the anxiety of falling behind—it all added up. I tried apps, caffeine, even silence. Nothing seemed to bring that mental click I was searching for.

    Discovering MindAmend: A Personal Turning Point

    One day, after stumbling through another disjointed work session, I came across a YouTube recommendation: @MindAmend, created by Jason Lewis. I clicked. The screen faded into soft visuals. No ads. No voices. Just music—not music, really, but a soundscape.

    I let it play in the background while I tackled a small task. To my surprise, I didn’t check my phone once. My thoughts slowed. I completed my work, and even kept going. It wasn’t magic—it was the first moment in a long time when I felt centered, calm, and capable.

    What Makes MindAmend Focus Music Different?

    Hyperfocus, Ambient Electronic Soft Beats + 40Hz Gamma Isochronic Tones

    I’ve tried lo-fi playlists and nature sounds. They’re relaxing, sure, but MindAmend is different. Jason Lewis creates his tracks using isochronic tones, a form of brainwave entrainment where distinct beats pulse at specific frequencies. Unlike binaural beats, these don’t require headphones—and they work more directly.

    The sounds feel engineered to support a purpose—not just to sound good. Whether I’m writing, brainstorming, or decompressing, there’s a track that meets me where I am.

    The Science Behind the Calm: Brainwave Entrainment Explained

    Isochronic tones help the brain shift into desired states by using rhythmic, evenly spaced pulses. When listened to over time, these tones guide your brainwaves into frequencies associated with deep focus, relaxation, or creative flow. It’s like tuning your mental radio to the right channel—alpha for calm, beta for alert focus, gamma for peak concentration.

    This isn’t fringe science. Studies have shown isochronic tones can effectively modulate brainwaves and improve attention and memory.

    A 2024 neurophysiology study also demonstrated a significant finding. Isochronic tones showed a ~15% increase in attention-related EEG potential at the prefrontal cortex. This is in comparison to binaural beats.

    In another experiment, beta-frequency music significantly enhanced sustained attention, especially in participants with ADHD traits.

    Lastly, a 2025 integrative review in Frontiers in Digital Health positioned brainwave entrainment as a promising tool. It is valuable for cognitive rehabilitation and emotional regulation.

    Why It Works Like Magic for Me

    Subtle tones that don’t distract, but hold your attention just enough to anchor you in the moment.

    For me, it’s like flipping a switch. Within five minutes, I stop fidgeting. Within ten, I’m in flow.

    The YouTube comments say the same:

    “I just finished a three-hour study session.”
    “Haven’t felt this grounded in weeks.”
    “This literally saved my focus.”

    I couldn’t agree more.

    My Daily Ritual and the Emotional Impact of Sound

    I integrate focus music into my daily routine as a cornerstone of my productivity toolkit. I usually choose a task I’ve been putting off. This is writing, planning, or tackling a long article. Then, I press play on a track. Within minutes, the sounds create a steady pulse that anchors my attention.

    The effect is subtle but powerful. These tones help shift my mindset from scattered to centered. Over time, this has built up into a habit of returning to calm focus, even when external distractions pile up.

    From Overwhelm to Order: The Emotional Transformation

    Isochronic tones helped me transition from mental chaos to emotional clarity. When my mind spins from anxiety or fatigue, these soundscapes provide structure. They ground me, slow down racing thoughts, and make way for inner peace.

    What Jason Lewis Has Created Goes Beyond Music

    This isn’t ambient noise or white sound—it’s a deliberate, precise design for mental performance. Jason Lewis’s productions are clean and consistent, tailored to facilitate brainwave entrainment. His tracks focus on listener experience rather than catchy melodies. For me, they feel like mental architecture, helping me build focus from the inside out.

    Comments That Echo My Story

    Scrolling MindAmend videos, I frequently see comments that mirror my own:

    “I just finished writing eight pages while listening to this.”
    “This channel is the only way I can work through brain fog.”

    It’s humbling and inspiring to know others are finding clarity through the same sounds that help me.

    Practical Tips for Getting Started With Focus Music

    1. Select a track that matches your goal—beta-frequency for focus, alpha or theta for relaxation or creative work.
    2. Use moderately comfortable volume (not too loud), ideally with headphones or two-channel playback.
    3. Time your work—try 25‑minute blocks with 5‑minute pauses to assess how you feel.
    4. Pair it with a clean environment—minimal visual distractions amplify the effect.
    5. Reflect after each session—note what tasks went well, how your mood changed, and use that feedback.

    My Message to Anyone Struggling With Focus or Anxiety

    Sound can be a first-line tool, accessible, free via YouTube, and non-invasive. You don’t need to be an expert to benefit—just be consistent and open-minded. Even if the effect feels subtle at first, persistence helps it build and compound over time, producing calmer clarity and better focus.

    Conclusion: Healing Happens in Small, Consistent Waves of Sound

    My experience with MindAmend focus music, especially those using carefully crafted isochronic tones, has transformed how I approach work, reading, and mental rest. What began as curiosity quickly became routine, and the sense of calm, clarity, and productivity I gained motivated me to keep coming back.

    If chaos, distraction, or overwhelm rings familiar—this might help you too. Even small, incremental shifts through sound can create meaningful change. Healing and focus don’t always come in giant leaps—they often happen in steady, rhythmic waves.