Wednesday, June 24, 2026

Text Processing

 

Behind the Scenes: How I Used a Local "Agentic AI" to Map and Synthesize 24 Years of a Blog


If you’ve been following the tech world lately, you’ve probably heard the buzz surrounding Agentic AI. Unlike standard chatbots that just sit there waiting for your next prompt, "agents" are designed to act autonomously—writing code, debugging their own errors, and managing massive, complex tasks with minimal human intervention.

I decided to put this hype to the ultimate test.

My friend Steve Mays has been blogging at smays.com for nearly a quarter of a century, accumulating over 6,500 posts. It’s an incredibly rich, deeply human archive. I scraped the entire site and converted it into a set of markdown files—one for each year from 2002 to 2025.

Then, I turned my local agent, Surfie (running the Hermes model), loose on the directory. Here is the play-by-play narrative of what happened over a multi-session effort on June 23, 2026, as my local machine became an autonomous research assistant.

Session 1: The Raw Setup (06:08 AM)

The project started early in the morning. My first step was moving 24 years of blog data into a dedicated sandbox environment. I instructed Hermes to pull the yearly archive files from my Windows directory into a local Linux environment so it could interact with them programmatically. Once the files were in place, the agent was ready to run.

Session 2: Automating the Index (07:17 AM)

Instead of trying to "read" all 6,500 posts at once (which would easily choke even the largest AI context windows), Hermes acted like a programmer.

We started with a pilot test using the 2002 archive. The agent wrote a Python script on the fly to parse the post headers and dividers, categorizing them into broad themes (like Politics, Technology, Movies & TV, etc.).

With the pilot successful, I gave it the green light to scale up: process all 24 years and write a comprehensive master index.

The agent built a highly sophisticated Python engine to:

  • Programmatically loop through all 24 files.

  • Parse thousands of posts.

  • Build a keyword-based categorization matrix.

  • Calculate percentage distributions for each year.

  • Select representative post titles as examples.

The result was a summary—a massive, 3,200-line index (~68KB) that mapped the thematic evolution of Steve's writing across six distinct eras.

Session 3: The 2021 Bug and the Self-Correction (08:18 AM)

This was the most fascinating part of the run. As I reviewed the newly generated index, I noticed a gaping blind spot: 2021 was a total void. The index claimed Steve wrote zero posts that year, which I knew was wrong.

When I pointed this out, Hermes didn't just apologize; it went to work investigating. It discovered that the python script I had used to convert the raw blog into markdown had formatted the 2021 file differently. Instead of standard headers, 2021 used file path markers like : .\2021\01\airpods-3\.

Because of the backslashes, Python's standard string escaping rules were breaking. The agent hit a string parser syntax error, but instead of giving up, it rewrote its own code. It bypassed the escaping problem entirely by calling the backslash character programmatically using chr(92).

Boom. It successfully parsed all 125 posts from 2021, revealing a heavy focus on COVID-19 (24%), the Jan 6 Capital riot (19%), and emerging technology (14%). It seamlessly patched the master index.

To celebrate, I did a quick spot-check. I asked Hermes to retrieve a highly specific post from 2013 called "Travel Pain Quotient." The agent queried its local index, targeted the 2013 file, and pulled the exact post where Steve laid out his mathematical formula:

$$\text{Travel Pain Quotient} = \frac{\text{Miles}}{\text{Mode}} \times \text{Payoff}$$

It was flawless.

Session 4: Synthesizing Nonduality (The Grand Finale)

With a complete, verified index, I decided to push the technology to its absolute limit. Steve had previously experimented with a cloud-based AI to write an essay on Nonduality—a philosophy of oneness and awareness he has returned to frequently over 25 years. But the cloud-based output was verbose and academic.

I asked Hermes to write a synthesis essay strictly and exclusively using Steve’s voice, thoughts, and specific highlighted book reviews.

The agent executed a brilliant three-phase strategy:

  1. Thematic Mapping: It scanned all 25 years of text for nonduality-adjacent keywords, pulling an initial 1,341 hits and narrowing them down to 253 deep, highly relevant posts.

  2. Voice Analysis: It programmatically sampled about 40 of Steve's highly personal posts. It analyzed his style, noting his dry wit, conversational second-person address, visual metaphors, occasional honest profanity, and his signature "highlighter test" for good writing.

  3. Drafting the Synthesis: It wrote a spectacular essay titled "Nonduality: Twenty-Five Years of Looking for What Isn't There."

The essay seamlessly bridged Steve's most load-bearing metaphors: his "steamer trunk of ego" from 2013, the "Ship of Theseus" paradox from 2016, his unpretentious meditation streaks, Robert Wright's Why Buddhism Is True, and Schrödinger’s quantum theories of consciousness. It reads not like a textbook, but like a deeply observant New Yorker profile of a lifelong thinker.

Why This Matters

This project perfectly illustrates why tech enthusiasts are so excited about the local, agentic AI revolution.

Instead of trusting my data to a corporate cloud database that "chunks" text invisibly, I watched a local agent programmatically audit, clean, debug, and synthesize a massive dataset right on my machine. It acted as an engineer, an editor, and a researcher all at once.

The resulting essay is sitting in my local folder as a separate markdown. It is a stunning, "goosebumps-accurate" synthesis that captures a real life, tracked one day at a time, across a quarter of a century. That's what Steve called it anyway.

No comments:

Post a Comment

Text Processing

  Behind the Scenes: How I Used a Local "Agentic AI" to Map and Synthesize 24 Years of a Blog If you’ve been following the tech wo...