The Hidden Co-Pilot: How AI Turned a Months-Long Project Into a Week
Last week, I finished setting up a local AI workstation—a Framework Desktop running a 70-billion parameter language model entirely on local hardware. It took me about a week, working 4-5 hours daily.
Afterward, I found myself wondering: How long would this have taken without AI assistance?
The answer unsettled me. And I think it says something important about where we are right now.
My Starting Point
Let me be clear about my qualifications for this project:
- Last serious Linux experience: ten years ago
- Programming background: BASIC, decades ago
- Machine learning expertise: enthusiastic amateur
- AMD ROCm experience: none
- Docker experience: minimal
By any traditional measure, I had no business attempting to deploy bleeding-edge AI infrastructure on brand-new hardware running an unreleased Ubuntu version with a compute stack that didn't officially support my configuration.
And yet, I did it. In a week.
What I Actually Built
The final system runs:
- Ubuntu 25.10 on an AMD Ryzen AI Max+ 395 (a chip released just weeks earlier)
- ROCm 7.1 GPU compute platform (AMD's answer to NVIDIA CUDA)
- Three Docker containers: Ollama, Open WebUI, and AnythingLLM
- Llama 3.3 70B—a state-of-the-art language model using 46% of my 96GB GPU allocation
- A complete RAG (Retrieval Augmented Generation) pipeline for document analysis
- Remote SSH access with key-based authentication
This isn't a "Hello World" tutorial project. This is production-grade infrastructure that, five years ago, would have required a dedicated ML engineering team.
The AI Assistants I Used
Throughout the project, I worked with three AI assistants:
- Perplexity for research and documentation retrieval
- Google Gemini (via a custom "Gem" configured for Framework/LLM setup)
- Claude for analysis, troubleshooting, and synthesis
I saved all our conversations. Reading back through them, I realized these weren't just search engines with better grammar. They were functioning as something else entirely.
What the AI Actually Provided
1. Architectural Understanding
When I asked why unified memory mattered for LLMs, I didn't get a Wikipedia summary. I got this:
"The Radeon 8060S is an iGPU. By default, it might only reserve 512MB of RAM for graphics. To run large models, you need to change this behavior... Set iGPU Memory Configuration to Custom and specify 96GB. On Linux with Strix Halo, the driver can dynamically borrow more system RAM, but setting a higher base allocation in BIOS often improves stability for the NPU."
That's not documentation retrieval. That's synthesized expertise—combining hardware architecture, BIOS behavior, Linux driver mechanics, and practical experience into actionable guidance.
2. Platform-Specific Edge Cases
The Strix Halo chip was weeks old. The AI knew:
- Kernel 6.16+ was required to fix a GPU memory allocation bug
- Standard ROCm installation would break my system
- The
--no-dkms flag was essential to skip kernel module replacement
gfx1151 was the correct device identifier for verification
This information existed nowhere in official documentation. It was scattered across GitHub issues, Reddit threads, and early-adopter forum posts—if it existed at all.
3. Real-Time Diagnosis
When my system crashed with an out-of-memory error, the AI immediately identified the cause:
"Setting the context to 128,000 for a 70B parameter model pushes the memory requirement right to the edge (approx. 40GB for the model + 40GB-60GB for the conversation context + OS overhead). It likely tried to allocate a massive block of memory and the system killed the process."
It then calculated the safe limit (32K tokens) and explained the math. Without this, I would have been randomly adjusting settings for hours.
4. Configuration Synthesis
For AnythingLLM's RAG settings, the AI didn't just list options—it explained tradeoffs specific to my use case (historical and philosophical research):
"If you search for 'What were the precursors to the Categorical Imperative?', a High threshold will look for chunks that match those specific keywords. It might filter out a paragraph discussing 'universal moral duties' because the vector score wasn't 'similar' enough. Set the threshold to Low. Your 70B model is smart enough to read tangentially related snippets and ignore what isn't relevant."
That's not information retrieval. That's consulting.
The Time Math
I asked Claude to estimate how long this project would have taken without AI assistance. The breakdown was sobering:
| Task | With AI | Without AI |
| Understanding unified memory architecture | 30 min | 4-8 hours |
| BIOS iGPU configuration | 15 min | 2-4 hours |
| ROCm installation | 2 hours | 8-40 hours |
| Docker networking fix | 20 min | 2-6 hours |
| Ollama service configuration | 30 min | 2-4 hours |
| AnythingLLM optimization | 2 hours | 8-20 hours |
| OOM crash diagnosis | 15 min | 4-12 hours |
My actual time: ~30 hours
Conservative estimate without AI: 80-150 hours
Realistic estimate for my skill level: 150-300+ hours
That's the difference between a week-long project and a multi-month odyssey—assuming I didn't give up entirely.
The Project-Killer Moment
Here's what haunts me: the ROCm installation.
AMD's compute platform doesn't officially support Ubuntu 25.10 or kernel 6.17. The standard installation process would have replaced my kernel modules with older versions, likely breaking the entire system—possibly requiring a complete OS reinstall.
The AI knew to use --no-dkms to install only user-space libraries while trusting the mainline kernel's built-in AMD drivers. That single flag was the difference between success and catastrophic failure.
Without AI guidance, here's what would have happened:
- Run standard ROCm installer
- System fails to boot or GPU stops working
- Spend hours troubleshooting kernel issues
- Eventually reinstall Ubuntu
- Try again with older Ubuntu version (which lacks required kernel features)
- Discover the chip needs kernel 6.16+
- Search forums for days trying to find the magic incantation
- Maybe find
--no-dkms buried in a GitHub issue from someone with similar hardware
- Or give up
That's not 40 hours of extra work. That's potentially project abandonment.
What This Means
I essentially had access to:
- A Linux systems administrator
- An AMD/ROCm specialist
- A Docker networking expert
- An LLM deployment consultant
- A RAG systems architect
All available instantly. All with infinite patience. All willing to explain not just what to do but why.
Ten years ago, this project would have required:
- Being deeply embedded in the Linux/ML community already, OR
- Hiring multiple consultants at significant cost, OR
- Months of self-education before even attempting the build, OR
- Getting extraordinarily lucky with forum posts and Stack Overflow answers
Today, an enthusiastic amateur with decade-old skills can deploy state-of-the-art AI infrastructure in a week.
The Uncomfortable Implications
I keep thinking about the paradox here:
I used AI to build a system that runs AI locally so I don't have to depend on cloud AI.
But without cloud AI assistance, I couldn't have built the system in the first place.
This isn't a contradiction—it's a transition. The AI assistants served as scaffolding: temporary support structures that let me build something I'll eventually be able to maintain and extend myself. Now that I understand how the pieces fit together, I'm not starting from zero next time.
But it does raise questions:
- How do we value expertise when AI can synthesize it on demand?
- What happens to the forums and communities where this knowledge traditionally accumulated?
- Are we building real skills or just learning to prompt effectively?
I don't have clean answers. But I notice that I understand my system better than I would have if I'd just followed a tutorial. The AI didn't give me a fish or teach me to fish—it fished alongside me, explaining every cast.
The Human Element
Here's what the AI couldn't do:
- Decide that local AI mattered enough to invest a week of my life
- Persist through the frustrating moments (and there were several)
- Recognize when something "felt wrong" and needed more investigation
- Connect this project to my broader interests in privacy, self-reliance, and technology ownership
- Feel the satisfaction of watching that VRAM meter climb to 46% as a 70-billion parameter model loaded successfully
The AI was a tool—an extraordinarily powerful one—but the project was still mine.
Looking Forward
I'm going to keep building. The web search agent still isn't working right. I want to experiment with thinking models like DeepSeek-R1. Fine-tuning on my own data is next on the list.
And yes, I'll keep using AI assistants for the parts where their knowledge exceeds mine.
But I'm also going to keep documenting. I saved every conversation from this project—not just for my own reference, but because these transcripts are themselves training data. They show how humans and AI collaborate on complex technical problems. They capture the back-and-forth of troubleshooting, the "aha" moments, the dead ends.
Somewhere, someday, an AI might learn from my confusion. And help someone else avoid it.
That's not a bad legacy for a week's work.
This post is part of an ongoing series about building and running local AI infrastructure. The previous post, "Building My Own AI Powerhouse", covers the technical details of the build itself.
Tags: AI, artificial intelligence, machine learning, productivity, local LLM, self-hosted AI, AI assistance, technology, future of work