Sunday, October 12, 2025


Taming the Digital Library: A Hands-On Journey to Bulk Ebook Conversion with Command-Line Tools

My digital bookshelf was a mess, cluttered with files from over the years in a dozen different formats. I had old Word documents, forgotten ebook types from the early 2000s, and modern files that still weren't compatible with everything I wanted to use. My goal was to finally clean it all up, creating a standardized library that would work perfectly on my e-reader and be ready for new AI tools like Google's NotebookLM. Instead of clicking "Convert" on hundreds of files one by one, I set out to learn how to automate the entire process.

The project became a three-step assembly line. First, I used a powerful, free ebook management program to tackle the most obscure and outdated ebook files, converting them all into a universal e-reader format. Next, I turned to a free office suite that has a clever feature allowing it to convert documents in the background without ever opening a window. I used this to automatically turn all my old Word documents into the same modern e-reader format.

With all my books finally standardized, the last step was to run them through a digital "Swiss-army knife" program that could turn them into simple, clean text files for my AI notebook. The real story, however, was in the troubleshooting. It was a journey of trial and error, from learning that you have to restart the command window for it to recognize new programs, to the final "aha!" moment of realizing a script wasn't running simply because I forgot to press Enter. This project was a powerful lesson in how a little automation and persistence can solve a massive organizational headache.


For anyone with a growing digital library, the dream of having all your books in a universally accessible, easily processable format is often just that – a dream. Ebooks come in a bewildering array of formats: .epub, .mobi, .pdf, .doc, .lit, .txt, .rtf, .html, and more. Manually converting them is a daunting task.

What follows are the details of how I used the tools to tackle this exact challenge: learning to automate the bulk conversion of various ebook formats. The goal was twofold: to create a standardized library for my e-reader, and to prepare content for AI-powered note-taking tools like Google's NotebookLM, all while leveraging the powerful Gemini feature in Chrome for guidance.

Let's dive into the tools and techniques that made this possible!


The Challenge: A Disparate Ebook Collection

My ebook collection was a mixed bag. I had:

  • Modern Ebook Formats: .epub (already good for e-readers, but also good as an intermediary)
  • Older Ebook Formats: .lit (Microsoft Reader format), .mobi (Amazon Kindle's older format)
  • Document Formats: .doc (older Microsoft Word files), .rtf, .txt
  • Web & Print Formats: .htm/.html, .pdf

My ultimate objective was to:

  1. Standardize for E-readers: Convert everything compatible into a widely accepted e-reader format (like .epub).
  2. Optimize for AI: Convert suitable content into a lightweight, AI-friendly format (like Markdown) for tools such as NotebookLM.
  3. Automate: Avoid manual conversion one by one.
  4. Learn: Master command-line tools in the process.

The Toolkit: Pandoc, Calibre, and LibreOffice (and Gemini!)

To achieve this, we enlisted three heavy-hitters of the digital conversion world:

  1. Pandoc: The "Swiss-army knife" for document conversion. It excels at converting between almost any markup format.
  2. Calibre: The ultimate ebook management software, with powerful command-line tools for proprietary ebook formats.
  3. LibreOffice: A free and open-source office suite, crucial for handling older Microsoft Word documents via its command-line interface.

And, of course, the Gemini feature in Chrome served as an indispensable guide throughout the entire process, providing specific commands and troubleshooting assistance.


Phase 1: Setting Up the Conversion Environment on Windows

My system is Windows 10, so all commands and paths are tailored for PowerShell.

A. Organizing Directories:

First, I created clear input and output directories:

  • F:\Backup\Backup\Ebooks (My primary messy source folder, containing subfolders)
  • C:\Users\gygee\Documents\booksLit (Temporary staging for .lit files)
  • C:\Users\gygee\Documents\booksDoc (Temporary staging for .doc files)
  • C:\Users\gygee\Documents\booksEpub (Final .epub output for e-readers)
  • C:\Users\gygee\Documents\booksMarkdown (Final Markdown output for NotebookLM)

B. Ensuring Tools are Accessible (PATH Variable):

To run ebook-convert (Calibre) and soffice.exe (LibreOffice) from any PowerShell window without typing their full paths every time, they need to be added to the system's PATH environment variable.

To check/add to PATH:

  1. Find the executable's directory (e.g., C:\Program Files\Calibre2 for Calibre, C:\Program Files\LibreOffice\program for LibreOffice).
  2. Search Windows Start for "Edit the system environment variables."
  3. Click "Environment Variables...", then select "Path" under "User variables" and click "Edit...".
  4. Click "New", paste the directory path, and click "OK" on all windows.
  5. Crucially: Close and reopen PowerShell for changes to take effect!

Phase 2: Converting Proprietary and Older Formats

A. Converting .lit Files to .epub with Calibre:

Calibre's ebook-convert tool is the hero here. It effortlessly converts .lit files (which Pandoc doesn't support) into standard .epub.

  • Input: C:\Users\gygee\Documents\booksLit (containing .lit files)
  • Output: C:\Users\gygee\Documents\booksLit (.epub files alongside original .lit)
# Navigate to the directory first
cd C:\Users\gygee\Documents\booksLit

# Then run the conversion (using the full path if PATH isn't set for Calibre)
Get-ChildItem -Filter *.lit | ForEach-Object { ebook-convert $_.FullName "$($_.BaseName).epub" }

# If ebook-convert isn't in PATH, use:
# Get-ChildItem -Filter *.lit | ForEach-Object { & "C:\Program Files\Calibre2\ebook-convert.exe" $_.FullName "$($_.BaseName).epub" }

B. Converting .doc Files to .epub with LibreOffice:

Pandoc doesn't directly handle the older .doc format, but LibreOffice (specifically soffice.exe) does, and it can even convert directly to .epub.

  • Input: C:\Users\gygee\Documents\booksDoc (containing .doc files)
  • Output: C:\Users\gygee\Documents\booksEpub (converted .epub files)
# --- Configuration ---
$inputDir = "C:\Users\gygee\Documents\booksDoc"
$outputDir = "C:\Users\gygee\Documents\booksEpub" # Ensure this folder exists!
$libreOfficePath = "C:\Program Files\LibreOffice\program\soffice.exe" # Verify your path
# ---------------------

# Verify LibreOffice executable exists (error handling)
if (-not (Test-Path $libreOfficePath)) {
    Write-Host "Error: LibreOffice executable not found at '$libreOfficePath'. Please check the path and try again."
} else {
    # Find all .doc files and convert them
    Get-ChildItem -Path $inputDir -Filter *.doc | ForEach-Object {
        Write-Host "Converting '$($_.Name)'..."
        & $libreOfficePath --headless --convert-to epub $_.FullName --outdir $outputDir
    }
    Write-Host "Conversion complete! EPUB files are in '$outputDir'."
}

(Self-correction: A key learning point here was realizing the importance of pressing "Enter" after pasting a multi-line script in PowerShell to actually execute it! A simple oversight, but crucial.)


Phase 3: Mass Conversion to Markdown with Pandoc (for NotebookLM)

With .lit and .doc files handled by specialized tools, and .pdf files being generally unsuitable for text extraction to Markdown, the next step was to convert all remaining compatible formats (.epub, .html, .rtf, .txt, and any .docx files) into Markdown (.md) for NotebookLM. Markdown is plain text, making it extremely resource-efficient for AI processing compared to PDFs or even EPUBs.

  • Input: F:\Backup\Backup\Ebooks (recursively, targeting .epub specifically for this example)
  • Output: C:\Users\gygee\Documents\booksMarkdown (all converted .md files)
# --- Configuration ---
$sourceDir = "F:\Backup\Backup\Ebooks" # Your main ebook folder
$outputDir = "C:\Users\gygee\Documents\booksMarkdown" # Destination for Markdown files
$inputExt = "epub" # Starting with EPUBs from the source
$outputExt = "md"
# ---------------------

# Create output directory if it doesn't exist
if (-not (Test-Path $outputDir)) {
    Write-Host "Creating output directory: $outputDir"
    New-Item -ItemType Directory -Force -Path $outputDir
}

# Find all specified files recursively and convert them
Get-ChildItem -Path $sourceDir -Filter "*.$inputExt" -Recurse | ForEach-Object {
    $baseName = $_.BaseName
    $outputFile = Join-Path -Path $outputDir -ChildPath "$baseName.$outputExt"
    Write-Host "Converting '$($_.FullName)' to '$outputFile'"
    pandoc $_.FullName -o $outputFile
}

Write-Host "----------------------------------------"
Write-Host "Conversion complete! All converted Markdown files are in: $outputDir"

(Note on .txt files: When converting .txt files, Pandoc might warn about UTF-8 encoding. The solution is to ensure your .txt files are saved with UTF-8 encoding in your text editor. This is the modern standard and prevents compatibility issues.)


Phase 4: Cleaning Up (Optional, but Handy)

Once files are converted and verified, you might want to delete the originals. For .txt files that were successfully converted to Markdown, a simple PowerShell one-liner can do the trick:

# CAUTION: This deletes files permanently. Back up first!
# Navigate to the directory containing the .txt files
cd C:\Users\gygee\Documents\booksMarkdown # Or wherever your .txt files ended up

Get-ChildItem *.txt | ForEach-Object { pandoc $_.FullName -o "$($_.BaseName).md"; if ($?) { Remove-Item $_.FullName } }

Conclusion: A Streamlined Digital Future

This journey, guided step-by-step by Gemini, transformed a daunting task into a series of manageable, automatable steps. Now, I have:

  • A well-organized collection of .epub files, perfect for my e-reader.
  • A clean set of .md files, optimized for rapid processing by AI tools like NotebookLM, enhancing my learning and research capabilities.

The biggest takeaway is the power of combining specialized command-line tools with scripting. While Pandoc is incredibly versatile, knowing when to bring in Calibre for ebook formats or LibreOffice for document formats completes the toolkit. And for navigating this complex landscape, an interactive AI assistant proved to be an invaluable co-pilot!

Hopefully, this detailed guide helps you embark on your own digital library automation journey. Happy converting!

No comments:

Post a Comment

Washington on the Fox Hunt

  A Gentleman's Pursuit: Five Days in the Hunt As told by George Washington, Master of Mount Vernon The frost lay heavy upon the Neck...