Tool · 2025 · 04 / 10
RepoDigest.
A Node CLI that turns any repo, local or remote, into an LLM-ready digest with optional Gemini analysis on top.
- Context
RepoDigest is a TypeScript command-line tool that walks a Git repository or local directory and emits a single structured digest optimised for large language models. It handles remote cloning through simple-git, binary detection, language identification, glob-based include/exclude filters, depth caps, size caps, and token estimation via tiktoken, then writes the result as plain text, JSON, or Markdown. An optional Google Gemini layer adds repository summaries, file-level quality analysis, and AI security scans behind dedicated flags. The project ships as a published-shape npm package with a global `repodigest` binary, a modular help system, an interactive setup mode, and persistent AI configuration under `~/.repodigest/`.
- Approach
The design keeps the digest pipeline and the AI layer decoupled on purpose. Core processing (`query-parser`, `repository-cloner`, `file-processor`, `output-generator`) produces a fully-formed digest with no dependency on any model; the AI service only activates when one of the `--ai-*` flags is set and bails gracefully if the key is missing or the network fails. Filters compose instead of nest: language and glob and size and depth all combine through fast-glob and ignore, so real-world repositories with noisy `node_modules` and large binaries behave the way you'd expect on the first run. The CLI itself is written against Commander with a custom help formatter that splits long documentation into sub-commands (`examples`, `ai-help`, `options`, `features`) to keep `--help` legible, and inquirer drives interactive mode for users who'd rather answer questions than read flag tables.
- Outcome
The tool is published open-source under MIT and installs globally via npm as `repodigest`. It reads almost any repository end-to-end, including private ones via `GITHUB_TOKEN`, and emits digests sized for real LLM context windows with an accurate token count up front. The AI path supports Gemini 2.5 Flash, 1.5 Pro, and 1.5 Flash, with environment-variable overrides for model, temperature, max tokens, and timeout. What started as a scratch-your-own-itch utility for feeding codebases to models became a useful piece of my own workflow and a small but complete example of how to build a Node CLI that doesn't feel like a weekend hack.
- Process
- 01
One command, a repo-shaped prompt
The core job is boring on purpose. Point repodigest at a path or a GitHub URL and it walks the tree, respects .gitignore plus a .repodigestignore, detects binaries, tags languages, and emits one digest file that a model can actually reason about. Text, JSON, or Markdown out of the same pipeline.
- 02
Filters that survive real repos
fast-glob drives include/exclude patterns, language filters compose with size caps and depth caps, and tiktoken does the token count up front so you know what you're shipping to an LLM before you ship it. No surprises at the context window.
- 03
AI is an option, not the product
A Gemini layer sits behind --ai-analysis, --ai-summary, and --security-scan for when you want prose back instead of raw content. It runs on top of the digest, caps files to avoid rate limits, and degrades cleanly if the key is missing. The tool still works with zero AI configured.
- 04
CLI that respects the terminal
Welcome banner in figlet, progress in ora, a modular help system split into examples/ai-help/options/features so the main --help stays short. Interactive mode covers first-time setup without forcing you to memorise flags.
- Credits
- Author
- Salah Boussettah
- Next
→ ReadWise+
Mobile · 2025