paper-reader/docs/plans/2026-02-19-paper-cli-design.md
Ellie f82b738db7 Add paper CLI: download academic papers by DOI and convert to markdown
Downloads PDFs from LibGen (primary) or Anna's Archive API (fallback),
converts to markdown via marker_single, and prints to stdout. Includes
XDG-compliant caching, nix flake with marker-pdf packaging, and a
Claude Code skill for paper-reader integration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 22:54:30 -08:00

1.3 KiB

paper CLI — Design

A CLI tool that downloads academic papers by DOI from Anna's Archive and converts them to markdown.

CLI Interface

paper <DOI>

Single positional argument. Markdown output goes to stdout.

paper 10.1038/nature12373 > paper.md

Download Flow

  1. Request https://annas-archive.org/scidb/<DOI> with a browser-like User-Agent
  2. Parse HTML for <iframe> or <embed> with id="pdf" — extract src for direct PDF URL
  3. Fallback: find any link ending in .pdf
  4. Download PDF to a temp file
  5. Exit with clear error if no PDF found

Conversion

  1. Shell out to marker_single <tempfile.pdf> --output_dir <tempdir>
  2. Read the generated .md file from the output dir
  3. Print to stdout
  4. Clean up temp dir

Error Handling

  • marker_single not on PATH: tell user to install (pip install marker-pdf)
  • Conversion failure: forward marker's stderr
  • Network errors: surface reqwest errors clearly
  • No PDF found: specific error message with the DOI

Dependencies

  • clap — argument parsing
  • reqwest (blocking, rustls-tls) — HTTP
  • scraper — HTML parsing
  • tempfile — temp directory
  • anyhow — error handling

Dev Environment

The nix flake includes Rust nightly toolchain and marker-pdf in the devshell.