docs: update skill docs for unpaywall and pdftotext support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 13:35:18 -08:00 · 2026-02-25 13:35:18 -08:00 · 91920d4103
commit 91920d4103
parent c3b63ea2f5
3 changed files with 339 additions and 2 deletions
--- a/skills/paper-reader/SKILL.md
+++ b/skills/paper-reader/SKILL.md
@ -5,7 +5,7 @@ description: Fetch and read academic papers by DOI. Use when (1) the user mentio

 # Paper Reader

-Fetch academic papers by DOI using the `paper` CLI, which downloads PDFs and converts them to markdown via `marker_single`.
+Fetch academic papers by DOI using the `paper` CLI, which downloads PDFs and converts them to markdown. Uses simple text extraction (`pdftotext`) when possible, falling back to ML OCR (`marker_single`) for scanned or image-heavy papers.

 ## Usage

@ -32,7 +32,15 @@ Results are cached at `~/.cache/paper/<DOI>.md`. Subsequent requests for the sam

 ## Download Sources

-The tool tries LibGen first (free, no authentication), then falls back to Anna's Archive fast download API if `ANNAS_ARCHIVE_KEY` is set.
+The tool tries sources in this order:
+
+1. **Unpaywall** — free open-access PDFs (requires `UNPAYWALL_EMAIL` env var)
+2. **LibGen** — free, no authentication
+3. **Anna's Archive** — fast download API (requires `ANNAS_ARCHIVE_KEY` env var)
+
+## Conversion
+
+PDF-to-markdown conversion tries simple text extraction first (`pdftotext`), which works well for most modern papers with proper text layers. If the output is low quality (garbled or too short), it falls back to ML OCR via `marker_single`.

 ## Errors