docs: update skill docs for unpaywall and pdftotext support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Ellie 2026-02-25 13:35:18 -08:00
parent c3b63ea2f5
commit 91920d4103
3 changed files with 339 additions and 2 deletions

View file

@ -5,7 +5,7 @@ description: Fetch and read academic papers by DOI. Use when (1) the user mentio
# Paper Reader
Fetch academic papers by DOI using the `paper` CLI, which downloads PDFs and converts them to markdown via `marker_single`.
Fetch academic papers by DOI using the `paper` CLI, which downloads PDFs and converts them to markdown. Uses simple text extraction (`pdftotext`) when possible, falling back to ML OCR (`marker_single`) for scanned or image-heavy papers.
## Usage
@ -32,7 +32,15 @@ Results are cached at `~/.cache/paper/<DOI>.md`. Subsequent requests for the sam
## Download Sources
The tool tries LibGen first (free, no authentication), then falls back to Anna's Archive fast download API if `ANNAS_ARCHIVE_KEY` is set.
The tool tries sources in this order:
1. **Unpaywall** — free open-access PDFs (requires `UNPAYWALL_EMAIL` env var)
2. **LibGen** — free, no authentication
3. **Anna's Archive** — fast download API (requires `ANNAS_ARCHIVE_KEY` env var)
## Conversion
PDF-to-markdown conversion tries simple text extraction first (`pdftotext`), which works well for most modern papers with proper text layers. If the output is low quality (garbled or too short), it falls back to ML OCR via `marker_single`.
## Errors