Add paper CLI: download academic papers by DOI and convert to markdown

Downloads PDFs from LibGen (primary) or Anna's Archive API (fallback),
converts to markdown via marker_single, and prints to stdout. Includes
XDG-compliant caching, nix flake with marker-pdf packaging, and a
Claude Code skill for paper-reader integration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Ellie 2026-02-19 22:54:30 -08:00
commit f82b738db7
10 changed files with 2860 additions and 0 deletions

View file

@ -0,0 +1,49 @@
# paper CLI — Design
A CLI tool that downloads academic papers by DOI from Anna's Archive and converts them to markdown.
## CLI Interface
```
paper <DOI>
```
Single positional argument. Markdown output goes to stdout.
```
paper 10.1038/nature12373 > paper.md
```
## Download Flow
1. Request `https://annas-archive.org/scidb/<DOI>` with a browser-like User-Agent
2. Parse HTML for `<iframe>` or `<embed>` with `id="pdf"` — extract `src` for direct PDF URL
3. Fallback: find any link ending in `.pdf`
4. Download PDF to a temp file
5. Exit with clear error if no PDF found
## Conversion
1. Shell out to `marker_single <tempfile.pdf> --output_dir <tempdir>`
2. Read the generated `.md` file from the output dir
3. Print to stdout
4. Clean up temp dir
## Error Handling
- `marker_single` not on PATH: tell user to install (`pip install marker-pdf`)
- Conversion failure: forward marker's stderr
- Network errors: surface reqwest errors clearly
- No PDF found: specific error message with the DOI
## Dependencies
- `clap` — argument parsing
- `reqwest` (blocking, rustls-tls) — HTTP
- `scraper` — HTML parsing
- `tempfile` — temp directory
- `anyhow` — error handling
## Dev Environment
The nix flake includes Rust nightly toolchain and marker-pdf in the devshell.