paper-reader/docs/plans/2026-02-19-paper-cli-design.md

50 lines
1.3 KiB
Markdown
Raw Normal View History

# paper CLI — Design
A CLI tool that downloads academic papers by DOI from Anna's Archive and converts them to markdown.
## CLI Interface
```
paper <DOI>
```
Single positional argument. Markdown output goes to stdout.
```
paper 10.1038/nature12373 > paper.md
```
## Download Flow
1. Request `https://annas-archive.org/scidb/<DOI>` with a browser-like User-Agent
2. Parse HTML for `<iframe>` or `<embed>` with `id="pdf"` — extract `src` for direct PDF URL
3. Fallback: find any link ending in `.pdf`
4. Download PDF to a temp file
5. Exit with clear error if no PDF found
## Conversion
1. Shell out to `marker_single <tempfile.pdf> --output_dir <tempdir>`
2. Read the generated `.md` file from the output dir
3. Print to stdout
4. Clean up temp dir
## Error Handling
- `marker_single` not on PATH: tell user to install (`pip install marker-pdf`)
- Conversion failure: forward marker's stderr
- Network errors: surface reqwest errors clearly
- No PDF found: specific error message with the DOI
## Dependencies
- `clap` — argument parsing
- `reqwest` (blocking, rustls-tls) — HTTP
- `scraper` — HTML parsing
- `tempfile` — temp directory
- `anyhow` — error handling
## Dev Environment
The nix flake includes Rust nightly toolchain and marker-pdf in the devshell.