Downloads PDFs from LibGen (primary) or Anna's Archive API (fallback), converts to markdown via marker_single, and prints to stdout. Includes XDG-compliant caching, nix flake with marker-pdf packaging, and a Claude Code skill for paper-reader integration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
49 lines
1.3 KiB
Markdown
49 lines
1.3 KiB
Markdown
# paper CLI — Design
|
|
|
|
A CLI tool that downloads academic papers by DOI from Anna's Archive and converts them to markdown.
|
|
|
|
## CLI Interface
|
|
|
|
```
|
|
paper <DOI>
|
|
```
|
|
|
|
Single positional argument. Markdown output goes to stdout.
|
|
|
|
```
|
|
paper 10.1038/nature12373 > paper.md
|
|
```
|
|
|
|
## Download Flow
|
|
|
|
1. Request `https://annas-archive.org/scidb/<DOI>` with a browser-like User-Agent
|
|
2. Parse HTML for `<iframe>` or `<embed>` with `id="pdf"` — extract `src` for direct PDF URL
|
|
3. Fallback: find any link ending in `.pdf`
|
|
4. Download PDF to a temp file
|
|
5. Exit with clear error if no PDF found
|
|
|
|
## Conversion
|
|
|
|
1. Shell out to `marker_single <tempfile.pdf> --output_dir <tempdir>`
|
|
2. Read the generated `.md` file from the output dir
|
|
3. Print to stdout
|
|
4. Clean up temp dir
|
|
|
|
## Error Handling
|
|
|
|
- `marker_single` not on PATH: tell user to install (`pip install marker-pdf`)
|
|
- Conversion failure: forward marker's stderr
|
|
- Network errors: surface reqwest errors clearly
|
|
- No PDF found: specific error message with the DOI
|
|
|
|
## Dependencies
|
|
|
|
- `clap` — argument parsing
|
|
- `reqwest` (blocking, rustls-tls) — HTTP
|
|
- `scraper` — HTML parsing
|
|
- `tempfile` — temp directory
|
|
- `anyhow` — error handling
|
|
|
|
## Dev Environment
|
|
|
|
The nix flake includes Rust nightly toolchain and marker-pdf in the devshell.
|