Add paper CLI: download academic papers by DOI and convert to markdown
Downloads PDFs from LibGen (primary) or Anna's Archive API (fallback), converts to markdown via marker_single, and prints to stdout. Includes XDG-compliant caching, nix flake with marker-pdf packaging, and a Claude Code skill for paper-reader integration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
commit
f82b738db7
10 changed files with 2860 additions and 0 deletions
49
docs/plans/2026-02-19-paper-cli-design.md
Normal file
49
docs/plans/2026-02-19-paper-cli-design.md
Normal file
|
|
@ -0,0 +1,49 @@
|
|||
# paper CLI — Design
|
||||
|
||||
A CLI tool that downloads academic papers by DOI from Anna's Archive and converts them to markdown.
|
||||
|
||||
## CLI Interface
|
||||
|
||||
```
|
||||
paper <DOI>
|
||||
```
|
||||
|
||||
Single positional argument. Markdown output goes to stdout.
|
||||
|
||||
```
|
||||
paper 10.1038/nature12373 > paper.md
|
||||
```
|
||||
|
||||
## Download Flow
|
||||
|
||||
1. Request `https://annas-archive.org/scidb/<DOI>` with a browser-like User-Agent
|
||||
2. Parse HTML for `<iframe>` or `<embed>` with `id="pdf"` — extract `src` for direct PDF URL
|
||||
3. Fallback: find any link ending in `.pdf`
|
||||
4. Download PDF to a temp file
|
||||
5. Exit with clear error if no PDF found
|
||||
|
||||
## Conversion
|
||||
|
||||
1. Shell out to `marker_single <tempfile.pdf> --output_dir <tempdir>`
|
||||
2. Read the generated `.md` file from the output dir
|
||||
3. Print to stdout
|
||||
4. Clean up temp dir
|
||||
|
||||
## Error Handling
|
||||
|
||||
- `marker_single` not on PATH: tell user to install (`pip install marker-pdf`)
|
||||
- Conversion failure: forward marker's stderr
|
||||
- Network errors: surface reqwest errors clearly
|
||||
- No PDF found: specific error message with the DOI
|
||||
|
||||
## Dependencies
|
||||
|
||||
- `clap` — argument parsing
|
||||
- `reqwest` (blocking, rustls-tls) — HTTP
|
||||
- `scraper` — HTML parsing
|
||||
- `tempfile` — temp directory
|
||||
- `anyhow` — error handling
|
||||
|
||||
## Dev Environment
|
||||
|
||||
The nix flake includes Rust nightly toolchain and marker-pdf in the devshell.
|
||||
Loading…
Add table
Add a link
Reference in a new issue