docs: update skill docs for unpaywall and pdftotext support
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
c3b63ea2f5
commit
91920d4103
3 changed files with 339 additions and 2 deletions
|
|
@ -5,7 +5,7 @@ description: Fetch and read academic papers by DOI. Use when (1) the user mentio
|
|||
|
||||
# Paper Reader
|
||||
|
||||
Fetch academic papers by DOI using the `paper` CLI, which downloads PDFs and converts them to markdown via `marker_single`.
|
||||
Fetch academic papers by DOI using the `paper` CLI, which downloads PDFs and converts them to markdown. Uses simple text extraction (`pdftotext`) when possible, falling back to ML OCR (`marker_single`) for scanned or image-heavy papers.
|
||||
|
||||
## Usage
|
||||
|
||||
|
|
@ -32,7 +32,15 @@ Results are cached at `~/.cache/paper/<DOI>.md`. Subsequent requests for the sam
|
|||
|
||||
## Download Sources
|
||||
|
||||
The tool tries LibGen first (free, no authentication), then falls back to Anna's Archive fast download API if `ANNAS_ARCHIVE_KEY` is set.
|
||||
The tool tries sources in this order:
|
||||
|
||||
1. **Unpaywall** — free open-access PDFs (requires `UNPAYWALL_EMAIL` env var)
|
||||
2. **LibGen** — free, no authentication
|
||||
3. **Anna's Archive** — fast download API (requires `ANNAS_ARCHIVE_KEY` env var)
|
||||
|
||||
## Conversion
|
||||
|
||||
PDF-to-markdown conversion tries simple text extraction first (`pdftotext`), which works well for most modern papers with proper text layers. If the output is low quality (garbled or too short), it falls back to ML OCR via `marker_single`.
|
||||
|
||||
## Errors
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue