Initialize qqzot skill
This commit is contained in:
commit
f78d22f1e5
|
|
@ -0,0 +1,126 @@
|
||||||
|
---
|
||||||
|
name: qqzot
|
||||||
|
description: Manage the user's Zotero literature database from Codex: search local Zotero items, generate and audit Zotero child-note AI literature notes, summarize selected Zotero papers into Markdown tables, export citation metadata, and provide Zotero-side inputs to QQnote, qqcites, and qqsci. Use when the user asks about Zotero item keys, Zotero child notes, Zotero Local API, missing AI notes, Zotero paper tables, Zotero metadata, or library-side citation inputs. This skill does not own Obsidian vault maintenance or manuscript writing.
|
||||||
|
---
|
||||||
|
|
||||||
|
# QQzot
|
||||||
|
|
||||||
|
Use this skill for Zotero-side literature operations. It owns Zotero item lookup,
|
||||||
|
metadata extraction, child-note generation, generated-note audits, Zotero paper
|
||||||
|
tables, and Zotero inputs for the user's QQ literature workflow.
|
||||||
|
|
||||||
|
Boundary:
|
||||||
|
|
||||||
|
- `qqzot`: Zotero library, Zotero item keys, Zotero child notes, Zotero metadata,
|
||||||
|
local Zotero API, citation metadata exports, and Zotero-derived paper tables.
|
||||||
|
- `QQnote-skill`: Obsidian literature notes, vault organization, Markdown note
|
||||||
|
cleanup, Dataview dashboards, and Obsidian presentation of literature notes.
|
||||||
|
- `qqcites`: sentence/claim-to-reference ranking, support grading, duplicate
|
||||||
|
citation control, and citation verification.
|
||||||
|
- `qqsci`: manuscript writing, scientific logic, section structure, novelty,
|
||||||
|
claim-evidence checks, and reviewer-risk audits.
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
- Zotero Desktop must be open.
|
||||||
|
- Zotero Local API must be available at `http://127.0.0.1:23119`.
|
||||||
|
- Default Obsidian vault, only when a script needs templates or an output path:
|
||||||
|
`C:\Users\qyh15\Documents\Obsidian Vault`.
|
||||||
|
- `ZOTERO_API_KEY` is required only for Zotero Web API writes.
|
||||||
|
- LLM settings come from `AWESOMEGPT_API_KEY`, `AWESOMEGPT_BASE_URL`,
|
||||||
|
`AWESOMEGPT_MODEL`, or AwesomeGPT preferences in the default Zotero profile.
|
||||||
|
|
||||||
|
Never store API keys in skill files, vault helpers, Git commits, zip files,
|
||||||
|
terminal output, or chat. If keys appear in logs or chat, tell the user to
|
||||||
|
rotate them.
|
||||||
|
|
||||||
|
## Workflow Decision
|
||||||
|
|
||||||
|
Use the smallest Zotero operation that answers the request:
|
||||||
|
|
||||||
|
1. Need AI child notes for Zotero items: run `generate_zotero_ai_note.py`.
|
||||||
|
2. Need to know which Zotero items lack generated notes: run
|
||||||
|
`audit_zotero_ai_notes.py`.
|
||||||
|
3. Need a comparison table for selected Zotero papers: run
|
||||||
|
`summarize_zotero_table.py`.
|
||||||
|
4. Need manuscript citations for a sentence or claim: hand candidates to
|
||||||
|
`qqcites` after Zotero retrieval.
|
||||||
|
5. Need Obsidian note organization, cleanup, or Dataview: hand off to
|
||||||
|
`QQnote-skill`.
|
||||||
|
|
||||||
|
## Generate Zotero AI Child Notes
|
||||||
|
|
||||||
|
State clearly before live writes that Zotero child notes will be created.
|
||||||
|
Use `--dry-run` for first-time validation or template changes.
|
||||||
|
|
||||||
|
Single item:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
py "$env:USERPROFILE\.codex\skills\qqzot\scripts\generate_zotero_ai_note.py" --vault "C:\Users\qyh15\Documents\Obsidian Vault" --item-key SXAIQUJT --skip-existing
|
||||||
|
```
|
||||||
|
|
||||||
|
Multiple items:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
py "$env:USERPROFILE\.codex\skills\qqzot\scripts\generate_zotero_ai_note.py" --vault "C:\Users\qyh15\Documents\Obsidian Vault" --item-keys "SXAIQUJT X7GJZ627 ZCZXGRAM" --limit 0 --skip-existing --fulltext-chars 4000
|
||||||
|
```
|
||||||
|
|
||||||
|
Whole library:
|
||||||
|
|
||||||
|
Use only after the user explicitly approves library-wide writes.
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
py "$env:USERPROFILE\.codex\skills\qqzot\scripts\generate_zotero_ai_note.py" --vault "C:\Users\qyh15\Documents\Obsidian Vault" --all --limit 0 --skip-existing --fulltext-chars 4000
|
||||||
|
```
|
||||||
|
|
||||||
|
For long runs, prefer 20-30 item batches. If a run times out, rerun with
|
||||||
|
`--skip-existing`; duplicate detection uses the Zotero item link inside child
|
||||||
|
notes.
|
||||||
|
|
||||||
|
## Audit Missing Generated Notes
|
||||||
|
|
||||||
|
This is deterministic library comparison. Do not use DeepSeek or another LLM to
|
||||||
|
decide whether notes are missing.
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
py "$env:USERPROFILE\.codex\skills\qqzot\scripts\audit_zotero_ai_notes.py" --vault "C:\Users\qyh15\Documents\Obsidian Vault" --rebuild
|
||||||
|
py "$env:USERPROFILE\.codex\skills\qqzot\scripts\audit_zotero_ai_notes.py" --vault "C:\Users\qyh15\Documents\Obsidian Vault" --keys-only
|
||||||
|
```
|
||||||
|
|
||||||
|
The cache lives at:
|
||||||
|
|
||||||
|
```text
|
||||||
|
<vault>\00 Templater\.zotero-ai-notes-index.json
|
||||||
|
```
|
||||||
|
|
||||||
|
Use `--refresh` or `--rebuild` when notes were edited outside this workflow.
|
||||||
|
|
||||||
|
## Summarize Zotero Papers Into A Table
|
||||||
|
|
||||||
|
Use this for one paper or a batch of papers when the user wants a compact
|
||||||
|
comparison table for Obsidian, Word, qqcites, or qqsci.
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
py "$env:USERPROFILE\.codex\skills\qqzot\scripts\summarize_zotero_table.py" --vault "C:\Users\qyh15\Documents\Obsidian Vault" --item-keys "SXAIQUJT X7GJZ627 ZCZXGRAM" --batch-size 3 --out "C:\Users\qyh15\Documents\Obsidian Vault\99 项目\文献对比表.md"
|
||||||
|
```
|
||||||
|
|
||||||
|
Rules:
|
||||||
|
|
||||||
|
- This workflow reads Zotero Local API plus LLM config, but does not create
|
||||||
|
Zotero child notes.
|
||||||
|
- Prefer 3-10 papers per batch to keep tables accurate and readable.
|
||||||
|
- Use custom columns when the user asks for a specific comparison axis.
|
||||||
|
- If `--out` is omitted, print the Markdown table to stdout.
|
||||||
|
|
||||||
|
## Operating Rules
|
||||||
|
|
||||||
|
- Prefer Zotero item keys, DOI, title, first author, and year as metadata anchors.
|
||||||
|
- Do not delete duplicate Zotero notes unless the user explicitly requests
|
||||||
|
cleanup.
|
||||||
|
- Do not add visible machine markers to note bodies.
|
||||||
|
- Reserve DeepSeek for note content generation or summarization, not
|
||||||
|
deterministic library audits.
|
||||||
|
- If PDF full text is unavailable, fall back to metadata, abstract, BibTeX, and
|
||||||
|
optional local PDF extraction if a Python PDF library exists.
|
||||||
|
- If PowerShell output has Unicode issues, set `PYTHONIOENCODING=utf-8` or rely
|
||||||
|
on the scripts' UTF-8 handling.
|
||||||
|
|
@ -0,0 +1,4 @@
|
||||||
|
interface:
|
||||||
|
display_name: "QQzot"
|
||||||
|
short_description: "Zotero library notes and metadata workflow"
|
||||||
|
default_prompt: "Use $qqzot to search Zotero, generate AI child notes, audit missing notes, or summarize selected Zotero papers."
|
||||||
|
|
@ -0,0 +1,180 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Audit which Zotero top-level items have generated AI child notes.
|
||||||
|
|
||||||
|
This script is deterministic and does not call any LLM.
|
||||||
|
It can build a local cache for fast repeated missing-note checks.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
import urllib.parse
|
||||||
|
import urllib.request
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
try:
|
||||||
|
sys.stdout.reconfigure(encoding="utf-8")
|
||||||
|
sys.stderr.reconfigure(encoding="utf-8")
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
LOCAL_ZOTERO = "http://127.0.0.1:23119/api/users/0"
|
||||||
|
|
||||||
|
|
||||||
|
def fail(message: str) -> None:
|
||||||
|
print(f"error: {message}", file=sys.stderr)
|
||||||
|
raise SystemExit(1)
|
||||||
|
|
||||||
|
|
||||||
|
def zotero_get(path: str) -> Any:
|
||||||
|
req = urllib.request.Request(LOCAL_ZOTERO + path, headers={"Zotero-API-Version": "3"})
|
||||||
|
with urllib.request.urlopen(req, timeout=30) as response:
|
||||||
|
return json.loads(response.read().decode("utf-8", errors="replace"))
|
||||||
|
|
||||||
|
|
||||||
|
def all_top_items() -> list[dict[str, Any]]:
|
||||||
|
items: list[dict[str, Any]] = []
|
||||||
|
start = 0
|
||||||
|
limit = 100
|
||||||
|
while True:
|
||||||
|
page = zotero_get("/items/top?" + urllib.parse.urlencode({"limit": limit, "start": start}))
|
||||||
|
if not page:
|
||||||
|
break
|
||||||
|
items.extend(page)
|
||||||
|
if len(page) < limit:
|
||||||
|
break
|
||||||
|
start += limit
|
||||||
|
return items
|
||||||
|
|
||||||
|
|
||||||
|
def item_summary(item: dict[str, Any]) -> dict[str, Any]:
|
||||||
|
data = item.get("data") or {}
|
||||||
|
return {
|
||||||
|
"key": item.get("key"),
|
||||||
|
"title": data.get("title"),
|
||||||
|
"itemType": data.get("itemType"),
|
||||||
|
"version": item.get("version") or data.get("version"),
|
||||||
|
"dateModified": data.get("dateModified"),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def note_is_generated_for_parent(note_html: str, parent_key: str) -> bool:
|
||||||
|
return (
|
||||||
|
f"items/{parent_key}" in note_html
|
||||||
|
or "AI Literature Note" in note_html
|
||||||
|
or bool(re.search(r"<h1>[^<]*📝", note_html))
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def scan_parent(parent_key: str) -> dict[str, Any]:
|
||||||
|
children = zotero_get(f"/items/{urllib.parse.quote(parent_key)}/children")
|
||||||
|
generated_notes: list[str] = []
|
||||||
|
child_note_count = 0
|
||||||
|
for child in children or []:
|
||||||
|
data = child.get("data") or {}
|
||||||
|
if data.get("itemType") != "note":
|
||||||
|
continue
|
||||||
|
child_note_count += 1
|
||||||
|
note_key = child.get("key")
|
||||||
|
if not note_key:
|
||||||
|
continue
|
||||||
|
full = zotero_get(f"/items/{urllib.parse.quote(note_key)}")
|
||||||
|
note_html = ((full.get("data") or {}).get("note") or "")
|
||||||
|
if note_is_generated_for_parent(note_html, parent_key):
|
||||||
|
generated_notes.append(note_key)
|
||||||
|
return {
|
||||||
|
"hasGeneratedNote": bool(generated_notes),
|
||||||
|
"generatedNoteKeys": generated_notes,
|
||||||
|
"childNoteCount": child_note_count,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def load_cache(path: Path) -> dict[str, Any]:
|
||||||
|
if not path.exists():
|
||||||
|
return {"items": {}}
|
||||||
|
try:
|
||||||
|
return json.loads(path.read_text(encoding="utf-8"))
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
return {"items": {}}
|
||||||
|
|
||||||
|
|
||||||
|
def save_cache(path: Path, cache: dict[str, Any]) -> None:
|
||||||
|
path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
cache["updatedAt"] = datetime.now(timezone.utc).isoformat()
|
||||||
|
path.write_text(json.dumps(cache, ensure_ascii=False, indent=2), encoding="utf-8")
|
||||||
|
|
||||||
|
|
||||||
|
def audit(vault: Path, *, refresh: bool, rebuild: bool, limit: int | None) -> dict[str, Any]:
|
||||||
|
cache_path = vault / "00 Templater" / ".zotero-ai-notes-index.json"
|
||||||
|
cache = {"items": {}} if rebuild else load_cache(cache_path)
|
||||||
|
cached_items: dict[str, Any] = cache.setdefault("items", {})
|
||||||
|
top_items = all_top_items()
|
||||||
|
if limit:
|
||||||
|
top_items = top_items[:limit]
|
||||||
|
|
||||||
|
current_keys = set()
|
||||||
|
scanned = 0
|
||||||
|
for item in top_items:
|
||||||
|
summary = item_summary(item)
|
||||||
|
key = summary.get("key")
|
||||||
|
if not key:
|
||||||
|
continue
|
||||||
|
current_keys.add(key)
|
||||||
|
record = cached_items.get(key)
|
||||||
|
if rebuild or refresh or not record:
|
||||||
|
record = {**summary, **scan_parent(key)}
|
||||||
|
cached_items[key] = record
|
||||||
|
scanned += 1
|
||||||
|
else:
|
||||||
|
record.update(summary)
|
||||||
|
|
||||||
|
for key in list(cached_items):
|
||||||
|
if key not in current_keys:
|
||||||
|
cached_items[key]["deletedOrNotTopLevel"] = True
|
||||||
|
|
||||||
|
save_cache(cache_path, cache)
|
||||||
|
|
||||||
|
active_records = [
|
||||||
|
record for key, record in cached_items.items()
|
||||||
|
if key in current_keys and not record.get("deletedOrNotTopLevel")
|
||||||
|
]
|
||||||
|
missing = [record for record in active_records if not record.get("hasGeneratedNote")]
|
||||||
|
duplicates = [
|
||||||
|
record for record in active_records
|
||||||
|
if len(record.get("generatedNoteKeys") or []) > 1
|
||||||
|
]
|
||||||
|
return {
|
||||||
|
"cache": str(cache_path),
|
||||||
|
"total": len(active_records),
|
||||||
|
"withGeneratedNote": len(active_records) - len(missing),
|
||||||
|
"missingCount": len(missing),
|
||||||
|
"duplicateGeneratedNoteItems": len(duplicates),
|
||||||
|
"scannedParentsThisRun": scanned,
|
||||||
|
"missing": missing,
|
||||||
|
"duplicates": duplicates,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("--vault", default=str(Path.cwd()), help="Obsidian vault path")
|
||||||
|
parser.add_argument("--refresh", action="store_true", help="Rescan all current top-level parents")
|
||||||
|
parser.add_argument("--rebuild", action="store_true", help="Discard cache and rescan all current top-level parents")
|
||||||
|
parser.add_argument("--limit", type=int, help="Only audit the first N top-level items")
|
||||||
|
parser.add_argument("--keys-only", action="store_true", help="Print only missing item keys")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
result = audit(Path(args.vault).expanduser().resolve(), refresh=args.refresh, rebuild=args.rebuild, limit=args.limit)
|
||||||
|
if args.keys_only:
|
||||||
|
print(" ".join(record["key"] for record in result["missing"] if record.get("key")))
|
||||||
|
else:
|
||||||
|
print(json.dumps(result, ensure_ascii=False, indent=2))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
|
@ -0,0 +1,586 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Generate an AI literature note from Zotero metadata and save it as a Zotero child note.
|
||||||
|
|
||||||
|
Required environment variables:
|
||||||
|
AWESOMEGPT_API_KEY DeepSeek/OpenAI-compatible API key
|
||||||
|
AWESOMEGPT_BASE_URL Example: https://api.deepseek.com
|
||||||
|
AWESOMEGPT_MODEL Example: deepseek-v4-pro
|
||||||
|
ZOTERO_API_KEY Zotero Web API key with library write permission
|
||||||
|
|
||||||
|
Optional:
|
||||||
|
ZOTERO_USER_ID If omitted, resolved from /keys/current
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import html
|
||||||
|
import importlib.util
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
import urllib.error
|
||||||
|
import urllib.parse
|
||||||
|
import urllib.request
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
try:
|
||||||
|
sys.stdout.reconfigure(encoding="utf-8")
|
||||||
|
sys.stderr.reconfigure(encoding="utf-8")
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
LOCAL_ZOTERO = "http://127.0.0.1:23119/api/users/0"
|
||||||
|
ZOTERO_WEB = "https://api.zotero.org"
|
||||||
|
DEFAULT_VAULT = Path.cwd()
|
||||||
|
|
||||||
|
|
||||||
|
def fail(message: str) -> None:
|
||||||
|
print(f"error: {message}", file=sys.stderr)
|
||||||
|
raise SystemExit(1)
|
||||||
|
|
||||||
|
|
||||||
|
def load_dotenv(path: Path) -> None:
|
||||||
|
if not path.exists():
|
||||||
|
return
|
||||||
|
for raw_line in path.read_text(encoding="utf-8").splitlines():
|
||||||
|
line = raw_line.strip()
|
||||||
|
if not line or line.startswith("#") or "=" not in line:
|
||||||
|
continue
|
||||||
|
key, value = line.split("=", 1)
|
||||||
|
key = key.strip()
|
||||||
|
value = value.strip().strip('"').strip("'")
|
||||||
|
os.environ.setdefault(key, value)
|
||||||
|
|
||||||
|
|
||||||
|
def zotero_profile_prefs() -> Path | None:
|
||||||
|
profiles_ini = Path.home() / "AppData/Roaming/Zotero/Zotero/profiles.ini"
|
||||||
|
profiles_root = profiles_ini.parent
|
||||||
|
try:
|
||||||
|
profiles_ini_exists = profiles_ini.exists()
|
||||||
|
except OSError:
|
||||||
|
return None
|
||||||
|
if profiles_ini_exists:
|
||||||
|
try:
|
||||||
|
text = profiles_ini.read_text(encoding="utf-8", errors="replace")
|
||||||
|
except OSError:
|
||||||
|
return None
|
||||||
|
blocks = re.split(r"\n(?=\[Profile\d+\])", text)
|
||||||
|
for block in blocks:
|
||||||
|
if "Default=1" not in block:
|
||||||
|
continue
|
||||||
|
path_match = re.search(r"^Path=(.+)$", block, re.MULTILINE)
|
||||||
|
relative_match = re.search(r"^IsRelative=(\d+)$", block, re.MULTILINE)
|
||||||
|
if path_match:
|
||||||
|
profile_path = Path(path_match.group(1).strip())
|
||||||
|
if relative_match and relative_match.group(1) == "1":
|
||||||
|
profile_path = profiles_root / profile_path
|
||||||
|
prefs = profile_path / "prefs.js"
|
||||||
|
if prefs.exists():
|
||||||
|
return prefs
|
||||||
|
profiles_dir = profiles_root / "Profiles"
|
||||||
|
try:
|
||||||
|
profiles_dir_exists = profiles_dir.exists()
|
||||||
|
except OSError:
|
||||||
|
return None
|
||||||
|
if profiles_dir_exists:
|
||||||
|
try:
|
||||||
|
for prefs in profiles_dir.glob("*/prefs.js"):
|
||||||
|
return prefs
|
||||||
|
except OSError:
|
||||||
|
return None
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def load_awesomegpt_prefs(path: Path | None = None) -> None:
|
||||||
|
if path is None:
|
||||||
|
path = zotero_profile_prefs()
|
||||||
|
if path is None:
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
path_exists = path.exists()
|
||||||
|
except OSError:
|
||||||
|
return
|
||||||
|
if not path_exists:
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
text = path.read_text(encoding="utf-8", errors="replace")
|
||||||
|
except OSError:
|
||||||
|
return
|
||||||
|
prefs: dict[str, Any] = {}
|
||||||
|
for name, raw_value in re.findall(r'user_pref\("([^"]+)",\s*(.*?)\);', text):
|
||||||
|
if not name.startswith("extensions.zotero.zoterogpt."):
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
prefs[name] = json.loads(raw_value)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
continue
|
||||||
|
|
||||||
|
settings_raw = prefs.get("extensions.zotero.zoterogpt.settings")
|
||||||
|
if isinstance(settings_raw, str):
|
||||||
|
try:
|
||||||
|
settings = json.loads(settings_raw)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
settings = {}
|
||||||
|
else:
|
||||||
|
settings = {}
|
||||||
|
|
||||||
|
direct_api = prefs.get("extensions.zotero.zoterogpt.api")
|
||||||
|
direct_model = prefs.get("extensions.zotero.zoterogpt.model")
|
||||||
|
direct_key = prefs.get("extensions.zotero.zoterogpt.secretKey")
|
||||||
|
provider = None
|
||||||
|
if isinstance(settings, dict):
|
||||||
|
provider = settings.get("DeepSeek") or next(
|
||||||
|
(value for key, value in settings.items() if key.lower() == "deepseek"),
|
||||||
|
None,
|
||||||
|
)
|
||||||
|
if isinstance(provider, dict):
|
||||||
|
os.environ.setdefault("AWESOMEGPT_BASE_URL", provider.get("api") or "")
|
||||||
|
os.environ.setdefault("AWESOMEGPT_MODEL", provider.get("model") or "")
|
||||||
|
os.environ.setdefault("AWESOMEGPT_API_KEY", provider.get("secretKey") or "")
|
||||||
|
if direct_api:
|
||||||
|
os.environ.setdefault("AWESOMEGPT_BASE_URL", str(direct_api))
|
||||||
|
if direct_model:
|
||||||
|
os.environ.setdefault("AWESOMEGPT_MODEL", str(direct_model))
|
||||||
|
if direct_key:
|
||||||
|
os.environ.setdefault("AWESOMEGPT_API_KEY", str(direct_key))
|
||||||
|
|
||||||
|
|
||||||
|
def http_json(
|
||||||
|
url: str,
|
||||||
|
*,
|
||||||
|
method: str = "GET",
|
||||||
|
headers: dict[str, str] | None = None,
|
||||||
|
payload: Any = None,
|
||||||
|
timeout: int = 90,
|
||||||
|
) -> Any:
|
||||||
|
body = None
|
||||||
|
req_headers = dict(headers or {})
|
||||||
|
if payload is not None:
|
||||||
|
body = json.dumps(payload, ensure_ascii=False).encode("utf-8")
|
||||||
|
req_headers.setdefault("Content-Type", "application/json")
|
||||||
|
request = urllib.request.Request(url, data=body, method=method, headers=req_headers)
|
||||||
|
try:
|
||||||
|
with urllib.request.urlopen(request, timeout=timeout) as response:
|
||||||
|
text = response.read().decode("utf-8", errors="replace")
|
||||||
|
if not text:
|
||||||
|
return None
|
||||||
|
return json.loads(text)
|
||||||
|
except urllib.error.HTTPError as exc:
|
||||||
|
detail = exc.read().decode("utf-8", errors="replace")
|
||||||
|
fail(f"{method} {url} failed: HTTP {exc.code}: {detail[:800]}")
|
||||||
|
except urllib.error.URLError as exc:
|
||||||
|
fail(f"{method} {url} failed: {exc}")
|
||||||
|
|
||||||
|
|
||||||
|
def zotero_local(path: str) -> Any:
|
||||||
|
url = LOCAL_ZOTERO + path
|
||||||
|
return http_json(url, headers={"Zotero-API-Version": "3"}, timeout=20)
|
||||||
|
|
||||||
|
|
||||||
|
def zotero_local_optional(path: str) -> Any | None:
|
||||||
|
url = LOCAL_ZOTERO + path
|
||||||
|
request = urllib.request.Request(url, headers={"Zotero-API-Version": "3"})
|
||||||
|
try:
|
||||||
|
with urllib.request.urlopen(request, timeout=20) as response:
|
||||||
|
text = response.read().decode("utf-8", errors="replace")
|
||||||
|
return json.loads(text) if text else None
|
||||||
|
except (urllib.error.HTTPError, urllib.error.URLError, json.JSONDecodeError):
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def zotero_web(path: str, *, method: str = "GET", payload: Any = None) -> Any:
|
||||||
|
api_key = os.environ.get("ZOTERO_API_KEY")
|
||||||
|
if not api_key:
|
||||||
|
fail("ZOTERO_API_KEY is required to write Zotero child notes")
|
||||||
|
url = ZOTERO_WEB + path
|
||||||
|
return http_json(
|
||||||
|
url,
|
||||||
|
method=method,
|
||||||
|
headers={"Zotero-API-Version": "3", "Zotero-API-Key": api_key},
|
||||||
|
payload=payload,
|
||||||
|
timeout=60,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_user_id() -> str:
|
||||||
|
explicit = os.environ.get("ZOTERO_USER_ID")
|
||||||
|
if explicit:
|
||||||
|
return explicit
|
||||||
|
current = zotero_web("/keys/current")
|
||||||
|
user_id = current.get("userID") if isinstance(current, dict) else None
|
||||||
|
if not user_id:
|
||||||
|
fail("could not resolve Zotero userID from /keys/current")
|
||||||
|
return str(user_id)
|
||||||
|
|
||||||
|
|
||||||
|
def find_item(item_key: str | None, query: str | None) -> dict[str, Any]:
|
||||||
|
if item_key:
|
||||||
|
return zotero_local(f"/items/{urllib.parse.quote(item_key)}")
|
||||||
|
if not query:
|
||||||
|
fail("provide --item-key or --query")
|
||||||
|
qs = urllib.parse.urlencode({"q": query, "limit": 5})
|
||||||
|
matches = zotero_local(f"/items/top?{qs}")
|
||||||
|
if not matches:
|
||||||
|
fail(f"no Zotero item matched query: {query}")
|
||||||
|
if len(matches) > 1:
|
||||||
|
print(f"warning: {len(matches)} matches; using {matches[0].get('key')}", file=sys.stderr)
|
||||||
|
return matches[0]
|
||||||
|
|
||||||
|
|
||||||
|
def find_items(keys: list[str], query: str | None, limit: int) -> list[dict[str, Any]]:
|
||||||
|
items: list[dict[str, Any]] = []
|
||||||
|
for key in keys:
|
||||||
|
items.append(find_item(key, None))
|
||||||
|
if query:
|
||||||
|
qs = urllib.parse.urlencode({"q": query, "limit": limit})
|
||||||
|
matches = zotero_local(f"/items/top?{qs}")
|
||||||
|
seen = {item.get("key") for item in items}
|
||||||
|
for item in matches or []:
|
||||||
|
if item.get("key") not in seen:
|
||||||
|
items.append(item)
|
||||||
|
seen.add(item.get("key"))
|
||||||
|
if not items:
|
||||||
|
fail("provide --item-key/--item-keys or --query")
|
||||||
|
return items[:limit] if limit else items
|
||||||
|
|
||||||
|
|
||||||
|
def all_top_items(limit: int = 0) -> list[dict[str, Any]]:
|
||||||
|
items: list[dict[str, Any]] = []
|
||||||
|
start = 0
|
||||||
|
page_limit = 100
|
||||||
|
while True:
|
||||||
|
qs = urllib.parse.urlencode({"limit": page_limit, "start": start})
|
||||||
|
page = zotero_local(f"/items/top?{qs}")
|
||||||
|
if not page:
|
||||||
|
break
|
||||||
|
items.extend(page)
|
||||||
|
if limit and len(items) >= limit:
|
||||||
|
return items[:limit]
|
||||||
|
if len(page) < page_limit:
|
||||||
|
break
|
||||||
|
start += page_limit
|
||||||
|
return items
|
||||||
|
|
||||||
|
|
||||||
|
def has_existing_ai_note(parent_key: str) -> bool:
|
||||||
|
children = zotero_local(f"/items/{urllib.parse.quote(parent_key)}/children")
|
||||||
|
for child in children or []:
|
||||||
|
data = child.get("data") or {}
|
||||||
|
if data.get("itemType") != "note":
|
||||||
|
continue
|
||||||
|
note = data.get("note") or ""
|
||||||
|
if not note and child.get("key"):
|
||||||
|
full_note = zotero_local_optional(f"/items/{urllib.parse.quote(child['key'])}")
|
||||||
|
note = ((full_note or {}).get("data") or {}).get("note") or ""
|
||||||
|
if "AI文献笔记" in note or "AI Literature Note" in note or f"items/{parent_key}" in note:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def export_bibtex(item_key: str) -> str:
|
||||||
|
qs = urllib.parse.urlencode({"itemKey": item_key, "format": "bibtex"})
|
||||||
|
url = f"{LOCAL_ZOTERO}/items?{qs}"
|
||||||
|
request = urllib.request.Request(url, headers={"Zotero-API-Version": "3"})
|
||||||
|
with urllib.request.urlopen(request, timeout=20) as response:
|
||||||
|
return response.read().decode("utf-8", errors="replace").strip()
|
||||||
|
|
||||||
|
|
||||||
|
def local_fulltext(parent_key: str, max_chars: int) -> str:
|
||||||
|
children = zotero_local(f"/items/{urllib.parse.quote(parent_key)}/children")
|
||||||
|
parts: list[str] = []
|
||||||
|
for child in children or []:
|
||||||
|
data = child.get("data") or {}
|
||||||
|
if data.get("itemType") != "attachment":
|
||||||
|
continue
|
||||||
|
key = child.get("key")
|
||||||
|
if not key:
|
||||||
|
continue
|
||||||
|
fulltext = zotero_local_optional(f"/items/{urllib.parse.quote(key)}/fulltext")
|
||||||
|
if not fulltext:
|
||||||
|
continue
|
||||||
|
content = fulltext.get("content") if isinstance(fulltext, dict) else ""
|
||||||
|
if content:
|
||||||
|
parts.append(content)
|
||||||
|
if sum(len(p) for p in parts) >= max_chars:
|
||||||
|
break
|
||||||
|
if not parts:
|
||||||
|
for child in children or []:
|
||||||
|
data = child.get("data") or {}
|
||||||
|
if data.get("itemType") != "attachment" or data.get("contentType") != "application/pdf":
|
||||||
|
continue
|
||||||
|
path = data.get("path")
|
||||||
|
if not path:
|
||||||
|
key = child.get("key")
|
||||||
|
if key:
|
||||||
|
full = zotero_local_optional(f"/items/{urllib.parse.quote(key)}")
|
||||||
|
path = ((full or {}).get("data") or {}).get("path")
|
||||||
|
if not path:
|
||||||
|
continue
|
||||||
|
extracted = extract_pdf_text(Path(path), max_chars=max_chars)
|
||||||
|
if extracted:
|
||||||
|
parts.append(extracted)
|
||||||
|
break
|
||||||
|
text = "\n\n".join(parts)
|
||||||
|
return text[:max_chars]
|
||||||
|
|
||||||
|
|
||||||
|
def template_paths(vault: Path) -> tuple[Path, Path, Path]:
|
||||||
|
templates = vault / "00 Templater"
|
||||||
|
if not templates.exists():
|
||||||
|
fail(f"template directory not found: {templates}")
|
||||||
|
files = {path.name: path for path in templates.glob("*.md")}
|
||||||
|
|
||||||
|
def match(prefix: str, contains: str) -> Path:
|
||||||
|
candidates = [
|
||||||
|
path for name, path in files.items()
|
||||||
|
if name.startswith(prefix) and contains in name
|
||||||
|
]
|
||||||
|
if not candidates:
|
||||||
|
candidates = [path for name, path in files.items() if name.startswith(prefix)]
|
||||||
|
if not candidates:
|
||||||
|
fail(f"missing template file starting with {prefix} in {templates}")
|
||||||
|
return candidates[0]
|
||||||
|
|
||||||
|
return (
|
||||||
|
match("03", "AI"),
|
||||||
|
match("01", ""),
|
||||||
|
match("02", ""),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def extract_pdf_text(path: Path, max_chars: int) -> str:
|
||||||
|
try:
|
||||||
|
path_exists = path.exists()
|
||||||
|
except OSError:
|
||||||
|
return ""
|
||||||
|
if not path_exists:
|
||||||
|
return ""
|
||||||
|
max_pages = 8
|
||||||
|
if importlib.util.find_spec("fitz"):
|
||||||
|
import fitz # type: ignore
|
||||||
|
|
||||||
|
chunks = []
|
||||||
|
with fitz.open(str(path)) as doc:
|
||||||
|
for page in doc[:max_pages]:
|
||||||
|
chunks.append(page.get_text("text"))
|
||||||
|
if sum(len(chunk) for chunk in chunks) >= max_chars:
|
||||||
|
break
|
||||||
|
return "\n".join(chunks)[:max_chars]
|
||||||
|
if importlib.util.find_spec("pypdf"):
|
||||||
|
from pypdf import PdfReader # type: ignore
|
||||||
|
|
||||||
|
reader = PdfReader(str(path))
|
||||||
|
chunks = []
|
||||||
|
for page in reader.pages[:max_pages]:
|
||||||
|
chunks.append(page.extract_text() or "")
|
||||||
|
if sum(len(chunk) for chunk in chunks) >= max_chars:
|
||||||
|
break
|
||||||
|
return "\n".join(chunks)[:max_chars]
|
||||||
|
if importlib.util.find_spec("PyPDF2"):
|
||||||
|
from PyPDF2 import PdfReader # type: ignore
|
||||||
|
|
||||||
|
reader = PdfReader(str(path))
|
||||||
|
chunks = []
|
||||||
|
for page in reader.pages[:max_pages]:
|
||||||
|
chunks.append(page.extract_text() or "")
|
||||||
|
if sum(len(chunk) for chunk in chunks) >= max_chars:
|
||||||
|
break
|
||||||
|
return "\n".join(chunks)[:max_chars]
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def creators_text(creators: list[dict[str, Any]]) -> str:
|
||||||
|
names = []
|
||||||
|
for creator in creators:
|
||||||
|
name = creator.get("name")
|
||||||
|
if not name:
|
||||||
|
name = " ".join(x for x in [creator.get("firstName"), creator.get("lastName")] if x)
|
||||||
|
if name:
|
||||||
|
names.append(name)
|
||||||
|
return "; ".join(names)
|
||||||
|
|
||||||
|
|
||||||
|
def build_prompt(item: dict[str, Any], bibtex: str, fulltext: str, vault: Path) -> str:
|
||||||
|
data = item.get("data") or {}
|
||||||
|
prompt_path, research_template_path, review_template_path = template_paths(vault)
|
||||||
|
prompt = prompt_path.read_text(encoding="utf-8")
|
||||||
|
research_template = research_template_path.read_text(encoding="utf-8")
|
||||||
|
review_template = review_template_path.read_text(encoding="utf-8")
|
||||||
|
metadata = {
|
||||||
|
"zoteroKey": item.get("key"),
|
||||||
|
"title": data.get("title"),
|
||||||
|
"itemType": data.get("itemType"),
|
||||||
|
"authors": creators_text(data.get("creators") or []),
|
||||||
|
"publicationTitle": data.get("publicationTitle"),
|
||||||
|
"date": data.get("date"),
|
||||||
|
"DOI": data.get("DOI"),
|
||||||
|
"url": data.get("url"),
|
||||||
|
"abstractNote": data.get("abstractNote"),
|
||||||
|
}
|
||||||
|
source = {
|
||||||
|
"metadata": metadata,
|
||||||
|
"bibtex": bibtex,
|
||||||
|
"indexedFullTextExcerpt": fulltext,
|
||||||
|
}
|
||||||
|
return "\n\n".join(
|
||||||
|
[
|
||||||
|
prompt,
|
||||||
|
"请先判断文献类型:综述型文献或研究型文献。",
|
||||||
|
"若为研究型文献,请严格填充下面的研究型模板:",
|
||||||
|
research_template,
|
||||||
|
"若为综述型文献,请严格填充下面的综述型模板:",
|
||||||
|
review_template,
|
||||||
|
"请将模板中的 ${topItem.getField('title')} 替换为真实题名,将 ${topItem.key} 替换为 Zotero key。",
|
||||||
|
"只输出最终 Markdown 笔记,不要输出解释、判断过程或代码围栏。",
|
||||||
|
"文献材料如下:",
|
||||||
|
json.dumps(source, ensure_ascii=False, indent=2),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def call_llm(prompt: str) -> str:
|
||||||
|
api_key = os.environ.get("AWESOMEGPT_API_KEY")
|
||||||
|
base_url = (os.environ.get("AWESOMEGPT_BASE_URL") or "").rstrip("/")
|
||||||
|
model = os.environ.get("AWESOMEGPT_MODEL")
|
||||||
|
if not api_key or not base_url or not model:
|
||||||
|
fail("AWESOMEGPT_API_KEY, AWESOMEGPT_BASE_URL, and AWESOMEGPT_MODEL are required")
|
||||||
|
if not base_url.endswith("/v1"):
|
||||||
|
base_url = base_url + "/v1"
|
||||||
|
payload = {
|
||||||
|
"model": model,
|
||||||
|
"messages": [
|
||||||
|
{"role": "system", "content": "You are a materials science literature-note assistant. Output Simplified Chinese Markdown only."},
|
||||||
|
{"role": "user", "content": prompt},
|
||||||
|
],
|
||||||
|
"temperature": 0.3,
|
||||||
|
}
|
||||||
|
response = http_json(
|
||||||
|
base_url + "/chat/completions",
|
||||||
|
method="POST",
|
||||||
|
headers={"Authorization": f"Bearer {api_key}"},
|
||||||
|
payload=payload,
|
||||||
|
timeout=180,
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
return response["choices"][0]["message"]["content"].strip()
|
||||||
|
except Exception as exc:
|
||||||
|
fail(f"unexpected LLM response shape: {exc}; response={response}")
|
||||||
|
|
||||||
|
|
||||||
|
def markdown_to_zotero_html(markdown: str) -> str:
|
||||||
|
lines = markdown.strip().splitlines()
|
||||||
|
out: list[str] = []
|
||||||
|
in_pre = False
|
||||||
|
pre_lines: list[str] = []
|
||||||
|
for line in lines:
|
||||||
|
stripped = line.strip()
|
||||||
|
if stripped.startswith("```") or stripped.startswith("~~~"):
|
||||||
|
if in_pre:
|
||||||
|
out.append("<pre>" + html.escape("\n".join(pre_lines)) + "</pre>")
|
||||||
|
pre_lines = []
|
||||||
|
in_pre = False
|
||||||
|
else:
|
||||||
|
in_pre = True
|
||||||
|
continue
|
||||||
|
if in_pre:
|
||||||
|
pre_lines.append(line)
|
||||||
|
continue
|
||||||
|
if not stripped:
|
||||||
|
continue
|
||||||
|
heading = re.match(r"^(#{1,6})\s+(.+)$", stripped)
|
||||||
|
if heading:
|
||||||
|
level = min(len(heading.group(1)), 6)
|
||||||
|
out.append(f"<h{level}>{html.escape(heading.group(2))}</h{level}>")
|
||||||
|
elif stripped.startswith(">"):
|
||||||
|
out.append(f"<blockquote>{html.escape(stripped.lstrip('> ').strip())}</blockquote>")
|
||||||
|
elif re.match(r"^[-*]\s+", stripped):
|
||||||
|
out.append(f"<p>{html.escape(stripped)}</p>")
|
||||||
|
else:
|
||||||
|
out.append(f"<p>{html.escape(stripped)}</p>")
|
||||||
|
if in_pre:
|
||||||
|
out.append("<pre>" + html.escape("\n".join(pre_lines)) + "</pre>")
|
||||||
|
return "\n".join(out)
|
||||||
|
|
||||||
|
|
||||||
|
def create_child_note(user_id: str, parent_key: str, markdown: str, dry_run: bool) -> Any:
|
||||||
|
note_html = markdown_to_zotero_html(markdown)
|
||||||
|
payload = [{"itemType": "note", "parentItem": parent_key, "note": note_html}]
|
||||||
|
if dry_run:
|
||||||
|
return {"dryRun": True, "payload": payload}
|
||||||
|
return zotero_web(f"/users/{user_id}/items", method="POST", payload=payload)
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("--item-key", action="append", default=[], help="Zotero top-level item key; can be repeated")
|
||||||
|
parser.add_argument("--item-keys", help="Comma/space separated Zotero top-level item keys")
|
||||||
|
parser.add_argument("--query", help="Search query; first top-level match is used")
|
||||||
|
parser.add_argument("--all", action="store_true", help="Process all top-level Zotero items")
|
||||||
|
parser.add_argument("--limit", type=int, default=1, help="Maximum number of items to process")
|
||||||
|
parser.add_argument("--fulltext-chars", type=int, default=12000)
|
||||||
|
parser.add_argument("--skip-existing", action="store_true", help="Skip items that already have a generated AI child note")
|
||||||
|
parser.add_argument("--dry-run", action="store_true", help="generate but do not write Zotero note")
|
||||||
|
parser.add_argument("--vault", default=str(DEFAULT_VAULT), help="Obsidian vault containing 00 Templater")
|
||||||
|
parser.add_argument("--env-file", help="Optional .env path; defaults to <vault>/.env")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
vault = Path(args.vault).expanduser().resolve()
|
||||||
|
load_dotenv(Path(args.env_file).expanduser().resolve() if args.env_file else vault / ".env")
|
||||||
|
load_awesomegpt_prefs()
|
||||||
|
|
||||||
|
keys = list(args.item_key)
|
||||||
|
if args.item_keys:
|
||||||
|
keys.extend([key for key in re.split(r"[\s,]+", args.item_keys.strip()) if key])
|
||||||
|
if args.all:
|
||||||
|
items = all_top_items(args.limit if args.limit != 1 else 0)
|
||||||
|
else:
|
||||||
|
items = find_items(keys, args.query, args.limit)
|
||||||
|
user_id = resolve_user_id()
|
||||||
|
results = []
|
||||||
|
for index, item in enumerate(items, 1):
|
||||||
|
try:
|
||||||
|
key = item.get("key")
|
||||||
|
title = (item.get("data") or {}).get("title")
|
||||||
|
if not key:
|
||||||
|
results.append({"index": index, "status": "skipped", "reason": "missing key", "title": title})
|
||||||
|
continue
|
||||||
|
if args.skip_existing and has_existing_ai_note(key):
|
||||||
|
results.append({"index": index, "itemKey": key, "title": title, "status": "skipped", "reason": "existing AI note"})
|
||||||
|
continue
|
||||||
|
print(f"[{index}/{len(items)}] generating {key}: {title}", file=sys.stderr)
|
||||||
|
bibtex = export_bibtex(key)
|
||||||
|
fulltext = local_fulltext(key, args.fulltext_chars)
|
||||||
|
prompt = build_prompt(item, bibtex, fulltext, vault)
|
||||||
|
markdown = call_llm(prompt)
|
||||||
|
result = create_child_note(user_id, key, markdown, args.dry_run)
|
||||||
|
results.append({"index": index, "itemKey": key, "title": title, "status": "ok", "result": result})
|
||||||
|
except SystemExit as exc:
|
||||||
|
results.append({
|
||||||
|
"index": index,
|
||||||
|
"itemKey": item.get("key"),
|
||||||
|
"title": (item.get("data") or {}).get("title"),
|
||||||
|
"status": "error",
|
||||||
|
"error": str(exc),
|
||||||
|
})
|
||||||
|
print(f"[{index}/{len(items)}] error; continuing", file=sys.stderr)
|
||||||
|
continue
|
||||||
|
except Exception as exc:
|
||||||
|
results.append({
|
||||||
|
"index": index,
|
||||||
|
"itemKey": item.get("key"),
|
||||||
|
"title": (item.get("data") or {}).get("title"),
|
||||||
|
"status": "error",
|
||||||
|
"error": repr(exc),
|
||||||
|
})
|
||||||
|
print(f"[{index}/{len(items)}] error; continuing: {exc}", file=sys.stderr)
|
||||||
|
continue
|
||||||
|
print(json.dumps(results, ensure_ascii=False, indent=2))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
|
@ -0,0 +1,493 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Summarize Zotero items into a Markdown table using an OpenAI-compatible LLM.
|
||||||
|
|
||||||
|
Required environment variables:
|
||||||
|
AWESOMEGPT_API_KEY DeepSeek/OpenAI-compatible API key
|
||||||
|
AWESOMEGPT_BASE_URL Example: https://api.deepseek.com
|
||||||
|
AWESOMEGPT_MODEL Example: deepseek-v4-pro
|
||||||
|
|
||||||
|
Optional:
|
||||||
|
Load from <vault>/.env or AwesomeGPT preferences in the default Zotero profile.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import importlib.util
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
import urllib.error
|
||||||
|
import urllib.parse
|
||||||
|
import urllib.request
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
try:
|
||||||
|
sys.stdout.reconfigure(encoding="utf-8")
|
||||||
|
sys.stderr.reconfigure(encoding="utf-8")
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
LOCAL_ZOTERO = "http://127.0.0.1:23119/api/users/0"
|
||||||
|
DEFAULT_VAULT = Path.cwd()
|
||||||
|
DEFAULT_COLUMNS = [
|
||||||
|
"Zotero Key",
|
||||||
|
"标题",
|
||||||
|
"类型",
|
||||||
|
"年份",
|
||||||
|
"期刊/来源",
|
||||||
|
"研究对象/材料",
|
||||||
|
"关键方法",
|
||||||
|
"核心结论",
|
||||||
|
"相关启发",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def fail(message: str) -> None:
|
||||||
|
print(f"error: {message}", file=sys.stderr)
|
||||||
|
raise SystemExit(1)
|
||||||
|
|
||||||
|
|
||||||
|
def load_dotenv(path: Path) -> None:
|
||||||
|
if not path.exists():
|
||||||
|
return
|
||||||
|
for raw_line in path.read_text(encoding="utf-8").splitlines():
|
||||||
|
line = raw_line.strip()
|
||||||
|
if not line or line.startswith("#") or "=" not in line:
|
||||||
|
continue
|
||||||
|
key, value = line.split("=", 1)
|
||||||
|
os.environ.setdefault(key.strip(), value.strip().strip('"').strip("'"))
|
||||||
|
|
||||||
|
|
||||||
|
def zotero_profile_prefs() -> Path | None:
|
||||||
|
profiles_ini = Path.home() / "AppData/Roaming/Zotero/Zotero/profiles.ini"
|
||||||
|
profiles_root = profiles_ini.parent
|
||||||
|
try:
|
||||||
|
profiles_ini_exists = profiles_ini.exists()
|
||||||
|
except OSError:
|
||||||
|
return None
|
||||||
|
if profiles_ini_exists:
|
||||||
|
try:
|
||||||
|
text = profiles_ini.read_text(encoding="utf-8", errors="replace")
|
||||||
|
except OSError:
|
||||||
|
return None
|
||||||
|
blocks = re.split(r"\n(?=\[Profile\d+\])", text)
|
||||||
|
for block in blocks:
|
||||||
|
if "Default=1" not in block:
|
||||||
|
continue
|
||||||
|
path_match = re.search(r"^Path=(.+)$", block, re.MULTILINE)
|
||||||
|
relative_match = re.search(r"^IsRelative=(\d+)$", block, re.MULTILINE)
|
||||||
|
if path_match:
|
||||||
|
profile_path = Path(path_match.group(1).strip())
|
||||||
|
if relative_match and relative_match.group(1) == "1":
|
||||||
|
profile_path = profiles_root / profile_path
|
||||||
|
prefs = profile_path / "prefs.js"
|
||||||
|
try:
|
||||||
|
if prefs.exists():
|
||||||
|
return prefs
|
||||||
|
except OSError:
|
||||||
|
return None
|
||||||
|
profiles_dir = profiles_root / "Profiles"
|
||||||
|
try:
|
||||||
|
profiles_dir_exists = profiles_dir.exists()
|
||||||
|
except OSError:
|
||||||
|
return None
|
||||||
|
if profiles_dir_exists:
|
||||||
|
try:
|
||||||
|
for prefs in profiles_dir.glob("*/prefs.js"):
|
||||||
|
return prefs
|
||||||
|
except OSError:
|
||||||
|
return None
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def load_awesomegpt_prefs(path: Path | None = None) -> None:
|
||||||
|
if path is None:
|
||||||
|
path = zotero_profile_prefs()
|
||||||
|
if path is None:
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
path_exists = path.exists()
|
||||||
|
except OSError:
|
||||||
|
return
|
||||||
|
if not path_exists:
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
text = path.read_text(encoding="utf-8", errors="replace")
|
||||||
|
except OSError:
|
||||||
|
return
|
||||||
|
prefs: dict[str, Any] = {}
|
||||||
|
for name, raw_value in re.findall(r'user_pref\("([^"]+)",\s*(.*?)\);', text):
|
||||||
|
if not name.startswith("extensions.zotero.zoterogpt."):
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
prefs[name] = json.loads(raw_value)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
continue
|
||||||
|
|
||||||
|
settings_raw = prefs.get("extensions.zotero.zoterogpt.settings")
|
||||||
|
settings = {}
|
||||||
|
if isinstance(settings_raw, str):
|
||||||
|
try:
|
||||||
|
settings = json.loads(settings_raw)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
settings = {}
|
||||||
|
|
||||||
|
direct_api = prefs.get("extensions.zotero.zoterogpt.api")
|
||||||
|
direct_model = prefs.get("extensions.zotero.zoterogpt.model")
|
||||||
|
direct_key = prefs.get("extensions.zotero.zoterogpt.secretKey")
|
||||||
|
provider = None
|
||||||
|
if isinstance(settings, dict):
|
||||||
|
provider = settings.get("DeepSeek") or next(
|
||||||
|
(value for key, value in settings.items() if key.lower() == "deepseek"),
|
||||||
|
None,
|
||||||
|
)
|
||||||
|
if isinstance(provider, dict):
|
||||||
|
os.environ.setdefault("AWESOMEGPT_BASE_URL", provider.get("api") or "")
|
||||||
|
os.environ.setdefault("AWESOMEGPT_MODEL", provider.get("model") or "")
|
||||||
|
os.environ.setdefault("AWESOMEGPT_API_KEY", provider.get("secretKey") or "")
|
||||||
|
if direct_api:
|
||||||
|
os.environ.setdefault("AWESOMEGPT_BASE_URL", str(direct_api))
|
||||||
|
if direct_model:
|
||||||
|
os.environ.setdefault("AWESOMEGPT_MODEL", str(direct_model))
|
||||||
|
if direct_key:
|
||||||
|
os.environ.setdefault("AWESOMEGPT_API_KEY", str(direct_key))
|
||||||
|
|
||||||
|
|
||||||
|
def http_json(
|
||||||
|
url: str,
|
||||||
|
*,
|
||||||
|
method: str = "GET",
|
||||||
|
headers: dict[str, str] | None = None,
|
||||||
|
payload: Any = None,
|
||||||
|
timeout: int = 90,
|
||||||
|
) -> Any:
|
||||||
|
body = None
|
||||||
|
req_headers = dict(headers or {})
|
||||||
|
if payload is not None:
|
||||||
|
body = json.dumps(payload, ensure_ascii=False).encode("utf-8")
|
||||||
|
req_headers.setdefault("Content-Type", "application/json")
|
||||||
|
request = urllib.request.Request(url, data=body, method=method, headers=req_headers)
|
||||||
|
try:
|
||||||
|
with urllib.request.urlopen(request, timeout=timeout) as response:
|
||||||
|
text = response.read().decode("utf-8", errors="replace")
|
||||||
|
if not text:
|
||||||
|
return None
|
||||||
|
return json.loads(text)
|
||||||
|
except urllib.error.HTTPError as exc:
|
||||||
|
detail = exc.read().decode("utf-8", errors="replace")
|
||||||
|
fail(f"{method} {url} failed: HTTP {exc.code}: {detail[:800]}")
|
||||||
|
except urllib.error.URLError as exc:
|
||||||
|
fail(f"{method} {url} failed: {exc}")
|
||||||
|
|
||||||
|
|
||||||
|
def zotero_local(path: str) -> Any:
|
||||||
|
return http_json(LOCAL_ZOTERO + path, headers={"Zotero-API-Version": "3"}, timeout=20)
|
||||||
|
|
||||||
|
|
||||||
|
def zotero_local_optional(path: str) -> Any | None:
|
||||||
|
url = LOCAL_ZOTERO + path
|
||||||
|
request = urllib.request.Request(url, headers={"Zotero-API-Version": "3"})
|
||||||
|
try:
|
||||||
|
with urllib.request.urlopen(request, timeout=20) as response:
|
||||||
|
text = response.read().decode("utf-8", errors="replace")
|
||||||
|
return json.loads(text) if text else None
|
||||||
|
except (urllib.error.HTTPError, urllib.error.URLError, json.JSONDecodeError):
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def find_item(item_key: str | None, query: str | None) -> dict[str, Any]:
|
||||||
|
if item_key:
|
||||||
|
return zotero_local(f"/items/{urllib.parse.quote(item_key)}")
|
||||||
|
if not query:
|
||||||
|
fail("provide --item-key or --query")
|
||||||
|
qs = urllib.parse.urlencode({"q": query, "limit": 5})
|
||||||
|
matches = zotero_local(f"/items/top?{qs}")
|
||||||
|
if not matches:
|
||||||
|
fail(f"no Zotero item matched query: {query}")
|
||||||
|
return matches[0]
|
||||||
|
|
||||||
|
|
||||||
|
def find_items(keys: list[str], query: str | None, limit: int) -> list[dict[str, Any]]:
|
||||||
|
items: list[dict[str, Any]] = []
|
||||||
|
for key in keys:
|
||||||
|
items.append(find_item(key, None))
|
||||||
|
if query:
|
||||||
|
qs = urllib.parse.urlencode({"q": query, "limit": limit})
|
||||||
|
matches = zotero_local(f"/items/top?{qs}")
|
||||||
|
seen = {item.get("key") for item in items}
|
||||||
|
for item in matches or []:
|
||||||
|
if item.get("key") not in seen:
|
||||||
|
items.append(item)
|
||||||
|
seen.add(item.get("key"))
|
||||||
|
if not items:
|
||||||
|
fail("provide --item-key/--item-keys or --query")
|
||||||
|
return items[:limit] if limit else items
|
||||||
|
|
||||||
|
|
||||||
|
def all_top_items(limit: int = 0) -> list[dict[str, Any]]:
|
||||||
|
items: list[dict[str, Any]] = []
|
||||||
|
start = 0
|
||||||
|
page_limit = 100
|
||||||
|
while True:
|
||||||
|
qs = urllib.parse.urlencode({"limit": page_limit, "start": start})
|
||||||
|
page = zotero_local(f"/items/top?{qs}")
|
||||||
|
if not page:
|
||||||
|
break
|
||||||
|
items.extend(page)
|
||||||
|
if limit and len(items) >= limit:
|
||||||
|
return items[:limit]
|
||||||
|
if len(page) < page_limit:
|
||||||
|
break
|
||||||
|
start += page_limit
|
||||||
|
return items
|
||||||
|
|
||||||
|
|
||||||
|
def export_bibtex(item_key: str) -> str:
|
||||||
|
qs = urllib.parse.urlencode({"itemKey": item_key, "format": "bibtex"})
|
||||||
|
request = urllib.request.Request(
|
||||||
|
f"{LOCAL_ZOTERO}/items?{qs}",
|
||||||
|
headers={"Zotero-API-Version": "3"},
|
||||||
|
)
|
||||||
|
with urllib.request.urlopen(request, timeout=20) as response:
|
||||||
|
return response.read().decode("utf-8", errors="replace").strip()
|
||||||
|
|
||||||
|
|
||||||
|
def extract_pdf_text(path: Path, max_chars: int) -> str:
|
||||||
|
try:
|
||||||
|
path_exists = path.exists()
|
||||||
|
except OSError:
|
||||||
|
return ""
|
||||||
|
if not path_exists:
|
||||||
|
return ""
|
||||||
|
max_pages = 8
|
||||||
|
if importlib.util.find_spec("fitz"):
|
||||||
|
import fitz # type: ignore
|
||||||
|
|
||||||
|
chunks = []
|
||||||
|
with fitz.open(str(path)) as doc:
|
||||||
|
for page in doc[:max_pages]:
|
||||||
|
chunks.append(page.get_text("text"))
|
||||||
|
if sum(len(chunk) for chunk in chunks) >= max_chars:
|
||||||
|
break
|
||||||
|
return "\n".join(chunks)[:max_chars]
|
||||||
|
if importlib.util.find_spec("pypdf"):
|
||||||
|
from pypdf import PdfReader # type: ignore
|
||||||
|
|
||||||
|
reader = PdfReader(str(path))
|
||||||
|
chunks = []
|
||||||
|
for page in reader.pages[:max_pages]:
|
||||||
|
chunks.append(page.extract_text() or "")
|
||||||
|
if sum(len(chunk) for chunk in chunks) >= max_chars:
|
||||||
|
break
|
||||||
|
return "\n".join(chunks)[:max_chars]
|
||||||
|
if importlib.util.find_spec("PyPDF2"):
|
||||||
|
from PyPDF2 import PdfReader # type: ignore
|
||||||
|
|
||||||
|
reader = PdfReader(str(path))
|
||||||
|
chunks = []
|
||||||
|
for page in reader.pages[:max_pages]:
|
||||||
|
chunks.append(page.extract_text() or "")
|
||||||
|
if sum(len(chunk) for chunk in chunks) >= max_chars:
|
||||||
|
break
|
||||||
|
return "\n".join(chunks)[:max_chars]
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def local_fulltext(parent_key: str, max_chars: int) -> str:
|
||||||
|
children = zotero_local(f"/items/{urllib.parse.quote(parent_key)}/children")
|
||||||
|
parts: list[str] = []
|
||||||
|
for child in children or []:
|
||||||
|
data = child.get("data") or {}
|
||||||
|
if data.get("itemType") != "attachment":
|
||||||
|
continue
|
||||||
|
key = child.get("key")
|
||||||
|
if not key:
|
||||||
|
continue
|
||||||
|
fulltext = zotero_local_optional(f"/items/{urllib.parse.quote(key)}/fulltext")
|
||||||
|
if not fulltext:
|
||||||
|
continue
|
||||||
|
content = fulltext.get("content") if isinstance(fulltext, dict) else ""
|
||||||
|
if content:
|
||||||
|
parts.append(content)
|
||||||
|
if sum(len(p) for p in parts) >= max_chars:
|
||||||
|
break
|
||||||
|
if not parts:
|
||||||
|
for child in children or []:
|
||||||
|
data = child.get("data") or {}
|
||||||
|
if data.get("itemType") != "attachment" or data.get("contentType") != "application/pdf":
|
||||||
|
continue
|
||||||
|
path = data.get("path")
|
||||||
|
if not path:
|
||||||
|
key = child.get("key")
|
||||||
|
if key:
|
||||||
|
full = zotero_local_optional(f"/items/{urllib.parse.quote(key)}")
|
||||||
|
path = ((full or {}).get("data") or {}).get("path")
|
||||||
|
if not path:
|
||||||
|
continue
|
||||||
|
extracted = extract_pdf_text(Path(path), max_chars=max_chars)
|
||||||
|
if extracted:
|
||||||
|
parts.append(extracted)
|
||||||
|
break
|
||||||
|
return "\n\n".join(parts)[:max_chars]
|
||||||
|
|
||||||
|
|
||||||
|
def creators_text(creators: list[dict[str, Any]]) -> str:
|
||||||
|
names = []
|
||||||
|
for creator in creators:
|
||||||
|
name = creator.get("name")
|
||||||
|
if not name:
|
||||||
|
name = " ".join(x for x in [creator.get("firstName"), creator.get("lastName")] if x)
|
||||||
|
if name:
|
||||||
|
names.append(name)
|
||||||
|
return "; ".join(names)
|
||||||
|
|
||||||
|
|
||||||
|
def year_from_date(raw: str | None) -> str:
|
||||||
|
if not raw:
|
||||||
|
return ""
|
||||||
|
match = re.search(r"\b(19|20)\d{2}\b", raw)
|
||||||
|
return match.group(0) if match else raw[:4]
|
||||||
|
|
||||||
|
|
||||||
|
def build_source_record(item: dict[str, Any], fulltext_chars: int) -> dict[str, Any]:
|
||||||
|
data = item.get("data") or {}
|
||||||
|
key = item.get("key")
|
||||||
|
return {
|
||||||
|
"zoteroKey": key,
|
||||||
|
"title": data.get("title"),
|
||||||
|
"itemType": data.get("itemType"),
|
||||||
|
"year": year_from_date(data.get("date")),
|
||||||
|
"publicationTitle": data.get("publicationTitle"),
|
||||||
|
"authors": creators_text(data.get("creators") or []),
|
||||||
|
"DOI": data.get("DOI"),
|
||||||
|
"url": data.get("url"),
|
||||||
|
"abstractNote": data.get("abstractNote"),
|
||||||
|
"bibtex": export_bibtex(key) if key else "",
|
||||||
|
"indexedFullTextExcerpt": local_fulltext(key, fulltext_chars) if key else "",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def build_prompt(records: list[dict[str, Any]], columns: list[str]) -> str:
|
||||||
|
instructions = [
|
||||||
|
"你是材料与化学方向的文献整理助手。",
|
||||||
|
"请根据给定文献信息输出一个 Markdown 表格。",
|
||||||
|
"只输出表格,不要输出任何解释、标题、项目符号或代码块。",
|
||||||
|
f"表头必须严格使用以下列,并保持顺序:{' | '.join(columns)}。",
|
||||||
|
"每篇文献占一行,不要漏项。",
|
||||||
|
"信息不足时填写 - 。",
|
||||||
|
"请使用简体中文概括,保持内容紧凑,单元格内避免超过两句话。",
|
||||||
|
"如果文献明显不是材料方向,也照样总结,但保持客观。",
|
||||||
|
]
|
||||||
|
payload = {"columns": columns, "papers": records}
|
||||||
|
return "\n\n".join(instructions + [json.dumps(payload, ensure_ascii=False, indent=2)])
|
||||||
|
|
||||||
|
|
||||||
|
def call_llm(prompt: str) -> str:
|
||||||
|
api_key = os.environ.get("AWESOMEGPT_API_KEY")
|
||||||
|
base_url = (os.environ.get("AWESOMEGPT_BASE_URL") or "").rstrip("/")
|
||||||
|
model = os.environ.get("AWESOMEGPT_MODEL")
|
||||||
|
if not api_key or not base_url or not model:
|
||||||
|
fail("AWESOMEGPT_API_KEY, AWESOMEGPT_BASE_URL, and AWESOMEGPT_MODEL are required")
|
||||||
|
if not base_url.endswith("/v1"):
|
||||||
|
base_url = base_url + "/v1"
|
||||||
|
payload = {
|
||||||
|
"model": model,
|
||||||
|
"messages": [
|
||||||
|
{
|
||||||
|
"role": "system",
|
||||||
|
"content": "You summarize scientific papers into compact Simplified Chinese Markdown tables only.",
|
||||||
|
},
|
||||||
|
{"role": "user", "content": prompt},
|
||||||
|
],
|
||||||
|
"temperature": 0.2,
|
||||||
|
}
|
||||||
|
response = http_json(
|
||||||
|
base_url + "/chat/completions",
|
||||||
|
method="POST",
|
||||||
|
headers={"Authorization": f"Bearer {api_key}"},
|
||||||
|
payload=payload,
|
||||||
|
timeout=240,
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
return response["choices"][0]["message"]["content"].strip()
|
||||||
|
except Exception as exc:
|
||||||
|
fail(f"unexpected LLM response shape: {exc}; response={response}")
|
||||||
|
|
||||||
|
|
||||||
|
def split_batches(items: list[dict[str, Any]], batch_size: int) -> list[list[dict[str, Any]]]:
|
||||||
|
if batch_size <= 0:
|
||||||
|
return [items]
|
||||||
|
return [items[i:i + batch_size] for i in range(0, len(items), batch_size)]
|
||||||
|
|
||||||
|
|
||||||
|
def normalize_table(markdown: str, keep_header: bool) -> list[str]:
|
||||||
|
lines = [line.rstrip() for line in markdown.splitlines() if line.strip().startswith("|")]
|
||||||
|
if not lines:
|
||||||
|
fail("LLM did not return a Markdown table")
|
||||||
|
if keep_header:
|
||||||
|
return lines
|
||||||
|
if len(lines) >= 3:
|
||||||
|
return lines[2:]
|
||||||
|
return lines
|
||||||
|
|
||||||
|
|
||||||
|
def write_output(table: str, out_path: Path | None) -> None:
|
||||||
|
if out_path is None:
|
||||||
|
print(table)
|
||||||
|
return
|
||||||
|
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
out_path.write_text(table + "\n", encoding="utf-8")
|
||||||
|
print(str(out_path))
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("--item-key", action="append", default=[], help="Zotero top-level item key; can be repeated")
|
||||||
|
parser.add_argument("--item-keys", help="Comma/space separated Zotero top-level item keys")
|
||||||
|
parser.add_argument("--query", help="Search query; top-level matches are used")
|
||||||
|
parser.add_argument("--all", action="store_true", help="Process all top-level Zotero items")
|
||||||
|
parser.add_argument("--limit", type=int, default=5, help="Maximum number of items to process; 0 means no limit")
|
||||||
|
parser.add_argument("--batch-size", type=int, default=5, help="Number of papers per LLM call")
|
||||||
|
parser.add_argument("--fulltext-chars", type=int, default=4000)
|
||||||
|
parser.add_argument("--columns", help="Comma-separated Markdown table columns")
|
||||||
|
parser.add_argument("--vault", default=str(DEFAULT_VAULT), help="Obsidian vault containing optional .env")
|
||||||
|
parser.add_argument("--env-file", help="Optional .env path; defaults to <vault>/.env")
|
||||||
|
parser.add_argument("--out", help="Optional output Markdown file path")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
vault = Path(args.vault).expanduser().resolve()
|
||||||
|
load_dotenv(Path(args.env_file).expanduser().resolve() if args.env_file else vault / ".env")
|
||||||
|
load_awesomegpt_prefs()
|
||||||
|
|
||||||
|
keys = list(args.item_key)
|
||||||
|
if args.item_keys:
|
||||||
|
keys.extend([key for key in re.split(r"[\s,]+", args.item_keys.strip()) if key])
|
||||||
|
limit = args.limit if args.limit != 0 else 0
|
||||||
|
if args.all:
|
||||||
|
items = all_top_items(limit)
|
||||||
|
else:
|
||||||
|
query_limit = limit or 100
|
||||||
|
items = find_items(keys, args.query, query_limit)
|
||||||
|
if not items:
|
||||||
|
fail("no Zotero items selected")
|
||||||
|
|
||||||
|
columns = [col.strip() for col in (args.columns.split(",") if args.columns else DEFAULT_COLUMNS) if col.strip()]
|
||||||
|
batches = split_batches(items, args.batch_size)
|
||||||
|
|
||||||
|
output_lines: list[str] = []
|
||||||
|
for index, batch in enumerate(batches, 1):
|
||||||
|
print(f"[{index}/{len(batches)}] summarizing {len(batch)} items", file=sys.stderr)
|
||||||
|
records = [build_source_record(item, args.fulltext_chars) for item in batch]
|
||||||
|
table = call_llm(build_prompt(records, columns))
|
||||||
|
output_lines.extend(normalize_table(table, keep_header=index == 1))
|
||||||
|
|
||||||
|
write_output("\n".join(output_lines), Path(args.out).expanduser().resolve() if args.out else None)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
Loading…
Reference in New Issue