Initialize qqzot skill

2026-06-09 11:01:32 +08:00 · 2026-06-09 11:01:32 +08:00 · f78d22f1e5
commit f78d22f1e5
5 changed files with 1389 additions and 0 deletions
--- a/SKILL.md
+++ b/SKILL.md
@ -0,0 +1,126 @@
 ---
 name: qqzot
 description: Manage the user's Zotero literature database from Codex: search local Zotero items, generate and audit Zotero child-note AI literature notes, summarize selected Zotero papers into Markdown tables, export citation metadata, and provide Zotero-side inputs to QQnote, qqcites, and qqsci. Use when the user asks about Zotero item keys, Zotero child notes, Zotero Local API, missing AI notes, Zotero paper tables, Zotero metadata, or library-side citation inputs. This skill does not own Obsidian vault maintenance or manuscript writing.
 ---
 # QQzot
 Use this skill for Zotero-side literature operations. It owns Zotero item lookup,
 metadata extraction, child-note generation, generated-note audits, Zotero paper
 tables, and Zotero inputs for the user's QQ literature workflow.
 Boundary:
 - `qqzot`: Zotero library, Zotero item keys, Zotero child notes, Zotero metadata,
  local Zotero API, citation metadata exports, and Zotero-derived paper tables.
 - `QQnote-skill`: Obsidian literature notes, vault organization, Markdown note
  cleanup, Dataview dashboards, and Obsidian presentation of literature notes.
 - `qqcites`: sentence/claim-to-reference ranking, support grading, duplicate
  citation control, and citation verification.
 - `qqsci`: manuscript writing, scientific logic, section structure, novelty,
  claim-evidence checks, and reviewer-risk audits.
 ## Requirements
 - Zotero Desktop must be open.
 - Zotero Local API must be available at `http://127.0.0.1:23119`.
 - Default Obsidian vault, only when a script needs templates or an output path:
  `C:\Users\qyh15\Documents\Obsidian Vault`.
 - `ZOTERO_API_KEY` is required only for Zotero Web API writes.
 - LLM settings come from `AWESOMEGPT_API_KEY`, `AWESOMEGPT_BASE_URL`,
  `AWESOMEGPT_MODEL`, or AwesomeGPT preferences in the default Zotero profile.
 Never store API keys in skill files, vault helpers, Git commits, zip files,
 terminal output, or chat. If keys appear in logs or chat, tell the user to
 rotate them.
 ## Workflow Decision
 Use the smallest Zotero operation that answers the request:
 1. Need AI child notes for Zotero items: run `generate_zotero_ai_note.py`.
 2. Need to know which Zotero items lack generated notes: run
   `audit_zotero_ai_notes.py`.
 3. Need a comparison table for selected Zotero papers: run
   `summarize_zotero_table.py`.
 4. Need manuscript citations for a sentence or claim: hand candidates to
   `qqcites` after Zotero retrieval.
 5. Need Obsidian note organization, cleanup, or Dataview: hand off to
   `QQnote-skill`.
 ## Generate Zotero AI Child Notes
 State clearly before live writes that Zotero child notes will be created.
 Use `--dry-run` for first-time validation or template changes.
 Single item:
 ```powershell
 py "$env:USERPROFILE\.codex\skills\qqzot\scripts\generate_zotero_ai_note.py" --vault "C:\Users\qyh15\Documents\Obsidian Vault" --item-key SXAIQUJT --skip-existing
 ```
 Multiple items:
 ```powershell
 py "$env:USERPROFILE\.codex\skills\qqzot\scripts\generate_zotero_ai_note.py" --vault "C:\Users\qyh15\Documents\Obsidian Vault" --item-keys "SXAIQUJT X7GJZ627 ZCZXGRAM" --limit 0 --skip-existing --fulltext-chars 4000
 ```
 Whole library:
 Use only after the user explicitly approves library-wide writes.
 ```powershell
 py "$env:USERPROFILE\.codex\skills\qqzot\scripts\generate_zotero_ai_note.py" --vault "C:\Users\qyh15\Documents\Obsidian Vault" --all --limit 0 --skip-existing --fulltext-chars 4000
 ```
 For long runs, prefer 20-30 item batches. If a run times out, rerun with
 `--skip-existing`; duplicate detection uses the Zotero item link inside child
 notes.
 ## Audit Missing Generated Notes
 This is deterministic library comparison. Do not use DeepSeek or another LLM to
 decide whether notes are missing.
 ```powershell
 py "$env:USERPROFILE\.codex\skills\qqzot\scripts\audit_zotero_ai_notes.py" --vault "C:\Users\qyh15\Documents\Obsidian Vault" --rebuild
 py "$env:USERPROFILE\.codex\skills\qqzot\scripts\audit_zotero_ai_notes.py" --vault "C:\Users\qyh15\Documents\Obsidian Vault" --keys-only
 ```
 The cache lives at:
 ```text
 <vault>\00 Templater\.zotero-ai-notes-index.json
 ```
 Use `--refresh` or `--rebuild` when notes were edited outside this workflow.
 ## Summarize Zotero Papers Into A Table
 Use this for one paper or a batch of papers when the user wants a compact
 comparison table for Obsidian, Word, qqcites, or qqsci.
 ```powershell
 py "$env:USERPROFILE\.codex\skills\qqzot\scripts\summarize_zotero_table.py" --vault "C:\Users\qyh15\Documents\Obsidian Vault" --item-keys "SXAIQUJT X7GJZ627 ZCZXGRAM" --batch-size 3 --out "C:\Users\qyh15\Documents\Obsidian Vault\99 项目\文献对比表.md"
 ```
 Rules:
 - This workflow reads Zotero Local API plus LLM config, but does not create
  Zotero child notes.
 - Prefer 3-10 papers per batch to keep tables accurate and readable.
 - Use custom columns when the user asks for a specific comparison axis.
 - If `--out` is omitted, print the Markdown table to stdout.
 ## Operating Rules
 - Prefer Zotero item keys, DOI, title, first author, and year as metadata anchors.
 - Do not delete duplicate Zotero notes unless the user explicitly requests
  cleanup.
 - Do not add visible machine markers to note bodies.
 - Reserve DeepSeek for note content generation or summarization, not
  deterministic library audits.
 - If PDF full text is unavailable, fall back to metadata, abstract, BibTeX, and
  optional local PDF extraction if a Python PDF library exists.
 - If PowerShell output has Unicode issues, set `PYTHONIOENCODING=utf-8` or rely
  on the scripts' UTF-8 handling.
--- a/agents/openai.yaml
+++ b/agents/openai.yaml
@ -0,0 +1,4 @@
 interface:
  display_name: "QQzot"
  short_description: "Zotero library notes and metadata workflow"
  default_prompt: "Use $qqzot to search Zotero, generate AI child notes, audit missing notes, or summarize selected Zotero papers."
--- a/scripts/audit_zotero_ai_notes.py
+++ b/scripts/audit_zotero_ai_notes.py
@ -0,0 +1,180 @@
 #!/usr/bin/env python3
 """Audit which Zotero top-level items have generated AI child notes.
 This script is deterministic and does not call any LLM.
 It can build a local cache for fast repeated missing-note checks.
 """
 from __future__ import annotations
 import argparse
 import json
 import re
 import sys
 import urllib.parse
 import urllib.request
 from datetime import datetime, timezone
 from pathlib import Path
 from typing import Any
 try:
    sys.stdout.reconfigure(encoding="utf-8")
    sys.stderr.reconfigure(encoding="utf-8")
 except Exception:
    pass
 LOCAL_ZOTERO = "http://127.0.0.1:23119/api/users/0"
 def fail(message: str) -> None:
    print(f"error: {message}", file=sys.stderr)
    raise SystemExit(1)
 def zotero_get(path: str) -> Any:
    req = urllib.request.Request(LOCAL_ZOTERO + path, headers={"Zotero-API-Version": "3"})
    with urllib.request.urlopen(req, timeout=30) as response:
        return json.loads(response.read().decode("utf-8", errors="replace"))
 def all_top_items() -> list[dict[str, Any]]:
    items: list[dict[str, Any]] = []
    start = 0
    limit = 100
    while True:
        page = zotero_get("/items/top?" + urllib.parse.urlencode({"limit": limit, "start": start}))
        if not page:
            break
        items.extend(page)
        if len(page) < limit:
            break
        start += limit
    return items
 def item_summary(item: dict[str, Any]) -> dict[str, Any]:
    data = item.get("data") or {}
    return {
        "key": item.get("key"),
        "title": data.get("title"),
        "itemType": data.get("itemType"),
        "version": item.get("version") or data.get("version"),
        "dateModified": data.get("dateModified"),
    }
 def note_is_generated_for_parent(note_html: str, parent_key: str) -> bool:
    return (
        f"items/{parent_key}" in note_html
        or "AI Literature Note" in note_html
        or bool(re.search(r"<h1>[^<]*📝", note_html))
    )
 def scan_parent(parent_key: str) -> dict[str, Any]:
    children = zotero_get(f"/items/{urllib.parse.quote(parent_key)}/children")
    generated_notes: list[str] = []
    child_note_count = 0
    for child in children or []:
        data = child.get("data") or {}
        if data.get("itemType") != "note":
            continue
        child_note_count += 1
        note_key = child.get("key")
        if not note_key:
            continue
        full = zotero_get(f"/items/{urllib.parse.quote(note_key)}")
        note_html = ((full.get("data") or {}).get("note") or "")
        if note_is_generated_for_parent(note_html, parent_key):
            generated_notes.append(note_key)
    return {
        "hasGeneratedNote": bool(generated_notes),
        "generatedNoteKeys": generated_notes,
        "childNoteCount": child_note_count,
    }
 def load_cache(path: Path) -> dict[str, Any]:
    if not path.exists():
        return {"items": {}}
    try:
        return json.loads(path.read_text(encoding="utf-8"))
    except json.JSONDecodeError:
        return {"items": {}}
 def save_cache(path: Path, cache: dict[str, Any]) -> None:
    path.parent.mkdir(parents=True, exist_ok=True)
    cache["updatedAt"] = datetime.now(timezone.utc).isoformat()
    path.write_text(json.dumps(cache, ensure_ascii=False, indent=2), encoding="utf-8")
 def audit(vault: Path, *, refresh: bool, rebuild: bool, limit: int | None) -> dict[str, Any]:
    cache_path = vault / "00 Templater" / ".zotero-ai-notes-index.json"
    cache = {"items": {}} if rebuild else load_cache(cache_path)
    cached_items: dict[str, Any] = cache.setdefault("items", {})
    top_items = all_top_items()
    if limit:
        top_items = top_items[:limit]
    current_keys = set()
    scanned = 0
    for item in top_items:
        summary = item_summary(item)
        key = summary.get("key")
        if not key:
            continue
        current_keys.add(key)
        record = cached_items.get(key)
        if rebuild or refresh or not record:
            record = {**summary, **scan_parent(key)}
            cached_items[key] = record
            scanned += 1
        else:
            record.update(summary)
    for key in list(cached_items):
        if key not in current_keys:
            cached_items[key]["deletedOrNotTopLevel"] = True
    save_cache(cache_path, cache)
    active_records = [
        record for key, record in cached_items.items()
        if key in current_keys and not record.get("deletedOrNotTopLevel")
    ]
    missing = [record for record in active_records if not record.get("hasGeneratedNote")]
    duplicates = [
        record for record in active_records
        if len(record.get("generatedNoteKeys") or []) > 1
    ]
    return {
        "cache": str(cache_path),
        "total": len(active_records),
        "withGeneratedNote": len(active_records) - len(missing),
        "missingCount": len(missing),
        "duplicateGeneratedNoteItems": len(duplicates),
        "scannedParentsThisRun": scanned,
        "missing": missing,
        "duplicates": duplicates,
    }
 def main() -> None:
    parser = argparse.ArgumentParser()
    parser.add_argument("--vault", default=str(Path.cwd()), help="Obsidian vault path")
    parser.add_argument("--refresh", action="store_true", help="Rescan all current top-level parents")
    parser.add_argument("--rebuild", action="store_true", help="Discard cache and rescan all current top-level parents")
    parser.add_argument("--limit", type=int, help="Only audit the first N top-level items")
    parser.add_argument("--keys-only", action="store_true", help="Print only missing item keys")
    args = parser.parse_args()
    result = audit(Path(args.vault).expanduser().resolve(), refresh=args.refresh, rebuild=args.rebuild, limit=args.limit)
    if args.keys_only:
        print(" ".join(record["key"] for record in result["missing"] if record.get("key")))
    else:
        print(json.dumps(result, ensure_ascii=False, indent=2))
 if __name__ == "__main__":
    main()
--- a/scripts/generate_zotero_ai_note.py
+++ b/scripts/generate_zotero_ai_note.py
@ -0,0 +1,586 @@
 #!/usr/bin/env python3
 """
 Generate an AI literature note from Zotero metadata and save it as a Zotero child note.
 Required environment variables:
  AWESOMEGPT_API_KEY       DeepSeek/OpenAI-compatible API key
  AWESOMEGPT_BASE_URL      Example: https://api.deepseek.com
  AWESOMEGPT_MODEL         Example: deepseek-v4-pro
  ZOTERO_API_KEY           Zotero Web API key with library write permission
 Optional:
  ZOTERO_USER_ID           If omitted, resolved from /keys/current
 """
 from __future__ import annotations
 import argparse
 import html
 import importlib.util
 import json
 import os
 import re
 import sys
 import urllib.error
 import urllib.parse
 import urllib.request
 from pathlib import Path
 from typing import Any
 try:
    sys.stdout.reconfigure(encoding="utf-8")
    sys.stderr.reconfigure(encoding="utf-8")
 except Exception:
    pass
 LOCAL_ZOTERO = "http://127.0.0.1:23119/api/users/0"
 ZOTERO_WEB = "https://api.zotero.org"
 DEFAULT_VAULT = Path.cwd()
 def fail(message: str) -> None:
    print(f"error: {message}", file=sys.stderr)
    raise SystemExit(1)
 def load_dotenv(path: Path) -> None:
    if not path.exists():
        return
    for raw_line in path.read_text(encoding="utf-8").splitlines():
        line = raw_line.strip()
        if not line or line.startswith("#") or "=" not in line:
            continue
        key, value = line.split("=", 1)
        key = key.strip()
        value = value.strip().strip('"').strip("'")
        os.environ.setdefault(key, value)
 def zotero_profile_prefs() -> Path | None:
    profiles_ini = Path.home() / "AppData/Roaming/Zotero/Zotero/profiles.ini"
    profiles_root = profiles_ini.parent
    try:
        profiles_ini_exists = profiles_ini.exists()
    except OSError:
        return None
    if profiles_ini_exists:
        try:
            text = profiles_ini.read_text(encoding="utf-8", errors="replace")
        except OSError:
            return None
        blocks = re.split(r"\n(?=\[Profile\d+\])", text)
        for block in blocks:
            if "Default=1" not in block:
                continue
            path_match = re.search(r"^Path=(.+)$", block, re.MULTILINE)
            relative_match = re.search(r"^IsRelative=(\d+)$", block, re.MULTILINE)
            if path_match:
                profile_path = Path(path_match.group(1).strip())
                if relative_match and relative_match.group(1) == "1":
                    profile_path = profiles_root / profile_path
                prefs = profile_path / "prefs.js"
                if prefs.exists():
                    return prefs
    profiles_dir = profiles_root / "Profiles"
    try:
        profiles_dir_exists = profiles_dir.exists()
    except OSError:
        return None
    if profiles_dir_exists:
        try:
            for prefs in profiles_dir.glob("*/prefs.js"):
                return prefs
        except OSError:
            return None
    return None
 def load_awesomegpt_prefs(path: Path | None = None) -> None:
    if path is None:
        path = zotero_profile_prefs()
    if path is None:
        return
    try:
        path_exists = path.exists()
    except OSError:
        return
    if not path_exists:
        return
    try:
        text = path.read_text(encoding="utf-8", errors="replace")
    except OSError:
        return
    prefs: dict[str, Any] = {}
    for name, raw_value in re.findall(r'user_pref\("([^"]+)",\s*(.*?)\);', text):
        if not name.startswith("extensions.zotero.zoterogpt."):
            continue
        try:
            prefs[name] = json.loads(raw_value)
        except json.JSONDecodeError:
            continue
    settings_raw = prefs.get("extensions.zotero.zoterogpt.settings")
    if isinstance(settings_raw, str):
        try:
            settings = json.loads(settings_raw)
        except json.JSONDecodeError:
            settings = {}
    else:
        settings = {}
    direct_api = prefs.get("extensions.zotero.zoterogpt.api")
    direct_model = prefs.get("extensions.zotero.zoterogpt.model")
    direct_key = prefs.get("extensions.zotero.zoterogpt.secretKey")
    provider = None
    if isinstance(settings, dict):
        provider = settings.get("DeepSeek") or next(
            (value for key, value in settings.items() if key.lower() == "deepseek"),
            None,
        )
    if isinstance(provider, dict):
        os.environ.setdefault("AWESOMEGPT_BASE_URL", provider.get("api") or "")
        os.environ.setdefault("AWESOMEGPT_MODEL", provider.get("model") or "")
        os.environ.setdefault("AWESOMEGPT_API_KEY", provider.get("secretKey") or "")
    if direct_api:
        os.environ.setdefault("AWESOMEGPT_BASE_URL", str(direct_api))
    if direct_model:
        os.environ.setdefault("AWESOMEGPT_MODEL", str(direct_model))
    if direct_key:
        os.environ.setdefault("AWESOMEGPT_API_KEY", str(direct_key))
 def http_json(
    url: str,
    *,
    method: str = "GET",
    headers: dict[str, str] | None = None,
    payload: Any = None,
    timeout: int = 90,
 ) -> Any:
    body = None
    req_headers = dict(headers or {})
    if payload is not None:
        body = json.dumps(payload, ensure_ascii=False).encode("utf-8")
        req_headers.setdefault("Content-Type", "application/json")
    request = urllib.request.Request(url, data=body, method=method, headers=req_headers)
    try:
        with urllib.request.urlopen(request, timeout=timeout) as response:
            text = response.read().decode("utf-8", errors="replace")
            if not text:
                return None
            return json.loads(text)
    except urllib.error.HTTPError as exc:
        detail = exc.read().decode("utf-8", errors="replace")
        fail(f"{method} {url} failed: HTTP {exc.code}: {detail[:800]}")
    except urllib.error.URLError as exc:
        fail(f"{method} {url} failed: {exc}")
 def zotero_local(path: str) -> Any:
    url = LOCAL_ZOTERO + path
    return http_json(url, headers={"Zotero-API-Version": "3"}, timeout=20)
 def zotero_local_optional(path: str) -> Any | None:
    url = LOCAL_ZOTERO + path
    request = urllib.request.Request(url, headers={"Zotero-API-Version": "3"})
    try:
        with urllib.request.urlopen(request, timeout=20) as response:
            text = response.read().decode("utf-8", errors="replace")
            return json.loads(text) if text else None
    except (urllib.error.HTTPError, urllib.error.URLError, json.JSONDecodeError):
        return None
 def zotero_web(path: str, *, method: str = "GET", payload: Any = None) -> Any:
    api_key = os.environ.get("ZOTERO_API_KEY")
    if not api_key:
        fail("ZOTERO_API_KEY is required to write Zotero child notes")
    url = ZOTERO_WEB + path
    return http_json(
        url,
        method=method,
        headers={"Zotero-API-Version": "3", "Zotero-API-Key": api_key},
        payload=payload,
        timeout=60,
    )
 def resolve_user_id() -> str:
    explicit = os.environ.get("ZOTERO_USER_ID")
    if explicit:
        return explicit
    current = zotero_web("/keys/current")
    user_id = current.get("userID") if isinstance(current, dict) else None
    if not user_id:
        fail("could not resolve Zotero userID from /keys/current")
    return str(user_id)
 def find_item(item_key: str | None, query: str | None) -> dict[str, Any]:
    if item_key:
        return zotero_local(f"/items/{urllib.parse.quote(item_key)}")
    if not query:
        fail("provide --item-key or --query")
    qs = urllib.parse.urlencode({"q": query, "limit": 5})
    matches = zotero_local(f"/items/top?{qs}")
    if not matches:
        fail(f"no Zotero item matched query: {query}")
    if len(matches) > 1:
        print(f"warning: {len(matches)} matches; using {matches[0].get('key')}", file=sys.stderr)
    return matches[0]
 def find_items(keys: list[str], query: str | None, limit: int) -> list[dict[str, Any]]:
    items: list[dict[str, Any]] = []
    for key in keys:
        items.append(find_item(key, None))
    if query:
        qs = urllib.parse.urlencode({"q": query, "limit": limit})
        matches = zotero_local(f"/items/top?{qs}")
        seen = {item.get("key") for item in items}
        for item in matches or []:
            if item.get("key") not in seen:
                items.append(item)
                seen.add(item.get("key"))
    if not items:
        fail("provide --item-key/--item-keys or --query")
    return items[:limit] if limit else items
 def all_top_items(limit: int = 0) -> list[dict[str, Any]]:
    items: list[dict[str, Any]] = []
    start = 0
    page_limit = 100
    while True:
        qs = urllib.parse.urlencode({"limit": page_limit, "start": start})
        page = zotero_local(f"/items/top?{qs}")
        if not page:
            break
        items.extend(page)
        if limit and len(items) >= limit:
            return items[:limit]
        if len(page) < page_limit:
            break
        start += page_limit
    return items
 def has_existing_ai_note(parent_key: str) -> bool:
    children = zotero_local(f"/items/{urllib.parse.quote(parent_key)}/children")
    for child in children or []:
        data = child.get("data") or {}
        if data.get("itemType") != "note":
            continue
        note = data.get("note") or ""
        if not note and child.get("key"):
            full_note = zotero_local_optional(f"/items/{urllib.parse.quote(child['key'])}")
            note = ((full_note or {}).get("data") or {}).get("note") or ""
        if "AI文献笔记" in note or "AI Literature Note" in note or f"items/{parent_key}" in note:
            return True
    return False
 def export_bibtex(item_key: str) -> str:
    qs = urllib.parse.urlencode({"itemKey": item_key, "format": "bibtex"})
    url = f"{LOCAL_ZOTERO}/items?{qs}"
    request = urllib.request.Request(url, headers={"Zotero-API-Version": "3"})
    with urllib.request.urlopen(request, timeout=20) as response:
        return response.read().decode("utf-8", errors="replace").strip()
 def local_fulltext(parent_key: str, max_chars: int) -> str:
    children = zotero_local(f"/items/{urllib.parse.quote(parent_key)}/children")
    parts: list[str] = []
    for child in children or []:
        data = child.get("data") or {}
        if data.get("itemType") != "attachment":
            continue
        key = child.get("key")
        if not key:
            continue
        fulltext = zotero_local_optional(f"/items/{urllib.parse.quote(key)}/fulltext")
        if not fulltext:
            continue
        content = fulltext.get("content") if isinstance(fulltext, dict) else ""
        if content:
            parts.append(content)
        if sum(len(p) for p in parts) >= max_chars:
            break
    if not parts:
        for child in children or []:
            data = child.get("data") or {}
            if data.get("itemType") != "attachment" or data.get("contentType") != "application/pdf":
                continue
            path = data.get("path")
            if not path:
                key = child.get("key")
                if key:
                    full = zotero_local_optional(f"/items/{urllib.parse.quote(key)}")
                    path = ((full or {}).get("data") or {}).get("path")
            if not path:
                continue
            extracted = extract_pdf_text(Path(path), max_chars=max_chars)
            if extracted:
                parts.append(extracted)
                break
    text = "\n\n".join(parts)
    return text[:max_chars]
 def template_paths(vault: Path) -> tuple[Path, Path, Path]:
    templates = vault / "00 Templater"
    if not templates.exists():
        fail(f"template directory not found: {templates}")
    files = {path.name: path for path in templates.glob("*.md")}
    def match(prefix: str, contains: str) -> Path:
        candidates = [
            path for name, path in files.items()
            if name.startswith(prefix) and contains in name
        ]
        if not candidates:
            candidates = [path for name, path in files.items() if name.startswith(prefix)]
        if not candidates:
            fail(f"missing template file starting with {prefix} in {templates}")
        return candidates[0]
    return (
        match("03", "AI"),
        match("01", ""),
        match("02", ""),
    )
 def extract_pdf_text(path: Path, max_chars: int) -> str:
    try:
        path_exists = path.exists()
    except OSError:
        return ""
    if not path_exists:
        return ""
    max_pages = 8
    if importlib.util.find_spec("fitz"):
        import fitz  # type: ignore
        chunks = []
        with fitz.open(str(path)) as doc:
            for page in doc[:max_pages]:
                chunks.append(page.get_text("text"))
                if sum(len(chunk) for chunk in chunks) >= max_chars:
                    break
        return "\n".join(chunks)[:max_chars]
    if importlib.util.find_spec("pypdf"):
        from pypdf import PdfReader  # type: ignore
        reader = PdfReader(str(path))
        chunks = []
        for page in reader.pages[:max_pages]:
            chunks.append(page.extract_text() or "")
            if sum(len(chunk) for chunk in chunks) >= max_chars:
                break
        return "\n".join(chunks)[:max_chars]
    if importlib.util.find_spec("PyPDF2"):
        from PyPDF2 import PdfReader  # type: ignore
        reader = PdfReader(str(path))
        chunks = []
        for page in reader.pages[:max_pages]:
            chunks.append(page.extract_text() or "")
            if sum(len(chunk) for chunk in chunks) >= max_chars:
                break
        return "\n".join(chunks)[:max_chars]
    return ""
 def creators_text(creators: list[dict[str, Any]]) -> str:
    names = []
    for creator in creators:
        name = creator.get("name")
        if not name:
            name = " ".join(x for x in [creator.get("firstName"), creator.get("lastName")] if x)
        if name:
            names.append(name)
    return "; ".join(names)
 def build_prompt(item: dict[str, Any], bibtex: str, fulltext: str, vault: Path) -> str:
    data = item.get("data") or {}
    prompt_path, research_template_path, review_template_path = template_paths(vault)
    prompt = prompt_path.read_text(encoding="utf-8")
    research_template = research_template_path.read_text(encoding="utf-8")
    review_template = review_template_path.read_text(encoding="utf-8")
    metadata = {
        "zoteroKey": item.get("key"),
        "title": data.get("title"),
        "itemType": data.get("itemType"),
        "authors": creators_text(data.get("creators") or []),
        "publicationTitle": data.get("publicationTitle"),
        "date": data.get("date"),
        "DOI": data.get("DOI"),
        "url": data.get("url"),
        "abstractNote": data.get("abstractNote"),
    }
    source = {
        "metadata": metadata,
        "bibtex": bibtex,
        "indexedFullTextExcerpt": fulltext,
    }
    return "\n\n".join(
        [
            prompt,
            "请先判断文献类型：综述型文献或研究型文献。",
            "若为研究型文献，请严格填充下面的研究型模板：",
            research_template,
            "若为综述型文献，请严格填充下面的综述型模板：",
            review_template,
            "请将模板中的 ${topItem.getField('title')} 替换为真实题名，将 ${topItem.key} 替换为 Zotero key。",
            "只输出最终 Markdown 笔记，不要输出解释、判断过程或代码围栏。",
            "文献材料如下：",
            json.dumps(source, ensure_ascii=False, indent=2),
        ]
    )
 def call_llm(prompt: str) -> str:
    api_key = os.environ.get("AWESOMEGPT_API_KEY")
    base_url = (os.environ.get("AWESOMEGPT_BASE_URL") or "").rstrip("/")
    model = os.environ.get("AWESOMEGPT_MODEL")
    if not api_key or not base_url or not model:
        fail("AWESOMEGPT_API_KEY, AWESOMEGPT_BASE_URL, and AWESOMEGPT_MODEL are required")
    if not base_url.endswith("/v1"):
        base_url = base_url + "/v1"
    payload = {
        "model": model,
        "messages": [
            {"role": "system", "content": "You are a materials science literature-note assistant. Output Simplified Chinese Markdown only."},
            {"role": "user", "content": prompt},
        ],
        "temperature": 0.3,
    }
    response = http_json(
        base_url + "/chat/completions",
        method="POST",
        headers={"Authorization": f"Bearer {api_key}"},
        payload=payload,
        timeout=180,
    )
    try:
        return response["choices"][0]["message"]["content"].strip()
    except Exception as exc:
        fail(f"unexpected LLM response shape: {exc}; response={response}")
 def markdown_to_zotero_html(markdown: str) -> str:
    lines = markdown.strip().splitlines()
    out: list[str] = []
    in_pre = False
    pre_lines: list[str] = []
    for line in lines:
        stripped = line.strip()
        if stripped.startswith("```") or stripped.startswith("~~~"):
            if in_pre:
                out.append("<pre>" + html.escape("\n".join(pre_lines)) + "</pre>")
                pre_lines = []
                in_pre = False
            else:
                in_pre = True
            continue
        if in_pre:
            pre_lines.append(line)
            continue
        if not stripped:
            continue
        heading = re.match(r"^(#{1,6})\s+(.+)$", stripped)
        if heading:
            level = min(len(heading.group(1)), 6)
            out.append(f"<h{level}>{html.escape(heading.group(2))}</h{level}>")
        elif stripped.startswith(">"):
            out.append(f"<blockquote>{html.escape(stripped.lstrip('> ').strip())}</blockquote>")
        elif re.match(r"^[-*]\s+", stripped):
            out.append(f"<p>{html.escape(stripped)}</p>")
        else:
            out.append(f"<p>{html.escape(stripped)}</p>")
    if in_pre:
        out.append("<pre>" + html.escape("\n".join(pre_lines)) + "</pre>")
    return "\n".join(out)
 def create_child_note(user_id: str, parent_key: str, markdown: str, dry_run: bool) -> Any:
    note_html = markdown_to_zotero_html(markdown)
    payload = [{"itemType": "note", "parentItem": parent_key, "note": note_html}]
    if dry_run:
        return {"dryRun": True, "payload": payload}
    return zotero_web(f"/users/{user_id}/items", method="POST", payload=payload)
 def main() -> None:
    parser = argparse.ArgumentParser()
    parser.add_argument("--item-key", action="append", default=[], help="Zotero top-level item key; can be repeated")
    parser.add_argument("--item-keys", help="Comma/space separated Zotero top-level item keys")
    parser.add_argument("--query", help="Search query; first top-level match is used")
    parser.add_argument("--all", action="store_true", help="Process all top-level Zotero items")
    parser.add_argument("--limit", type=int, default=1, help="Maximum number of items to process")
    parser.add_argument("--fulltext-chars", type=int, default=12000)
    parser.add_argument("--skip-existing", action="store_true", help="Skip items that already have a generated AI child note")
    parser.add_argument("--dry-run", action="store_true", help="generate but do not write Zotero note")
    parser.add_argument("--vault", default=str(DEFAULT_VAULT), help="Obsidian vault containing 00 Templater")
    parser.add_argument("--env-file", help="Optional .env path; defaults to <vault>/.env")
    args = parser.parse_args()
    vault = Path(args.vault).expanduser().resolve()
    load_dotenv(Path(args.env_file).expanduser().resolve() if args.env_file else vault / ".env")
    load_awesomegpt_prefs()
    keys = list(args.item_key)
    if args.item_keys:
        keys.extend([key for key in re.split(r"[\s,]+", args.item_keys.strip()) if key])
    if args.all:
        items = all_top_items(args.limit if args.limit != 1 else 0)
    else:
        items = find_items(keys, args.query, args.limit)
    user_id = resolve_user_id()
    results = []
    for index, item in enumerate(items, 1):
        try:
            key = item.get("key")
            title = (item.get("data") or {}).get("title")
            if not key:
                results.append({"index": index, "status": "skipped", "reason": "missing key", "title": title})
                continue
            if args.skip_existing and has_existing_ai_note(key):
                results.append({"index": index, "itemKey": key, "title": title, "status": "skipped", "reason": "existing AI note"})
                continue
            print(f"[{index}/{len(items)}] generating {key}: {title}", file=sys.stderr)
            bibtex = export_bibtex(key)
            fulltext = local_fulltext(key, args.fulltext_chars)
            prompt = build_prompt(item, bibtex, fulltext, vault)
            markdown = call_llm(prompt)
            result = create_child_note(user_id, key, markdown, args.dry_run)
            results.append({"index": index, "itemKey": key, "title": title, "status": "ok", "result": result})
        except SystemExit as exc:
            results.append({
                "index": index,
                "itemKey": item.get("key"),
                "title": (item.get("data") or {}).get("title"),
                "status": "error",
                "error": str(exc),
            })
            print(f"[{index}/{len(items)}] error; continuing", file=sys.stderr)
            continue
        except Exception as exc:
            results.append({
                "index": index,
                "itemKey": item.get("key"),
                "title": (item.get("data") or {}).get("title"),
                "status": "error",
                "error": repr(exc),
            })
            print(f"[{index}/{len(items)}] error; continuing: {exc}", file=sys.stderr)
            continue
    print(json.dumps(results, ensure_ascii=False, indent=2))
 if __name__ == "__main__":
    main()
--- a/scripts/summarize_zotero_table.py
+++ b/scripts/summarize_zotero_table.py
@ -0,0 +1,493 @@
 #!/usr/bin/env python3
 """
 Summarize Zotero items into a Markdown table using an OpenAI-compatible LLM.
 Required environment variables:
  AWESOMEGPT_API_KEY       DeepSeek/OpenAI-compatible API key
  AWESOMEGPT_BASE_URL      Example: https://api.deepseek.com
  AWESOMEGPT_MODEL         Example: deepseek-v4-pro
 Optional:
  Load from <vault>/.env or AwesomeGPT preferences in the default Zotero profile.
 """
 from __future__ import annotations
 import argparse
 import importlib.util
 import json
 import os
 import re
 import sys
 import urllib.error
 import urllib.parse
 import urllib.request
 from pathlib import Path
 from typing import Any
 try:
    sys.stdout.reconfigure(encoding="utf-8")
    sys.stderr.reconfigure(encoding="utf-8")
 except Exception:
    pass
 LOCAL_ZOTERO = "http://127.0.0.1:23119/api/users/0"
 DEFAULT_VAULT = Path.cwd()
 DEFAULT_COLUMNS = [
    "Zotero Key",
    "标题",
    "类型",
    "年份",
    "期刊/来源",
    "研究对象/材料",
    "关键方法",
    "核心结论",
    "相关启发",
 ]
 def fail(message: str) -> None:
    print(f"error: {message}", file=sys.stderr)
    raise SystemExit(1)
 def load_dotenv(path: Path) -> None:
    if not path.exists():
        return
    for raw_line in path.read_text(encoding="utf-8").splitlines():
        line = raw_line.strip()
        if not line or line.startswith("#") or "=" not in line:
            continue
        key, value = line.split("=", 1)
        os.environ.setdefault(key.strip(), value.strip().strip('"').strip("'"))
 def zotero_profile_prefs() -> Path | None:
    profiles_ini = Path.home() / "AppData/Roaming/Zotero/Zotero/profiles.ini"
    profiles_root = profiles_ini.parent
    try:
        profiles_ini_exists = profiles_ini.exists()
    except OSError:
        return None
    if profiles_ini_exists:
        try:
            text = profiles_ini.read_text(encoding="utf-8", errors="replace")
        except OSError:
            return None
        blocks = re.split(r"\n(?=\[Profile\d+\])", text)
        for block in blocks:
            if "Default=1" not in block:
                continue
            path_match = re.search(r"^Path=(.+)$", block, re.MULTILINE)
            relative_match = re.search(r"^IsRelative=(\d+)$", block, re.MULTILINE)
            if path_match:
                profile_path = Path(path_match.group(1).strip())
                if relative_match and relative_match.group(1) == "1":
                    profile_path = profiles_root / profile_path
                prefs = profile_path / "prefs.js"
                try:
                    if prefs.exists():
                        return prefs
                except OSError:
                    return None
    profiles_dir = profiles_root / "Profiles"
    try:
        profiles_dir_exists = profiles_dir.exists()
    except OSError:
        return None
    if profiles_dir_exists:
        try:
            for prefs in profiles_dir.glob("*/prefs.js"):
                return prefs
        except OSError:
            return None
    return None
 def load_awesomegpt_prefs(path: Path | None = None) -> None:
    if path is None:
        path = zotero_profile_prefs()
    if path is None:
        return
    try:
        path_exists = path.exists()
    except OSError:
        return
    if not path_exists:
        return
    try:
        text = path.read_text(encoding="utf-8", errors="replace")
    except OSError:
        return
    prefs: dict[str, Any] = {}
    for name, raw_value in re.findall(r'user_pref\("([^"]+)",\s*(.*?)\);', text):
        if not name.startswith("extensions.zotero.zoterogpt."):
            continue
        try:
            prefs[name] = json.loads(raw_value)
        except json.JSONDecodeError:
            continue
    settings_raw = prefs.get("extensions.zotero.zoterogpt.settings")
    settings = {}
    if isinstance(settings_raw, str):
        try:
            settings = json.loads(settings_raw)
        except json.JSONDecodeError:
            settings = {}
    direct_api = prefs.get("extensions.zotero.zoterogpt.api")
    direct_model = prefs.get("extensions.zotero.zoterogpt.model")
    direct_key = prefs.get("extensions.zotero.zoterogpt.secretKey")
    provider = None
    if isinstance(settings, dict):
        provider = settings.get("DeepSeek") or next(
            (value for key, value in settings.items() if key.lower() == "deepseek"),
            None,
        )
    if isinstance(provider, dict):
        os.environ.setdefault("AWESOMEGPT_BASE_URL", provider.get("api") or "")
        os.environ.setdefault("AWESOMEGPT_MODEL", provider.get("model") or "")
        os.environ.setdefault("AWESOMEGPT_API_KEY", provider.get("secretKey") or "")
    if direct_api:
        os.environ.setdefault("AWESOMEGPT_BASE_URL", str(direct_api))
    if direct_model:
        os.environ.setdefault("AWESOMEGPT_MODEL", str(direct_model))
    if direct_key:
        os.environ.setdefault("AWESOMEGPT_API_KEY", str(direct_key))
 def http_json(
    url: str,
    *,
    method: str = "GET",
    headers: dict[str, str] | None = None,
    payload: Any = None,
    timeout: int = 90,
 ) -> Any:
    body = None
    req_headers = dict(headers or {})
    if payload is not None:
        body = json.dumps(payload, ensure_ascii=False).encode("utf-8")
        req_headers.setdefault("Content-Type", "application/json")
    request = urllib.request.Request(url, data=body, method=method, headers=req_headers)
    try:
        with urllib.request.urlopen(request, timeout=timeout) as response:
            text = response.read().decode("utf-8", errors="replace")
            if not text:
                return None
            return json.loads(text)
    except urllib.error.HTTPError as exc:
        detail = exc.read().decode("utf-8", errors="replace")
        fail(f"{method} {url} failed: HTTP {exc.code}: {detail[:800]}")
    except urllib.error.URLError as exc:
        fail(f"{method} {url} failed: {exc}")
 def zotero_local(path: str) -> Any:
    return http_json(LOCAL_ZOTERO + path, headers={"Zotero-API-Version": "3"}, timeout=20)
 def zotero_local_optional(path: str) -> Any | None:
    url = LOCAL_ZOTERO + path
    request = urllib.request.Request(url, headers={"Zotero-API-Version": "3"})
    try:
        with urllib.request.urlopen(request, timeout=20) as response:
            text = response.read().decode("utf-8", errors="replace")
            return json.loads(text) if text else None
    except (urllib.error.HTTPError, urllib.error.URLError, json.JSONDecodeError):
        return None
 def find_item(item_key: str | None, query: str | None) -> dict[str, Any]:
    if item_key:
        return zotero_local(f"/items/{urllib.parse.quote(item_key)}")
    if not query:
        fail("provide --item-key or --query")
    qs = urllib.parse.urlencode({"q": query, "limit": 5})
    matches = zotero_local(f"/items/top?{qs}")
    if not matches:
        fail(f"no Zotero item matched query: {query}")
    return matches[0]
 def find_items(keys: list[str], query: str | None, limit: int) -> list[dict[str, Any]]:
    items: list[dict[str, Any]] = []
    for key in keys:
        items.append(find_item(key, None))
    if query:
        qs = urllib.parse.urlencode({"q": query, "limit": limit})
        matches = zotero_local(f"/items/top?{qs}")
        seen = {item.get("key") for item in items}
        for item in matches or []:
            if item.get("key") not in seen:
                items.append(item)
                seen.add(item.get("key"))
    if not items:
        fail("provide --item-key/--item-keys or --query")
    return items[:limit] if limit else items
 def all_top_items(limit: int = 0) -> list[dict[str, Any]]:
    items: list[dict[str, Any]] = []
    start = 0
    page_limit = 100
    while True:
        qs = urllib.parse.urlencode({"limit": page_limit, "start": start})
        page = zotero_local(f"/items/top?{qs}")
        if not page:
            break
        items.extend(page)
        if limit and len(items) >= limit:
            return items[:limit]
        if len(page) < page_limit:
            break
        start += page_limit
    return items
 def export_bibtex(item_key: str) -> str:
    qs = urllib.parse.urlencode({"itemKey": item_key, "format": "bibtex"})
    request = urllib.request.Request(
        f"{LOCAL_ZOTERO}/items?{qs}",
        headers={"Zotero-API-Version": "3"},
    )
    with urllib.request.urlopen(request, timeout=20) as response:
        return response.read().decode("utf-8", errors="replace").strip()
 def extract_pdf_text(path: Path, max_chars: int) -> str:
    try:
        path_exists = path.exists()
    except OSError:
        return ""
    if not path_exists:
        return ""
    max_pages = 8
    if importlib.util.find_spec("fitz"):
        import fitz  # type: ignore
        chunks = []
        with fitz.open(str(path)) as doc:
            for page in doc[:max_pages]:
                chunks.append(page.get_text("text"))
                if sum(len(chunk) for chunk in chunks) >= max_chars:
                    break
        return "\n".join(chunks)[:max_chars]
    if importlib.util.find_spec("pypdf"):
        from pypdf import PdfReader  # type: ignore
        reader = PdfReader(str(path))
        chunks = []
        for page in reader.pages[:max_pages]:
            chunks.append(page.extract_text() or "")
            if sum(len(chunk) for chunk in chunks) >= max_chars:
                break
        return "\n".join(chunks)[:max_chars]
    if importlib.util.find_spec("PyPDF2"):
        from PyPDF2 import PdfReader  # type: ignore
        reader = PdfReader(str(path))
        chunks = []
        for page in reader.pages[:max_pages]:
            chunks.append(page.extract_text() or "")
            if sum(len(chunk) for chunk in chunks) >= max_chars:
                break
        return "\n".join(chunks)[:max_chars]
    return ""
 def local_fulltext(parent_key: str, max_chars: int) -> str:
    children = zotero_local(f"/items/{urllib.parse.quote(parent_key)}/children")
    parts: list[str] = []
    for child in children or []:
        data = child.get("data") or {}
        if data.get("itemType") != "attachment":
            continue
        key = child.get("key")
        if not key:
            continue
        fulltext = zotero_local_optional(f"/items/{urllib.parse.quote(key)}/fulltext")
        if not fulltext:
            continue
        content = fulltext.get("content") if isinstance(fulltext, dict) else ""
        if content:
            parts.append(content)
        if sum(len(p) for p in parts) >= max_chars:
            break
    if not parts:
        for child in children or []:
            data = child.get("data") or {}
            if data.get("itemType") != "attachment" or data.get("contentType") != "application/pdf":
                continue
            path = data.get("path")
            if not path:
                key = child.get("key")
                if key:
                    full = zotero_local_optional(f"/items/{urllib.parse.quote(key)}")
                    path = ((full or {}).get("data") or {}).get("path")
            if not path:
                continue
            extracted = extract_pdf_text(Path(path), max_chars=max_chars)
            if extracted:
                parts.append(extracted)
                break
    return "\n\n".join(parts)[:max_chars]
 def creators_text(creators: list[dict[str, Any]]) -> str:
    names = []
    for creator in creators:
        name = creator.get("name")
        if not name:
            name = " ".join(x for x in [creator.get("firstName"), creator.get("lastName")] if x)
        if name:
            names.append(name)
    return "; ".join(names)
 def year_from_date(raw: str | None) -> str:
    if not raw:
        return ""
    match = re.search(r"\b(19|20)\d{2}\b", raw)
    return match.group(0) if match else raw[:4]
 def build_source_record(item: dict[str, Any], fulltext_chars: int) -> dict[str, Any]:
    data = item.get("data") or {}
    key = item.get("key")
    return {
        "zoteroKey": key,
        "title": data.get("title"),
        "itemType": data.get("itemType"),
        "year": year_from_date(data.get("date")),
        "publicationTitle": data.get("publicationTitle"),
        "authors": creators_text(data.get("creators") or []),
        "DOI": data.get("DOI"),
        "url": data.get("url"),
        "abstractNote": data.get("abstractNote"),
        "bibtex": export_bibtex(key) if key else "",
        "indexedFullTextExcerpt": local_fulltext(key, fulltext_chars) if key else "",
    }
 def build_prompt(records: list[dict[str, Any]], columns: list[str]) -> str:
    instructions = [
        "你是材料与化学方向的文献整理助手。",
        "请根据给定文献信息输出一个 Markdown 表格。",
        "只输出表格，不要输出任何解释、标题、项目符号或代码块。",
        f"表头必须严格使用以下列，并保持顺序：{' | '.join(columns)}。",
        "每篇文献占一行，不要漏项。",
        "信息不足时填写 - 。",
        "请使用简体中文概括，保持内容紧凑，单元格内避免超过两句话。",
        "如果文献明显不是材料方向，也照样总结，但保持客观。",
    ]
    payload = {"columns": columns, "papers": records}
    return "\n\n".join(instructions + [json.dumps(payload, ensure_ascii=False, indent=2)])
 def call_llm(prompt: str) -> str:
    api_key = os.environ.get("AWESOMEGPT_API_KEY")
    base_url = (os.environ.get("AWESOMEGPT_BASE_URL") or "").rstrip("/")
    model = os.environ.get("AWESOMEGPT_MODEL")
    if not api_key or not base_url or not model:
        fail("AWESOMEGPT_API_KEY, AWESOMEGPT_BASE_URL, and AWESOMEGPT_MODEL are required")
    if not base_url.endswith("/v1"):
        base_url = base_url + "/v1"
    payload = {
        "model": model,
        "messages": [
            {
                "role": "system",
                "content": "You summarize scientific papers into compact Simplified Chinese Markdown tables only.",
            },
            {"role": "user", "content": prompt},
        ],
        "temperature": 0.2,
    }
    response = http_json(
        base_url + "/chat/completions",
        method="POST",
        headers={"Authorization": f"Bearer {api_key}"},
        payload=payload,
        timeout=240,
    )
    try:
        return response["choices"][0]["message"]["content"].strip()
    except Exception as exc:
        fail(f"unexpected LLM response shape: {exc}; response={response}")
 def split_batches(items: list[dict[str, Any]], batch_size: int) -> list[list[dict[str, Any]]]:
    if batch_size <= 0:
        return [items]
    return [items[i:i + batch_size] for i in range(0, len(items), batch_size)]
 def normalize_table(markdown: str, keep_header: bool) -> list[str]:
    lines = [line.rstrip() for line in markdown.splitlines() if line.strip().startswith("|")]
    if not lines:
        fail("LLM did not return a Markdown table")
    if keep_header:
        return lines
    if len(lines) >= 3:
        return lines[2:]
    return lines
 def write_output(table: str, out_path: Path | None) -> None:
    if out_path is None:
        print(table)
        return
    out_path.parent.mkdir(parents=True, exist_ok=True)
    out_path.write_text(table + "\n", encoding="utf-8")
    print(str(out_path))
 def main() -> None:
    parser = argparse.ArgumentParser()
    parser.add_argument("--item-key", action="append", default=[], help="Zotero top-level item key; can be repeated")
    parser.add_argument("--item-keys", help="Comma/space separated Zotero top-level item keys")
    parser.add_argument("--query", help="Search query; top-level matches are used")
    parser.add_argument("--all", action="store_true", help="Process all top-level Zotero items")
    parser.add_argument("--limit", type=int, default=5, help="Maximum number of items to process; 0 means no limit")
    parser.add_argument("--batch-size", type=int, default=5, help="Number of papers per LLM call")
    parser.add_argument("--fulltext-chars", type=int, default=4000)
    parser.add_argument("--columns", help="Comma-separated Markdown table columns")
    parser.add_argument("--vault", default=str(DEFAULT_VAULT), help="Obsidian vault containing optional .env")
    parser.add_argument("--env-file", help="Optional .env path; defaults to <vault>/.env")
    parser.add_argument("--out", help="Optional output Markdown file path")
    args = parser.parse_args()
    vault = Path(args.vault).expanduser().resolve()
    load_dotenv(Path(args.env_file).expanduser().resolve() if args.env_file else vault / ".env")
    load_awesomegpt_prefs()
    keys = list(args.item_key)
    if args.item_keys:
        keys.extend([key for key in re.split(r"[\s,]+", args.item_keys.strip()) if key])
    limit = args.limit if args.limit != 0 else 0
    if args.all:
        items = all_top_items(limit)
    else:
        query_limit = limit or 100
        items = find_items(keys, args.query, query_limit)
    if not items:
        fail("no Zotero items selected")
    columns = [col.strip() for col in (args.columns.split(",") if args.columns else DEFAULT_COLUMNS) if col.strip()]
    batches = split_batches(items, args.batch_size)
    output_lines: list[str] = []
    for index, batch in enumerate(batches, 1):
        print(f"[{index}/{len(batches)}] summarizing {len(batch)} items", file=sys.stderr)
        records = [build_source_record(item, args.fulltext_chars) for item in batch]
        table = call_llm(build_prompt(records, columns))
        output_lines.extend(normalize_table(table, keep_header=index == 1))
    write_output("\n".join(output_lines), Path(args.out).expanduser().resolve() if args.out else None)
 if __name__ == "__main__":
    main()