Add citation workflow improvements

2026-06-09 10:36:19 +08:00 · 2026-06-09 10:36:19 +08:00 · 17c015ec76
parent 14c52ac70f
commit 17c015ec76
3 changed files with 277 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -48,6 +48,119 @@ QQcites 是用于论文写作阶段“按句子/段落/性能表查找参考文
 6. 记录 ledger  
   输出后应能追加到 used-reference log 或 performance-table ledger。

+## 新增能力：向 Nature 系列 skill 对齐
+
+这次补充后，QQcites 增加了几个更适合长文本和原文参数追踪的规则。
+
+### 1. 稳定分段编号
+
+当用户输入长段落或多句话时，应先拆成稳定片段：
+
+```text
+S001, S002, S003 ...
+```
+
+每个片段应记录 claim type，例如：
+
+- background
+- review-context
+- mechanism
+- material-property
+- method
+- characterization
+- performance
+- application
+- limitation
+
+这样后续可以明确知道“哪一句对应哪篇文献”，也方便检查重复引用。
+
+### 2. 更规范的支持等级
+
+推荐文献时使用更细的支持等级：
+
+| 等级 | 含义 |
+|---|---|
+| strong support | 直接支持该句或该参数。 |
+| partial support | 只支持其中一部分，需要拆句或降调。 |
+| background support | 适合作为背景综述，不适合支撑具体实验结论。 |
+| contradictory/limiting | 与原句冲突或限制原句范围。 |
+| metadata-only candidate | 只看了题名/元数据，还没有查摘要或原文，不能直接引用。 |
+| weak | 关键词相似但支持很弱。 |
+| unrelated | 不相关，排除。 |
+
+### 3. Evidence note 模板
+
+重要引用应尽量按下面结构记录：
+
+```text
+Segment: S001
+Claim: 原始句子或 claim
+Candidate: 作者/年份/题名/期刊/DOI/Zotero key
+Support grade: strong / partial / background / contradictory / metadata-only / weak
+Evidence basis: local note / Zotero metadata / abstract / main text p.X / figure caption / SI / publisher page
+Reasoning: 为什么支持或不支持
+Caveat: 是否重复、是否图读数、是否角度型、是否总功率等
+Citation wording: 建议插入位置或建议改写
+```
+
+### 4. 来源分级
+
+QQcites 的检索源按可靠性分层：
+
+| 层级 | 来源 | 用法 |
+|---|---|---|
+| T0 | 本地 Zotero、Obsidian、QQnote、PDF | 默认优先，最符合用户论文上下文。 |
+| T1 | DOI/CrossRef、publisher page、PubMed | 用于元数据验证和官方页面核查。 |
+| T2 | Semantic Scholar、arXiv/bioRxiv/medRxiv | 扩展发现和引用网络。 |
+| T3 | Google Scholar、普通网页、CNKI/万方手动结果 | 最后兜底，必须标注风险。 |
+
+### 5. 引用验证模式
+
+当用户不是“找文献”，而是“检查参考文献列表”时，应进入 verification 模式：
+
+| 状态 | 含义 |
+|---|---|
+| verified | DOI 或官方元数据核对正确。 |
+| duplicate | 同一文献重复出现。 |
+| mismatch | 题名、期刊、年份、作者或 DOI 不一致。 |
+| not_found | 没有可靠匹配。 |
+| suspicious | 疑似乱码、页码异常、期刊不匹配等。 |
+| manual_needed | 信息太少，需要人工确认。 |
+
+### 6. Source map
+
+查原文参数时，应记录 source map，而不是只写“主文提到”。
+
+最少记录：
+
+- 参数 ID，例如 `P001`
+- 题名、DOI、Zotero key
+- 参数名和值
+- 单位
+- 来源等级
+- 页码、图号、表号或 SI 位置
+- 原文短摘录
+- 换算公式和假设
+- 风险说明
+
+### 7. 长文本批处理
+
+如果输入超过 10 个可引用片段，应批处理：
+
+- 1-10 段：正常处理。
+- 11-25 段：每批约 10 段，最后合并去重。
+- 26 段以上：按 manuscript section 拆分，再合并 DOI 去重。
+
+### 8. 文献导出
+
+如果用户要求导出，应支持准备：
+
+- RIS
+- BibTeX
+- ENW
+
+只导出已验证元数据；缺失字段留空并标注 metadata incomplete，不编造 DOI、卷期页码。
+
 ## 性能对比表方法论

 性能表不能只按关键词找文献。先定义硬性纳入标准，再筛选主表文献。
--- a/SKILL.md
+++ b/SKILL.md
@ -11,12 +11,15 @@ Use this skill to turn a manuscript sentence or claim into a ranked list of cita

 1. Identify the citation need.
   - Split long input into one claim per search unit.
+   - For multi-sentence or paragraph input, assign stable segment IDs such as `S001`, `S002`, and `S003`.
+   - Record claim type when useful: background, review-context, mechanism, material-property, method, characterization, performance, application, or limitation.
   - Extract mechanism, material system, method, performance metric, disease/application, and comparison terms.
   - Preserve the user's wording so the final answer can map references back to the exact sentence.

 2. Check the current manuscript citation ledger before searching.
   - If a manuscript-specific `used_references.md`, `used_references.csv`, or extracted citation list is available, read it before ranking candidates.
   - Treat DOI, Zotero item key, title, and normalized first-author/year as duplicate-detection keys.
+   - Use DOI as the primary duplicate key; when DOI is missing, compare normalized title plus first-author/year.
   - If no ledger exists, infer recent used references from the current conversation and recommend creating a ledger rather than relying on memory.
   - Zotero and Obsidian identify candidate papers; the ledger or manuscript citation fields identify papers already used in this manuscript.

@ -35,6 +38,7 @@ Use this skill to turn a manuscript sentence or claim into a ranked list of cita
   - Default to local QQnote/Obsidian/Zotero evidence first.
   - If local notes produce too few or weak candidates, supplement with web literature search and clearly label web-only candidates.
   - Do not replace strong local evidence with web results unless the web result is clearly more relevant or more authoritative.
+   - Treat local Zotero/Obsidian/PDF evidence as T0, structured metadata or publisher pages as T1, scholarly discovery APIs as T2, and general web/search-engine results as T3.

 6. Use DeepSeek for local semantic screening.
   - When deciding which local notes or papers are relevant, send the manuscript claim plus candidate note snippets to DeepSeek through the user's configured QQnote/AwesomeGPT/DeepSeek route.
@ -44,6 +48,8 @@ Use this skill to turn a manuscript sentence or claim into a ranked list of cita

 7. Rank candidates by manuscript usefulness.
   - Rank by direct claim support first.
+   - Use the support grades from `references/citation-ranking.md`: strong support, partial support, background support, contradictory/limiting, metadata-only candidate, weak, or unrelated.
+   - Do not cite metadata-only candidates as support until the abstract, publisher page, local note, or local PDF has been checked.
   - Within similar relevance, put review articles before primary research when the user needs background, broad motivation, mechanism overview, or field status.
   - Put primary research before reviews when the sentence makes a specific experimental, material, performance, or mechanistic claim that needs original evidence.
   - Force review-only or review-priority ranking only when the user explicitly asks for reviews, "需要综述", or "综述优先".
@ -62,6 +68,12 @@ Use this skill to turn a manuscript sentence or claim into a ranked list of cita
   - If file editing is not requested or the ledger path is unknown, include a compact "used-reference log" block in the answer so it can be appended later.
   - Keep the ledger manuscript-specific; do not globally ban papers across unrelated manuscripts or projects.

+10. For verification, batch, export, or source-map requests, switch workflows.
+   - If the user asks to check an existing bibliography, classify references as `verified`, `duplicate`, `mismatch`, `not_found`, `suspicious`, or `manual_needed`.
+   - If the user provides more than about 10 citable segments, process in batches and deduplicate final candidates by DOI.
+   - If the user asks for RIS, BibTeX, or ENW, export only verified metadata and leave missing fields blank rather than inventing them.
+   - If extracting original-paper parameters, keep a source map with value, unit, page/figure/table location, evidence level, conversion rule, and caveat.
+
 ## Output

 For each user sentence, return:
@ -71,10 +83,13 @@ For each user sentence, return:
 - `Why relevant`: one concise reason tied to the sentence.
 - `Type`: `review`, `primary research`, `method`, `dataset`, or `unclear`.
 - `Strength`: `direct`, `partial`, `background`, or `weak`.
+- `Evidence basis`: local note, Zotero metadata, abstract, main text, figure caption, supporting information, graph digitization, or calculated.
+- `Caveat`: repeated reference, metadata-only, figure-only, angle-only, total power, mismatch, or other risk when relevant.
 - `Suggested citation use`: where the paper should be cited in the sentence or paragraph.

 Read [references/citation-ranking.md](references/citation-ranking.md) when the task involves multiple sentences, many candidate notes, tie-breaking, or final table formatting.
 Also read it when the user asks for performance comparison tables, literature tables, parameter extraction from original papers, unit normalization, graph-derived values, or inclusion/exclusion decisions for main-table versus supplemental-table references.
+Also read it when the user asks for segmented citations, claim IDs, citation verification, source-map tracing, long-manuscript batch processing, or RIS/BibTeX/ENW export.

 ## Operating Rules

--- a/references/citation-ranking.md
+++ b/references/citation-ranking.md
@ -2,6 +2,31 @@

 Use this reference when a request has more than one claim, many local-note hits, or ambiguous citation choices.

+## Segmentation and Claim IDs
+
+When the user provides a paragraph, manuscript section, or multiple sentences, create stable citable
+segments before searching.
+
+- Use `S001`, `S002`, `S003` for sentence or claim segments.
+- Preserve the user's original text for each segment.
+- Split broad paragraphs into smaller claims when one sentence contains multiple citation needs.
+- Skip purely connective wording unless the user asks to cite every sentence.
+- For each segment, record:
+  - `claim_type`: background, review-context, mechanism, material-property, method, characterization, performance, application, or limitation.
+  - `entity`: material, molecule, device, method, application, or phenomenon.
+  - `relationship`: improves, drives, coordinates, adsorbs, transports, converts, senses, stabilizes, limits, etc.
+  - `context`: stimulus, material system, measurement method, device condition, organism/model, or application boundary.
+  - `boundary`: only under a stated condition such as humidity range, light intensity, cell model, or target journal scope.
+
+Generate 2-4 search queries per segment:
+
+1. `precise`: entity + relationship + context + target metric.
+2. `synonym`: alternate names, abbreviations, Chinese/English equivalents, chemical formulas.
+3. `broad`: field or mechanism context if direct local matches are weak.
+4. `method/performance`: include measurement method, figure metric, or device parameter when the claim is quantitative.
+
+For Chinese manuscript text, translate scientific concepts rather than the sentence literally. Keep standard formulas, acronyms, and material names unchanged.
+
 ## Candidate Search Pattern

 Search each claim with four groups of terms:
@ -19,6 +44,25 @@ rg -n -i "MXene|Ti3C2|conductivity|electromagnetic shielding" "C:\Users\qyh15\Do

 Adjust the vault subfolder if QQnote-skill documents a different current literature-note folder.

+## Source Tiers and Routing
+
+Use local sources before external sources, but record source reliability explicitly.
+
+| Tier | Source | Use |
+|---|---|---|
+| T0 | User local Zotero, Obsidian literature notes, QQnote notes, local PDFs | Default first source and strongest grounding for this user's manuscripts |
+| T1 | DOI/CrossRef metadata, publisher pages, PubMed when relevant | Metadata verification and official claim/source checking |
+| T2 | Semantic Scholar, arXiv/bioRxiv/medRxiv when relevant | Broader discovery, citation graph, preprints |
+| T3 | Google Scholar, general web pages, institutional pages, manually accessed CNKI/Wanfang | Last resort; label as incomplete or web-sourced |
+
+Fallback routing:
+
+1. Search T0 with exact and synonym terms.
+2. If T0 is insufficient, verify or supplement with T1.
+3. If T1 is still insufficient, broaden with T2.
+4. Use T3 only when T0-T2 fail or the user explicitly asks for broad web search.
+5. Never let weak T3 evidence replace strong T0 evidence unless the local item is wrong, outdated, or not actually supportive.
+
 ## DeepSeek Screening Prompt

 When DeepSeek is available, pass only the manuscript claim and candidate snippets needed for ranking:
@ -33,7 +77,7 @@ Candidate notes:
 <numbered snippets with title/year/DOI/path if available>

 For each candidate, judge:
-1. relevance: direct, partial, background/review, weak, unrelated
+1. relevance: strong support, partial support, background support, contradictory/limiting, metadata-only candidate, weak, unrelated
 2. article type: review, primary research, method, unclear
 3. whether it truly supports the sentence or only shares keywords
 4. one-sentence reason
@ -42,6 +86,64 @@ Return a ranked list. Prefer review articles for broad background claims, but pr
 Do not invent metadata or references not present in the candidates.
 ```

+## Support Grades
+
+Use the smallest support grade that is defensible.
+
+| Grade | Meaning | Good use |
+|---|---|---|
+| strong support | Directly tests or reviews the same core claim in a matching context | Specific mechanism, material property, method, or performance statements |
+| partial support | Supports only part of the sentence, a narrower system, or a related condition | Claims that can be qualified or split |
+| background support | Establishes field context but not the exact claim | Introduction, motivation, broad status statements |
+| contradictory/limiting | Conflicts with or narrows the claim | Avoid as support; use for limitations or revise wording |
+| metadata-only candidate | Title/metadata suggest relevance but abstract/full text has not been checked | Screening only; do not cite as support yet |
+| weak | Shares keywords but does not support the sentence well | Usually exclude |
+| unrelated | Not useful for the claim | Exclude |
+
+Do not cite a `metadata-only candidate` as support until the abstract, note, publisher page, or local PDF has been checked.
+
+## Evidence Note Template
+
+Use this template when ranking important citations, when the user asks why a paper was selected, or when the claim is high-risk:
+
+```text
+Segment: S001
+Claim: <original claim>
+Candidate: <first author/year/title/journal/DOI/Zotero key>
+Support grade: <strong/partial/background/contradictory/metadata-only/weak>
+Evidence basis: <local note / Zotero metadata / abstract / main text p.X / figure caption / SI / publisher page>
+Reasoning: <why it supports, partially supports, or fails to support the exact claim>
+Caveat: <repeated reference, review-only, different material system, graph-only value, etc.>
+Citation wording: <where or how to cite; suggest wording change if the manuscript overclaims>
+```
+
+## Deduplication
+
+Use DOI as the primary duplicate key.
+
+1. Normalize DOI by lowercasing, trimming whitespace, and removing `https://doi.org/`.
+2. Treat identical normalized DOI values as the same paper even if Zotero keys differ.
+3. If DOI is missing, compare Zotero key, normalized title, first author, and year.
+4. Normalize titles by lowercasing, removing punctuation and stopwords, and collapsing whitespace.
+5. Treat records as duplicates when the first-author surname matches and normalized-title token overlap is very high, approximately Jaccard similarity >= 0.90.
+
+When duplicate records differ in metadata quality, prefer the record with DOI, complete journal/year/volume/pages, and a local PDF or QQnote note.
+
+## Citation Verification Mode
+
+When the user asks to check an existing reference list, manuscript bibliography, or Zotero export, switch from recommendation mode to verification mode.
+
+Classify each reference as:
+
+- `verified`: metadata matches DOI or official source.
+- `duplicate`: same DOI/title appears more than once.
+- `mismatch`: title, journal, year, author, or DOI conflicts with retrieved metadata.
+- `not_found`: no reliable match in local library or external metadata.
+- `suspicious`: likely typo, encoding problem, impossible year/pages, or journal mismatch.
+- `manual_needed`: insufficient identifiers or ambiguous title.
+
+Return a summary count plus a detail table with DOI/Zotero key, issue, and suggested correction. Do not silently rewrite a bibliography without showing what changed.
+
 ## Tie-Breaking

 Rank candidates using this order:
@ -64,6 +166,12 @@ Use this compact table for most answers:

 If a sentence has no strong local match, write `No strong local match found` and list the closest weak candidates separately.

+For multi-segment citation work, use stable segment IDs:
+
+| Segment | Claim type | Best candidate | Support grade | Evidence basis | Suggested use |
+|---|---|---|---|---|---|
+| S001 | background | Title. Journal, Year. DOI: ... | background support | local note + abstract | Cite after broad field sentence |
+
 ## Performance Comparison Tables

 When the user asks for a literature performance table, define hard inclusion criteria before recommending papers. Typical criteria include material family, stimulus type, whether the paper is primary research, whether the target metric is reported, whether units can be normalized, and whether the item is present in local Zotero/Obsidian. Put only papers that satisfy the table criteria in the main table. Papers that report only angle, displacement, speed, demonstration photos, or concept-level behavior should be listed as supplemental or non-comparable unless the user explicitly wants them.
@ -81,3 +189,43 @@ Default table triage:
 - `not recommended`: review, wrong stimulus, non-target material family, no performance metric, or only broad background relevance.

 Maintain a manuscript-specific performance-table ledger in addition to the citation ledger. Track table name, DOI, Zotero key, title, material system, metric values, evidence level, whether it was used in the main or supplemental table, and any caveats such as `figure only`, `angle-only`, or `total power`.
+
+## Source Map for Original-Paper Parameters
+
+When extracting parameters from PDFs or full texts, maintain a source map so the value can be rechecked later.
+
+Minimum source-map fields:
+
+| Field | Meaning |
+|---|---|
+| `value_id` | Stable ID such as `P001`, `P002` |
+| `paper` | title, DOI, Zotero key |
+| `parameter` | curvature, light intensity, response time, Tmax, PTCE, RH range, etc. |
+| `value` | extracted numeric value and unit |
+| `source_level` | main text, figure caption, SI, graph digitization, abstract only, calculated |
+| `location` | page, figure, table, supplementary note, or caption |
+| `snippet` | short local text around the value when available |
+| `conversion` | formula and assumptions if calculated |
+| `caveat` | figure-only, total power, angle-only, missing area, repeated paper, etc. |
+
+If the user asks follow-up questions about a table value, answer using the source map rather than memory.
+
+## Batch Mode for Long Manuscripts
+
+When the input has more than about 10 citable segments, use batch mode.
+
+- 1-10 segments: process normally and include inline evidence notes.
+- 11-25 segments: split into batches of about 10, return a compact summary table, and keep a ledger/source-map artifact if file editing is available.
+- 26+ segments: split by manuscript section first, process section by section, then merge and deduplicate by DOI.
+
+For long runs, avoid writing long explanations for every weak candidate. Focus detailed notes on missing, contradictory, repeated, or high-risk segments.
+
+## Reference Export
+
+When the user asks for export, prepare metadata for one reference-manager format:
+
+- RIS for Zotero/EndNote/Mendeley interchange.
+- BibTeX for LaTeX/manuscript projects.
+- ENW when EndNote tagged export is requested.
+
+Do not invent missing fields. If DOI, volume, issue, or pages are unavailable, leave the field blank and mark `metadata incomplete`. Deduplicate exported records by DOI before writing an export file.