Abstract
Internal flow signatures analyze depthwise dynamics in large language models to enable self-checking and targeted refinement without modifying the base model.
Large language models can generate fluent answers that are unfaithful to the provided context, while many safeguards rely on external verification or a separate judge after generation. We introduce internal flow signatures that audit decision formation from depthwise dynamics at a fixed inter-block monitoring boundary. The method stabilizes token-wise motion via bias-centered monitoring, then summarizes trajectories in compact moving readout-aligned subspaces constructed from the top token and its close competitors within each depth window. Neighboring window frames are aligned by an orthogonal transport, yielding depth-comparable transported step lengths, turning angles, and subspace drift summaries that are invariant to within-window basis choices. A lightweight GRU validator trained on these signatures performs self-checking without modifying the base model. Beyond detection, the validator localizes a culprit depth event and enables a targeted refinement: the model rolls back to the culprit token and clamps an abnormal transported step at the identified block while preserving the orthogonal residual. The resulting pipeline provides actionable localization and low-overhead self-checking from internal decision dynamics. Code is available at github.com/EavnJeong/Internal-Flow-Signatures-for-Self-Checking-and-Refinement-in-LLMs.
Community
This repository implements Internal Flow Signatures, a training-free method for auditing and refining LLM decisions by analyzing depthwise hidden-state dynamics. The approach enables lightweight self-checking and targeted refinement without modifying the base model.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Prompt-Induced Over-Generation as Denial-of-Service: A Black-Box Attack-Side Benchmark (2025)
- Entropy Sentinel: Continuous LLM Accuracy Monitoring from Decoding Entropy Traces in STEM (2026)
- The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition (2025)
- When Benchmarks Leak: Inference-Time Decontamination for LLMs (2026)
- Theoretical Foundations of Prompt Engineering: From Heuristics to Expressivity (2025)
- How Does Prefix Matter in Reasoning Model Tuning? (2026)
- Q-realign: Piggybacking Realignment on Quantization for Safe and Efficient LLM Deployment (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper