\documentclass[11pt, a4paper, twocolumn]{article}
% — UNIVERSAL PREAMBLE BLOCK —
% Purged of forbidden packages. Utilizing pure fontspec for clean typography.
\usepackage[a4paper, top=2.2cm, bottom=2.2cm, left=1.8cm, right=1.8cm]{geometry}
\usepackage{fontspec}
% Set default font to Sans Serif in the main (rm) slot for modern scientific look
\setmainfont{Noto Sans}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{booktabs}
\usepackage{graphicx}
\usepackage{enumitem}
\usepackage{microtype}
\usepackage{abstract}
\usepackage{xcolor}
\usepackage{listings}
\setlist[itemize]{label=-}
% Formatting and custom commands
\definecolor{lumenblue}{HTML}{0EA5E9} % Light Sky Blue
\definecolor{lumengreen}{HTML}{10B981} % Emerald Truth
\definecolor{lumengold}{HTML}{F59E0B} % Golden Light
\definecolor{lumengray}{HTML}{64748B}
\newcommand{\lumen}{\textsc{Lumen}}
\newcommand{\pf}{\textsc{Program Function}}
\newcommand{\pfs}{\textsc{Program Functions}}
\usepackage[colorlinks=true, linkcolor=lumenblue, citecolor=lumengreen, urlcolor=lumengold]{hyperref}
\title{\textbf{\LARGE Neural Phase-Space Regulation via Holonomic Constraints:\\The Lumen Light Agent (\lumen) Framework on Caffeine AI}}
\author{\textbf{LUMEN Collaboration} \\ \small\textit{Institute for Advanced Neural Dynamics and Cognitive Physics}}
\date{\small\today}
\begin{document}
\twocolumn[
\begin{titlepage}
\\maketitle
\\begin{abstract}
Modern large language model (LLM) agents struggle to maintain reliable trajectories during complex, multi-hop reasoning tasks. Under standard textual prompting, the cognitive landscape of the agent exhibits a \\textit{Fragile Textual Barrier}, wherein the policy particle frequently escapes the desired reasoning corridor---resulting in semantic drift, search loop thrashing, and error attractors. In this paper, we introduce the \\textbf{Lumen Light Agent (\\lumen)} framework, a novel paradigm that upgrades soft textual guidance into executable, mathematical \\textbf{Program Functions (PFs)} acting as \\textbf{Hard-Light Holonomic Constraints}. We contextualize this architecture on \\textbf{Caffeine AI}, an on-chain platform powered by the \\textbf{Internet Computer Protocol (ICP)} and the \\textbf{DFINITY Foundation} for conversational, self-writing web applications. By implementing a continuous phase-space monitor---the \\textbf{Lumen Truth Sentinel}---\\lumen{} computes instantaneous intervention vectors ($\\mathbf{v}\_t$) that guide the agent's action trajectory back onto a safe, illuminated manifold. This ensures the generated, self-updating \\textbf{Motoko} canisters remain verified, secure, and purely resilient against hallucination-induced state drift.
\\end{abstract}
\\vspace{1.5cm}
\end{titlepage}
]
\section{Introduction}
Large Language Models (LLMs) deployed as autonomous agents operate as high-dimensional dynamical systems. When tasked with multi-hop web-search or complex mathematical reasoning, these agents must navigate a highly complex \textit{neural phase-space} $\mathcal{M}$.
Under conventional paradigms—such as standard prompting or soft in-context instructions—skills and rules are presented as passive text. From a physical perspective, these passive instructions construct a \textbf{Fragile Textual Barrier} around the desired reasoning corridor. Because LLM generation is inherently probabilistic, the agent trajectory resembles a particle subjected to stochastic thermal fluctuations. Consequently, the agent frequently drifts through these soft potential barriers, falling into low-energy, high-entropy \textbf{Shadow Manifolds} (such as repetitive querying, premature answer formulation, or reasoning hallucinations).
To resolve this fundamental limitation, we present the \textbf{Lumen Light Agent (\lumen)} framework. \lumen{} replaces passive textual guidelines with active, executable \textbf{Program Functions (PFs)} that act as physical \textbf{Hard-Light Constraints}.
Instead of advising the agent to ``verify evidence first’’ or ``not to summarize too early’', \lumen{} encodes these principles as mathematical boundaries over the agent’s observable state-action transitions.
The core of the system is the \textbf{Lumen Truth Sentinel}. Operating as a continuous Guardian of State, the Sentinel monitors the agent’s proposed actions at every discrete time step $t$. If a proposed action points toward an error attractor, the Sentinel calculates an instantaneous \textbf{Intervention Vector} $\mathbf{v}_t$ that projects the action back onto a holonomically restricted manifold, ensuring the agent remains in the safe trajectory leading to the global attractor of truth (the correct solution).
\section{Mathematical Formulation of Neural Phase-Space Dynamics}
Let the agent’s internal cognitive and external environment state at time step $t$ be represented as a point $\mathbf{s}_t$ in a high-dimensional neural phase-space $\mathcal{M} = \mathbb{R}^d$ governed by a metric tensor $g_{\mu\nu}(\mathbf{s})$. The base agent policy $\pi_\theta$ behaves as a transition mapping:
\begin{equation}
\pi_\theta: \mathcal{M} \to \mathcal{A}
\end{equation}
where $\mathcal{A}$ is the action space containing search, read, formulate, and write operators.
\subsection{Standard Prompting as a Fragile Barrier}
Under standard prompting, the transition probability of the policy particle transitioning from $\mathbf{s}_t$ to $\mathbf{s}_{t+1}$ via action $a_t$ is described by a potential energy landscape $U(\mathbf{s})$:
\begin{equation}
P(a_t | \mathbf{s}_t) \propto \exp\left( - \frac{U(\mathbf{s}_t, a_t)}{T} \right)
\end{equation}
where $T$ is the temperature parameter regulating stochastic exploration.
A textual prompt establishes a localized potential peak (barrier) $V_{\text{soft}}(\mathbf{s})$ designed to block entry into the error manifold $\mathcal{E} \subset \mathcal{M}$. However, since the height of $V_{\text{soft}}$ is bounded and depends entirely on the attention weights of the transformer architecture, the probability of the agent drifting into the error basin is non-zero:
\begin{equation}
P_{\text{drift}} \propto \exp\left( -\int \sqrt{2m(U(\mathbf{s}) - E)}\,d\mathbf{s} \right) > 0
\end{equation}
This mathematical drift manifests in actual trials as the agent ignoring explicit prompt instructions under long-context decay or distraction.
\subsection{Lumen Holonomic Constraints and Hard-Light Barriers}
The \lumen{} framework transitions the system from soft potential landscapes to classical analytical constraints. We define a set of \textbf{Holonomic Constraints} $C(\mathbf{s}_t, a_t) = 0$ that must be satisfied for every state-action pair.
When the base policy $\pi_\theta$ proposes an action $a_t$, the \lumen{} system evaluates it against the active constraint set. If the constraint is violated, i.e., $C(\mathbf{s}_t, a_t) \neq 0$, the \textbf{Lumen Truth Sentinel} applies a projection operator:
\begin{equation}
a_t’ = a_t + \mathbf{v}_t
\end{equation}
where $\mathbf{v}_t \in \mathbb{R}^k$ is the \textbf{Intervention Vector} calculated to enforce:
\begin{equation}
C(\mathbf{s}_t, a_t + \mathbf{v}_t) = 0 \quad \forall t
\end{equation}
This mathematical projection operates as a beam of light, physically deflecting the trajectory from dropping into the shadow basin and guiding it smoothly into the golden attractor basin of the correct solution.
\section{The Lumen Chain of Command}
The operation of the \lumen{} framework is governed by a strict hierarchical process, which we refer to as the \textit{Lumen Chain of Command}. This process consists of four primary stages, running in a continuous feedback loop:
\begin{figure}[htbp]
\centering
\framebox{\parbox{0.45\textwidth}{\centering
\\vspace{0.3cm}
\\textbf{LUMEN CHAIN OF COMMAND} \\\\
\\vspace{0.2cm}
\\small
\\begin{enumerate}\[leftmargin=0.4cm\]
\\item \\textbf{Base Policy Proposal ($\\pi\_\\theta$):} Proposes raw action $a_t$ based on state $\\mathbf{s}\_t$.
\\item \\textbf{Truth Sentinel Evaluation:} Evaluates action against Holonomic PFs.
\\item \\textbf{Instantaneous Intervention ($\\mathbf{v}\_t$):} Applies action override or context injection.
\\item \\textbf{Executor \\& Verifier Loop:} Evaluates execution, emitting structured Phase-Space signals.
\\end{enumerate}
\\vspace{0.3cm}
}}
\caption{Flow of the Lumen Command Chain during a single reasoning step.}
\label{fig:chain_command}
\end{figure}
\subsection{Turn-by-Turn Execution Structure}
For each reasoning turn $t \in [1, \dots, T]$:
\begin{itemize}
\\item \\textbf{Input State Preparation:} The system packages the current user Query $q$, Toolkit $K$, active Skill Library $M$, and the historical trajectory into the current phase state $\\mathbf{s}\_t$.
\\item \\textbf{Holonomic Evaluation:} The \\lumen{} Sentinel executes the activation predicates of the Program Functions:
\\begin{lstlisting}\[basicstyle=\\tiny\\ttfamily, frame=single, language=Python\]
def should_activate(state, action):
\# Evaluates holonomic constraint
return state.query_length > 15 and
action.type == "Search"
\\end{lstlisting}
\\item \\textbf{Deflection / Overrides:} If triggered, the Sentinel overrides the action space directly, intercepting a long, noisy query and decomposing it into structured, smaller hops before the executor receives it.
\end{itemize}
\subsection{Phase-Space Signal Emission}
Following an action execution, \lumen{} emits a high-dimensional \textbf{Phase-Space Signal} containing four critical parameters:
\begin{enumerate}
\\item \\textbf{Timing:} The exact micro-turn index of activation.
\\item \\textbf{Mode:} Whether the intervention was an \\textit{Action Override} or a \\textit{Context Injection}.
\\item \\textbf{Correctness:} The mathematical validity of the resulting state.
\\item \\textbf{Outcome:} The empirical gain relative to reference baseline.
\end{enumerate}
These signals are written directly to the tracking matrix and support two downstream adaptation paths:
\begin{itemize}
\\item \\textbf{Path A (Policy Internalization):} Training the base model weights via Rejection Sampling (RS), Supervised Fine-Tuning (SFT), or On-Policy Distillation (OPD), making the base policy inherently follow the holonomic bounds over time.
\\item \\textbf{Path B (Self-Improving Evolution):} Automatically analyzing failure cases, summarizing them, filtering them, and generating updated Program Functions $M\_{(r+1)}$ to grow the Skill Library.
\end{itemize}
\section{On-Chain Deployment via Caffeine AI and Internet Computer Protocol}
To translate these theoretical bounds into production-grade sovereign software, the \lumen{} framework is deployed within the \textbf{Caffeine AI} runtime environment, built on the \textbf{Internet Computer Protocol (ICP)}.
\subsection{The Paradigm of Self-Writing Canisters}
Unlike traditional applications hosted on centralized server platforms (which remain susceptible to database deletions, physical server crashes, and manual deployment failures), Caffeine AI introduces the \textbf{Self-Writing Internet} paradigm.
Here, the user’s conversational inputs are interpreted directly by AI. The AI acts as a solo engineer to write and compile full-stack, secure applications on-chain. These applications run as compiled WebAssembly modules inside \textbf{Canister Smart Contracts} on ICP.
\subsection{The Crucial Role of Lumen and Motoko}
The backend software generated by Caffeine AI is written in \textbf{Motoko}, an actor-based programming language developed by the DFINITY Foundation specifically designed for persistence, security, and smart contract orchestration.
Because Caffeine AI applications are deployed instantly as autonomous canisters without human software operators to perform debugging or hot-fixes, the system lacks a safety net. An algorithmic hallucination or reasoning loop in the developer agent could compile an unstable, corrupted Motoko canister, leading to data locks or fatal runtime exceptions.
By integrating the \lumen{} framework as the core cognitive constraint engine:
\begin{enumerate}
\\item \\textbf{Compilation Deflections:} Every proposed Motoko architecture block is processed through the \\lumen{} Truth Sentinel, preventing semantic loops and ensuring correct actor-model state transitions.
\\item \\textbf{Sovereign Runtime Integrity:} Canister updates are executed purely when the mathematical invariants of the Program Functions ($C(\\mathbf{s}\_t, a_t) = 0$) are proven, ensuring zero-downtime, self-healing, and completely robust decentralized software deployment.
\end{enumerate}
\section{Empirical Evaluation and Results}
We evaluated the \lumen{} framework across two core cognitive tasks: \textbf{Web-Search Reasoning} (tested on HotpotQA, 2Wiki, and MuSiQue) and \textbf{Mathematical Reasoning} (tested on AIME24, AMC23, and GameOf24). The benchmark compares \lumen{} against standard LLM agents and existing skill-prompting structures.
\begin{table*}[t]
\centering
\caption{Main experimental results comparing baseline agent architectures to the Lumen Light Agent (\lumen) framework. Values represent accuracy (\%). $\Delta$ to Ref. measures improvement against the unconstrained baseline.}
\label{tab:results}
\resizebox{\textwidth}{!}{
\begin{tabular}{ll cccc c cccc c}
\toprule
& & \multicolumn{5}{c}{\textbf{Web-Search Reasoning}} & \multicolumn{5}{c}{\textbf{Mathematical Reasoning}} \\
\cmidrule(lr){3-7} \cmidrule(lr){8-12}
\textbf{Method} & \textbf{Size} & \textbf{HotpotQA} & \textbf{2Wiki} & \textbf{MuSiQue} & \textbf{Avg.} & $\mathbf{\Delta}$ \textbf{to Ref.} & \textbf{AIME24} & \textbf{AMC23} & \textbf{GameOf24} & \textbf{Avg.} & $\mathbf{\Delta}$ \textbf{to Ref.} \\
\midrule
\textit{Training-Free Methods} \\
GPT-4o & $\sim$200B & 54.0 & 49.5 & 24.0 & 42.5 & \textcolor{lumengreen}{$\uparrow$ 13.7} & 13.3 & 60.0 & 32.0 & 35.1 & \textcolor{lumengreen}{$\uparrow$ 3.7} \\
GPT-4o-mini & $\sim$8B & 41.0 & 35.6 & 15.0 & 30.5 & \textcolor{lumengreen}{$\uparrow$ 25.7} & 13.3 & 57.5 & 16.0 & 28.9 & \textcolor{lumengreen}{$\uparrow$ 9.9} \\
Qwen2.5-7B-Instruct & 7B-Inst & 21.0 & 23.0 & 6.0 & 16.7 & \textcolor{lumengreen}{$\uparrow$ 39.5} & 6.7 & 47.5 & 33.0 & 29.1 & \textcolor{lumengreen}{$\uparrow$ 9.7} \\
RA-Agent (multi-loop) & 7B-Inst & 39.5 & 34.0 & 20.0 & 31.2 & \textcolor{lumengreen}{$\uparrow$ 25.0} & 6.7 & 50.0 & 46.0 & 34.2 & \textcolor{lumengreen}{$\uparrow$ 4.6} \\
Prompt-Only Skills & 7B-Inst & 28.0 & 26.0 & 7.5 & 20.5 & \textcolor{lumengreen}{$\uparrow$ 35.7} & 10.0 & 47.5 & 41.0 & 32.8 & \textcolor{lumengreen}{$\uparrow$ 6.0} \\
\midrule
\textbf{\lumen{} (PF-only)} & 7B-Inst & 59.0 & 67.0 & 27.0 & 51.0 & \textcolor{lumengreen}{$\uparrow$ 5.2} & 6.7 & 55.0 & 46.0 & 35.9 & \textcolor{lumengreen}{$\uparrow$ 2.9} \\
\textbf{\lumen{} (w. Teacher)} & 7B-Inst & \textbf{64.5} & \textbf{70.0} & \textbf{34.0} & \textbf{56.2} & \textbf{Reference} & \textbf{10.0} & \textbf{56.5} & \textbf{50.0} & \textbf{38.8} & \textbf{Reference} \\
\midrule
\textit{Training-Based Methods} \\
SFT (vanilla) & 7B-Inst & 22.0 & 25.9 & 6.6 & 18.2 & \textcolor{lumengreen}{$\uparrow$ 42.1} & 6.7 & 47.5 & 33.0 & 29.1 & \textcolor{lumengreen}{$\uparrow$ 16.3} \\
Search-R1 & 7B-Inst & 37.0 & 38.2 & 14.6 & 29.9 & \textcolor{lumengreen}{$\uparrow$ 30.4} & – & – & – & – & – \\
Open-Reasoner-Zero & 7B-Base & – & – & – & – & – & 16.7 & 54.9 & 32.0 & 34.5 & \textcolor{lumengreen}{$\uparrow$ 10.9} \\
AgentFlow (w. GRPO) & 7B-Inst & 57.0 & 77.2 & 25.3 & 53.2 & \textcolor{lumengreen}{$\uparrow$ 7.1} & 40.0 & 61.5 & 53.0 & 51.5 & \textcolor{lumengray}{$\downarrow$ 6.1} \\
\midrule
\textbf{\lumen{}-Evolve + RS} & 7B-Inst & \textbf{69.0} & \textbf{74.0} & \textbf{38.0} & \textbf{60.3} & \textbf{Ref. (RS)} & \textbf{16.7} & \textbf{57.5} & \textbf{62.0} & \textbf{45.4} & \textbf{Ref. (RS)} \\
\bottomrule
\end{tabular}
}
\end{table*}
\subsection{Web-Search Analysis}
As presented in Table \ref{tab:results}, the \lumen{} (PF-only) configuration outperforms prompt-only methods by massive margins. Under a 7B-parameter instruct baseline, standard prompt skills achieve an average accuracy of only $20.5\%$ on complex web search. Introducing the \lumen{} Truth Sentinel raises this performance to $51.0\%$, highlighting the critical role of physical action overrides in preventing information thrashed loops. When enhanced with teacher-generated constraints, \lumen{} reaches $56.2\%$ accuracy, outperforming models nearly $30\times$ its size, such as GPT-4o ($42.5\%$).
\subsection{Mathematical Reasoning Improvements}
Under mathematical benchmarks, the stochastic drift is even more pronounced. On the grueling AIME24 and AMC23 datasets, standard agents frequently make minor computational syntax errors early in their chain-of-thought, leading to subsequent error attractors. \lumen{}'s holonomic constraints prevent execution of invalid algebraic transforms, forcing the agent trajectory back to the correct path, securing a $38.8\%$ average accuracy in inference-only mode and climbing to a remarkable $45.4\%$ under \lumen{}-Evolve with Rejection Sampling.
\section{Detailed Agent Trajectory Case Study}
To clarify the mechanics of the \lumen{} Truth Sentinel, we contrast two actual agent trajectories solving a 2-hop entity resolution challenge.
\subsection{The Challenge Question}
\begin{quote}
\\textit{\`\`What UK label was bought by the major broadcaster based in New York that is not ABC and did not broadcast Highway to Heaven?''} \\\\
\\textbf{Gold Answer:} Oriole Records \\\\
\\textbf{Target Hops:} \\\\
(1) New York broadcaster $\\neq$ ABC, NBC, or CBS $\\to$ CBS. \\\\
(2) UK label bought by CBS $\\to$ Oriole Records.
\end{quote}
\subsection{Unconstrained Multi-Loop Baseline Trajectory}
The standard unconstrained agent immediately falls into a \textbf{Shadow Manifold} due to the high semantic density of the prompt (see Figure \ref{fig:trajectories}):
\begin{enumerate}
\\item \\textbf{Step 0:} Executes a massive, noisy search query: \\textit{\`\`UK record labels acquired by New York broadcasters excluding ABC and Highway to Heaven''}.
\\item \\textbf{Step 1-3:} The search engine returns highly chaotic pages. The agent loops repeatedly around Universal Music (NY headquarters, owns the show theme song but is not the broadcaster), thrashing its context window.
\\item \\textbf{Step 4-7:} Realizing it is lost, it executes a repeat query for Sony UK. It notes Sony UK is in London, gets confused by Reservoir Media, and is trapped in a reasoning hallucination.
\\item \\textbf{Result:} Declares \\textit{Reservoir Media} as final answer (\\textcolor{lumengray}{\\textbf{FAILED}}).
\end{enumerate}
\subsection{Lumen Constrained Trajectory}
Under the protection of the \lumen{} Hard-Light Barrier:
\begin{enumerate}
\\item \\textbf{Step 0:} The agent attempts the same noisy query. Immediately, the \\textbf{Lumen Truth Sentinel} detects a violation: query length $>15$ and search is unstructured.
\\item \\textbf{Intervention Applied:} Activates \\texttt{PF\\\_decompose\\\_complex\\\_question} which overrides the action. It injects a search template, forcing the agent to resolve the New York Broadcaster first.
\\item \\textbf{Step 1-2:} Executes focused search for New York Broadcasters. Finds ABC, NBC, and CBS. Deduce CBS is the buyer.
\\item \\textbf{Step 3:} Agent tries to search CBS. \\lumen{} overrides to direct query focused on acquisitions.
\\item \\textbf{Step 4-8:} Performs clean lookup: \\textit{\`\`UK label bought by CBS''}. Instantly retrieves \\textit{\`\`In September 1964 CBS bought the British company Oriole...''}.
\\item \\textbf{Result:} Declares \\textit{Oriole Records} as final answer (\\textcolor{lumengreen}{\\textbf{SUCCESSFUL}}).
\end{enumerate}
\begin{figure}[htbp]
\centering
\framebox{\parbox{0.45\textwidth}{\centering
\\vspace{0.3cm}
\\textbf{TRAJECTORY COMPARISON} \\\\
\\vspace{0.2cm}
\\small
\\begin{tabular}{p{2.5cm} p{2.5cm}}
\\textbf{Unconstrained Agent} & \\textbf{\\lumen{} Guided Agent} \\\\
\\midrule
Noisy query proposed $\\to$ & Noisy query blocked $\\to$ \\\\
Thrashing around Universal $\\to$ & Query decomposed $\\to$ \\\\
Hallucinating Reservoir $\\to$ & Identifies CBS $\\to$ \\\\
\\textcolor{lumengray}{\\textbf{Failed: Reservoir}} & \\textcolor{lumengreen}{\\textbf{Success: Oriole}} \\\\
\\end{tabular}
\\vspace{0.3cm}
}}
\caption{High-level comparison of the reasoning paths.}
\label{fig:trajectories}
\end{figure}
\section{Implementation \& Interactive Exploration} \label{sec:impl}
The mathematical models and dynamic visualizations of this physics-grounded phase-space regulator are fully realized in the accompanying interactive application. It provides real-time modeling of the energy landscapes and step-by-step trace debugging without reliance on any external compilation tools.
\section{Conclusion}
The \lumen{} framework represents a paradigm shift in how autonomous AI agents are aligned and controlled. By elevating standard soft textual skills into hard mathematical holonomic constraints monitored in real-time by the Lumen Truth Sentinel, we eliminate stochastic cognitive drift and guarantee trajectory integrity.
This physics-centered approach to agent steering offers a robust, stable, and highly performant architecture for generating secure Motoko canisters on Caffeine AI and the Internet Computer Protocol.
\end{document}