new Integrating Artificial Intelligence, Physics, and Internet of Things: A Framework for Cultural Heritage Conservation

Authors: Carmine Valentino, Federico Pichi, Francesco Colace, Dajana Conte, Gianluigi Rozza

Abstract: The conservation of cultural heritage increasingly relies on integrating technological innovation with domain expertise to ensure effective monitoring and predictive maintenance. This paper presents a novel framework to support the preservation of cultural assets, combining Internet of Things (IoT) and Artificial Intelligence (AI) technologies, enhanced with the physical knowledge of phenomena. The framework is structured into four functional layers that permit the analysis of 3D models of cultural assets and elaborate simulations based on the knowledge acquired from data and physics. A central component of the proposed framework consists of Scientific Machine Learning, particularly Physics-Informed Neural Networks (PINNs), which incorporate physical laws into deep learning models. To enhance computational efficiency, the framework also integrates Reduced Order Methods (ROMs), specifically Proper Orthogonal Decomposition (POD), and is also compatible with classical Finite Element (FE) methods. Additionally, it includes tools to automatically manage and process 3D digital replicas, enabling their direct use in simulations. The proposed approach offers three main contributions: a methodology for processing 3D models of cultural assets for reliable simulation; the application of PINNs to combine data-driven and physics-based approaches in cultural heritage conservation; and the integration of PINNs with ROMs to efficiently model degradation processes influenced by environmental and material parameters. The reproducible and open-access experimental phase exploits simulated scenarios on complex and real-life geometries to test the efficacy of the proposed framework in each of its key components, allowing the possibility of dealing with both direct and inverse problems. Code availability: https://github.com/valc89/PhysicsInformedCulturalHeritage

URLs: https://github.com/valc89/PhysicsInformedCulturalHeritage

new Scaling DPPs for RAG: Density Meets Diversity

Authors: Xun Sun, Baiheng Xie, Li Huang, Qiang Gao

Abstract: Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by grounding generation in external knowledge, yielding relevance responses that are aligned with factual evidence and evolving corpora. Standard RAG pipelines construct context through relevance ranking, performing point-wise scoring between the user query and each corpora chunk. This formulation, however, ignores interactions among retrieved candidates, leading to redundant contexts that dilute density and fail to surface complementary evidence. We argue that effective retrieval should optimize jointly for both density and diversity, ensuring the grounding evidence that is dense in information yet diverse in coverage. In this study, we propose ScalDPP, a diversity-aware retrieval mechanism for RAG that incorporates Determinantal Point Processes (DPPs) through a lightweight P-Adapter, enabling scalable modeling of inter-chunk dependencies and complementary context selection. In addition, we develop a novel set-level objective, Diverse Margin Loss (DML), that enforces ground-truth complementary evidence chains to dominate any equally sized redundant alternatives under DPP geometry. Experimental results demonstrate the superiority of ScalDPP, substantiating our core statement in practice.

new DRAFT: Task Decoupled Latent Reasoning for Agent Safety

Authors: Lin Wang, Junfeng Fang, Dan Zhang, Fei Shen, Xiang Wang, Tat-Seng Chua

Abstract: The advent of tool-using LLM agents shifts safety monitoring from output moderation to auditing long, noisy interaction trajectories, where risk-critical evidence is sparse-making standard binary supervision poorly suited for credit assignment. To address this, we propose DRAFT (Task Decoupled Latent Reasoning for Agent Safety), a latent reasoning framework that decouples safety judgment into two trainable stages: an Extractor that distills the full trajectory into a compact continuous latent draft, and a Reasoner that jointly attends to the draft and the original trajectory to predict safety. DRAFT avoids lossy explicit summarize-then-judge pipelines by performing evidence aggregation in latent space, enabling end-to-end differentiable training.Across benchmarks including ASSEBench and R-Judge, DRAFT consistently outperforms strong baselines, improving accuracy from 63.27% (LoRA) to 91.18% averaged over benchmarks, and learns more separable representations. Ablations demonstrate a clear synergy between the Extractor and the Reasoner.Overall, DRAFT suggests that continuous latent reasoning prior to readout is a practical path to robust agent safety under long-context supervision with sparse evidence.

new General Explicit Network (GEN): A novel deep learning architecture for solving partial differential equations

Authors: Genwei Ma, Ting Luo, Ping Yang, Xing Zhao

Abstract: Machine learning, especially physics-informed neural networks (PINNs) and their neural network variants, has been widely used to solve problems involving partial differential equations (PDEs). The successful deployment of such methods beyond academic research remains limited. For example, PINN methods primarily consider discrete point-to-point fitting and fail to account for the potential properties of real solutions. The adoption of continuous activation functions in these approaches leads to local characteristics that align with the equation solutions while resulting in poor extensibility and robustness. A general explicit network (GEN) that implements point-to-function PDE solving is proposed in this paper. The "function" component can be constructed based on our prior knowledge of the original PDEs through corresponding basis functions for fitting. The experimental results demonstrate that this approach enables solutions with high robustness and strong extensibility to be obtained.

new Apparent Age Estimation: Challenges and Outcomes

Authors: Justin Rainier Go, Lorenz Bernard Marqueses, Mikaella Kaye Martinez, John Kevin Patrick Sarmiento, Abien Fred Agarap

Abstract: Apparent age estimation is a valuable tool for business personalization, yet current models frequently exhibit demographic biases. We review prior works on the DEX method by applying distribution learning techniques such as Mean-Variance Loss (MVL) and Adaptive Mean-Residue Loss (AMRL), and evaluate them in both accuracy and fairness. Using IMDB-WIKI, APPA-REAL, and FairFace, we demonstrate that while AMRL achieves state-of-the-art accuracy, trade-offs between precision and demographic equity persist. Despite clear age clustering in UMAP embeddings, our saliency maps indicate inconsistent feature focus across demographics, leading to significant performance degradation for Asian and African American populations. We argue that technical improvements alone are insufficient; accurate and fair apparent age estimation requires the integration of localized and diverse datasets, and strict adherence to fairness validation protocols.

new NativeTernary: A Self-Delimiting Binary Encoding with Unary Run-Length Hierarchy Markers for Ternary Neural Network Weights, Structured Data, and General Computing Infrastructure

Authors: Maharshi Savdhariya

Abstract: BitNet b1.58 (Ma et al., 2024) demonstrates that large language models can operate entirely on ternary weights {-1, 0, +1}, yet no native binary wire format exists for such models. NativeTernary closes this gap. We present NativeTernary, a binary encoding scheme that partitions the 2-bit pair space into three data symbols representing ternary values -- either balanced {-1, 0, +1} or unsigned {0, 1, 2} -- and a reserved structural delimiter. The central contribution is the use of unary run-length encoding to represent semantic hierarchy depth: a sequence of N consecutive delimiter pairs denotes a boundary of level N, encoding character, word, sentence, paragraph, and topic boundaries at cost 2, 4, 6, 8, and 10 bits respectively -- proportional to boundary rarity. The choice of which 2-bit pair serves as the delimiter is a design parameter: {11} is the primary embodiment, offering simple OR-gate detection; {00} is an alternative embodiment optimised for ultra-low-power CMOS systems, minimising switching activity. All four bit-pair choices are covered by the patent claims. We present three encoding variants: (1) the primary scheme with {11} as sole delimiter; (2) a dual-starter variant where both {10} and {11} initiate distinct symbol namespaces; and (3) an analysis of unsigned versus balanced ternary data mappings. We describe a path toward ternary-native general computing infrastructure requiring no hardware changes, and outline applications spanning ternary neural network weight storage, hierarchical natural language encoding, edge computing, IoT and satellite telemetry, industrial sensors, automotive systems, medical devices, gaming, and financial tick data. The decoder is a 10-line stateless state machine resilient to bitstream corruption.

new Towards Intelligent Energy Security: A Unified Spatio-Temporal and Graph Learning Framework for Scalable Electricity Theft Detection in Smart Grids

Authors: AbdulQoyum A. Olowookere, Usman A. Oguntola, Ebenezer. Leke Odekanle, Maridiyah A. Madehin, Aisha A. Adesope

Abstract: Electricity theft and non-technical losses (NTLs) remain critical challenges in modern smart grids, causing significant economic losses and compromising grid reliability. This study introduces the SmartGuard Energy Intelligence System (SGEIS), an integrated artificial intelligence framework for electricity theft detection and intelligent energy monitoring. The proposed system combines supervised machine learning, deep learning-based time-series modeling, Non-Intrusive Load Monitoring (NILM), and graph-based learning to capture both temporal and spatial consumption patterns. A comprehensive data processing pipeline is developed, incorporating feature engineering, multi-scale temporal analysis, and rule-based anomaly labeling. Deep learning models, including Long Short-Term Memory (LSTM), Temporal Convolutional Networks (TCN), and Autoencoders, are employed to detect abnormal usage patterns. In parallel, ensemble learning methods such as Random Forest, Gradient Boosting, XGBoost, and LightGBM are utilized for classification. To model grid topology and spatial dependencies, Graph Neural Networks (GNNs) are applied to identify correlated anomalies across interconnected nodes. The NILM module enhances interpretability by disaggregating appliance-level consumption from aggregate signals. Experimental results demonstrate strong performance, with Gradient Boosting achieving a ROC-AUC of 0.894, while graph-based models attain over 96% accuracy in identifying high-risk nodes. The hybrid framework improves detection robustness by integrating temporal, statistical, and spatial intelligence. Overall, SGEIS provides a scalable and practical solution for electricity theft detection, offering high accuracy, improved interpretability, and strong potential for real-world smart grid deployment.

new Hardware-Oriented Inference Complexity of Kolmogorov-Arnold Networks

Authors: Bilal Khalid, Pedro Freire, Sergei K. Turitsyn, Jaroslaw E. Prilepsky

Abstract: Kolmogorov-Arnold Networks (KANs) have recently emerged as a powerful architecture for various machine learning applications. However, their unique structure raises significant concerns regarding their computational overhead. Existing studies primarily evaluate KAN complexity in terms of Floating-Point Operations (FLOPs) required for GPU-based training and inference. However, in many latency-sensitive and power-constrained deployment scenarios, such as neural network-driven non-linearity mitigation in optical communications or channel state estimation in wireless communications, training is performed offline and dedicated hardware accelerators are preferred over GPUs for inference. Recent hardware implementation studies report KAN complexity using platform-specific resource consumption metrics, such as Look-Up Tables, Flip-Flops, and Block RAMs. However, these metrics require a full hardware design and synthesis stage that limits their utility for early-stage architectural decisions and cross-platform comparisons. To address this, we derive generalized, platform-independent formulae for evaluating the hardware inference complexity of KANs in terms of Real Multiplications (RM), Bit Operations (BOP), and Number of Additions and Bit-Shifts (NABS). We extend our analysis across multiple KAN variants, including B-spline, Gaussian Radial Basis Function (GRBF), Chebyshev, and Fourier KANs. The proposed metrics can be computed directly from the network structure and enable a fair and straightforward inference complexity comparison between KAN and other neural network architectures.

new From Model-Based Screening to Data-Driven Surrogates: A Multi-Stage Workflow for Exploring Stochastic Agent-Based Models

Authors: Paul Saves, Matthieu Mastio, Nicolas Verstaevel, Benoit Gaudou

Abstract: Systematic exploration of Agent-Based Models (ABMs) is challenged by the curse of dimensionality and their inherent stochasticity. We present a multi-stage pipeline integrating the systematic design of experiments with machine learning surrogates. Using a predator-prey case study, our methodology proceeds in two steps. First, an automated model-based screening identifies dominant variables, assesses outcome variability, and segments the parameter space. Second, we train Machine Learning models to map the remaining nonlinear interaction effects. This approach automates the discovery of unstable regions where system outcomes are highly dependent on nonlinear interactions between many variables. Thus, this work provides modelers with a rigorous, hands-off framework for sensitivity analysis and policy testing, even when dealing with high-dimensional stochastic simulators.

new The limits of bio-molecular modeling with large language models : a cross-scale evaluation

Authors: Yaxin Xu, Yue Zhou, Tianyu Zhao, Fengwei An, Zhixiang Ren

Abstract: The modeling of bio-molecular system across molecular scales remains a central challenge in scientific research. Large language models (LLMs) are increasingly applied to bio-molecular discovery, yet systematic evaluation across multi-scale biological problems and rigorous assessment of their tool-augmented capabilities remain limited. We reveal a systematic gap between LLM performance and mechanistic understanding through the proposed cross-scale bio-molecular benchmark: BioMol-LLM-Bench, a unified framework comprising 26 downstream tasks that covers 4 distinct difficulty levels, and computational tools are integrated for a more comprehensive evaluation. Evaluation on 13 representative models reveals 4 main findings: chain-of-thought data provides limited benefit and may even reduce performance on biological tasks; hybrid mamba-attention architectures are more effective for long bio-molecular sequences; supervised fine-tuning improves specialization at the cost of generalization; and current LLMs perform well on classification tasks but remain weak on challenging regression tasks. Together, these findings provide practical guidance for future LLM-based modeling of molecular systems.

new Scalable Variational Bayesian Fine-Tuning of LLMs via Orthogonalized Low-Rank Adapters

Authors: Haotian Xiang, Bingcong Li, Qin Lu

Abstract: When deploying large language models (LLMs) to safety-critical applications, uncertainty quantification (UQ) is of utmost importance to self-assess the reliability of the LLM-based decisions. However, such decisions typically suffer from overconfidence, particularly after parameter-efficient fine-tuning (PEFT) for downstream domain-specific tasks with limited data. Existing methods to alleviate this issue either rely on Laplace approximation based post-hoc framework, which may yield suboptimal calibration depending on the training trajectory, or variational Bayesian training that requires multiple complete forward passes through the entire LLM backbone at inference time for Monte Carlo estimation, posing scalability challenges for deployment. To address these limitations, we build on the Bayesian last layer (BLL) model, where the LLM-based deterministic feature extractor is followed by random last layer parameters for uncertainty reasoning. Since existing low-rank adapters (LoRA) for PEFT have limited expressiveness due to rank collapse, we address this with Polar-decomposed Low-rank Adapter Representation (PoLAR), an orthogonalized parameterization paired with Riemannian optimization to enable more stable and expressive adaptation. Building on this PoLAR-BLL model, we leverage the variational (V) inference framework to put forth a scalable Bayesian fine-tuning approach which jointly seeks the PoLAR parameters and approximate posterior of the last layer parameters via alternating optimization. The resulting PoLAR-VBLL is a flexible framework that nicely integrates architecture-enhanced optimization with scalable Bayesian inference to endow LLMs with well-calibrated UQ. Our empirical results verify the effectiveness of PoLAR-VBLL in terms of generalization and uncertainty estimation on both in-distribution and out-of-distribution data for various common-sense reasoning tasks.

new Beauty in the Eye of AI: Aligning LLMs and Vision Models with Human Aesthetics in Network Visualization

Authors: Peng Zhang, Xuefeng Li, Xiaoqi Wang, Han-Wei Shen, Yifan Hu

Abstract: Network visualization has traditionally relied on heuristic metrics, such as stress, under the assumption that optimizing them leads to aesthetic and informative layouts. However, no single metric consistently produces the most effective results. A data-driven alternative is to learn from human preferences, where annotators select their favored visualization among multiple layouts of the same graphs. These human-preference labels can then be used to train a generative model that approximates human aesthetic preferences. However, obtaining human labels at scale is costly and time-consuming. As a result, this generative approach has so far been tested only with machine-labeled data. In this paper, we explore the use of large language models (LLMs) and vision models (VMs) as proxies for human judgment. Through a carefully designed user study involving 27 participants, we curated a large set of human preference labels. We used this data both to better understand human preferences and to bootstrap LLM/VM labelers. We show that prompt engineering that combines few-shot examples and diverse input formats, such as image embeddings, significantly improves LLM-human alignment, and additional filtering by the confidence score of the LLM pushes the alignment to human-human levels. Furthermore, we demonstrate that carefully trained VMs can achieve VM-human alignment at a level comparable to that between human annotators. Our results suggest that AI can feasibly serve as a scalable proxy for human labelers.

new Adaptive Threshold-Driven Continuous Greedy Method for Scalable Submodular Optimization

Authors: Mohammadreza Rostami, Solmaz S. Kia

Abstract: Submodular maximization under matroid constraints is a fundamental problem in combinatorial optimization with applications in sensing, data summarization, active learning, and resource allocation. While the Sequential Greedy (SG) algorithm achieves only a $\frac{1}{2}$-approximation due to irrevocable selections, Continuous Greedy (CG) attains the optimal $\bigl(1-\frac{1}{e}\bigr)$-approximation via the multilinear relaxation, at the cost of a progressively dense decision vector that forces agents to exchange feature embeddings for nearly every ground-set element. We propose \textit{ATCG} (\underline{A}daptive \underline{T}hresholded \underline{C}ontinuous \underline{G}reedy), which gates gradient evaluations behind a per-partition progress ratio $\eta_i$, expanding each agent's active set only when current candidates fail to capture sufficient marginal gain, thereby directly bounding which feature embeddings are ever transmitted. Theoretical analysis establishes a curvature-aware approximation guarantee with effective factor $\tau_{\mathrm{eff}}=\max\{\tau,1-c\}$, interpolating between the threshold-based guarantee and the low-curvature regime where \textit{ATCG} recovers the performance of CG. Experiments on a class-balanced prototype selection problem over a subset of the CIFAR-10 animal dataset show that \textit{ATCG} achieves objective values comparable to those of the full CG method while substantially reducing communication overhead through adaptive active-set expansion.

new Adversarial Robustness of Deep State Space Models for Forecasting

Authors: Sribalaji C. Anand, George J. Pappas

Abstract: State-space model (SSM) for time-series forecasting have demonstrated strong empirical performance on benchmark datasets, yet their robustness under adversarial perturbations is poorly understood. We address this gap through a control-theoretic lens, focusing on the recently proposed Spacetime SSM forecaster. We first establish that the decoder-only Spacetime architecture can represent the optimal Kalman predictor when the underlying data-generating process is autoregressive - a property no other SSM possesses. Building on this, we formulate robust forecaster design as a Stackelberg game against worst-case stealthy adversaries constrained by a detection budget, and solve it via adversarial training. We derive closed-form bounds on adversarial forecasting error that expose how open-loop instability, closed-loop instability, and decoder state dimension each amplify vulnerability - offering actionable principles towards robust forecaster design. Finally, we show that even adversaries with no access to the forecaster can nonetheless construct effective attacks by exploiting the model's locally linear input-output behavior, bypassing gradient computations entirely. Experiments on the Monash benchmark datasets highlight that model-free attacks, without any gradient computation, can cause at least 33% more error than projected gradient descent with a small step size.

new MetaSAEs: Joint Training with a Decomposability Penalty Produces More Atomic Sparse Autoencoder Latents

Authors: Matthew Levinson

Abstract: Sparse autoencoders (SAEs) are increasingly used for safety-relevant applications including alignment detection and model steering. These use cases require SAE latents to be as atomic as possible. Each latent should represent a single coherent concept drawn from a single underlying representational subspace. In practice, SAE latents blend representational subspaces together. A single feature can activate across semantically distinct contexts that share no true common representation, muddying an already complex picture of model computation. We introduce a joint training objective that directly penalizes this subspace blending. A small meta SAE is trained alongside the primary SAE to sparsely reconstruct the primary SAE's decoder columns; the primary SAE is penalized whenever its decoder directions are easy to reconstruct from the meta dictionary. This occurs whenever latent directions lie in a subspace spanned by other primary directions. This creates gradient pressure toward more mutually independent decoder directions that resist sparse meta-compression. On GPT-2 large (layer 20), the selected configuration reduces mean $|\varphi|$ by 7.5% relative to an identical solo SAE trained on the same data. Automated interpretability (fuzzing) scores improve by 7.6%, providing external validation of the atomicity gain independent of the training and co-occurrence metrics. Reconstruction overhead is modest. Results on Gemma 2 9B are directional. On not-fully-converged SAEs, the same parameterization yields the best results, a $+8.6\%$ $\Delta$Fuzz. Though directional, this is an encouraging sign that the method transfers to a larger model. Qualitative analysis confirms that features firing on polysemantic tokens are split into semantically distinct sub-features, each specializing in a distinct representational subspace.

new Olmo Hybrid: From Theory to Practice and Back

Authors: William Merrill, Yanhong Li, Tyler Romero, Anej Svete, Caia Costello, Pradeep Dasigi, Dirk Groeneveld, David Heineman, Bailey Kuehl, Nathan Lambert, Chuan Li, Kyle Lo, Saumya Malik, DJ Matusz, Benjamin Minixhofer, Jacob Morrison, Luca Soldaini, Finbarr Timbers, Pete Walsh, Noah A. Smith, Hannaneh Hajishirzi, Ashish Sabharwal

Abstract: Recent work has demonstrated the potential of non-transformer language models, especially linear recurrent neural networks (RNNs) and hybrid models that mix recurrence and attention. Yet there is no consensus on whether the potential benefits of these new architectures justify the risk and effort of scaling them up. To address this, we provide evidence for the advantages of hybrid models over pure transformers on several fronts. First, theoretically, we show that hybrid models do not merely inherit the expressivity of transformers and linear RNNs, but can express tasks beyond both, such as code execution. Putting this theory to practice, we train Olmo Hybrid, a 7B-parameter model largely comparable to Olmo 3 7B but with the sliding window layers replaced by Gated DeltaNet layers. We show that Olmo Hybrid outperforms Olmo 3 across standard pretraining and mid-training evaluations, demonstrating the benefit of hybrid models in a controlled, large-scale setting. We find that the hybrid model scales significantly more efficiently than the transformer, explaining its higher performance. However, its unclear why greater expressivity on specific formal problems should result in better scaling or superior performance on downstream tasks unrelated to those problems. To explain this apparent gap, we return to theory and argue why increased expressivity should translate to better scaling efficiency, completing the loop. Overall, our results suggest that hybrid models mixing attention and recurrent layers are a powerful extension to the language modeling paradigm: not merely to reduce memory during inference, but as a fundamental way to obtain more expressive models that scale better during pretraining.

new Neural Operators for Multi-Task Control and Adaptation

Authors: David Sewell, Xingjian Li, Stepan Tretiakov, Krishna Kumar, David Fridovich-Keil

Abstract: Neural operator methods have emerged as powerful tools for learning mappings between infinite-dimensional function spaces, yet their potential in optimal control remains largely unexplored. We focus on multi-task control problems, whose solution is a mapping from task description (e.g., cost or dynamics functions) to optimal control law (e.g., feedback policy). We approximate these solution operators using a permutation-invariant neural operator architecture. Across a range of parametric optimal control environments and a locomotion benchmark, a single operator trained via behavioral cloning accurately approximates the solution operator and generalizes to unseen tasks, out-of-distribution settings, and varying amounts of task observations. We further show that the branch-trunk structure of our neural operator architecture enables efficient and flexible adaptation to new tasks. We develop structured adaptation strategies ranging from lightweight updates to full-network fine-tuning, achieving strong performance across different data and compute settings. Finally, we introduce meta-trained operator variants that optimize the initialization for few-shot adaptation. These methods enable rapid task adaptation with limited data and consistently outperform a popular meta-learning baseline. Together, our results demonstrate that neural operators provide a unified and efficient framework for multi-task control and adaptation.

new Earth Embeddings Reveal Diverse Urban Signals from Space

Authors: Wenjing Gong, Udbhav Srivastava, Yuchen Wang, Yuhao Jia, Qifan Wu, Weishan Bai, Yifan Yang, Xiao Huang, Xinyue Ye

Abstract: Conventional urban indicators derived from censuses, surveys, and administrative records are often costly, spatially inconsistent, and slow to update. Recent geospatial foundation models enable Earth embeddings, compact satellite image representations transferable across downstream tasks, but their utility for neighborhood-scale urban monitoring remains unclear. Here, we benchmark three Earth embedding families, AlphaEarth, Prithvi, and Clay, for urban signal prediction across six U.S. metropolitan areas from 2020 to 2023. Using a unified supervised-learning framework, we predict 14 neighborhood-level indicators spanning crime, income, health, and travel behavior, and evaluate performance under four settings: global, city-wise, year-wise, and city-year. Results show that Earth embeddings capture substantial urban variation, with the highest predictive skill for outcomes more directly tied to built-environment structure, including chronic health burdens and dominant commuting modes. By contrast, indicators shaped more strongly by fine-scale behavior and local policy, such as cycling, remain difficult to infer. Predictive performance varies markedly across cities but remains comparatively stable across years, indicating strong spatial heterogeneity alongside temporal robustness. Exploratory analysis suggests that cross-city variation in predictive performance is associated with urban form in task-specific ways. Controlled dimensionality experiments show that representation efficiency is critical: compact 64-dimensional AlphaEarth embeddings remain more informative than 64-dimensional reductions of Prithvi and Clay. This study establishes a benchmark for evaluating Earth embeddings in urban remote sensing and demonstrates their potential as scalable, low-cost features for SDG-aligned neighborhood-scale urban monitoring.

new Super Agents and Confounders: Influence of surrounding agents on vehicle trajectory prediction

Authors: Daniel Jost, Luca Paparusso, Martin Stoll, J\"org Wagner, Raghu Rajan, Joschka B\"odecker

Abstract: In highly interactive driving scenes, trajectory prediction is conditioned on information from surrounding traffic participants such as cars and pedestrians. Our main contribution is a comprehensive analysis of state-of-the-art trajectory predictors, which reveals a surprising and critical flaw: many surrounding agents degrade prediction accuracy rather than improve it. Using Shapley-based attribution, we rigorously demonstrate that models learn unstable and non-causal decision-making schemes that vary significantly across training runs. Building on these insights, we propose to integrate a Conditional Information Bottleneck (CIB), which does not require additional supervision and is trained to effectively compress agent features as well as ignore those that are not beneficial for the prediction task. Comprehensive experiments using multiple datasets and model architectures demonstrate that this simple yet effective approach not only improves overall trajectory prediction performance in many cases but also increases robustness to different perturbations. Our results highlight the importance of selectively integrating contextual information, which can often contain spurious or misleading signals, in trajectory prediction. Moreover, we provide interpretable metrics for identifying non-robust behavior and present a promising avenue towards a solution.

new Investigating Data Interventions for Subgroup Fairness: An ICU Case Study

Authors: Erin Tan, Judy Hanwen Shen, Irene Y. Chen

Abstract: In high-stakes settings where machine learning models are used to automate decision-making about individuals, the presence of algorithmic bias can exacerbate systemic harm to certain subgroups of people. These biases often stem from the underlying training data. In practice, interventions to "fix the data" depend on the actual additional data sources available -- where many are less than ideal. In these cases, the effects of data scaling on subgroup performance become volatile, as the improvements from increased sample size are counteracted by the introduction of distribution shifts in the training set. In this paper, we investigate the limitations of combining data sources to improve subgroup performance within the context of healthcare. Clinical models are commonly trained on datasets comprised of patient electronic health record (EHR) data from different hospitals or admission departments. Across two such datasets, the eICU Collaborative Research Database and the MIMIC-IV dataset, we find that data addition can both help and hurt model fairness and performance, and many intuitive strategies for data selection are unreliable. We compare model-based post-hoc calibration and data-centric addition strategies to find that the combination of both is important to improve subgroup performance. Our work questions the traditional dogma of "better data" for overcoming fairness challenges by comparing and combining data- and model-based approaches.

new Improving Feasibility via Fast Autoencoder-Based Projections

Authors: Maria Chzhen, Priya L. Donti

Abstract: Enforcing complex (e.g., nonconvex) operational constraints is a critical challenge in real-world learning and control systems. However, existing methods struggle to efficiently enforce general classes of constraints. To address this, we propose a novel data-driven amortized approach that uses a trained autoencoder as an approximate projector to provide fast corrections to infeasible predictions. Specifically, we train an autoencoder using an adversarial objective to learn a structured, convex latent representation of the feasible set. This enables rapid correction of neural network outputs by projecting their associated latent representations onto a simple convex shape before decoding into the original feasible set. We test our approach on a diverse suite of constrained optimization and reinforcement learning problems with challenging nonconvex constraints. Results show that our method effectively enforces constraints at a low computational cost, offering a practical alternative to expensive feasibility correction techniques based on traditional solvers.

new Online learning of smooth functions on $\mathbb{R}$

Authors: Jesse Geneson, Kuldeep Singh, Alexander Wang

Abstract: We study adversarial online learning of real-valued functions on $\mathbb{R}$. In each round the learner is queried at $x_t\in\mathbb{R}$, predicts $\hat y_t$, and then observes the true value $f(x_t)$; performance is measured by cumulative $p$-loss $\sum_{t\ge 1}|\hat y_t-f(x_t)|^p$. For the class \[ \mathcal{G}_q=\Bigl\{f:\mathbb{R}\to\mathbb{R}\ \text{absolutely continuous}:\ \int_{\mathbb{R}}|f'(x)|^q\,dx\le 1\Bigr\}, \] we show that the standard model becomes ill-posed on $\mathbb{R}$: for every $p\ge 1$ and $q>1$, an adversary can force infinite loss. Motivated by this obstruction, we analyze three modified learning scenarios that limit the influence of queries that are far from previously observed inputs. In Scenario 1 the adversary must choose each new query within distance $1$ of some past query. In Scenario 2 the adversary may query anywhere, but the learner is penalized only on rounds whose query lies within distance $1$ of a past query. In Scenario 3 the loss in round $t$ is multiplied by a weight $g(\min_{j

new Choosing the Right Regularizer for Applied ML: Simulation Benchmarks of Popular Scikit-learn Regularization Frameworks

Authors: Benjamin S. Knight, Ahsaas Bajaj

Abstract: This study surveys the historical development of regularization, tracing its evolution from stepwise regression in the 1960s to recent advancements in formal error control, structured penalties for non-independent features, Bayesian methods, and l0-based regularization (among other techniques). We empirically evaluate the performance of four canonical frameworks -- Ridge, Lasso, ElasticNet, and Post-Lasso OLS -- across 134,400 simulations spanning a 7-dimensional manifold grounded in eight production-grade machine learning models. Our findings demonstrate that for prediction accuracy when the sample-to-feature ratio is sufficient (n/p >= 78), Ridge, Lasso, and ElasticNet are nearly interchangeable. However, we find that Lasso recall is highly fragile under multicollinearity; at high condition numbers (kappa) and low SNR, Lasso recall collapses to 0.18 while ElasticNet maintains 0.93. Consequently, we advise practitioners against using Lasso or Post-Lasso OLS at high kappa with small sample sizes. The analysis concludes with an objective-driven decision guide to assist machine learning engineers in selecting the optimal scikit-learn-supported framework based on observable feature space attributes.

new Simple yet Effective: Low-Rank Spatial Attention for Neural Operators

Authors: Zherui Yang, Haiyang Xin, Tao Du, Ligang Liu

Abstract: Neural operators have emerged as data-driven surrogates for solving partial differential equations (PDEs), and their success hinges on efficiently modeling the long-range, global coupling among spatial points induced by the underlying physics. In many PDE regimes, the induced global interaction kernels are empirically compressible, exhibiting rapid spectral decay that admits low-rank approximations. We leverage this observation to unify representative global mixing modules in neural operators under a shared low-rank template: compressing high-dimensional pointwise features into a compact latent space, processing global interactions within it, and reconstructing the global context back to spatial points. Guided by this view, we introduce Low-Rank Spatial Attention (LRSA) as a clean and direct instantiation of this template. Crucially, unlike prior approaches that often rely on non-standard aggregation or normalization modules, LRSA is built purely from standard Transformer primitives, i.e., attention, normalization, and feed-forward networks, yielding a concise block that is straightforward to implement and directly compatible with hardware-optimized kernels. In our experiments, such a simple construction is sufficient to achieve high accuracy, yielding an average error reduction of over 17\% relative to second-best methods, while remaining stable and efficient in mixed-precision training.

new Evaluation of Bagging Predictors with Kernel Density Estimation and Bagging Score

Authors: Philipp Seitz, Jan Schmitt, Andreas Schiffler

Abstract: For a larger set of predictions of several differently trained machine learning models, known as bagging predictors, the mean of all predictions is taken by default. Nevertheless, this proceeding can deviate from the actual ground truth in certain parameter regions. An approach is presented to determine a representative y_BS from such a set of predictions using Kernel Density Estimation (KDE) in nonlinear regression with Neural Networks (NN) which simultaneously provides an associated quality criterion beta_BS, called Bagging Score (BS), that reflects the confidence of the obtained ensemble prediction. It is shown that working with the new approach better predictions can be made than working with the common use of mean or median. In addition to this, the used method is contrasted to several approaches of nonlinear regression from the literatur, resulting in a top ranking in each of the calculated error values without using any optimization or feature selection technique.

new BlazeFL: Fast and Deterministic Federated Learning Simulation

Authors: Kitsuya Azuma, Takayuki Nishio

Abstract: Federated learning (FL) research increasingly relies on single-node simulations with hundreds or thousands of virtual clients, making both efficiency and reproducibility essential. Yet parallel client training often introduces nondeterminism through shared random state and scheduling variability, forcing researchers to trade throughput for reproducibility or to implement custom control logic within complex frameworks. We present BlazeFL, a lightweight framework for single-node FL simulation that alleviates this trade-off through free-threaded shared-memory execution and deterministic randomness management. BlazeFL uses thread-based parallelism with in-memory parameter exchange between the server and clients, avoiding serialization and inter-process communication overhead. To support deterministic execution, BlazeFL assigns isolated random number generator (RNG) streams to clients. Under a fixed software/hardware stack, and when stochastic operators consume BlazeFL-managed generators, this design yields bitwise-identical results across repeated high-concurrency runs in both thread-based and process-based modes. In CIFAR-10 image-classification experiments, BlazeFL substantially reduces execution time relative to a widely used open-source baseline, achieving up to 3.1$\times$ speedup on communication-dominated workloads while preserving a lightweight dependency footprint. Our open-source implementation is available at: https://github.com/kitsuyaazuma/blazefl.

URLs: https://github.com/kitsuyaazuma/blazefl.

new Neural Global Optimization via Iterative Refinement from Noisy Samples

Authors: Qusay Muzaffar, David Levin, Michael Werman

Abstract: Global optimization of black-box functions from noisy samples is a fundamental challenge in machine learning and scientific computing. Traditional methods such as Bayesian Optimization often converge to local minima on multi-modal functions, while gradient-free methods require many function evaluations. We present a novel neural approach that learns to find global minima through iterative refinement. Our model takes noisy function samples and their fitted spline representation as input, then iteratively refines an initial guess toward the true global minimum. Trained on randomly generated functions with ground truth global minima obtained via exhaustive search, our method achieves a mean error of 8.05 percent on challenging multi-modal test functions, compared to 36.24 percent for the spline initialization, a 28.18 percent improvement. The model successfully finds global minima in 72 percent of test cases with error below 10 percent, demonstrating learned optimization principles rather than mere curve fitting. Our architecture combines encoding of multiple modalities including function values, derivatives, and spline coefficients with iterative position updates, enabling robust global optimization without requiring derivative information or multiple restarts.

new Algebraic Diversity: Group-Theoretic Spectral Estimation from Single Observations

Authors: Mitchell A. Thornton

Abstract: We prove that temporal averaging over multiple observations can be replaced by algebraic group action on a single observation for second-order statistical estimation. A General Replacement Theorem establishes conditions under which a group-averaged estimator from one snapshot achieves equivalent subspace decomposition to multi-snapshot covariance estimation, and an Optimality Theorem proves that the symmetric group is universally optimal (yielding the KL transform). The framework unifies the DFT, DCT, and KLT as special cases of group-matched spectral transforms, with a closed-form double-commutator eigenvalue problem for polynomial-time optimal group selection. Five applications are demonstrated: MUSIC DOA estimation from a single snapshot, massive MIMO channel estimation with 64% throughput gain, single-pulse waveform classification at 90% accuracy, graph signal processing with non-Abelian groups, and a new algebraic analysis of transformer LLMs revealing that RoPE uses the wrong algebraic group for 70-80% of attention heads across five models (22,480 head observations), that the optimal group is content-dependent, and that spectral-concentration-based pruning improves perplexity at the 13B scale. All diagnostics require a single forward pass with no gradients or training.

new Delayed Homomorphic Reinforcement Learning for Environments with Delayed Feedback

Authors: Jongsoo Lee, Jangwon Kim, Soohee Han

Abstract: Reinforcement learning in real-world systems is often accompanied by delayed feedback, which breaks the Markov assumption and impedes both learning and control. Canonical state augmentation approaches cause the state-space explosion, which introduces a severe sample-complexity burden. Despite recent progress, the state-of-the-art augmentation-based baselines remain incomplete: they either predominantly reduce the burden on the critic or adopt non-unified treatments for the actor and critic. To provide a structured and sample-efficient solution, we propose delayed homomorphic reinforcement learning (DHRL), a framework grounded in MDP homomorphisms that collapses belief-equivalent augmented states and enables efficient policy learning on the resulting abstract MDP without loss of optimality. We provide theoretical analyses of state-space compression bounds and sample complexity, and introduce a practical algorithm. Experiments on continuous control tasks in MuJoCo benchmark confirm that our algorithm outperforms strong augmentation-based baselines, particularly under long delays.

new Automated Attention Pattern Discovery at Scale in Large Language Models

Authors: Jonathan Katzy, Razvan-Mihai Popescu, Erik Mekkes, Arie van Deursen, Maliheh Izadi

Abstract: Large language models have found success by scaling up capabilities to work in general settings. The same can unfortunately not be said for interpretability methods. The current trend in mechanistic interpretability is to provide precise explanations of specific behaviors in controlled settings. These often do not generalize, or are too resource intensive for larger studies. In this work we propose to study repeated behaviors in large language models by mining completion scenarios in Java code datasets, through exploiting the structured nature of code. We collect the attention patterns generated in the attention heads to demonstrate that they are scalable signals for global interpretability of model components. We show that vision models offer a promising direction for analyzing attention patterns at scale. To demonstrate this, we introduce the Attention Pattern - Masked Autoencoder(AP-MAE), a vision transformer-based model that efficiently reconstructs masked attention patterns. Experiments on StarCoder2 show that AP-MAE (i) reconstructs masked attention patterns with high accuracy, (ii) generalizes across unseen models with minimal degradation, (iii) reveals recurring patterns across inferences, (iv) predicts whether a generation will be correct without access to ground truth, with accuracies ranging from 55% to 70% depending on the task, and (v) enables targeted interventions that increase accuracy by 13.6% when applied selectively, but cause collapse when applied excessively. These results establish attention patterns as a scalable signal for interpretability and demonstrate that AP-MAE provides a transferable foundation for both analysis and intervention in large language models. Beyond its standalone value, AP-MAE also serves as a selection procedure to guide fine-grained mechanistic approaches. We release code and models to support future work in large-scale interpretability.

new CountsDiff: A Diffusion Model on the Natural Numbers for Generation and Imputation of Count-Based Data

Authors: Renzo G. Soatto, Anders Hoel, Greycen Ren, Shorna Alam, Stephen Bates, Nikolaos P. Daskalakis, Caroline Uhler, Maria Skoularidou

Abstract: Diffusion models have excelled at generative tasks for both continuous and token-based domains, but their application to discrete ordinal data remains underdeveloped. We present CountsDiff, a diffusion framework designed to natively model distributions on the natural numbers. CountsDiff extends the Blackout diffusion framework by simplifying its formulation through a direct parameterization in terms of a survival probability schedule and an explicit loss weighting. This introduces flexibility through design parameters with direct analogues in existing diffusion modeling frameworks. Beyond this reparameterization, CountsDiff introduces features from modern diffusion models, previously absent in counts-based domains, including continuous-time training, classifier-free guidance, and churn/remasking reverse dynamics that allow non-monotone reverse trajectories. We propose an initial instantiation of CountsDiff and validate it on natural image datasets (CIFAR-10, CelebA), exploring the effects of varying the introduced design parameters in a complex, well-studied, and interpretable data domain. We then highlight biological count assays as a natural use case, evaluating CountsDiff on single-cell RNA-seq imputation in a fetal cell and heart cell atlas. Remarkably, we find that even this simple instantiation matches or surpasses the performance of a state-of-the-art discrete generative model and leading RNA-seq imputation methods, while leaving substantial headroom for further gains through optimized design choices in future work.

new Automated Conjecture Resolution with Formal Verification

Authors: Haocheng Ju, Guoxiong Gao, Jiedong Jiang, Bin Wu, Zeming Sun, Leheng Chen, Yutong Wang, Yuefeng Wang, Zichen Wang, Wanyi He, Peihao Wu, Liang Xiao, Ruochuan Liu, Bryan Dai, Bin Dong

Abstract: Recent advances in large language models have significantly improved their ability to perform mathematical reasoning, extending from elementary problem solving to increasingly capable performance on research-level problems. However, reliably solving and verifying such problems remains challenging due to the inherent ambiguity of natural language reasoning. In this paper, we propose an automated framework for tackling research-level mathematical problems that integrates natural language reasoning with formal verification, enabling end-to-end problem solving with minimal human intervention. Our framework consists of two components: an informal reasoning agent, Rethlas, and a formal verification agent, Archon. Rethlas mimics the workflow of human mathematicians by combining reasoning primitives with our theorem search engine, Matlas, to explore solution strategies and construct candidate proofs. Archon, equipped with our formal theorem search engine LeanSearch, translates informal arguments into formalized Lean 4 projects through structured task decomposition, iterative refinement, and automated proof synthesis, ensuring machine-checkable correctness. Using this framework, we automatically resolve an open problem in commutative algebra and formally verify the resulting proof in Lean 4 with essentially no human involvement. Our experiments demonstrate that strong theorem retrieval tools enable the discovery and application of cross-domain mathematical techniques, while the formal agent is capable of autonomously filling nontrivial gaps in informal arguments. More broadly, our work illustrates a promising paradigm for mathematical research in which informal and formal reasoning systems, equipped with theorem retrieval tools, operate in tandem to produce verifiable results, substantially reduce human effort, and offer a concrete instantiation of human-AI collaborative mathematical research.

new Representational Collapse in Multi-Agent LLM Committees: Measurement and Diversity-Aware Consensus

Authors: Dipkumar Patel

Abstract: Multi-agent LLM committees replicate the same model under different role prompts and aggregate outputs by majority vote, implicitly assuming that agents contribute complementary evidence. We embed each agent's chain-of-thought rationale and measure pairwise similarity: across 100 GSM8K questions with three Qwen2.5-14B agents, mean cosine similarity is 0.888 and effective rank is 2.17 out of 3.0, a failure mode we term representational collapse. DALC, a training-free consensus protocol that computes diversity weights from embedding geometry, reaches 87% on GSM8K versus 84% for self-consistency at 26% lower token cost. Ablation experiments reveal 1-3 point per-protocol run-to-run variance, confirm that hint sharing contributes more than diversity weighting alone, and show that encoder choice strongly modulates collapse severity (cosine 0.908 with mxbai versus 0.888 with nomic) and downstream accuracy. The more robust finding is that collapse is measurable, worsens on harder tasks, and that the choice of embedding proxy is a first-order design decision for any latent communication protocol.

new k-Maximum Inner Product Attention for Graph Transformers and the Expressive Power of GraphGPS The Expressive Power of GraphGPS

Authors: Jonas De Schouwer, Haitz S\'aez de Oc\'ariz Borde, Xiaowen Dong

Abstract: Graph transformers have shown promise in overcoming limitations of traditional graph neural networks, such as oversquashing and difficulties in modelling long-range dependencies. However, their application to large-scale graphs is hindered by the quadratic memory and computational complexity of the all-to-all attention mechanism. Although alternatives such as linearized attention and restricted attention patterns have been proposed, these often degrade performance or limit expressive power. To better balance efficiency and effectiveness, we introduce k-Maximum Inner Product (k-MIP) attention for graph transformers. k-MIP attention selects the most relevant key nodes per query via a top-k operation, yielding a sparse yet flexible attention pattern. Combined with an attention score computation based on symbolic matrices, this results in linear memory complexity and practical speedups of up to an order of magnitude compared to all-to-all attention, enabling the processing of graphs with over 500k nodes on a single A100 GPU. We provide a theoretical analysis of expressive power, showing that k-MIP attention does not compromise the expressiveness of graph transformers: specifically, we prove that k-MIP transformers can approximate any full-attention transformer to arbitrary precision. In addition, we analyze the expressive power of the GraphGPS framework, in which we integrate our attention mechanism, and establish an upper bound on its graph distinguishing capability in terms of the S-SEG-WL test. Finally, we validate our approach on the Long Range Graph Benchmark, the City-Networks benchmark, and two custom large-scale inductive point cloud datasets, consistently ranking among the top-performing scalable graph transformers.

new Collapse-Free Prototype Readout Layer for Transformer Encoders

Authors: Giansalvo Cirrincione, Rahul Ranjeev Kumar

Abstract: DDCL-Attention is a prototype-based readout layer for transformer encoders that replaces simple pooling methods, such as mean pooling or class tokens, with a learned compression mechanism. It uses a small set of global prototype vectors and assigns tokens to them through soft probabilistic matching, producing compact token summaries at linear complexity in sequence length. The method offers three main advantages. First, it avoids prototype collapse through an exact decomposition of the training loss into a reconstruction term and a diversity term, ensuring that prototypes remain distinct. Second, its joint training with the encoder is shown to be stable under a practical timescale condition, using Tikhonov's singular perturbation theory and explicit learning-rate constraints. Third, the same framework supports three uses: a final readout layer, a differentiable codebook extending VQ-VAE, and a hierarchical document compressor. Experiments on four datasets confirm the theoretical predictions: the loss decomposition holds exactly, prototype separation grows as expected when the stability condition is met, and the codebook reaches full utilization, outperforming standard hard vector quantization. An additional study on orbital debris classification shows that the method also applies beyond standard NLP and vision tasks, including scientific tabular data.

new Understanding When Poisson Log-Normal Models Outperform Penalized Poisson Regression for Microbiome Count Data

Authors: Daniel Agyapong, Julien Chiquet, Jane Marks, Toby Dylan Hocking

Abstract: Multivariate count models are often justified by their ability to capture latent dependence, but researchers receive little guidance on when this added structure improves on simpler penalized marginal Poisson regression. We study this question using real microbiome data under a unified held-out evaluation framework. For count prediction, we compare PLN and GLMNet(Poisson) on 20 datasets spanning 32 to 18,270 samples and 24 to 257 taxa, using held-out Poisson deviance under leave-one-taxon-out prediction with 3-fold sample cross-validation rather than synthetic or in-sample criteria. For network inference, we compare PLNNetwork and GLMNet(Poisson) neighborhood selection on five publicly available datasets with experimentally validated microbial interaction truth. PLN outperforms GLMNet(Poisson) on most count-prediction datasets, with gains up to 38 percent. The primary predictor of the winner is the sample-to-taxon ratio, with mean absolute correlation as the strongest secondary signal and overdispersion as an additional predictor. PLNNetwork performs best on broad undirected interaction benchmarks, whereas GLMNet(Poisson) is better aligned with local or directional effects. Taken together, these results provide guidance for choosing between latent multivariate count models and penalized Poisson regression in biological count prediction and interaction recovery.

new A Bayesian Information-Theoretic Approach to Data Attribution

Authors: Dharmesh Tailor, Nicol\`o Felicioni, Kamil Ciosek

Abstract: Training Data Attribution (TDA) seeks to trace model predictions back to influential training examples, enhancing interpretability and safety. We formulate TDA as a Bayesian information-theoretic problem: subsets are scored by the information loss they induce - the entropy increase at a query when removed. This criterion credits examples for resolving predictive uncertainty rather than label noise. To scale to modern networks, we approximate information loss using a Gaussian Process surrogate built from tangent features. We show this aligns with classical influence scores for single-example attribution while promoting diversity for subsets. For even larger-scale retrieval, we relax to an information-gain objective and add a variance correction for scalable attribution in vector databases. Experiments show competitive performance on counterfactual sensitivity, ground-truth retrieval and coreset selection, showing that our method scales to modern architectures while bridging principled measures with practice.

new Where to Steer: Input-Dependent Layer Selection for Steering Improves LLM Alignment

Authors: Soham Gadgil, Chris Lin, Su-In Lee

Abstract: Steering vectors have emerged as a lightweight and effective approach for aligning large language models (LLMs) at inference time, enabling modulation over model behaviors by shifting LLM representations towards a target behavior. However, existing methods typically apply steering vectors at a globally fixed layer, implicitly assuming that the optimal intervention layer is invariant across inputs. We argue that this assumption is fundamentally limited, as representations relevant to a target behavior can be encoded at different layers depending on the input. Theoretically, we show that different inputs can require steering at different layers to achieve alignment with a desirable model behavior. We also provide empirical evidence that the optimal steering layer varies substantially across inputs in practice. Motivated by these observations, we introduce Where to Steer (W2S), a framework that adaptively selects the intervention layer conditioned on the input, by learning a mapping from input embeddings to optimal steering layers. Across multiple LLMs and alignment behaviors, W2S consistently outperforms fixed-layer baselines, with improvements in both in-distribution and out-of-distribution settings. Our findings highlight the importance of input-dependent control in LLM alignment and demonstrate that adaptive layer selection is a key design dimension missing in the current methodology of steering vectors.

new SODA: Semi On-Policy Black-Box Distillation for Large Language Models

Authors: Xiwen Chen, Jingjing Wang, Wenhui Zhu, Peijie Qiu, Xuanzhao Dong, Hejian Sang, Zhipeng Wang, Alborz Geramifard, Feng Luo

Abstract: Black-box knowledge distillation for large language models presents a strict trade-off. Simple off-policy methods (e.g., sequence-level knowledge distillation) struggle to correct the student's inherent errors. Fully on-policy methods (e.g., Generative Adversarial Distillation) solve this via adversarial training but introduce well-known training instability and crippling computational overhead. To address this dilemma, we propose SODA (Semi On-policy Distillation with Alignment), a highly efficient alternative motivated by the inherent capability gap between frontier teachers and much smaller base models. Because a compact student model's natural, zero-shot responses are almost strictly inferior to the powerful teacher's targets, we can construct a highly effective contrastive signal simply by pairing the teacher's optimal response with a one-time static snapshot of the student's outputs. This demonstrates that exposing the small student to its own static inferior behaviors is sufficient for high-quality distribution alignment, eliminating the need for costly dynamic rollouts and fragile adversarial balancing. Extensive evaluations across four compact Qwen2.5 and Llama-3 models validate this semi on-policy paradigm. SODA matches or outperforms the state-of-the-art methods on 15 out of 16 benchmark results. More importantly, it achieves this superior distillation quality while training 10 times faster, consuming 27% less peak GPU memory, and completely eliminating adversarial instability.

new Spatiotemporal Interpolation of GEDI Biomass with Calibrated Uncertainty

Authors: Robin Young, Srinivasan Keshav

Abstract: Monitoring deforestation-driven carbon emissions requires both spatially explicit and temporally continuous estimates of aboveground biomass density (AGBD) with calibrated uncertainty. NASA's Global Ecosystem Dynamics Investigation (GEDI) provides reliable LIDAR-derived AGBD, but its orbital sampling causes irregular spatiotemporal coverage, and occasional operational interruptions, including a 13-month hibernation from March 2023 to April 2024, leave extended gaps in the observational record. Prior work has used machine learning approaches to fill GEDI's spatial gaps using satellite-derived features, but temporal interpolation of biomass through unobserved periods, particularly across active disturbance events, remains largely unaddressed. Moreover, standard ensemble methods for biomass mapping have been shown to produce systematically miscalibrated prediction intervals. To address these gaps, we extend the Attentive Neural Process (ANP) framework, previously applied to spatial biomass interpolation, to jointly sparse spatiotemporal settings using geospatial foundation model embeddings. We treat space and time symmetrically, empirically validating a form of space-for-time substitution in which observations from nearby locations at other times inform predictions at held-out periods. Our results demonstrate that the ANP produces well-calibrated uncertainty estimates across disturbance regimes, supporting its use in Measurement, Reporting, and Verification (MRV) applications that require reliable uncertainty quantification for forest carbon accounting.

new Regime-Calibrated Demand Priors for Ride-Hailing Fleet Dispatch and Repositioning

Authors: Indar Kumar, Akanksha Tiwari

Abstract: Effective ride-hailing dispatch requires anticipating demand patterns that vary substantially across time-of-day, day-of-week, season, and special events. We propose a regime-calibrated approach that (i) segments historical trip data into demand regimes, (ii) matches the current operating period to the most similar historical analogues via a similarity ensemble combining Kolmogorov-Smirnov distance, Wasserstein-1 distance, feature distance, variance ratio, event pattern similarity, and temporal proximity, and (iii) uses the resulting calibrated demand prior to drive both an LP-based fleet repositioning policy and batch dispatch with Hungarian matching. In ablation, a distributional-only metric subset achieves the strongest mean-wait reduction, while the full ensemble is retained as a robustness-oriented default that preserves calendar and event context. Evaluated on 5.2 million NYC TLC trips across 8 diverse scenarios (winter/summer, weekday/weekend/holiday, morning/evening/night) with 5 random seeds each, our method reduces mean rider wait times by 31.1% (bootstrap 95% CI: [26.5, 36.6]; Friedman chi-squared = 80.0, p = 4.25e-18; Cohen's d = 7.5-29.9). P95 wait drops 37.6% and the Gini coefficient of wait times improves from 0.441 to 0.409. The two contributions compose multiplicatively: calibration provides 16.9% reduction relative to the replay baseline; LP repositioning adds a further 15.5%. The approach requires no training, is deterministic and explainable, generalizes to Chicago (23.3% wait reduction using the NYC-built regime library without retraining), and is robust across fleet sizes (32-47% improvement for 0.5x-2.0x fleet scaling). Code is available at https://github.com/IndarKarhana/regime-calibrated-dispatch.

URLs: https://github.com/IndarKarhana/regime-calibrated-dispatch.

new Provable Multi-Task Reinforcement Learning: A Representation Learning Framework with Low Rank Rewards

Authors: Yaoze Guo, Shana Moothedath

Abstract: Multi-task representation learning (MTRL) is an approach that learns shared latent representations across related tasks, facilitating collaborative learning that improves the overall learning efficiency. This paper studies MTRL for multi-task reinforcement learning (RL), where multiple tasks have the same state-action space and transition probabilities, but different rewards. We consider T linear Markov Decision Processes (MDPs) where the reward functions and transition dynamics admit linear feature embeddings of dimension d. The relatedness among the tasks is captured by a low-rank structure on the reward matrices. Learning shared representations across multiple RL tasks is challenging due to the complex and policy-dependent nature of data that leads to a temporal progression of error. Our approach adopts a reward-free reinforcement learning framework to first learn a data-collection policy. This policy then informs an exploration strategy for estimating the unknown reward matrices. Importantly, the data collected under this well-designed policy enable accurate estimation, which ultimately supports the learning of an near-optimal policy. Unlike existing approaches that rely on restrictive assumptions such as Gaussian features, incoherence conditions, or access to optimal solutions, we propose a low-rank matrix estimation method that operates under more general feature distributions encountered in RL settings. Theoretical analysis establishes that accurate low-rank matrix recovery is achievable under these relaxed assumptions, and we characterize the relationship between representation error and sample complexity. Leveraging the learned representation, we construct near-optimal policies and prove a regret bound. Experimental results demonstrate that our method effectively learns robust shared representations and task dynamics from finite data.

new Improving Model Performance by Adapting the KGE Metric to Account for System Non-Stationarity

Authors: M Jawad, HV Gupta, YH Wang, MA Farmani, A Behrangi, GY Niu

Abstract: Geoscientific systems tend to be characterized by pronounced temporal non-stationarity, arising from seasonal and climatic variability in hydrometeorological drivers, and from natural and anthropogenic changes to land use and cover. As has been pointed out, such variability renders "the assumption of statistical stationarity obsolete in water management", and requires us to "account for, rather than ignore, non-stationary trends" in the data. However, metrics used for model development are typically based on the implicit and unjustifiable assumption that the data generating process is time-stationary. Here, we introduce the JKGE_ss metric (adapted from KGE_ss) that detects and accounts for dynamical non-stationarity in the statistical properties of the data and thereby improves information extraction and model performance. Unlike NSE and KGE_ss, which use the long-term mean as a benchmark against which to evaluate model efficiency, JKGE_ss emphasizes reproduction of temporal variations in system storage. We tested the robustness of the new metric by training physical-conceptual and data-based catchment-scale models of varying complexity across a wide range of hydroclimatic conditions, from recent-precipitation-dominated to snow-dominated to strongly arid. In all cases, the result was improved reproduction of system temporal dynamics at all time scales, across wet to dry years, and over the full range of flow levels (especially recession periods). Since traditional metrics fail to adequately account for temporal shifts in system dynamics, potentially resulting in misleading assessments of model performance under changing conditions, we recommend the adoption of JKGE_ss for geoscientific model development.

new Align Your Structures: Generating Trajectories with Structure Pretraining for Molecular Dynamics

Authors: Aniketh Iyengar, Jiaqi Han, Pengwei Sun, Mingjian Jiang, Jianwen Xie, Stefano Ermon

Abstract: Generating molecular dynamics (MD) trajectories using deep generative models has attracted increasing attention, yet remains inherently challenging due to the limited availability of MD data and the complexities involved in modeling high-dimensional MD distributions. To overcome these challenges, we propose a novel framework that leverages structure pretraining for MD trajectory generation. Specifically, we first train a diffusion-based structure generation model on a large-scale conformer dataset, on top of which we introduce an interpolator module trained on MD trajectory data, designed to enforce temporal consistency among generated structures. Our approach effectively harnesses abundant structural data to mitigate the scarcity of MD trajectory data and effectively decomposes the intricate MD modeling task into two manageable subproblems: structural generation and temporal alignment. We comprehensively evaluate our method on the QM9 and DRUGS small-molecule datasets across unconditional generation, forward simulation, and interpolation tasks, and further extend our framework and analysis to tetrapeptide and protein monomer systems. Experimental results confirm that our approach excels in generating chemically realistic MD trajectories, as evidenced by remarkable improvements of accuracy in geometric, dynamical, and energetic measurements.

new ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation

Authors: Hui Sun, Yun-Ji Zhang, Zheng Xie, Ren-Biao Liu, Yali Du, Xin-Ye Li, Ming Li

Abstract: Selecting LLM-generated code candidates using LLM-generated tests is challenging because the tests themselves may be incorrect. Existing methods either treat all tests equally or rely on ad-hoc heuristics to filter unreliable tests. Yet determining test correctness requires knowing which codes are correct, creating a \emph{circular dependency}. Our key insight is that we need not determine test correctness at all: \emph{test votes should rank, not merely count}. What matters is not how many codes pass a test, but whether the test can \emph{distinguish} correct from incorrect code. We break the circular dependency via leave-one-out evaluation: hold out one test, rank codes by their aggregate scores on all remaining tests, and measure whether the held-out test's pass/fail pattern agrees with this ranking. We formalize this agreement as the leave-one-out AUC~(LOO-AUC) and prove that the expected LOO-AUC is proportional to each test's ability to separate correct code from incorrect code. Building on this, we propose \textbf{ACES}~(\textbf{A}UC \textbf{C}onsist\textbf{E}ncy \textbf{S}coring) with two complementary variants: ACES-C provides closed-form weights that provably approximate the oracle in expectation under a mild assumption on average test quality; ACES-O drops this assumption and iteratively optimizes a differentiable LOO-AUC objective. Both operate solely on the binary pass matrix with negligible overhead, and achieve state-of-the-art Pass@$k$ on multiple code generation benchmarks.

new Supervised Dimensionality Reduction Revisited: Why LDA on Frozen CNN Features Deserves a Second Look

Authors: Indar Kumar, Girish Karhana, Sai Krishna Jasti, Ankit Hemant Lade

Abstract: Effective ride-hailing dispatch requires anticipating demand patterns that vary substantially across time-of-day, day-of-week, season, and special events. We propose a regime-calibrated approach that (i) segments historical trip data into demand regimes, (ii) matches the current operating period to the most similar historical analogues via a six-metric similarity ensemble (Kolmogorov-Smirnov, Wasserstein-1, feature distance, variance ratio, event pattern, temporal proximity), and (iii) uses the resulting calibrated demand prior to drive both an LP-based fleet repositioning policy and batch dispatch with Hungarian matching. In ablation, a distributional-only subset is strongest on mean wait, while the full ensemble is retained as a robustness-oriented default. Evaluated on 5.2 million NYC TLC trips across 8 diverse scenarios (winter/summer, weekday/weekend/holiday, morning/evening/night) with 5 random seeds each, our method reduces mean rider wait times by 31.1% (bootstrap 95% CI: [26.5, 36.6]%; Friedman chi-sq = 80.0, p = 4.25e-18; Cohen's d = 7.5-29.9 across scenarios). The improvement extends to the tail: P95 wait drops 37.6% and the Gini coefficient of wait times improves from 0.441 to 0.409 (7.3% relative). The two contributions compose multiplicatively and are independently validated: calibration provides 16.9% reduction; LP repositioning adds a further 15.5%. The approach requires no training, is deterministic and explainable, generalizes to Chicago (23.3% wait reduction via NYC-built regime library), and is robust across fleet sizes (32-47% improvement for 0.5-2x fleet scaling). We provide comprehensive ablation studies, formal statistical tests, and routing-fidelity validation with OSRM.

new Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference

Authors: Yifu Ding, Xinhao Zhang, Jinyang Guo

Abstract: Transformer-based large language models (LLMs) have demonstrated remarkable performance across a wide range of real-world tasks, but their inference cost remains prohibitively high due to the quadratic complexity of attention and the memory bandwidth limitations of high-precision operations. In this work, we present a low-bit mixed-precision attention kernel using the microscaling floating-point (MXFP) data format, utilizing the computing capability on next-generation GPU architectures. Our Diagonal-Tiled Mixed-Precision Attention (DMA) incorporates two kinds of low-bit computation at the tiling-level, and is a delicate fused kernel implemented using Triton, exploiting hardware-level parallelism and memory efficiency to enable fast and efficient inference without compromising model performance. Extensive empirical evaluations on NVIDIA B200 GPUs show that our kernel maintains generation quality with negligible degradation, and meanwhile achieves significant speedup by kernel fusion. We release our code at https://github.com/yifu-ding/MP-Sparse-Attn.

URLs: https://github.com/yifu-ding/MP-Sparse-Attn.

new BWTA: Accurate and Efficient Binarized Transformer by Algorithm-Hardware Co-design

Authors: Yifu Ding, Xianglong Liu, Shenghao Jin, Jinyang Guo, Jiwen Lu

Abstract: Ultra low-bit quantization brings substantial efficiency for Transformer-based models, but the accuracy degradation and limited GPU support hinder its wide usage. In this paper, we analyze zero-point distortion in binarization and propose a Binary Weights & Ternary Activations (BWTA) quantization scheme, which projects tiny values to zero and preserves the accuracy of extremely low-bit models. For training, we propose Smooth Multi-Stage Quantization, combining a Levelwise Degradation Strategy and a Magnitude-Alignment Projection Factor to enable stable and fast convergence. For inference, we develop a BWTA MatMul CUDA kernel with instruction-level parallel bit-packing and comprehensive binary/ternary MatMul implementations for both linear and attention operators, allowing seamless integration across Transformer architectures. Experiments show that BWTA approaches full-precision performance for BERT, with an average 3.5% drop on GLUE and less than 2% drop on five tasks, and achieves comparable perplexity and accuracy for LLMs. In efficiency, it delivers 16 to 24 times kernel-level speedup over FP16 on NVIDIA GPUs, and 216 to 330 tokens/s end-to-end prefill speedup with lower memory footprint on LLMs. As an algorithm-hardware co-design, BWTA demonstrates practical, low-latency ultra-low-bit inference without sacrificing model quality.

new Multirate Stein Variational Gradient Descent for Efficient Bayesian Sampling

Authors: Arash Sarshar

Abstract: Many particle-based Bayesian inference methods use a single global step size for all parts of the update. In Stein variational gradient descent (SVGD), however, each update combines two qualitatively different effects: attraction toward high-posterior regions and repulsion that preserves particle diversity. These effects can evolve at different rates, especially in high-dimensional, anisotropic, or hierarchical posteriors, so one step size can be unstable in some regions and inefficient in others. We derive a multirate version of SVGD that updates these components on different time scales. The framework yields practical algorithms, including a symmetric split method, a fixed multirate method (MR-SVGD), and an adaptive multirate method (Adapt-MR-SVGD) with local error control. We evaluate the methods in a broad and rigorous benchmark suite covering six problem families: a 50D Gaussian target, multiple 2D synthetic targets, UCI Bayesian logistic regression, multimodal Gaussian mixtures, Bayesian neural networks, and large-scale hierarchical logistic regression. Evaluation includes posterior-matching metrics, predictive performance, calibration quality, mixing, and explicit computational cost accounting. Across these six benchmark families, multirate SVGD variants improve robustness and quality-cost tradeoffs relative to vanilla SVGD. The strongest gains appear on stiff hierarchical, strongly anisotropic, and multimodal targets, where adaptive multirate SVGD is usually the strongest variant and fixed multirate SVGD provides a simpler robust alternative at lower cost.

new Autoencoder-Based Parameter Estimation for Superposed Multi-Component Damped Sinusoidal Signals

Authors: Momoka Iida, Hayato Motohashi, Hirotaka Takahashi

Abstract: Damped sinusoidal oscillations are widely observed in many physical systems, and their analysis provides access to underlying physical properties. However, parameter estimation becomes difficult when the signal decays rapidly, multiple components are superposed, and observational noise is present. In this study, we develop an autoencoder-based method that uses the latent space to estimate the frequency, phase, decay time, and amplitude of each component in noisy multi-component damped sinusoidal signals. We investigate multi-component cases under Gaussian-distribution training and further examine the effect of the training-data distribution through comparisons between Gaussian and uniform training. The performance is evaluated through waveform reconstruction and parameter-estimation accuracy. We find that the proposed method can estimate the parameters with high accuracy even in challenging setups, such as those involving a subdominant component or nearly opposite-phase components, while remaining reasonably robust when the training distribution is less informative. This demonstrates its potential as a tool for analyzing short-duration, noisy signals.

new Can LLMs Learn to Reason Robustly under Noisy Supervision?

Authors: Shenzhi Yang, Guangcheng Zhu, Bowen Song, Sharon Li, Haobo Wang, Xing Zheng, Yingfan Ma, Zhongqi Chen, Weiqiang Wang, Gang Chen

Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) effectively trains reasoning models that rely on abundant perfect labels, but its vulnerability to unavoidable noisy labels due to expert scarcity remains critically underexplored. In this work, we take the first step toward a systematic analysis of noisy label mechanisms in RLVR. In contrast to supervised classification, most RLVR algorithms incorporate a rollout-based condition: a label's influence on training is contingent on whether the current policy can generate rollouts that realize it, a property that naturally extends to noisy labels. Based on this observation, we distinguish two types of noise: inactive noisy labels, which reduce data efficiency, and active noisy labels, which are reinforced and risk skewing the model toward incorrect distributions. From experiments on training with noisy samples, we identify an Early Correctness Coherence phenomenon: although noisy samples begin to lag behind in later stages, accuracy on both clean and noisy samples increases similarly in early training. Motivated by this dynamic, we propose Online Label Refinement (OLR), which progressively corrects potentially noisy labels with majority-voted answers when two conditions hold: a positive slope in the majority answer's rollout pass rate and stable historical consistency across updates, enabling gradual self-correction as the policy improves. We evaluate OLR on six in-distribution mathematical reasoning benchmarks (AIME24/25, AMC, MATH-500, Minerva, and Olympiad) and three out-of-distribution tasks (ARC-c, GPQA-diamond, and MMLU-pro). Across noise ratios from 0.1 to 0.9, OLR consistently improves robustness under both inactive and active noisy-label settings, achieving average gains of 3.6% to 3.9% on in-distribution benchmarks and 3.3% to 4.6% on out-of-distribution evaluations.

new Geometric Limits of Knowledge Distillation: A Minimum-Width Theorem via Superposition Theory

Authors: Nilesh Sarkar, Dawar Jyoti Deka

Abstract: Knowledge distillation compresses large teachers into smaller students, but performance saturates at a loss floor that persists across training methods and objectives. We argue this floor is geometric: neural networks represent far more features than dimensions through superposition, and a student of width $d_S$ can encode at most $d_S \cdot g(\alpha)$ features, where $g(\alpha) = 1/((1-\alpha)\ln\frac{1}{1-\alpha})$ is a sparsity-dependent capacity function. Features beyond this budget are permanently lost, yielding an importance-weighted loss floor. We validate on a toy model (48 configurations, median accuracy >93%) and on Pythia-410M, where sparse autoencoders measure $F \approx 28{,}700$ features at $\alpha \approx 0.992$ (critical width $d_S^* \approx 1{,}065$). Distillation into five student widths confirms the predicted monotonic floor ordering. The observed floor decomposes into a geometric component and a width-independent architectural baseline ($R^2 = 0.993$). Linear probing shows coarse concepts survive even 88% feature loss, revealing the floor arises from aggregate loss of fine-grained features in the importance distribution's long tail. Our results connect representation geometry to distillation limits and provide a practical tool for predicting distillation performance from SAE measurements alone.

new ArrowFlow: Hierarchical Machine Learning in the Space of Permutations

Authors: Ozgur Yilmaz

Abstract: We introduce ArrowFlow, a machine learning architecture that operates entirely in the space of permutations. Its computational units are ranking filters, learned orderings that compare inputs via Spearman's footrule distance and update through permutation-matrix accumulation, a non-gradient rule rooted in displacement evidence. Layers compose hierarchically: each layer's output ranking becomes the next layer's input, enabling deep ordinal representation learning without any floating-point parameters in the core computation. We connect the architecture to Arrow's impossibility theorem, showing that violations of social-choice fairness axioms (context dependence, specialization, symmetry breaking) serve as inductive biases for nonlinearity, sparsity, and stability. Experiments span UCI tabular benchmarks, MNIST, gene expression cancer classification (TCGA), and preference data, all against GridSearchCV-tuned baselines. ArrowFlow beats all baselines on Iris (2.7% vs. 3.3%) and is competitive on most UCI datasets. A single parameter, polynomial degree, acts as a master switch: degree 1 yields noise robustness (8-28% less degradation), privacy preservation (+0.5pp cost), and missing-feature resilience; higher degrees trade these for improved clean accuracy. ArrowFlow is not designed to surpass gradient-based methods. It is an existence proof that competitive classification is possible in a fundamentally different computational paradigm, one that elevates ordinal structure to a first-class citizen, with natural alignment to integer-only and neuromorphic hardware.

new Fine-grained Analysis of Stability and Generalization for Stochastic Bilevel Optimization

Authors: Xuelin Zhang, Hong Chen, Bin Gu, Tieliang Gong, Feng Zheng

Abstract: Stochastic bilevel optimization (SBO) has been integrated into many machine learning paradigms recently, including hyperparameter optimization, meta learning, and reinforcement learning. Along with the wide range of applications, there have been numerous studies on the computational behavior of SBO. However, the generalization guarantees of SBO methods are far less understood from the lens of statistical learning theory. In this paper, we provide a systematic generalization analysis of the first-order gradient-based bilevel optimization methods. Firstly, we establish the quantitative connections between the on-average argument stability and the generalization gap of SBO methods. Then, we derive the upper bounds of on-average argument stability for single-timescale stochastic gradient descent (SGD) and two-timescale SGD, where three settings (nonconvex-nonconvex (NC-NC), convex-convex (C-C), and strongly-convex-strongly-convex (SC-SC)) are considered respectively. Experimental analysis validates our theoretical findings. Compared with the previous algorithmic stability analysis, our results do not require reinitializing the inner-level parameters at each iteration and are applicable to more general objective functions.

new Spectral Path Regression: Directional Chebyshev Harmonics for Interpretable Tabular Learning

Authors: Milo Coombs

Abstract: Classical approximation bases such as Chebyshev polynomials provide principled and interpretable representations, but their multivariate tensor-product constructions scale exponentially with dimension and impose axis-aligned structure that is poorly matched to real tabular data. We address this by replacing tensorised oscillations with directional harmonic modes of the form $\cos(\mathbf{m}^{\top}\arccos(\mathbf{x}))$, which organise multivariate structure by direction in angular space rather than by coordinate index. This representation yields a discrete spectral regression model in which complexity is controlled by selecting a small number of structured frequency vectors (spectral paths), and training reduces to a single closed-form ridge solve with no iterative optimisation. Experiments on standard continuous-feature tabular regression benchmarks show that the resulting models achieve accuracy competitive with strong nonlinear baselines while remaining compact, computationally efficient, and explicitly interpretable through analytic expressions of learned feature interactions.

new Restless Bandits with Individual Penalty Constraints: A New Near-Optimal Index Policy and How to Learn It

Authors: Nida Zamir, I-Hong Hou

Abstract: This paper investigates the Restless Multi-Armed Bandit (RMAB) framework under individual penalty constraints to address resource allocation challenges in dynamic wireless networked environments. Unlike conventional RMAB models, our model allows each user (arm) to have distinct and stringent performance constraints, such as energy limits, activation limits, or age of information minimums, enabling the capture of diverse objectives including fairness and efficiency. To find the optimal resource allocation policy, we propose a new Penalty-Optimal Whittle (POW) index policy. The POW index of an user only depends on the user's transition kernel and penalty constraints, and remains invariable to system-wide features such as the number of users present and the amount of resource available. This makes it computationally tractable to calculate the POW Indices offline without any need for online adaptation. Moreover, we theoretically prove that the POW index policy is asymptotically optimal while satisfying all individual penalty constraints. We also introduce a deep reinforcement learning algorithm to efficiently learn the POW index on the fly. Simulation results across various applications and system configurations further demonstrate that the POW index policy not only has near-optimal performance but also significantly outperforms other existing policies.

new Physical Sensitivity Kernels Can Emerge in Data-Driven Forward Models: Evidence From Surface-Wave Dispersion

Authors: Ziye Yu, Yuqi Cai, Xin Liu

Abstract: Data-driven neural networks are increasingly used as surrogate forward models in geophysics, but it remains unclear whether they recover only the data mapping or also the underlying physical sensitivity structure. Here we test this question using surface-wave dispersion. By comparing automatically differentiated gradients from a neural-network surrogate with theoretical sensitivity kernels, we show that the learned gradients can recover the main depth-dependent structure of physical kernels across a broad range of periods. This indicates that neural surrogate models can learn physically meaningful differential information, rather than acting as purely black-box predictors. At the same time, strong structural priors in the training distribution can introduce systematic artifacts into the inferred sensitivities. Our results show that neural forward surrogates can recover useful physical information for inversion and uncertainty analysis, while clarifying the conditions under which this differential structure remains physically consistent.

new The Geometric Alignment Tax: Tokenization vs. Continuous Geometry in Scientific Foundation Models

Authors: Prashant C. Raju

Abstract: Foundation models for biology and physics optimize predictive accuracy, but their internal representations systematically fail to preserve the continuous geometry of the systems they model. We identify the root cause: the Geometric Alignment Tax, an intrinsic cost of forcing continuous manifolds through discrete categorical bottlenecks. Controlled ablations on synthetic dynamical systems demonstrate that replacing cross-entropy with a continuous head on an identical encoder reduces geometric distortion by up to 8.5x, while learned codebooks exhibit a non-monotonic double bind where finer quantization worsens geometry despite improving reconstruction. Under continuous objectives, three architectures differ by 1.3x; under discrete tokenization, they diverge by 3,000x. Evaluating 14 biological foundation models with rate-distortion theory and MINE, we identify three failure regimes: Local-Global Decoupling, Representational Compression, and Geometric Vacuity. A controlled experiment confirms that Evo 2's reverse-complement robustness on real DNA reflects conserved sequence composition, not learned symmetry. No model achieves simultaneously low distortion, high mutual information, and global coherence.

new Uncertainty-Aware Foundation Models for Clinical Data

Authors: Qian Zhou, Yuanyun Zhang, Shi Li

Abstract: Healthcare foundation models have largely followed paradigms from natural language processing and computer vision, emphasizing large scale pretraining and deterministic representations over heterogeneous clinical data. However, clinical observations are inherently incomplete, reflecting sparse, irregular, and modality dependent measurements of an underlying physiologic state. In this work, we propose a framework for uncertainty aware foundation modeling that represents each patient not as a point embedding, but as a distribution over plausible latent states. By learning set valued representations and enforcing consistency across partial views of the same patient, the model captures what is invariantly inferable while explicitly encoding epistemic uncertainty. We integrate this formulation with multimodal encoders and scalable self supervised objectives, combining reconstruction, contrastive alignment, and distributional regularization. Across diverse clinical tasks, our approach improves predictive performance, robustness under missing data, and uncertainty calibration relative to strong baselines. These results suggest that modeling what is not observed rather than only what is constitutes a critical inductive bias for healthcare foundation models.

new Stable and Privacy-Preserving Synthetic Educational Data with Empirical Marginals: A Copula-Based Approach

Authors: Gabriel Diaz Ramos, Lorenzo Luzi, Debshila Basu Mallick, Richard Baraniuk

Abstract: To advance Educational Data Mining (EDM) within strict privacy-protecting regulatory frameworks, researchers must develop methods that enable data-driven analysis while protecting sensitive student information. Synthetic data generation is one such approach, enabling the release of statistically generated samples instead of real student records; however, existing deep learning and parametric generators often distort marginal distributions and degrade under iterative regeneration, leading to distribution drift and progressive loss of distributional support that compromise reliability. In response, we introduce the Non-Parametric Gaussian Copula (NPGC), a plug-and-play synthesis method that replaces deep learning and parametric optimization with empirical statistical anchoring to preserve the observed marginal distributions while modeling dependencies through a copula framework. NPGC integrates Differential Privacy (DP) at both the marginal and correlation levels, supports heterogeneous variable types, and treats missing data as an explicit state to retain informative absence patterns. We evaluate NPGC against deep learning and parametric baselines on five benchmark datasets and demonstrate that it remains stable across multiple regeneration cycles and achieves competitive downstream performance at substantially lower computational cost. We further validate NPGC through deployment in a real-world online learning platform, demonstrating its practicality for privacy-preserving research.

new Which Leakage Types Matter?

Authors: Simon Roth

Abstract: Twenty-eight within-subject counterfactual experiments across 2,047 tabular datasets, plus a boundary experiment on 129 temporal datasets, measuring the severity of four data leakage classes in machine learning. Class I (estimation - fitting scalers on full data) is negligible: all nine conditions produce $|\Delta\text{AUC}| \leq 0.005$. Class II (selection - peeking, seed cherry-picking) is substantial: ~90% of the measured effect is noise exploitation that inflates reported scores. Class III (memorization) scales with model capacity: d_z = 0.37 (Naive Bayes) to 1.11 (Decision Tree). Class IV (boundary) is invisible under random CV. The textbook emphasis is inverted: normalization leakage matters least; selection leakage at practical dataset sizes matters most.

new ClawArena: Benchmarking AI Agents in Evolving Information Environments

Authors: Haonian Ji, Kaiwen Xiong, Siwei Han, Peng Xia, Shi Qiu, Yiyang Zhou, Jiaqi Liu, Jinlong Li, Bingzhou Li, Zeyu Zheng, Cihang Xie, Huaxiu Yao

Abstract: AI agents deployed as persistent assistants must maintain correct beliefs as their information environment evolves. In practice, evidence is scattered across heterogeneous sources that often contradict one another, new information can invalidate earlier conclusions, and user preferences surface through corrections rather than explicit instructions. Existing benchmarks largely assume static, single-authority settings and do not evaluate whether agents can keep up with this complexity. We introduce ClawArena, a benchmark for evaluating AI agents in evolving information environments. Each scenario maintains a complete hidden ground truth while exposing the agent only to noisy, partial, and sometimes contradictory traces across multi-channel sessions, workspace files, and staged updates. Evaluation is organized around three coupled challenges: multi-source conflict reasoning, dynamic belief revision, and implicit personalization, whose interactions yield a 14-category question taxonomy. Two question formats, multi-choice (set-selection) and shell-based executable checks, test both reasoning and workspace grounding. The current release contains 64 scenarios across 8 professional domains, totaling 1{,}879 evaluation rounds and 365 dynamic updates. Experiments on five agent frameworks and five language models show that both model capability (15.4% range) and framework design (9.2%) substantially affect performance, that self-evolving skill frameworks can partially close model-capability gaps, and that belief revision difficulty is determined by update design strategy rather than the mere presence of updates. Code is available at https://github.com/aiming-lab/ClawArena.

URLs: https://github.com/aiming-lab/ClawArena.

new Towards Agentic Defect Reasoning: A Graph-Assisted Retrieval Framework for Laser Powder Bed Fusion

Authors: Muhammad Rizwan Awan, Volker Pickert, Muhammad Waqar Ashraf, Saleh Ali, Farshid Mahmouditabar, Shafiq Odhano

Abstract: Laser Powder Bed Fusion (LPBF) is highly sensitive to process parameters, which influence defect formation through complex thermal and fluid mechanisms. However, defect-related knowledge is dispersed across the literature, limiting systematic understanding. This study presents a graph-assisted retrieval framework for defect reasoning in LPBF, using Ti6Al4V as a case study. Scientific publications are transformed into a structured representation, and relationships between parameters, mechanisms, and defects are encoded into an evidence-linked knowledge graph. The framework integrates semantic and graph-based retrieval, supported by a lightweight agent-based reasoning layer to construct interpretable defect pathways. Evaluation shows high retrieval accuracy (0.9667) and recall (0.9667), demonstrating effective identification of relevant defect related evidence. The framework enables transparent reasoning chains linking process parameters to defects. This work provides a scalable approach for converting unstructured literature into a query able and interpretable knowledge resource for additive manufacturing.

new Learning from Imperfect Demonstrations via Temporal Behavior Tree-Guided Trajectory Repair

Authors: Aniruddh G. Puranic, Sebastian Schirmer, John S. Baras, Calin Belta

Abstract: Learning robot control policies from demonstrations is a powerful paradigm, yet real-world data is often suboptimal, noisy, or otherwise imperfect, posing significant challenges for imitation and reinforcement learning. In this work, we present a formal framework that leverages Temporal Behavior Trees (TBT), an extension of Signal Temporal Logic (STL) with Behavior Tree semantics, to repair suboptimal trajectories prior to their use in downstream policy learning. Given demonstrations that violate a TBT specification, a model-based repair algorithm corrects trajectory segments to satisfy the formal constraints, yielding a dataset that is both logically consistent and interpretable. The repaired trajectories are then used to extract potential functions that shape the reward signal for reinforcement learning, guiding the agent toward task-consistent regions of the state space without requiring knowledge of the agent's kinematic model. We demonstrate the effectiveness of this framework on discrete grid-world navigation and continuous single and multi-agent reach-avoid tasks, highlighting its potential for data-efficient robot learning in settings where high-quality demonstrations cannot be assumed.

new Three Phases of Expert Routing: How Load Balance Evolves During Mixture-of-Experts Training

Authors: Charafeddine Mouzouni

Abstract: We model Mixture-of-Experts (MoE) token routing as a congestion game with a single effective parameter, the congestion coefficient gamma_eff, that quantifies the balance-quality tradeoff. Tracking gamma_eff across training checkpoints of two open-source MoE models, OLMoE-1B-7B (20 checkpoints, with dense sampling in the surge region) and OpenMoE-8B (6 checkpoints), reveals a three-phase trajectory: a surge phase where the router learns to balance load (gamma_eff: 14 to 36-39, peaking in the step 30K-40K region), a stabilization phase where experts specialize under steady balance (B_0: 2.4 to 2.3, steps 100K-400K), and a relaxation phase where the router trades balance for quality as experts differentiate (gamma_eff: 27 to 9, steps 400K-1.2M). This non-monotone trajectory, invisible to post-hoc analysis of converged models, reveals that early MoE training prioritizes balance while late training prioritizes quality. The theoretical framework is honest about its limits: the single-type equilibrium reduces to temperature-scaled softmax (held-out L1: MFG = 0.199 vs. softmax = 0.200). The game is not a better predictor; it reveals what the temperature means and, critically, how that temperature evolves. We complement the dynamics with an effective congestion decomposition, a multi-type extension that improves load prediction via token clustering on all 16 layers (mean: 30%), scope diagnostics (K/M, epsilon_l), and robustness verification across four independent quality estimators (r >= 0.89). All confidence intervals are from bootstrap resampling over 50 independent text batches.

new Subspace Control: Turning Constrained Model Steering into Controllable Spectral Optimization

Authors: Yancheng Huang, Changsheng Wang, Chongyu Fan, Yicheng Lang, Bingqi Shang, Yang Zhang, Mingyi Hong, Qing Qu, Alvaro Velasquez, Sijia Liu

Abstract: Foundation models, such as large language models (LLMs), are powerful but often require customization before deployment to satisfy practical constraints such as safety, privacy, and task-specific requirements, leading to "constrained" optimization problems for model steering and adaptation. However, solving such problems remains largely underexplored and is particularly challenging due to interference between the primary objective and constraint objectives during optimization. In this paper, we propose a subspace control framework for constrained model training. Specifically, (i) we first analyze, from a model merging perspective, how spectral cross-task interference arises and show that it can be resolved via a one-shot solution that orthogonalizes the merged subspace; (ii) we establish a connection between this solution and gradient orthogonalization in the spectral optimizer Muon; and (iii) building on these insights, we introduce SIFT (spectral interference-free training), which leverages a localization scheme to selectively intervene during optimization, enabling controllable updates that mitigate objective-constraint conflicts. We evaluate SIFT across four representative applications: (a) machine unlearning, (b) safety alignment, (c) text-to-speech adaptation, and (d) hallucination mitigation. Compared to both control-based and control-free baselines, SIFT consistently achieves substantial and robust performance improvements across all tasks. Code is available at https://github.com/OPTML-Group/SIFT.

URLs: https://github.com/OPTML-Group/SIFT.

new Good Rankings, Wrong Probabilities: A Calibration Audit of Multimodal Cancer Survival Models

Authors: Sajad Ghawami

Abstract: Multimodal deep learning models that fuse whole-slide histopathology images with genomic data have achieved strong discriminative performance for cancer survival prediction, as measured by the concordance index. Yet whether the survival probabilities derived from these models - either directly from native outputs or via standard post-hoc reconstruction - are calibrated remains largely unexamined. We conduct, to our knowledge, the first systematic fold-level 1-calibration audit of multimodal WSI-genomics survival architectures, evaluating native discrete-time survival outputs (Experiment A: 3 models on TCGA-BRCA) and Breslow-reconstructed survival curves from scalar risk scores (Experiment B: 11 architectures across 5 TCGA cancer types). In Experiment A, all three models fail 1-calibration on a majority of folds (12 of 15 fold-level tests reject after Benjamini-Hochberg correction). Across the full 290 fold-level tests, 166 reject the null of correct calibration at the median event time after Benjamini-Hochberg correction (FDR = 0.05). MCAT achieves C-index 0.817 on GBMLGG yet fails 1-calibration on all five folds. Gating-based fusion is associated with better calibration; bilinear and concatenation fusion are not. Post-hoc Platt scaling reduces miscalibration at the evaluated horizon (e.g., MCAT: 5/5 folds failing to 2/5) without affecting discrimination. The concordance index alone is insufficient for evaluating survival models intended for clinical use.

new Peoples Water Data: Enabling Reliable Field Data Generation and Microbial Contamination Screening in Household Drinking Water

Authors: Suzan Kagan, Shira Spigelman, Sankar Sudhir, Thalappil Pradeep, Hadas Mamane

Abstract: Unsafe drinking water remains a major public health concern globally, particularly in low-resource regions where routine microbiological surveillance is limited. Although Escherichia coli is the internationally recognized indicator of fecal contamination, laboratory-based testing is often inaccessible at scale. In this study, we developed and evaluated a two-stage machine-learning framework for predicting E. coli presence in decentralized household point-of-use drinking water in Chennai, India using low-cost physicochemical and contextual indicators. The dataset comprised 3,023 samples collected under the Peoples Water Data initiative; after harmonization, technical cleaning, and outlier screening, 2,207 valid samples were retained. This framework provides a scalable decision-support tool for prioritizing microbiological testing in resource-constrained environments and addresses an important gap in point-of-use contamination risk assessment. Beyond predictive modeling, the present study was conducted within an AI-supported field implementation framework that combined student-facing guidance and real-time QC to improve protocol adherence, traceability, and data reliability in decentralized household water monitoring.

new Learning An Interpretable Risk Scoring System for Maximizing Decision Net Benefit

Authors: Wenhao Chi, \c{S}. \.Ilker Birbil

Abstract: Risk scoring systems are widely used in high-stakes domains to assist decision-making. However, existing approaches often focus on optimizing predictive accuracy or likelihood-based criteria, which may not align with the main goal of maximizing utility. In this paper, we propose a novel risk scoring system that directly optimizes net benefit over a range of decision thresholds. The model is formulated as a sparse integer linear programming problem which enables the construction of a transparent scoring system with integer coefficients, and hence, facilitates interpretation and practical application. We also establish fundamental relationships among net benefit, discrimination, and calibration. Our analysis proves that optimizing net benefit also guarantees conventional performance measures. We thoroughly evaluated our method on multiple public datasets as well as on a real-world clinical dataset. This computational study demonstrated that our interpretable method can effectively achieve high net benefit while maintaining competitive discrimination and calibration performance.

new Towards Unveiling Vulnerabilities of Large Reasoning Models in Machine Unlearning

Authors: Aobo Chen, Chenxu Zhao, Chenglin Miao, Mengdi Huai

Abstract: Large language models (LLMs) possess strong semantic understanding, driving significant progress in data mining applications. This is further enhanced by large reasoning models (LRMs), which provide explicit multi-step reasoning traces. On the other hand, the growing need for the right to be forgotten has driven the development of machine unlearning techniques, which aim to eliminate the influence of specific data from trained models without full retraining. However, unlearning may also introduce new security vulnerabilities by exposing additional interaction surfaces. Although many studies have investigated unlearning attacks, there is no prior work on LRMs. To bridge the gap, we first in this paper propose LRM unlearning attack that forces incorrect final answers while generating convincing but misleading reasoning traces. This objective is challenging due to non-differentiable logical constraints, weak optimization effect over long rationales, and discrete forget set selection. To overcome these challenges, we introduce a bi-level exact unlearning attack that incorporates a differentiable objective function, influential token alignment, and a relaxed indicator strategy. To demonstrate the effectiveness and generalizability of our attack, we also design novel optimization frameworks and conduct comprehensive experiments in both white-box and black-box settings, aiming to raise awareness of the emerging threats to LRM unlearning pipelines.

new APPA: Adaptive Preference Pluralistic Alignment for Fair Federated RLHF of LLMs

Authors: Mahmoud Srewa, Tianyu Zhao, Salma Elmalaki

Abstract: Aligning large language models (LLMs) with diverse human preferences requires pluralistic alignment, where a single model must respect the values of multiple distinct groups simultaneously. In federated reinforcement learning from human feedback (FedRLHF), these groups align a shared policy without centralizing preference data, which makes fair reward aggregation essential. Existing aggregation methods exhibit clear trade offs: average based aggregation systematically under aligns worst performing groups, while min aggregation prioritizes worst group performance at the cost of overall alignment. We propose APPA, an Adaptive Preference Pluralistic Alignment framework that dynamically reweights group level rewards based on historical alignment rewards. Our approach prioritizes under aligned groups without degrading well aligned ones, while requiring no access to raw preference data. Integrated into a proximal policy optimization (PPO) based FedRLHF pipeline and evaluated on GLOBALQA and OQA across three model families (Gemma 2 2B, Llama 3.2 3B, Qwen3 0.6B), APPA achieves strong fairness alignment trade offs, improving worst group alignment by up to 28% over average aggregation while maintaining higher overall alignment than min aggregation across most configurations.

new Entropy, Disagreement, and the Limits of Foundation Models in Genomics

Authors: Maxime Rochkoulets, Lovro Vr\v{c}ek, Mile \v{S}iki\'c

Abstract: Foundation models in genomics have shown mixed success compared to their counterparts in natural language processing. Yet, the reasons for their limited effectiveness remain poorly understood. In this work, we investigate the role of entropy as a fundamental factor limiting the capacities of such models to learn from their training data and develop foundational capabilities. We train ensembles of models on text and DNA sequences and analyze their predictions, static embeddings, and empirical Fisher information flow. We show that the high entropy of genomic sequences -- from the point of view of unseen token prediction -- leads to near-uniform output distributions, disagreement across models, and unstable static embeddings, even for models that are matched in architecture, training and data. We then demonstrate that models trained on DNA concentrate Fisher information in embedding layers, seemingly failing to exploit inter-token relationships. Our results suggest that self-supervised training from sequences alone may not be applicable to genomic data, calling into question the assumptions underlying current methodologies for training genomic foundation models.

new DAGAF: A directed acyclic generative adversarial framework for joint structure learning and tabular data synthesis

Authors: Hristo Petkov, Calum MacLellan, Feng Dong

Abstract: Understanding the causal relationships between data variables can provide crucial insights into the construction of tabular datasets. Most existing causality learning methods typically focus on applying a single identifiable causal model, such as the Additive Noise Model (ANM) or the Linear non-Gaussian Acyclic Model (LiNGAM), to discover the dependencies exhibited in observational data. We improve on this approach by introducing a novel dual-step framework capable of performing both causal structure learning and tabular data synthesis under multiple causal model assumptions. Our approach uses Directed Acyclic Graphs (DAG) to represent causal relationships among data variables. By applying various functional causal models including ANM, LiNGAM and the Post-Nonlinear model (PNL), we implicitly learn the contents of DAG to simulate the generative process of observational data, effectively replicating the real data distribution. This is supported by a theoretical analysis to explain the multiple loss terms comprising the objective function of the framework. Experimental results demonstrate that DAGAF outperforms many existing methods in structure learning, achieving significantly lower Structural Hamming Distance (SHD) scores across both real-world and benchmark datasets (Sachs: 47%, Child: 11%, Hailfinder: 5%, Pathfinder: 7% improvement compared to state-of-the-art), while being able to produce diverse, high-quality samples.

new Correcting Source Mismatch in Flow Matching with Radial-Angular Transport

Authors: Fouad Oubari, Mathilde Mougeot

Abstract: Flow Matching is typically built from Gaussian sources and Euclidean probability paths. For heavy-tailed or anisotropic data, however, a Gaussian source induces a structural mismatch already at the level of the radial distribution. We introduce \textit{Radial--Angular Flow Matching (RAFM)}, a framework that explicitly corrects this source mismatch within the standard simulation-free Flow Matching template. RAFM uses a source whose radial law matches that of the data and whose conditional angular distribution is uniform on the sphere, thereby removing the Gaussian radial mismatch by construction. This reduces the remaining transport problem to angular alignment, which leads naturally to conditional paths on scaled spheres defined by spherical geodesic interpolation. The resulting framework yields explicit Flow Matching targets tailored to radial--angular transport without modifying the underlying deterministic training pipeline. We establish the exact density of the matched-radial source, prove a radial--angular KL decomposition that isolates the Gaussian radial penalty, characterize the induced target vector field, and derive a stability result linking Flow Matching error to generation error. We further analyze empirical estimation of the radial law, for which Wasserstein and CDF metrics provide natural guarantees. Empirically, RAFM substantially improves over standard Gaussian Flow Matching and remains competitive with recent non-Gaussian alternatives while preserving a lightweight deterministic training procedure. Overall, RAFM provides a principled source-and-path design for Flow Matching on heavy-tailed and extreme-event data.

new Convolutional Neural Network and Adversarial Autoencoder in EEG images classification

Authors: Albert Nasybullin, Semen Kurkin

Abstract: In this paper, we consider applying computer vision algorithms for the classification problem one faces in neuroscience during EEG data analysis. Our approach is to apply a combination of computer vision and neural network methods to solve human brain activity classification problems during hand movement. We pre-processed raw EEG signals and generated 2D EEG topograms. Later, we developed supervised and semi-supervised neural networks to classify different motor cortex activities.

new How Long short-term memory artificial neural network, synthetic data, and fine-tuning improve the classification of raw EEG data

Authors: Albert Nasybullin, Vladimir Maksimenko, Semen Kurkin

Abstract: In this paper, we discuss a Machine Learning pipeline for the classification of EEG data. We propose a combination of synthetic data generation, long short-term memory artificial neural network (LSTM), and fine-tuning to solve classification problems for experiments with implicit visual stimuli, such as the Necker cube with different levels of ambiguity. The developed approach increased the quality of the classification model of raw EEG data.

new Boosted Distributional Reinforcement Learning: Analysis and Healthcare Applications

Authors: Zequn Chen, Wesley J. Marrero

Abstract: Researchers and practitioners are increasingly considering reinforcement learning to optimize decisions in complex domains like robotics and healthcare. To date, these efforts have largely utilized expectation-based learning. However, relying on expectation-focused objectives may be insufficient for making consistent decisions in highly uncertain situations involving multiple heterogeneous groups. While distributional reinforcement learning algorithms have been introduced to model the full distributions of outcomes, they can yield large discrepancies in realized benefits among comparable agents. This challenge is particularly acute in healthcare settings, where physicians (controllers) must manage multiple patients (subordinate agents) with uncertain disease progression and heterogeneous treatment responses. We propose a Boosted Distributional Reinforcement Learning (BDRL) algorithm that optimizes agent-specific outcome distributions while enforcing comparability among similar agents and analyze its convergence. To further stabilize learning, we incorporate a post-update projection step formulated as a constrained convex optimization problem, which efficiently aligns individual outcomes with a high-performing reference within a specified tolerance. We apply our algorithm to manage hypertension in a large subset of the US adult population by categorizing individuals into cardiovascular disease risk groups. Our approach modifies treatment plans for median and vulnerable patients by mimicking the behavior of high-performing references in each risk group. Furthermore, we find that BDRL improves the number and consistency of quality-adjusted life years compared with reinforcement learning baselines.

new Generative models for decision-making under distributional shift

Authors: Xiuyuan Cheng, Yunqin Zhu, Yao Xie

Abstract: Many data-driven decision problems are formulated using a nominal distribution estimated from historical data, while performance is ultimately determined by a deployment distribution that may be shifted, context-dependent, partially observed, or stress-induced. This tutorial presents modern generative models, particularly flow- and score-based methods, as mathematical tools for constructing decision-relevant distributions. From an operations research perspective, their primary value lies not in unconstrained sample synthesis but in representing and transforming distributions through transport maps, velocity fields, score fields, and guided stochastic dynamics. We present a unified framework based on pushforward maps, continuity, Fokker-Planck equations, Wasserstein geometry, and optimization in probability space. Within this framework, generative models can be used to learn nominal uncertainty, construct stressed or least-favorable distributions for robustness, and produce conditional or posterior distributions under side information and partial observation. We also highlight representative theoretical guarantees, including forward-reverse convergence for iterative flow models, first-order minimax analysis in transport-map space, and error-transfer bounds for posterior sampling with generative priors. The tutorial provides a principled introduction to using generative models for scenario generation, robust decision-making, uncertainty quantification, and related problems under distributional shift.

new Deep Kuratowski Embedding Neural Networks for Wasserstein Metric Learning

Authors: Andrew Qing He

Abstract: Computing pairwise Wasserstein distances is a fundamental bottleneck in data analysis pipelines. Motivated by the classical Kuratowski embedding theorem, we propose two neural architectures for learning to approximate the Wasserstein-2 distance ($W_2$) from data. The first, DeepKENN, aggregates distances across all intermediate feature maps of a CNN using learnable positive weights. The second, ODE-KENN, replaces the discrete layer stack with a Neural ODE, embedding each input into the infinite-dimensional Banach space $C^1([0,1], \mathbb{R}^d)$ and providing implicit regularization via trajectory smoothness. Experiments on MNIST with exact precomputed $W_2$ distances show that ODE-KENN achieves a 28% lower test MSE than the single-layer baseline and 18% lower than DeepKENN under matched parameter counts, while exhibiting a smaller generalization gap. The resulting fast surrogate can replace the expensive $W_2$ oracle in downstream pairwise distance computations.

new Context is All You Need

Authors: Jean Erik Delanois, Shruti Joshi, Ryan Golden, Teresa Nick, Maxim Bazhenov

Abstract: Artificial Neural Networks (ANNs) are increasingly deployed across diverse real-world settings, where they must operate under data distributions that differ from those seen during training. This challenge is central to Domain Generalization (DG), which trains models to generalize to unseen domains without target data, and Test-Time Adaptation (TTA), which improves robustness by adapting to unlabeled test data at deployment. Existing approaches to address these challenges are often complex, resource-intensive, and difficult to scale. We introduce CONTXT (Contextual augmentatiOn for Neural feaTure X Transforms), a simple and intuitive method for contextual adaptation. CONTXT modulates internal representations using simple additive and multiplicative feature transforms. Within a TTA setting, it yields consistent gains across discriminative tasks (e.g., ANN/CNN classification) and generative models (e.g., LLMs). The method is lightweight, easy to integrate, and incurs minimal overhead, enabling robust performance under domain shift without added complexity. More broadly, CONTXT provides a compact way to steer information flow and neural processing without retraining.

new CPT: Controllable and Editable Design Variations with Language Models

Authors: Karthik Suresh, Amine Ben Khalifa, Li Zhang, Wei-ting Hsu, Fangzheng Wu, Vinay More, Asim Kadav

Abstract: Designing visually diverse and high-quality designs remains a manual, time-consuming process, limiting scalability and personalization in creative workflows. We present a system for generating editable design variations using a decoder-only language model, the Creative Pre-trained Transformer (CPT), trained to predict visual style attributes in design templates. At the core of our approach is a new representation called Creative Markup Language (CML), a compact, machine-learning-friendly format that captures canvas-level structure, page layout, and element-level details (text, images, and vector graphics), including both content and style. We fine-tune CPT on a large corpus of design templates authored by professional designers, enabling it to learn meaningful, context-aware predictions for attributes such as color schemes and font choices. The model produces semantically structured and stylistically coherent outputs, preserving internal consistency across elements. Unlike generative image models, our system yields fully editable design documents rather than pixel-only images, allowing users to iterate and personalize within a design editor. In experiments, our approach generates contextual color and font variations for existing templates and shows promise in adjusting layouts while maintaining design principles.

new Finite-Time Analysis of Q-Value Iteration for General-Sum Stackelberg Games

Authors: Narim Jeong, Donghwan Lee

Abstract: Reinforcement learning has been successful both empirically and theoretically in single-agent settings, but extending these results to multi-agent reinforcement learning in general-sum Markov games remains challenging. This paper studies the convergence of Stackelberg Q-value iteration in two-player general-sum Markov games from a control-theoretic perspective. We introduce a relaxed policy condition tailored to the Stackelberg setting and model the learning dynamics as a switching system. By constructing upper and lower comparison systems, we establish finite-time error bounds for the Q-functions and characterize their convergence properties. Our results provide a novel control-theoretic perspective on Stackelberg learning. Moreover, to the best of the authors' knowledge, this paper offers the first finite-time convergence guarantees for Q-value iteration in general-sum Markov games under Stackelberg interactions.

new Relative Density Ratio Optimization for Stable and Statistically Consistent Model Alignment

Authors: Hiroshi Takahashi, Tomoharu Iwata, Atsutoshi Kumagai, Sekitoshi Kanai, Masanori Yamada, Kosuke Nishida, Kazutoshi Shinoda

Abstract: Aligning language models with human preferences is essential for ensuring their safety and reliability. Although most existing approaches assume specific human preference models such as the Bradley-Terry model, this assumption may fail to accurately capture true human preferences, and consequently, these methods lack statistical consistency, i.e., the guarantee that language models converge to the true human preference as the number of samples increases. In contrast, direct density ratio optimization (DDRO) achieves statistical consistency without assuming any human preference models. DDRO models the density ratio between preferred and non-preferred data distributions using the language model, and then optimizes it via density ratio estimation. However, this density ratio is unstable and often diverges, leading to training instability of DDRO. In this paper, we propose a novel alignment method that is both stable and statistically consistent. Our approach is based on the relative density ratio between the preferred data distribution and a mixture of the preferred and non-preferred data distributions. Our approach is stable since this relative density ratio is bounded above and does not diverge. Moreover, it is statistically consistent and yields significantly tighter convergence guarantees than DDRO. We experimentally show its effectiveness with Qwen 2.5 and Llama 3.

new Is Prompt Selection Necessary for Task-Free Online Continual Learning?

Authors: Seoyoung Park, Haemin Lee, Hankook Lee

Abstract: Task-free online continual learning has recently emerged as a realistic paradigm for addressing continual learning in dynamic, real-world environments, where data arrive in a non-stationary stream without clear task boundaries and can only be observed once. To consider such challenging scenarios, many recent approaches have employed prompt selection, an adaptive strategy that selects prompts from a pool based on input signals. However, we observe that such selection strategies often fail to select appropriate prompts, yielding suboptimal results despite additional training of key parameters. Motivated by this observation, we propose a simple yet effective SinglePrompt that eliminates the need for prompt selection and focuses on classifier optimization. Specifically, we simply (i) inject a single prompt into each self-attention block, (ii) employ a cosine similarity-based logit design to alleviate the forgetting effect inherent in the classifier weights, and (iii) mask logits for unexposed classes in the current minibatch. With this simple task-free design, our framework achieves state-of-the-art performance across various online continual learning benchmarks. Source code is available at https://github.com/efficient-learning-lab/SinglePrompt.

URLs: https://github.com/efficient-learning-lab/SinglePrompt.

new Estimating Central, Peripheral, and Temporal Visual Contributions to Human Decision Making in Atari Games

Authors: Henrik Krauss, Takehisa Yairi

Abstract: We study how different visual information sources contribute to human decision making in dynamic visual environments. Using Atari-HEAD, a large-scale Atari gameplay dataset with synchronized eye-tracking, we introduce a controlled ablation framework as a means to reverse-engineer the contribution of peripheral visual information, explicit gaze information in form of gaze maps, and past-state information from human behavior. We train action-prediction networks under six settings that selectively include or exclude these information sources. Across 20 games, peripheral information shows by far the strongest contribution, with median prediction-accuracy drops in the range of 35.27-43.90% when removed. Gaze information yields smaller drops of 2.11-2.76%, while past-state information shows a broader range of 1.52-15.51%, with the upper end likely more informative due to reduced peripheral-information leakage. To complement aggregate accuracies, we cluster states by true-action probabilities assigned by the different model configurations. This analysis identifies coarse behavioral regimes, including focus-dominated, periphery-dominated, and more contextual decision situations. These results suggest that human decision making in Atari depends strongly on information beyond the current focus of gaze, while the proposed framework provides a way to estimate such information-source contributions from behavior.

new TinyNina: A Resource-Efficient Edge-AI Framework for Sustainable Air Quality Monitoring via Intra-Image Satellite Super-Resolution

Authors: Prasanjit Dey, Zachary Yahn, Bianca Schoen-Phelan, Soumyabrata Dev

Abstract: Nitrogen dioxide (NO$_2$) is a primary atmospheric pollutant and a significant contributor to respiratory morbidity and urban climate-related challenges. While satellite platforms like Sentinel-2 provide global coverage, their native spatial resolution often limits the precision required, fine-grained NO$_2$ assessment. To address this, we propose TinyNina, a resource-efficient Edge-AI framework specifically engineered for sustainable environmental monitoring. TinyNina implements a novel intra-image learning paradigm that leverages the multi-spectral hierarchy of Sentinel-2 as internal training labels, effectively eliminating the dependency on costly and often unavailable external high-resolution reference datasets. The framework incorporates wavelength-specific attention gates and depthwise separable convolutions to preserve pollutant-sensitive spectral features while maintaining an ultra-lightweight footprint of only 51K parameters. Experimental results, validated against 3,276 matched satellite-ground station pairs, demonstrate that TinyNina achieves a state-of-the-art Mean Absolute Error (MAE) of 7.4 $\mu$g/m$^3$. This performance represents a 95% reduction in computational overhead and 47$\times$ faster inference compared to high-capacity models such as EDSR and RCAN. By prioritizing task-specific utility and architectural efficiency, TinyNina provides a scalable, low-latency solution for real-time air quality monitoring in smart city infrastructures.

new DP-OPD: Differentially Private On-Policy Distillation for Language Models

Authors: Fatemeh Khadem, Sajad Mousavi, Yi Fang, Yuhong Liu

Abstract: Large language models (LLMs) are increasingly adapted to proprietary and domain-specific corpora that contain sensitive information, creating a tension between formal privacy guarantees and efficient deployment through model compression. Differential privacy (DP), typically enforced via DP-SGD, provides record-level protection but often incurs substantial utility loss in autoregressive generation, where optimization noise can amplify exposure bias and compounding errors along long rollouts. Existing approaches to private distillation either apply DP-SGD to both teacher and student, worsening computation and the privacy--utility tradeoff, or rely on DP synthetic text generation from a DP-trained teacher, avoiding DP on the student at the cost of DP-optimizing a large teacher and introducing an offline generation pipeline. We propose \textbf{Differentially Private On-Policy Distillation (DP-OPD)}, a synthesis-free framework that enforces privacy solely through DP-SGD on the student while leveraging a frozen teacher to provide dense token-level targets on \emph{student-generated} trajectories. DP-OPD instantiates this idea via \emph{private generalized knowledge distillation} on continuation tokens. Under a strict privacy budget ($\varepsilon=2.0$), DP-OPD improves perplexity over DP fine-tuning and off-policy DP distillation, and outperforms synthesis-based DP distillation (Yelp: 44.15$\rightarrow$41.68; BigPatent: 32.43$\rightarrow$30.63), while substantially simplifying the training pipeline. In particular, \textbf{DP-OPD collapses private compression into a single DP student-training loop} by eliminating DP teacher training and offline synthetic text generation. Code will be released upon publication at https://github.com/khademfatemeh/dp_opd.

URLs: https://github.com/khademfatemeh/dp_opd.

new MAVEN: A Mesh-Aware Volumetric Encoding Network for Simulating 3D Flexible Deformation

Authors: Zhe Feng, Shilong Tao, Haonan Sun, Shaohan Chen, Zhanxing Zhu, Yunhuai Liu

Abstract: Deep learning-based approaches, particularly graph neural networks (GNNs), have gained prominence in simulating flexible deformations and contacts of solids, due to their ability to handle unstructured physical fields and nonlinear regression on graph structures. However, existing GNNs commonly represent meshes with graphs built solely from vertices and edges. These approaches tend to overlook higher-dimensional spatial features, e.g., 2D facets and 3D cells, from the original geometry. As a result, it is challenging to accurately capture boundary representations and volumetric characteristics, though this information is critically important for modeling contact interactions and internal physical quantity propagation, particularly under sparse mesh discretization. In this paper, we introduce MAVEN, a mesh-aware volumetric encoding network for simulating 3D flexible deformation, which explicitly models geometric mesh elements of higher dimension to achieve a more accurate and natural physical simulation. MAVEN establishes learnable mappings among 3D cells, 2D facets, and vertices, enabling flexible mutual transformations. Explicit geometric features are incorporated into the model to alleviate the burden of implicitly learning geometric patterns. Experimental results show that MAVEN consistently achieves state-of-the-art performance across established datasets and a novel metal stretch-bending task featuring large deformations and prolonged contacts.

new Discrete Prototypical Memories for Federated Time Series Foundation Models

Authors: Liwei Deng, Qingxiang Liu, Xinhe Niu, Shengchao Chen, Sheng Sun, Yuankai Wu, Guodong Long, Yuxuan Liang

Abstract: Leveraging Large Language Models (LLMs) as federated learning (FL)-based time series foundation models offers a promising way to transfer the generalization capabilities of LLMs to time series data while preserving access to private data. However, the semantic misalignment between time-series data and the text-centric latent space of existing LLMs often leads to degraded performance. Meanwhile, the parameter-sharing mechanism in existing FL methods model heterogeneous cross-domain time-series data into a unified continuous latent space, which contradicts the fact that time-series semantics frequently manifest as discrete and recurring regimes. To address these limitations, we propose \textsc{FeDPM}, a federated framework for time-series foundation models based on discrete prototypical memories. Specifically, we learn local prototypical memory priors for intra-domain time-series data. We then align cross-domain memories to promote a unified discrete latent space and introduce a domain-specific memory update mechanism to balance shared and personalized prototypical knowledge. Extensive experiments demonstrate the efficiency and effectiveness of \textsc{FeDPM}. The code is publicly available at https://anonymous.4open.science/r/FedUnit-64D1.

URLs: https://anonymous.4open.science/r/FedUnit-64D1.

new ECG Biometrics with ArcFace-Inception: External Validation on MIMIC and HEEDB

Authors: Arjuna Scagnetto

Abstract: ECG biometrics has been studied mainly on small cohorts and short inter-session intervals, leaving open how identification behaves under large galleries, external domain shift, and multi-year temporal gaps. We evaluated a 1D Inception-v1 model trained with ArcFace on an internal clinical corpus of 164,440 12-lead ECGs from 53,079 patients and tested it on larger cohorts derived from MIMIC-IV-ECG and HEEDB. The study used a unified closed-set leave-one-out protocol with Rank@K and TAR@FAR metrics, together with scale, temporal-stress, reranking, and confidence analyses. Under general comparability, the system achieved Rank@1 of 0.9506 on ASUGI-DB, 0.8291 on MIMIC-GC, and 0.6884 on HEEDB-GC. In the temporal stress test at constant gallery size, Rank@1 declined from 0.7853 to 0.6433 on MIMIC and from 0.6864 to 0.5560 on HEEDB from 1 to 5 years. Scale analysis on HEEDB showed monotonic degradation as gallery size increased and recovery as more examinations per patient became available. On HEEDB-RR, post-hoc reranking further improved retrieval, with AS-norm reaching Rank@1 = 0.8005 from a 0.7765 baseline. ECG identity information therefore remains measurable under externally validated large-scale closed-set conditions, but its operational quality is strongly affected by domain heterogeneity, longitudinal drift, gallery size, and second-stage score processing.

new Isokinetic Flow Matching for Pathwise Straightening of Generative Flows

Authors: Tauhid Khan

Abstract: Flow Matching (FM) constructs linear conditional probability paths, but the learned marginal velocity field inevitably exhibits strong curvature due to trajectory superposition. This curvature severely inflates numerical truncation errors, bottlenecking few-step sampling. To overcome this, we introduce Isokinetic Flow Matching (Iso-FM), a lightweight, Jacobian-free dynamical regularizer that directly penalizes pathwise acceleration. By using a self-guided finite-difference approximation of the material derivative Dv/Dt, Iso-FM enforces local velocity consistency without requiring auxiliary encoders or expensive second-order autodifferentiation. Operating as a pure plug-and-play addition to single-stage FM training, Iso-FM dramatically improves few-step generation. On CIFAR-10 (DiT-S/2), Iso-FM slashes conditional non-OT FID at 2 steps from 78.82 to 27.13 - a 2.9x relative efficiency gain - and reaches a best-observed FID at 4 steps of 10.23. These results firmly establish acceleration regularization as a principled, compute-efficient mechanism for fast generative sampling.

new SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models

Authors: Ziwei Li, Yuang Ma, Yi Kang

Abstract: The rapid growth of large language models (LLMs) presents significant deployment challenges due to their massive computational and memory demands. While model compression, such as network pruning, offers potential solutions, most existing methods often fail to maintain good performance at high compression ratios. To address this, we propose SLaB, a novel framework that decomposes each linear layer weight into three complementary components: a sparse matrix, a low-rank matrix, and a binary matrix. SLaB eliminates the need for retraining and leverages activation-aware pruning scores to guide the decomposition process. Experiments on Llama-family models demonstrate that SLaB achieves state-of-the-art performance, reducing perplexity by up to 36% compared to existing methods at 50% compression and improving accuracy by up to 8.98% over the baseline on zero-shot tasks.

new One Model for All: Multi-Objective Controllable Language Models

Authors: Qiang He, Yucheng Yang, Tianyi Zhou, Meng Fang, Mykola Pechenizkiy, Setareh Maghsudi

Abstract: Aligning large language models (LLMs) with human preferences is critical for enhancing LLMs' safety, helpfulness, humor, faithfulness, etc. Current reinforcement learning from human feedback (RLHF) mainly focuses on a fixed reward learned from average human ratings, which may weaken the adaptability and controllability of varying preferences. However, creating personalized LLMs requires aligning LLMs with individual human preferences, which is non-trivial due to the scarce data per user and the diversity of user preferences in multi-objective trade-offs, varying from emphasizing empathy in certain contexts to demanding efficiency and precision in others. Can we train one LLM to produce personalized outputs across different user preferences on the Pareto front? In this paper, we introduce Multi-Objective Control (MOC), which trains a single LLM to directly generate responses in the preference-defined regions of the Pareto front. Our approach introduces multi-objective optimization (MOO) principles into RLHF to train an LLM as a preference-conditioned policy network. We improve the computational efficiency of MOC by applying MOO at the policy level, enabling us to fine-tune a 7B-parameter model on a single A6000 GPU. Extensive experiments demonstrate the advantages of MOC over baselines in three aspects: (i) controllability of LLM outputs w.r.t. user preferences on the trade-off among multiple rewards; (ii) quality and diversity of LLM outputs, measured by the hyper-volume of multiple solutions achieved; and (iii) generalization to unseen preferences. These results highlight MOC's potential for real-world applications requiring scalable and customizable LLMs.

new GAIN: Multiplicative Modulation for Domain Adaptation

Authors: Hengshuai Yao, Xing Chen, Ahmed Murtadha, Guan Wang

Abstract: Adapting LLMs to new domains causes forgetting because standard methods (full fine-tuning, LoRA) inject new directions into the weight space. We propose GAIN, which re-emphasizes existing features through multiplicative modulation W_new = S * W. The learned diagonal matrix S is applied to the attention output projection and optionally the FFN. The principle mirrors gain modulation in neuroscience, where neurons adapt to context by scaling response strength while preserving selectivity. We evaluate GAIN on five models from four families (774M to 70B), adapting sequentially across eight domains. GAIN-FFN matches LoRA's in-domain adaptation, but their effects on previously trained domains are opposite: GAIN-FFN improves them by 7-13% (validation PPL), while LoRA degrades them by 18-36%. Downstream accuracy confirms the pattern: for example, after seven sequential adaptations on Qwen2.5, GAIN-FFN degrades BoolQ by only 0.8% while LoRA damages it by 14.9%. GAIN adds 46K-230K parameters per model and can be absorbed into the pretrained weights for zero inference cost.

new Reproducibility study on how to find Spurious Correlations, Shortcut Learning, Clever Hans or Group-Distributional non-robustness and how to fix them

Authors: Ole Delzer, Sidney Bender

Abstract: Deep Neural Networks (DNNs) are increasingly utilized in high-stakes domains like medical diagnostics and autonomous driving where model reliability is critical. However, the research landscape for ensuring this reliability is terminologically fractured across communities that pursue the same goal of ensuring models rely on causally relevant features rather than confounding signals. While frameworks such as distributionally robust optimization (DRO), invariant risk minimization (IRM), shortcut learning, simplicity bias, and the Clever Hans effect all address model failure due to spurious correlations, researchers typically only reference work within their own domains. This reproducibility study unifies these perspectives through a comparative analysis of correction methods under challenging constraints like limited data availability and severe subgroup imbalance. We evaluate recently proposed correction methods based on explainable artificial intelligence (XAI) techniques alongside popular non-XAI baselines using both synthetic and real-world datasets. Findings show that XAI-based methods generally outperform non-XAI approaches, with Counterfactual Knowledge Distillation (CFKD) proving most consistently effective at improving generalization. Our experiments also reveal that the practical application of many methods is hindered by a dependency on group labels, as manual annotation is often infeasible and automated tools like Spectral Relevance Analysis (SpRAy) struggle with complex features and severe imbalance. Furthermore, the scarcity of minority group samples in validation sets renders model selection and hyperparameter tuning unreliable, posing a significant obstacle to the deployment of robust and trustworthy models in safety-critical areas.

new Learning from Equivalence Queries, Revisited

Authors: Mark Braverman, Roi Livni, Yishay Mansour, Shay Moran, Kobbi Nissim

Abstract: Modern machine learning systems, such as generative models and recommendation systems, often evolve through a cycle of deployment, user interaction, and periodic model updates. This differs from standard supervised learning frameworks, which focus on loss or regret minimization over a fixed sequence of prediction tasks. Motivated by this setting, we revisit the classical model of learning from equivalence queries, introduced by Angluin (1988). In this model, a learner repeatedly proposes hypotheses and, when a deployed hypothesis is inadequate, receives a counterexample. Under fully adversarial counterexample generation, however, the model can be overly pessimistic. In addition, most prior work assumes a \emph{full-information} setting, where the learner also observes the correct label of the counterexample, an assumption that is not always natural. We address these issues by restricting the environment to a broad class of less adversarial counterexample generators, which we call \emph{symmetric}. Informally, such generators choose counterexamples based only on the symmetric difference between the hypothesis and the target. This class captures natural mechanisms such as random counterexamples (Angluin and Dohrn, 2017; Bhatia, 2021; Chase, Freitag, and Reyzin, 2024), as well as generators that return the simplest counterexample according to a prescribed complexity measure. Within this framework, we study learning from equivalence queries under both full-information and bandit feedback. We obtain tight bounds on the number of learning rounds in both settings and highlight directions for future work. Our analysis combines a game-theoretic view of symmetric adversaries with adaptive weighting methods and minimax arguments.

new FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control

Authors: Donghu Kim, Youngdo Lee, Minho Park, Kinam Kim, I Made Aswin Nahendra, Takuma Seno, Sehee Min, Daniel Palenicek, Florian Vogt, Danica Kragic, Jan Peters, Jaegul Choo, Hojoon Lee

Abstract: Reinforcement learning (RL) is a core approach for robot control when expert demonstrations are unavailable. On-policy methods such as Proximal Policy Optimization (PPO) are widely used for their stability, but their reliance on narrowly distributed on-policy data limits accurate policy evaluation in high-dimensional state and action spaces. Off-policy methods can overcome this limitation by learning from a broader state-action distribution, yet suffer from slow convergence and instability, as fitting a value function over diverse data requires many gradient updates, causing critic errors to accumulate through bootstrapping. We present FlashSAC, a fast and stable off-policy RL algorithm built on Soft Actor-Critic. Motivated by scaling laws observed in supervised learning, FlashSAC sharply reduces gradient updates while compensating with larger models and higher data throughput. To maintain stability at increased scale, FlashSAC explicitly bounds weight, feature, and gradient norms, curbing critic error accumulation. Across over 60 tasks in 10 simulators, FlashSAC consistently outperforms PPO and strong off-policy baselines in both final performance and training efficiency, with the largest gains on high-dimensional tasks such as dexterous manipulation. In sim-to-real humanoid locomotion, FlashSAC reduces training time from hours to minutes, demonstrating the promise of off-policy RL for sim-to-real transfer.

new Beyond Imbalance Ratio: Data Characteristics as Critical Moderators of Oversampling Method Selection

Authors: Yuwen Jiang, Songyun Ye

Abstract: The prevailing IR-threshold paradigm posits a positive correlation between imbalance ratio (IR) and oversampling effectiveness, yet this assumption remains empirically unsubstantiated through controlled experimentation. We conducted 12 controlled experiments (N > 100 dataset variants) that systematically manipulated IR while holding data characteristics (class separability, cluster structure) constant via algorithmic generation of Gaussian mixture datasets. Two additional validation experiments examined ceiling effects and metric-dependence. All methods were evaluated on 17 real-world datasets from OpenML. Upon controlling for confounding variables, IR exhibited a weak to moderate negative correlation with oversampling benefits. Class separability emerged as a substantially stronger moderator, accounting for significantly more variance in method effectiveness than IR alone. We propose a 'Context Matters' framework that integrates IR, class separability, and cluster structure to provide evidence-based selection criteria for practitioners.

new Dynamic Free-Rider Detection in Federated Learning via Simulated Attack Patterns

Authors: Motoki Nakamura

Abstract: Federated learning (FL) enables multiple clients to collaboratively train a global model by aggregating local updates without sharing private data. However, FL often faces the challenge of free-riders, clients who submit fake model parameters without performing actual training to obtain the global model without contributing. Chen et al. proposed a free-rider detection method based on the weight evolving frequency (WEF) of model parameters. This detection approach is a leading candidate for practical free-rider detection methods, as it requires neither a proxy dataset nor pre-training. Nevertheless, it struggles to detect ``dynamic'' free-riders who behave honestly in early rounds and later switch to free-riding, particularly under global-model-mimicking attacks such as the delta weight attack and our newly proposed adaptive WEF-camouflage attack. In this paper, we propose a novel detection method S2-WEF that simulates the WEF patterns of potential global-model-based attacks on the server side using previously broadcasted global models, and identifies clients whose submitted WEF patterns resemble the simulated ones. To handle a variety of free-rider attack strategies, S2-WEF further combines this simulation-based similarity score with a deviation score computed from mutual comparisons among submitted WEFs, and separates benign and free-rider clients by two-dimensional clustering and per-score classification. This method enables dynamic detection of clients that transition into free-riders during training without proxy datasets or pre-training. We conduct extensive experiments across three datasets and five attack types, demonstrating that S2-WEF achieves higher robustness than existing approaches.

new A Clinical Point Cloud Paradigm for In-Hospital Mortality Prediction from Multi-Level Incomplete Multimodal EHRs

Authors: Bohao Li, Tao Zou, Junchen Ye, Yan Gong, Bowen Du

Abstract: Deep learning-based modeling of multimodal Electronic Health Records (EHRs) has become an important approach for clinical diagnosis and risk prediction. However, due to diverse clinical workflows and privacy constraints, raw EHRs are inherently multi-level incomplete, including irregular sampling, missing modalities, and sparse labels. These issues cause temporal misalignment, modality imbalance, and limited supervision. Most existing multimodal methods assume relatively complete data, and even methods designed for incompleteness usually address only one or two of these issues in isolation. As a result, they often rely on rigid temporal/modal alignment or discard incomplete data, which may distort raw clinical semantics. To address this problem, we propose HealthPoint (HP), a unified clinical point cloud paradigm for multi-level incomplete EHRs. HP represents heterogeneous clinical events as points in a continuous 4D space defined by content, time, modality, and case. To model interactions between arbitrary point pairs, we introduce a Low-Rank Relational Attention mechanism that efficiently captures high-order dependencies across these four dimensions. We further develop a hierarchical interaction and sampling strategy to balance fine-grained modeling and computational efficiency. Built on this framework, HP enables flexible event-level interaction and fine-grained self-supervision, supporting robust modality recovery and effective use of unlabeled data. Experiments on large-scale EHR datasets for risk prediction show that HP consistently achieves state-of-the-art performance and strong robustness under varying degrees of incompleteness.

new From Curiosity to Caution: Mitigating Reward Hacking for Best-of-N with Pessimism

Authors: Zhuohao Yu, Zhiwei Steven Wu, Adam Block

Abstract: Inference-time compute scaling has emerged as a powerful paradigm for improving language model performance on a wide range of tasks, but the question of how best to use the additional compute remains open. A popular approach is BoN sampling, where N candidate responses are generated, scored according to a reward model, and the highest-scoring response is selected. While this approach can improve performance, it is vulnerable to reward hacking, where performance degrades as N increases due to the selection of responses that exploit imperfections in the reward model instead of genuinely improving generation quality. Prior attempts to mitigate reward hacking, via stronger reward models or heavy-handed distributional regularization, either fail to fully address over-optimization or are too conservative to exploit additional compute. In this work, we explore the principle of pessimism in RL, which uses lower confidence bounds on value estimates to avoid OOD actions with uncertain reward estimates. Our approach, termed as caution, can be seen as the reverse of curiosity: where curiosity rewards prediction error as a signal of novelty, caution penalizes prediction error as a signal of distributional uncertainty. Practically, caution trains an error model on typical responses and uses its prediction error to lower reward estimates for atypical ones. Our extensive empirical evaluation demonstrates that caution is a simple, computationally efficient approach that substantially mitigates reward hacking in BoN sampling. We also provide a theoretical analysis in a simplified linear setting, which shows that caution provably improves over the standard BoN approach. Together, our results not only establish caution as a practical solution to reward hacking, but also provide evidence that curiosity-based approaches can be a general OOD detection technique in LLM settings.

new Grokking as Dimensional Phase Transition in Neural Networks

Authors: Ping Wang

Abstract: Neural network grokking -- the abrupt memorization-to-generalization transition -- challenges our understanding of learning dynamics. Through finite-size scaling of gradient avalanche dynamics across eight model scales, we find that grokking is a \textit{dimensional phase transition}: effective dimensionality~$D$ crosses from sub-diffusive (subcritical, $D < 1$) to super-diffusive (supercritical, $D > 1$) at generalization onset, exhibiting self-organized criticality (SOC). Crucially, $D$ reflects \textbf{gradient field geometry}, not network architecture: synthetic i.i.d.\ Gaussian gradients maintain $D \approx 1$ regardless of graph topology, while real training exhibits dimensional excess from backpropagation correlations. The grokking-localized $D(t)$ crossing -- robust across topologies -- offers new insight into the trainability of overparameterized networks.

new Anticipatory Reinforcement Learning: From Generative Path-Laws to Distributional Value Functions

Authors: Daniel Bloch

Abstract: This paper introduces Anticipatory Reinforcement Learning (ARL), a novel framework designed to bridge the gap between non-Markovian decision processes and classical reinforcement learning architectures, specifically under the constraint of a single observed trajectory. In environments characterised by jump-diffusions and structural breaks, traditional state-based methods often fail to capture the essential path-dependent geometry required for accurate foresight. We resolve this by lifting the state space into a signature-augmented manifold, where the history of the process is embedded as a dynamical coordinate. By utilising a self-consistent field approach, the agent maintains an anticipated proxy of the future path-law, allowing for a deterministic evaluation of expected returns. This transition from stochastic branching to a single-pass linear evaluation significantly reduces computational complexity and variance. We prove that this framework preserves fundamental contraction properties and ensures stable generalisation even in the presence of heavy-tailed noise. Our results demonstrate that by grounding reinforcement learning in the topological features of path-space, agents can achieve proactive risk management and superior policy stability in highly volatile, continuous-time environments.

new Batch Loss Score for Dynamic Data Pruning

Authors: Qing Zhou, Bingxuan Zhao, Tao Yang, Hongyuan Zhang, Junyu Gao, Qi Wang

Abstract: Dynamic data pruning accelerates deep learning by selectively omitting less informative samples during training. While per-sample loss is a common importance metric, obtaining it can be challenging or infeasible for complex models or loss functions, often requiring significant implementation effort. This work proposes the Batch Loss Score (BLS), a computationally efficient alternative using an Exponential Moving Average (EMA) of readily available batch losses to assign scores to individual samples. We frame the batch loss, from the perspective of a single sample, as a noisy measurement of its scaled individual loss, with noise originating from stochastic batch composition. It is formally shown that the EMA mechanism functions as a first-order low-pass filter, attenuating high-frequency batch composition noise. This yields a score approximating the smoothed and persistent contribution of the individual sample to the loss, providing a theoretical grounding for BLS as a proxy for sample importance. BLS demonstrates remarkable code integration simplicity (\textbf{three-line injection}) and readily adapts existing per-sample loss-based methods (\textbf{one-line proxy}). Its effectiveness is demonstrated by enhancing two such methods to losslessly prune \textbf{20\%-50\%} of samples across \textit{14 datasets}, \textit{11 tasks} and \textit{18 models}, highlighting its utility and broad applicability, especially for complex scenarios where per-sample loss is difficult to access. Code is available at https://github.com/mrazhou/BLS.

URLs: https://github.com/mrazhou/BLS.

new Explainable Machine Learning for Sepsis Outcome Prediction Using a Novel Romanian Electronic Health Record Dataset

Authors: Andrei-Alexandru Bunea, Ovidiu Ghibea, Dan-Matei Popovici, Ion Daniel, Octavian Andronic

Abstract: We develop and analyze explainable machine learning (ML) models for sepsis outcome prediction using a novel Electronic Health Record (EHR) dataset from 12,286 hospitalizations at a large emergency hospital in Romania. The dataset includes demographics, International Classification of Diseases (ICD-10) diagnostics, and 600 types of laboratory tests. This study aims to identify clinically strong predictors while achieving state-of-the-art results across three classification tasks: (1)deceased vs. discharged, (2)deceased vs. recovered, and (3)recovered vs. ameliorated. We trained five ML models to capture complex distributions while preserving clinical interpretability. Experiments explored the trade-off between feature richness and patient coverage, using subsets of the 10--50 most frequent laboratory tests. Model performance was evaluated using accuracy and area under the curve (AUC), and explainability was assessed using SHapley Additive exPlanations (SHAP). The highest performance was obtained for the deceased vs. recovered case study (AUC=0.983, accuracy=0.93). SHAP analysis identified several strong predictors such as cardiovascular comorbidities, urea levels, aspartate aminotransferase, platelet count, and eosinophil percentage. Eosinopenia emerged as a top predictor, highlighting its value as an underutilized marker that is not included in current assessment standards, while the high performance suggests the applicability of these models in clinical settings.

new MUXQ: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank Outlier Decomposition

Authors: Seoungsub Lee, In Seo Kim, Seon Wook Kim

Abstract: Large language models (LLMs) have achieved outstanding performance across a wide range of natural language processing tasks, but their enormous parameter counts impose ubstantial memory and computational overheads. This challenge is particularly critical in NPU-based on-device environments, where FP16/FP32 computation is inefficient and integer (INT) quantization is therefore essential. However, existing methods, including ZeroQuant, LLM.int8(), and SmoothQuant, do not fully address input-activation outliers and the associated hardware inefficiencies. To overcome these limitations, we propose MUXQ (Mixed-to-Uniform Quantization). MUXQ detects outlier channels in input activations and introduces a small auxiliary matrix that redistributes outlier magnitudes across channels, thereby alleviating the outlier problem. This enables even activation outliers to be quantized at low-precision INT levels while preserving a hardware-friendly computation structure. Experiments on GPT-2 models at three scales (0.1B, 0.3B, and 0.7B parameters) using the WikiText-2 dataset show that MUXQ consistently achieves lower perplexity than naive quantization. In particular, under per-tensor quantization, MUXQ quantizes both activations and weights to INT8 while maintaining accuracy close to that of FP16. With only modest computational overhead, MUXQ enables stable low-precision inference and can be readily combined with other quantization techniques. These results suggest that MUXQ provides a promising direction for efficient and accurate LLM inference on edge devices.

new The Infinite-Dimensional Nature of Spectroscopy and Why Models Succeed, Fail, and Mislead

Authors: Umberto Michelucci, Francesca Venturini

Abstract: Machine learning (ML) models have achieved strikingly high accuracies in spectroscopic classification tasks, often without a clear proof that those models used chemically meaningful features. Existing studies have linked these results to data preprocessing choices, noise sensitivity, and model complexity, but no unifying explanation is available so far. In this work, we show that these phenomena arise naturally from the intrinsic high dimensionality of spectral data. Using a theoretical analysis grounded in the Feldman-Hajek theorem and the concentration of measure, we show that even infinitesimal distributional differences, caused by noise, normalisation, or instrumental artefacts, may become perfectly separable in high-dimensional spaces. Through a series of specific experiments on synthetic and real fluorescence spectra, we illustrate how models can achieve near-perfect accuracy even when chemical distinctions are absent, and why feature-importance maps may highlight spectrally irrelevant regions. We provide a rigorous theoretical framework, confirm the effect experimentally, and conclude with practical recommendations for building and interpreting ML models in spectroscopy.

new Sampling Parallelism for Fast and Efficient Bayesian Learning

Authors: Asena Karolin \"Ozdemir, Lars H. Heyen, Arvid Weyrauch, Achim Streit, Markus G\"otz, Charlotte Debus

Abstract: Machine learning models, and deep neural networks in particular, are increasingly deployed in risk-sensitive domains such as healthcare, environmental forecasting, and finance, where reliable quantification of predictive uncertainty is essential. However, many uncertainty quantification (UQ) methods remain difficult to apply due to their substantial computational cost. Sampling-based Bayesian learning approaches, such as Bayesian neural networks (BNNs), are particularly expensive since drawing and evaluating multiple parameter samples rapidly exhausts memory and compute resources. These constraints have limited the accessibility and exploration of Bayesian techniques thus far. To address these challenges, we introduce sampling parallelism, a simple yet powerful parallelization strategy that targets the primary bottleneck of sampling-based Bayesian learning: the samples themselves. By distributing sample evaluations across multiple GPUs, our method reduces memory pressure and training time without requiring architectural changes or extensive hyperparameter tuning. We detail the methodology and evaluate its performance on a few example tasks and architectures, comparing against distributed data parallelism (DDP) as a baseline. We further demonstrate that sampling parallelism is complementary to existing strategies by implementing a hybrid approach that combines sample and data parallelism. Our experiments show near-perfect scaling when the sample number is scaled proportionally to the computational resources, confirming that sample evaluations parallelize cleanly. Although DDP achieves better raw speedups under scaling with constant workload, sampling parallelism has a notable advantage: by applying independent stochastic augmentations to the same batch on each GPU, it increases augmentation diversity and thus reduces the number of epochs required for convergence.

new Darkness Visible: Reading the Exception Handler of a Language Model

Authors: Peter Balogh

Abstract: The final MLP of GPT-2 Small exhibits a fully legible routing program -- 27 named neurons organized into a three-tier exception handler -- while the knowledge it routes remains entangled across ~3,040 residual neurons. We decompose all 3,072 neurons (to numerical precision) into: 5 fused Core neurons that reset vocabulary toward function words, 10 Differentiators that suppress wrong candidates, 5 Specialists that detect structural boundaries, and 7 Consensus neurons that each monitor a distinct linguistic dimension. The consensus-exception crossover -- where MLP intervention shifts from helpful to harmful -- is statistically sharp (bootstrap 95% CIs exclude zero at all consensus levels; crossover between 4/7 and 5/7). Three experiments show that "knowledge neurons" (Dai et al., 2022), at L11 of this model, function as routing infrastructure rather than fact storage: the MLP amplifies or suppresses signals already present in the residual stream from attention, scaling with contextual constraint. A garden-path experiment reveals a reversed garden-path effect -- GPT-2 uses verb subcategorization immediately, consistent with the exception handler operating at token-level predictability rather than syntactic structure. This architecture crystallizes only at the terminal layer -- in deeper models, we predict equivalent structure at the final layer, not at layer 11. Code and data: https://github.com/pbalogh/transparent-gpt2

URLs: https://github.com/pbalogh/transparent-gpt2

new Cog-DRIFT: Exploration on Adaptively Reformulated Instances Enables Learning from Hard Reasoning Problems

Authors: Justin Chih-Yao Chen, Archiki Prasad, Zaid Khan, Joykirat Singh, Runchu Tian, Elias Stengel-Eskin, Mohit Bansal

Abstract: Reinforcement learning from verifiable rewards (RLVR) has improved the reasoning abilities of LLMs, yet a fundamental limitation remains: models cannot learn from problems that are too difficult to solve under their current policy, as these yield no meaningful reward signal. We propose a simple yet effective solution based on task reformulation. We transform challenging open-ended problems into cognitively simpler variants -- such as multiple-choice and cloze formats -- that preserve the original answer while reducing the effective search space and providing denser learning signals. These reformulations span a spectrum from discriminative to generative tasks, which we exploit to bootstrap learning: models first learn from structured, easier formats, and this knowledge transfers back to improve performance on the original open-ended problems. Building on this insight, we introduce Cog-DRIFT, a framework that constructs reformulated variants and organizes them into an adaptive curriculum based on difficulty. Training progresses from easier to harder formats, enabling the model to learn from problems that previously yielded zero signal under standard RL post-training. Cog-DRIFT not only improves on the originally unsolvable hard problems (absolute +10.11% for Qwen and +8.64% for Llama) but also generalizes well to other held-out datasets. Across 2 models and 6 reasoning benchmarks, our method consistently outperforms standard GRPO and strong guided-exploration baselines. On average, Cog-DRIFT shows +4.72% (Qwen) and +3.23% (Llama) improvements over the second-best baseline. We further show that Cog-DRIFT improves pass@k at test time, and the curriculum improves sample efficiency. Overall, our results highlight task reformulation and curriculum learning as an effective paradigm for overcoming the exploration barrier in LLM post-training.

new Forgetting to Witness: Efficient Federated Unlearning and Its Visible Evaluation

Authors: Houzhe Wang, Xiaojie Zhu, Chi Chen

Abstract: With the increasing importance of data privacy and security, federated unlearning has emerged as a novel research field dedicated to ensuring that federated learning models no longer retain or leak relevant information once specific data has been deleted. In this paper, to the best of our knowledge, we propose the first complete pipeline for federated unlearning, which includes a federated unlearning approach and an evaluation framework. Our proposed federated unlearning approach ensures high efficiency and model accuracy without the need to store historical data.It effectively leverages the knowledge distillation model alongside various optimization mechanisms. Moreover, we propose a framework named Skyeye to visualize the forgetting capacity of federated unlearning models. It utilizes the federated unlearning model as the classifier integrated into a Generative Adversarial Network (GAN). Afterward, both the classifier and discriminator guide the generator in generating samples. Throughout this process, the generator learns from the classifier's knowledge. The generator then visualizes this knowledge through sample generation. Finally, the model's forgetting capability is evaluated based on the relevance between the deleted data and the generated samples. Comprehensive experiments are conducted to illustrate the effectiveness of the proposed federated unlearning approach and the corresponding evaluation framework.

new Selecting Decision-Relevant Concepts in Reinforcement Learning

Authors: Naveen Raman, Stephanie Milani, Fei Fang

Abstract: Training interpretable concept-based policies requires practitioners to manually select which human-understandable concepts an agent should reason with when making sequential decisions. This selection demands domain expertise, is time-consuming and costly, scales poorly with the number of candidates, and provides no performance guarantees. To overcome this limitation, we propose the first algorithms for principled automatic concept selection in sequential decision-making. Our key insight is that concept selection can be viewed through the lens of state abstraction: intuitively, a concept is decision-relevant if removing it would cause the agent to confuse states that require different actions. As a result, agents should rely on decision-relevant concepts; states with the same concept representation should share the same optimal action, which preserves the optimal decision structure of the original state space. This perspective leads to the Decision-Relevant Selection (DRS) algorithm, which selects a subset of concepts from a candidate set, along with performance bounds relating the selected concepts to the performance of the resulting policy. Empirically, DRS automatically recovers manually curated concept sets while matching or exceeding their performance, and improves the effectiveness of test-time concept interventions across reinforcement learning benchmarks and real-world healthcare environments.

new The Role of Generator Access in Autoregressive Post-Training

Authors: Amit Kiran Rege

Abstract: We study how generator access constrains autoregressive post-training. The central question is whether the learner is confined to fresh root-start rollouts or can return to previously built prefixes and query the next-token rule there. In the root-start regime, output sampling, generated-token log probabilities, top-$k$ reports, and full next-token distributions along sampled trajectories all reduce to one canonical experiment, limited by the on-policy probability of reaching informative prefixes. Weak prefix control breaks this barrier, and once control is available, richer observations such as conditional sampling or logits can outperform top-$1$ access. Changing only the generator interface creates an exponential gap for KL-regularized outcome-reward post-training.

new FairLogue: A Toolkit for Intersectional Fairness Analysis in Clinical Machine Learning Models

Authors: Nick Souligne, Vignesh Subbian

Abstract: Objective: Algorithmic fairness is essential for equitable and trustworthy machine learning in healthcare. Most fairness tools emphasize single-axis demographic comparisons and may miss compounded disparities affecting intersectional populations. This study introduces Fairlogue, a toolkit designed to operationalize intersectional fairness assessment in observational and counterfactual contexts within clinical settings. Methods: Fairlogue is a Python-based toolkit composed of three components: 1) an observational framework extending demographic parity, equalized odds, and equal opportunity difference to intersectional populations; 2) a counterfactual framework evaluating fairness under treatment-based contexts; and 3) a generalized counterfactual framework assessing fairness under interventions on intersectional group membership. The toolkit was evaluated using electronic health record data from the All of Us Controlled Tier V8 dataset in a glaucoma surgery prediction task using logistic regression with race and gender as protected attributes. Results: Observational analysis identified substantial intersectional disparities despite moderate model performance (AUROC = 0.709; accuracy = 0.651). Intersectional evaluation revealed larger fairness gaps than single-axis analyses, including demographic parity differences of 0.20 and equalized odds true positive and false positive rate gaps of 0.33 and 0.15, respectively. Counterfactual analysis using permutation-based null distributions produced unfairness ("u-value") estimates near zero, suggesting observed disparities were consistent with chance after conditioning on covariates. Conclusion: Fairlogue provides a modular toolkit integrating observational and counterfactual methods for quantifying and evaluating intersectional bias in clinical machine learning workflows.

new Noise Immunity in In-Context Tabular Learning: An Empirical Robustness Analysis of TabPFN's Attention Mechanisms

Authors: James Hu, Mahdi Ghelichi

Abstract: Tabular foundation models (TFMs) such as TabPFN (Tabular Prior-Data Fitted Network) are designed to generalize across heterogeneous tabular datasets through in-context learning (ICL). They perform prediction in a single forward pass conditioned on labeled examples without dataset-specific parameter updates. This paradigm is particularly attractive in industrial domains (e.g., finance and healthcare) where tabular prediction is pervasive. Retraining a bespoke model for each new table can be costly or infeasible in these settings, while data quality issues such as irrelevant predictors, correlated feature groups, and label noise are common. In this paper, we provide strong empirical evidence that TabPFN is highly robust under these sub-optimal conditions. We study TabPFN and its attention mechanisms for binary classification problems with controlled synthetic perturbations that vary: (i) dataset width by injecting random uncorrelated features and by introducing nonlinearly correlated features, (ii) dataset size by increasing the number of training rows, and (iii) label quality by increasing the fraction of mislabeled targets. Beyond predictive performance, we analyze internal signals including attention concentration and attention-based feature ranking metrics. Across these parametric tests, TabPFN is remarkably resilient: ROC-AUC remains high, attention stays structured and sharp, and informative features are highly ranked by attention-based metrics. Qualitative visualizations with attention heatmaps, feature-token embeddings, and SHAP plots further support a consistent pattern across layers in which TabPFN increasingly concentrates on useful features while separating their signals from noise. Together, these findings suggest that TabPFN is a robust TFM capable of maintaining both predictive performance and coherent internal behavior under various scenarios of data imperfections.

new Optimizing LLM Prompt Engineering with DSPy Based Declarative Learning

Authors: Shiek Ruksana, Sailesh Kiran Kurra, Thipparthi Sanjay Baradwaj

Abstract: Large Language Models (LLMs) have shown strong performance across a wide range of natural language processing tasks; however, their effectiveness is highly dependent on prompt design, structure, and embedded reasoning signals. Conventional prompt engineering methods largely rely on heuristic trial-and-error processes, which limits scalability, reproducibility, and generalization across tasks. DSPy, a declarative framework for optimizing text-processing pipelines, offers an alternative approach by enabling automated, modular, and learnable prompt construction for LLM-based systems.This paper presents a systematic study of DSPy-based declarative learning for prompt optimization, with emphasis on prompt synthesis, correction, calibration, and adaptive reasoning control. We introduce a unified DSPy LLM architecture that combines symbolic planning, gradient free optimization, and automated module rewriting to reduce hallucinations, improve factual grounding, and avoid unnecessary prompt complexity. Experimental evaluations conducted on reasoning tasks, retrieval-augmented generation, and multi-step chain-of-thought benchmarks demonstrate consistent gains in output reliability, efficiency, and generalization across models. The results show improvements of up to 30 to 45% in factual accuracy and a reduction of approximately 25% in hallucination rates. Finally, we outline key limitations and discuss future research directions for declarative prompt optimization frameworks.

new Data Attribution in Adaptive Learning

Authors: Amit Kiran Rege

Abstract: Machine learning models increasingly generate their own training data -- online bandits, reinforcement learning, and post-training pipelines for language models are leading examples. In these adaptive settings, a single training observation both updates the learner and shifts the distribution of future data the learner will collect. Standard attribution methods, designed for static datasets, ignore this feedback. We formalize occurrence-level attribution for finite-horizon adaptive learning via a conditional interventional target, prove that replay-side information cannot recover it in general, and identify a structural class in which the target is identified from logged data.

new Are Latent Reasoning Models Easily Interpretable?

Authors: Connor Dilgren, Sarah Wiegreffe

Abstract: Latent reasoning models (LRMs) have attracted significant research interest due to their low inference cost (relative to explicit reasoning models) and theoretical ability to explore multiple reasoning paths in parallel. However, these benefits come at the cost of reduced interpretability: LRMs are difficult to monitor because they do not reason in natural language. This paper presents an investigation into LRM interpretability by examining two state-of-the-art LRMs. First, we find that latent reasoning tokens are often unnecessary for LRMs' predictions; on logical reasoning datasets, LRMs can almost always produce the same final answers without using latent reasoning at all. This underutilization of reasoning tokens may partially explain why LRMs do not consistently outperform explicit reasoning methods and raises doubts about the stated role of these tokens in prior work. Second, we demonstrate that when latent reasoning tokens are necessary for performance, we can decode gold reasoning traces up to 65-93% of the time for correctly predicted instances. This suggests LRMs often implement the expected solution rather than an uninterpretable reasoning process. Finally, we present a method to decode a verified natural language reasoning trace from latent tokens without knowing a gold reasoning trace a priori, demonstrating that it is possible to find a verified trace for a majority of correct predictions but only a minority of incorrect predictions. Our findings highlight that current LRMs largely encode interpretable processes, and interpretability itself can be a signal of prediction correctness.

new HI-MoE: Hierarchical Instance-Conditioned Mixture-of-Experts for Object Detection

Authors: Vadim Vashkelis, Natalia Trukhina

Abstract: Mixture-of-Experts (MoE) architectures enable conditional computation by activating only a subset of model parameters for each input. Although sparse routing has been highly effective in language models and has also shown promise in vision, most vision MoE methods operate at the image or patch level. This granularity is poorly aligned with object detection, where the fundamental unit of reasoning is an object query corresponding to a candidate instance. We propose Hierarchical Instance-Conditioned Mixture-of-Experts (HI-MoE), a DETR-style detection architecture that performs routing in two stages: a lightweight scene router first selects a scene-consistent expert subset, and an instance router then assigns each object query to a small number of experts within that subset. This design aims to preserve sparse computation while better matching the heterogeneous, instance-centric structure of detection. In the current draft, experiments are concentrated on COCO with preliminary specialization analysis on LVIS. Under these settings, HI-MoE improves over a dense DINO baseline and over simpler token-level or instance-only routing variants, with especially strong gains on small objects. We also provide an initial visualization of expert specialization patterns. We present the method, ablations, and current limitations in a form intended to support further experimental validation.

new Empowering Power Outage Prediction with Spatially Aware Hybrid Graph Neural Networks and Contrastive Learning

Authors: Xuyang Shen, Zijie Pan, Diego Cerrai, Xinxuan Zhang, Christopher Colorio, Emmanouil N. Anagnostou, Dongjin Song

Abstract: Extreme weather events, such as severe storms, hurricanes, snowstorms, and ice storms, which are exacerbated by climate change, frequently cause widespread power outages. These outages halt industrial operations, impact communities, damage critical infrastructure, profoundly disrupt economies, and have far-reaching effects across various sectors. To mitigate these effects, the University of Connecticut and Eversource Energy Center have developed an outage prediction modeling (OPM) system to provide pre-emptive forecasts for electric distribution networks before such weather events occur. However, existing predictive models in the system do not incorporate the spatial effect of extreme weather events. To this end, we develop Spatially Aware Hybrid Graph Neural Networks (SA-HGNN) with contrastive learning to enhance the OPM predictions for extreme weather-induced power outages. Specifically, we first encode spatial relationships of both static features (e.g., land cover, infrastructure) and event-specific dynamic features (e.g., wind speed, precipitation) via Spatially Aware Hybrid Graph Neural Networks (SA-HGNN). Next, we leverage contrastive learning to handle the imbalance problem associated with different types of extreme weather events and generate location-specific embeddings by minimizing intra-event distances between similar locations while maximizing inter-event distances across all locations. Thorough empirical studies in four utility service territories, i.e., Connecticut, Western Massachusetts, Eastern Massachusetts, and New Hampshire, demonstrate that SA-HGNN can achieve state-of-the-art performance for power outage prediction.

new Stratifying Reinforcement Learning with Signal Temporal Logic

Authors: Justin Curry, Alberto Speranzon

Abstract: In this paper, we develop a stratification-based semantics for Signal Temporal Logic (STL) in which each atomic predicate is interpreted as a membership test in a stratified space. This perspective reveals a novel correspondence principle between stratification theory and STL, showing that most STL formulas can be viewed as inducing a stratification of space-time. The significance of this interpretation is twofold. First, it offers a fresh theoretical framework for analyzing the structure of the embedding space generated by deep reinforcement learning (DRL) and relates it to the geometry of the ambient decision space. Second, it provides a principled framework that both enables the reuse of existing high-dimensional analysis tools and motivates the creation of novel computational techniques. To ground the theory, we (1) illustrate the role of stratification theory in Minigrid games and (2) apply numerical techniques to the latent embeddings of a DRL agent playing such a game where the robustness of STL formulas is used as the reward. In the process, we propose computationally efficient signatures that, based on preliminary evidence, appear promising for uncovering the stratification structure of such embedding spaces.

cross Copilot-Assisted Second-Thought Framework for Brain-to-Robot Hand Motion Decoding

Authors: Yizhe Li (University of Birmingham, Birmingham, United Kingdom), Shixiao Wang (University of Birmingham, Birmingham, United Kingdom), Jian K. Liu (University of Birmingham, Birmingham, United Kingdom)

Abstract: Motor kinematics prediction (MKP) from electroencephalography (EEG) is an important research area for developing movement-related brain-computer interfaces (BCIs). While traditional methods often rely on convolutional neural networks (CNNs) or recurrent neural networks (RNNs), Transformer-based models have shown strong ability in modeling long sequential EEG data. In this study, we propose a CNN-attention hybrid model for decoding hand kinematics from EEG during grasp-and-lift tasks, achieving strong performance in within-subject experiments. We further extend this approach to EEG-EMG multimodal decoding, which yields substantially improved results. Within-subject tests achieve PCC values of 0.9854, 0.9946, and 0.9065 for the X, Y, and Z axes, respectively, computed on the midpoint trajectory between the thumb and index finger, while cross-subject tests result in 0.9643, 0.9795, and 0.5852. The decoded trajectories from both modalities are then used to control a Franka Panda robotic arm in a MuJoCo simulation. To enhance trajectory fidelity, we introduce a copilot framework that filters low-confidence decoded points using a motion-state-aware critic within a finite-state machine. This post-processing step improves the overall within-subject PCC of EEG-only decoding to 0.93 while excluding fewer than 20% of the data points.

cross Self-Execution Simulation Improves Coding Models

Authors: Gallil Maimon, Ori Yoran, Felix Kreuk, Michael Hassid, Gal Cohen, Pierre Chambon, Yossi Adi

Abstract: A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program execution, particularly for code they generate. In this work, we demonstrate that Code LLMs can be trained to simulate program execution in a step-by-step manner and that this capability can be leveraged to improve competitive programming performance. Our approach combines supervised fine-tuning on natural language execution traces, textual explanations grounded in true execution, with reinforcement learning using verifiable rewards. We introduce two complementary objectives: output prediction given code and inputs, and solving competitive programming tasks with either ground-truth or self-predicted execution feedback. These objectives enable models to perform self-verification over multiple candidate solutions, and iterative self-fixing by simulating test execution. Across multiple competitive programming benchmarks, our method yields consistent improvements over standard reasoning approaches. We further present ablations and analysis to elucidate the role of execution simulation and its limitations.

cross Emergent Compositional Communication for Latent World Properties

Authors: Tomek Kaszy\'nski

Abstract: Can multi-agent communication pressure extract discrete, compositional representations of invisible physical properties from frozen video features? We show that agents communicating through a Gumbel-Softmax bottleneck with iterated learning develop positionally disentangled protocols for latent properties (elasticity, friction, mass ratio) without property labels or supervision on message structure. With 4 agents, 100% of 80 seeds converge to near-perfect compositionality (PosDis=0.999, holdout 98.3%). Controls confirm multi-agent structure -- not bandwidth or temporal coverage -- drives this effect. Causal intervention shows surgical property disruption (~15% drop on targeted property, <3% on others). A controlled backbone comparison reveals that the perceptual prior determines what is communicable: DINOv2 dominates on spatially-visible ramp physics (98.3% vs 95.1%), while V-JEPA 2 dominates on dynamics-only collision physics (87.4% vs 77.7%, d=2.74). Scale-matched (d=3.37) and frame-matched (d=6.53) controls attribute this gap entirely to video-native pretraining. The frozen protocol supports action-conditioned planning (91.5%) with counterfactual velocity reasoning (r=0.780). Validation on Physics 101 real camera footage confirms 85.6% mass-comparison accuracy on unseen objects, temporal dynamics contributing +11.2% beyond static appearance, agent-scaling compositionality replicating at 90% for 4 agents, and causal intervention extending to real video (d=1.87, p=0.022).

cross IPSL-AID: Generative Diffusion Models for Climate Downscaling from Global to Regional Scales

Authors: Kishanthan Kingston, Olivier Boucher, Freddy Bouchet, Pierre Chapel, Rosemary Eade, Jean-Francois Lamarque, Redouane Lguensat, Kazem Ardaneh

Abstract: Effective adaptation and mitigation strategies for climate change require high-resolution projections to inform strategic decision-making. Conventional global climate models, which typically operate at resolutions of 150 to 200 kilometers, lack the capacity to represent essential regional processes. IPSL-AID is a global to regional downscaling tool based on a denoising diffusion probabilistic model designed to address this limitation. Trained on ERA5 reanalysis data, it generates 0.25 degree resolution fields for temperature, wind, and precipitation using coarse inputs and their spatiotemporal context. It also models probability distributions of fine-scale features to produce plausible scenarios for uncertainty quantification. The model accurately reconstructs statistical distributions, including extreme events, power spectra, and spatial structures. This work highlights the potential of generative diffusion models for efficient climate downscaling with uncertainty

cross Event-Driven Neuromorphic Vision Enables Energy-Efficient Visual Place Recognition

Authors: Geoffroy Keime, Nicolas Cuperlier, Benoit R. Cottereau

Abstract: Reliable visual place recognition (VPR) under dynamic real-world conditions is critical for autonomous robots, yet conventional deep networks remain limited by high computational and energy demands. Inspired by the mammalian navigation system, we introduce SpikeVPR, a bio-inspired and neuromorphic approach combining event-based cameras with spiking neural networks (SNNs) to generate compact, invariant place descriptors from few exemplars, achieving robust recognition under extreme changes in illumination, viewpoint, and appearance. SpikeVPR is trained end-to-end using surrogate gradient learning and incorporates EventDilation, a novel augmentation strategy enhancing robustness to speed and temporal variations. Evaluated on two challenging benchmarks (Brisbane-Event-VPR and NSAVP), SpikeVPR achieves performance comparable to state-of-the-art deep networks while using 50 times fewer parameters and consuming 30 and 250 times less energy, enabling real-time deployment on mobile and neuromorphic platforms. These results demonstrate that spike-based coding offers an efficient pathway toward robust VPR in complex, changing environments.

cross Multi-Agent Training-free Urban Food Delivery System using Resilient UMST Network

Authors: Md Nahid Hasan, Vishwam Tiwari, Aditya Challa, Vaskar Raychoudhury, Snehanshu Saha

Abstract: Delivery systems have become a core part of urban life, supporting the demand for food, medicine, and other goods. Yet traditional logistics networks remain fragile, often struggling to adapt to road closures, accidents, and shifting demand. Online Food Delivery (OFD) platforms now represent a cornerstone of urban logistics, with the global market projected to grow to over 500 billion USD by 2030. Designing delivery networks that are efficient and resilient remains a major challenge: fully connected graphs provide flexibility but are computationally infeasible at scale, while single Minimum Spanning Trees (MSTs) are efficient but easily disrupted. We propose the Union of Minimum Spanning Trees (UMST) approach to construct delivery networks that are sparse yet robust. UMST generates multiple MSTs through randomized edge perturbations and unites them, producing graphs with far fewer edges than fully connected networks while maintaining multiple alternative routes between delivery hotspots. Across multiple U.S. cities, UMST achieves 20--40$\times$ fewer edges than fully connected graphs while enabling substantial order bundling with 75--83% participation rates. Compared to learning-based baselines including MADDPG and Graph Neural Networks, UMST delivers competitive performance (88-96% success rates, 44-53% distance savings) without requiring training, achieving 30$\times$ faster execution while maintaining interpretable routing structures. Its combination of structural efficiency and operational flexibility offers a scalable and resilient foundation for urban delivery networks.

cross Expressibility of neural quantum states: a Walsh-complexity perspective

Authors: Taige Wang

Abstract: Neural quantum states are powerful variational wavefunctions, but it remains unclear which many-body states can be represented efficiently by modern additive architectures. We introduce Walsh complexity, a basis-dependent measure of how broadly a wavefunction is spread over parity patterns. States with an almost uniform Walsh spectrum require exponentially large Walsh complexity from any good approximant. We show that shallow additive feed-forward networks cannot generate such complexity in the tame regime, e.g. polynomial activations with subexponential parameter scaling. As a concrete example, we construct a simple dimerized state prepared by a single layer of disjoint controlled-$Z$ gates. Although it has only short-range entanglement and a simple tensor-network description, its Walsh complexity is maximal. Full-cube fits across system size and depth are consistent with the complexity bound: for polynomial activations, successful fitting appears only once depth reaches a logarithmic scale in $N$, whereas activation saturation in $\tanh$ produces a sharp threshold-like jump already at depth $3$. Walsh complexity therefore provides an expressibility axis complementary to entanglement and clarifies when depth becomes an essential resource for additive neural quantum states.

cross XAttnRes: Cross-Stage Attention Residuals for Medical Image Segmentation

Authors: Xinyu Liu, Qing Xu, Zhen Chen

Abstract: In the field of Large Language Models (LLMs), Attention Residuals have recently demonstrated that learned, selective aggregation over all preceding layer outputs can outperform fixed residual connections. We propose Cross-Stage Attention Residuals (XAttnRes), a mechanism that maintains a global feature history pool accumulating both encoder and decoder stage outputs. Through lightweight pseudo-query attention, each stage selectively aggregates from all preceding representations. To bridge the gap between the same-dimensional Transformer layers in LLMs and the multi-scale encoder-decoder stages in segmentation networks, XAttnRes introduces spatial alignment and channel projection steps that handle cross-resolution features with negligible overhead. When added to existing segmentation networks, XAttnRes consistently improves performance across four datasets and three imaging modalities. We further observe that XAttnRes alone, even without skip connections, achieves performance on par with the baseline, suggesting that learned aggregation can recover the inter-stage information flow traditionally provided by predetermined connections.

cross ENEC: A Lossless AI Model Compression Method Enabling Fast Inference on Ascend NPUs

Authors: Jinwu Yang, Jiaan Wu, Zedong Liu, Xinyang Ma, Hairui Zhao, Yida Gu, Yuanhong Huang, Xingchen Liu, Wenjing Huang, Zheng Wei, Jing Xing, Yili Ma, Qingyi Zhang, Baoyi An, Zhongzhe Hu, Shaoteng Liu, Xia Zhu, Jiaxun Lu, Guangming Tan, Dingwen Tao

Abstract: The rapid scaling of Large Language Models presents significant challenges for their deployment and inference, particularly on resource-constrained specialized AI hardware accelerators such as Huawei's Ascend NPUs, where weight data transfer has become a critical performance bottleneck. While lossless compression can preserve model accuracy and reduce data volume, existing lossless compression algorithms exhibit extremely low throughput when ported to the Ascend NPU architecture. In this paper, we propose ENEC, a novel lossless compression method specifically customized for AI model weights and optimized for Ascend Neural Processing Units. ENEC adopts a block-based fixed-length encoding scheme and incorporates a series of NPU-specific optimizations: bit-width quantization with hierarchical halving bit-packing, vectorized branch-free integer transformation, and dependency-decoupled intra-segment scan for efficient prefix-sum computation. Experimental results demonstrate that ENEC outperforms existing state-of-the-art NPU compressors in both compression ratio and throughput. Compared to leading GPU solutions, ENEC achieves a 3.43X higher throughput than DietGPU and a 1.12X better compression ratio than nvCOMP. By reducing weight transmission overhead, ENEC significantly improves end-to-end inference performance, achieving up to a 6.3X speedup. On Ascend NPUs, ENEC is the first open-source lossless compression algorithm for model weights that achieves performance comparable to state-of-the-art GPU compressors, offering an effective solution for deploying large-scale AI models.

cross Generative Chemical Language Models for Energetic Materials Discovery

Authors: Andrew Salij, R. Seaton Ullberg, Megan C. Davis, Marc J. Cawkwell, Christopher J. Snyder, Cristina Garcia Cardona, Ivana Matanovic, Wilton J. M. Kort-Kamp

Abstract: The discovery of new energetic materials remains a pressing challenge hindered by limited availability of high-quality data. To address this, we have developed generative molecular language models that have been pretrained on extensive chemical data and then fine-tuned with curated energetic materials datasets. This transfer-learning strategy extends the chemical language model capabilities beyond the pharmacological space in which they have been predominantly developed, offering a framework applicable to other data-spare discovery problems. Furthermore, we discuss the benefits of fragment-based molecular encodings for chemical language models, in particular in constructing synthetically accessible structures. Together, these advances provide a foundation for accelerating the design of next-generation energetic materials with demanding performance requirements.

cross Computer Architecture's AlphaZero Moment: Automated Discovery in an Encircled World

Authors: Karthikeyan Sankaralingam

Abstract: The end of Moore's Law and Dennard scaling has fundamentally changed the economics of computer architecture. With transistor scaling delivering diminishing returns, architectural innovation is now the primary - and perhaps only - remaining lever for performance improvement. However, we argue that human-driven architecture research is fundamentally ill-suited for this new era. The architectural design space is vast (effectively infinite for practical purposes), yet human teams explore perhaps 50-100 designs per generation, sampling less than 0.001% of possibilities. This approach worked during the abundance era when Moore's Law provided a rising tide that lifted all designs. In the current scarcity paradigm, where every architecture must deliver 2X performance improvements using essentially the same transistor budget, systematic exploration becomes critical. We propose a concrete alternative: automated idea factories that generate and evaluate thousands of candidate architectures weekly through multi-tiered evaluation pipelines, learning from deployed telemetry data in a continuous feedback loop. Early results suggest that such systems can compress architectural design cycles from double-digit months to single-digit weeks by exploring orders of magnitude more candidates than any human team, and do it much faster. We predict that within 2 years, purely human-driven architecture research will be as obsolete as human chess players competing against engines.

cross InsightBoard: An Interactive Multi-Metric Visualization and Fairness Analysis Plugin for TensorBoard

Authors: Ray Zeyao Chen, Christan Grant

Abstract: Modern machine learning systems deployed in safety-critical domains require visibility not only into aggregate performance but also into how training dynamics affect subgroup fairness over time. Existing training dashboards primarily support single-metric monitoring and offer limited support for examining relationships between heterogeneous metrics or diagnosing subgroup disparities during training. We present InsightBoard, an interactive TensorBoard plugin that integrates synchronized multi-metric visualization with slice-based fairness diagnostics in a unified interface. InsightBoard enables practitioners to jointly inspect training dynamics, performance metrics, and subgroup disparities through linked multi-view plots, correlation analysis, and standard group fairness indicators computed over user-defined slices. Through case studies with YOLOX on the BDD100k dataset, we demonstrate that models achieving strong aggregate performance can still exhibit substantial demographic and environmental disparities that remain hidden under conventional monitoring. By making fairness diagnostics available during training, InsightBoard supports earlier, more informed model inspection without modifying existing training pipelines or introducing additional data stores.

cross CoLoRSMamba: Conditional LoRA-Steered Mamba for Supervised Multimodal Violence Detection

Authors: Damith Chamalke Senadeera, Dimitrios Kollias, Gregory Slabaugh

Abstract: Violence detection benefits from audio, but real-world soundscapes can be noisy or weakly related to the visible scene. We present CoLoRSMamba, a directional Video to Audio multimodal architecture that couples VideoMamba and AudioMamba through CLS-guided conditional LoRA. At each layer, the VideoMamba CLS token produces a channel-wise modulation vector and a stabilization gate that adapt the AudioMamba projections responsible for the selective state-space parameters (Delta, B, C), including the step-size pathway, yielding scene-aware audio dynamics without token-level cross-attention. Training combines binary classification with a symmetric AV-InfoNCE objective that aligns clip-level audio and video embeddings. To support fair multimodal evaluation, we curate audio-filtered clip level subsets of the NTU-CCTV and DVD datasets from temporal annotations, retaining only clips with available audio. On these subsets, CoLoRSMamba outperforms representative audio-only, video-only, and multimodal baselines, achieving 88.63% accuracy / 86.24% F1-V on NTU-CCTV and 75.77% accuracy / 72.94% F1-V on DVD. It further offers a favorable accuracy-efficiency tradeoff, surpassing several larger models with fewer parameters and FLOPs.

cross SDVDiag: Using Context-Aware Causality Mining for the Diagnosis of Connected Vehicle Functions

Authors: Matthias Wei{\ss}, Falk Dettinger, Elias Detrois, Nasser Jazdi, Michael Weyrich

Abstract: Real-world implementations of connected vehicle functions are spreading steadily, yet operating these functions reliably remains challenging due to their distributed nature and the complexity of the underlying cloud, edge, and networking infrastructure. Quick diagnosis of problems and understanding the error chains that lead to failures is essential for reducing downtime. However, diagnosing these systems is still largely performed manually, as automated analysis techniques are predominantly data-driven and struggle with hidden relationships and the integration of context information. This paper addresses this gap by introducing a multimodal approach that integrates human feedback and system-specific information into the causal analysis process. Reinforcement Learning from Human Feedback is employed to continuously train a causality mining model while incorporating expert knowledge. Additional modules leverage distributed tracing data to prune false-positive causal links and enable the injection of domain-specific relationships to further refine the causal graph.Evaluation is performed using an automated valet parking application operated in a connected vehicle test field. Results demonstrate a significant increase in precision from 14\% to 100\% for the detection of causal edges and improved system interpretability compared to purely data-driven approaches, highlighting the potential for system operators in the connected vehicle domain.

cross Banana100: Breaking NR-IQA Metrics by 100 Iterative Image Replications with Nano Banana Pro

Authors: Kenan Tang, Praveen Arunshankar, Andong Hua, Anthony Yang, Yao Qin

Abstract: The multi-step, iterative image editing capabilities of multi-modal agentic systems have transformed digital content creation. Although latest image editing models faithfully follow instructions and generate high-quality images in single-turn edits, we identify a critical weakness in multi-turn editing, which is the iterative degradation of image quality. As images are repeatedly edited, minor artifacts accumulate, rapidly leading to a severe accumulation of visible noise and a failure to follow simple editing instructions. To systematically study these failures, we introduce Banana100, a comprehensive dataset of 28,000 degraded images generated through 100 iterative editing steps, including diverse textures and image content. Alarmingly, image quality evaluators fail to detect the degradation. Among 21 popular no-reference image quality assessment (NR-IQA) metrics, none of them consistently assign lower scores to heavily degraded images than to clean ones. The dual failures of generators and evaluators may threaten the stability of future model training and the safety of deployed agentic systems, if the low-quality synthetic data generated by multi-turn edits escape quality filters. We release the full code and data to facilitate the development of more robust models, helping to mitigate the fragility of multi-modal agentic systems.

cross Diffusion Policy with Bayesian Expert Selection for Active Multi-Target Tracking

Authors: Haotian Xiang, Qin Lu, Yaakov Bar-Shalom

Abstract: Active multi-target tracking requires a mobile robot to balance exploration for undetected targets with exploitation of uncertain tracked ones. Diffusion policies have emerged as a powerful approach for capturing diverse behavioral strategies by learning action sequences from expert demonstrations. However, existing methods implicitly select among strategies through the denoising process, without uncertainty quantification over which strategy to execute. We formulate expert selection for diffusion policies as an offline contextual bandit problem and propose a Bayesian framework for pessimistic, uncertainty-aware strategy selection. A multi-head Variational Bayesian Last Layer (VBLL) model predicts the expected tracking performance of each expert strategy given the current belief state, providing both a point estimate and predictive uncertainty. Following the pessimism principle for offline decision-making, a Lower Confidence Bound (LCB) criterion then selects the expert whose worst-case predicted performance is best, avoiding overcommitment to experts with unreliable predictions. The selected expert conditions a diffusion policy to generate corresponding action sequences. Experiments on simulated indoor tracking scenarios demonstrate that our approach outperforms both the base diffusion policy and standard gating methods, including Mixture-of-Experts selection and deterministic regression baselines.

cross Zero-Shot Quantization via Weight-Space Arithmetic

Authors: Daniele Solombrino, Antonio Andrea Gargiulo, Adrian Robert Minut, Luca Zhou, Alessandro Zirilli, Emanuele Rodol\`a

Abstract: We show that robustness to post-training quantization (PTQ) is a transferable direction in weight space. We call this direction the quantization vector: extracted from a donor task by simple weight-space arithmetic, it can be used to patch a receiver model and improve robustness to PTQ-induced noise by as much as 60%, without receiver-side quantization-aware training (QAT). Because the method requires no receiver training data, it provides a zero-shot, low-cost alternative to QAT for extremely low-bit deployment. We demonstrate this on Vision Transformer (ViT) models. More broadly, our results suggest that quantization robustness is not merely a byproduct of task-specific training, but a reusable feature of weight-space geometry that can be transferred rather than retrained.

cross Inference-Path Optimization via Circuit Duplication in Frozen Visual Transformers for Marine Species Classification

Authors: Thomas Manuel Rost

Abstract: Automated underwater species classification is constrained by annotation cost and environmental variation that limits the transferability of fully supervised models. Recent work has shown that frozen embeddings from self-supervised vision foundation models already provide a strong label-efficient baseline for marine image classification. Here we investigate whether this frozen-embedding regime can be improved at inference time, without fine-tuning or changing model weights. We apply Circuit Duplication, an inference-time method originally proposed for Large Language Models, in which a selected range of transformer layers is traversed twice during the forward pass. We evaluate on the class-imbalanced AQUA20 benchmark using frozen DINOv3 embeddings under two settings: global circuit selection, where a single duplicated circuit is chosen for the full dataset, and class-specific circuit selection, where each species may receive a different optimal circuit. Both settings use simple semi-supervised downstream classifiers. Circuit Duplication consistently improves over the standard frozen forward pass. At the maximum label budget, class-specific selection reaches a macro F1 of 0.875, closing the gap to the fully supervised ConvNeXt benchmark (0.889) to 1.4 points without any gradient-based training. Four species exceed their fully supervised reference, with octopus improving by +12.1 F1 points. Across all budgets, roughly 75% of classes prefer a class-specific circuit, indicating a genuinely class-dependent benefit. To our knowledge, this is the first application of Circuit Duplication to computer vision.

cross Agile Story-Point Estimation: Is RAG a Better Way to Go?

Authors: Lamyea Maha, Tajmilur Rahman, Chanchal Roy

Abstract: The sprint-based iterative approach in the Agile software development method allows continuous feedback and adaptation. One of the crucial Agile software development activities is the sprint planning session where developers estimate the effort required to complete tasks through a consensus-based estimation technique such as Planning Poker. In the Agile software development method, a common unit of measuring development effort is Story Point (SP) which is assigned to tasks to understand the complexity and development time needed to complete them. Despite the benefits of this process, it is an extremely time-consuming manual process. To mitigate this issue, in this study, we investigated if this manual process can be automated using Retrieval Augmented Generation (RAG) which comprises a "Retriever" and a "Generator". We applied two embedding models - bge-large-en-v1.5, and Sentence-Transformers' all-mpnet-base-v2 on 23 open-source software projects of varying sizes and examined four key aspects: 1) how retrieval hyper-parameters influence the performance, 2) whether estimation accuracy differs across different sizes of the projects, 3) whether embedding model choice affects accuracy, and 4) how the RAG-based approach compares to the existing baselines. Although the RAG-based approach outperformed the baseline models in several occasions, our results did not exhibit statistically significant differences in performance across the projects or across the embedding models. This highlights the need for further studies and refinement of the RAG, and model adaptation strategies for better accuracy in automatically estimating user stories.

cross ExpressEdit: Fast Editing of Stylized Facial Expressions with Diffusion Models in Photoshop

Authors: Kenan Tang, Jiasheng Guo, Jeffrey Lin, Yao Qin

Abstract: Facial expressions of characters are a vital component of visual storytelling. While current AI image editing models hold promise for assisting artists in the task of stylized expression editing, these models introduce global noise and pixel drift into the edited image, preventing the integration of these models into professional image editing software and workflows. To bridge this gap, we introduce ExpressEdit, a fully open-source Photoshop plugin that is free from common artifacts of proprietary image editing models and robustly synergizes with native Photoshop operations such as Liquify. ExpressEdit seamlessly edits an expression within 3 seconds on a single consumer-grade GPU, significantly faster than popular proprietary models. Moreover, to support the generation of diverse expressions according to different narrative needs, we compile a comprehensive expression database of 135 expression tags enriched with example stories and images designed for retrieval-augmented generation. We open source the code and dataset to facilitate future research and artistic exploration.

cross Lightweight Query Routing for Adaptive RAG: A Baseline Study on RAGRouter-Bench

Authors: Prakhar Bansal, Shivangi Agarwal

Abstract: Retrieval-Augmented Generation pipelines span a wide range of retrieval strategies that differ substantially in token cost and capability. Selecting the right strategy per query is a practical efficiency problem, yet no routing classifiers have been trained on RAGRouter-Bench \citep{wang2026ragrouterbench}, a recently released benchmark of $7,727$ queries spanning four knowledge domains, each annotated with one of three canonical query types: factual, reasoning, and summarization. We present the first systematic evaluation of lightweight classifier-based routing on this benchmark. Five classical classifiers are evaluated under three feature regimes, namely, TF-IDF, MiniLM sentence embeddings \citep{reimers2019sbert}, and hand-crafted structural features, yielding 15 classifier feature combinations. Our best configuration, TF-IDF with an SVM, achieves a macro-averaged F1 of $\mathbf{0.928}$ and an accuracy of $\mathbf{93.2\%}$, while simulating $\mathbf{28.1\%}$ token savings relative to always using the most expensive paradigm. Lexical TF-IDF features outperform semantic sentence embeddings by $3.1$ macro-F1 points, suggesting that surface keyword patterns are strong predictors of query-type complexity. Domain-level analysis reveals that medical queries are hardest to route and legal queries most tractable. These results establish a reproducible query-side baseline and highlight the gap that corpus-aware routing must close.

cross Physics-Constrained Adaptive Flow Matching for Climate Downscaling

Authors: Kevin Debeire, Ayta\c{c} Pa\c{c}al, Pierre Gentine, Luis Medrano-Navarro, Nils Thuerey, Veronika Eyring

Abstract: Regional climate information at kilometer scales is essential for assessing the impacts of climate change, but generating it with global climate models is too expensive due to their high computational costs. Machine learning models offer a fast alternative, yet they often violate basic physical laws and degrade when applied to climates outside of their training distribution. We present Physics-Constrained Adaptive Flow Matching (PC-AFM), a generative downscaling model that addresses both problems. Building on the Adaptive Flow Matching (AFM) model of Fotiadis et al. (2025) as our baseline, we add soft conservation constraints that keep the downscaled output consistent with the large-scale input for precipitation and humidity, and use gradient surgery via the ConFIG algorithm to prevent these constraints from interfering with the generative objective. We train the model on Central Europe climate data, evaluate it on a 10-time downscaling task (63km to 6.3km) over six variables (near-surface temperature, precipitation, specific humidity, surface pressure, and horizontal wind components) across a comprehensive set of metrics including bias, ensemble skill scores, power spectra, and conservation error, and test the generalization on two held-out climate regions. Within the training distribution, PC-AFM reduces conservation errors and improves ensemble calibration while matching the baseline on standard skill metrics. Outside the training distribution, where unconstrained models develop large systematic errors by extrapolating learned statistics, PC-AFM halves precipitation wet bias, reduces conservation error and improves extreme-quantile accuracy, all without any information about the target climate at inference time. These results indicate that physical consistency is a practical requirement for deploying generative downscaling models in real-world applications.

cross Recurrent Quantum Feature Maps for Reservoir Computing

Authors: Utkarsh Singh, Aaron Z. Goldberg, Christoph Simon, Khabat Heshami

Abstract: Reservoir computing promises a fast method for handling large amounts of temporal data. This hinges on constructing a good reservoir--a dynamical system capable of transforming inputs into a high-dimensional representation while remembering properties of earlier data. In this work, we introduce a reservoir based on recurrent quantum feature maps where a fixed quantum circuit is reused to encode both current inputs and a classical feedback signal derived from previous outputs. We evaluate the model on the Mackey-Glass time-series prediction task using our recently introduced CP feature map, and find that it achieves lower mean squared error than standard classical baselines, including echo state networks and multilayer perceptrons, while maintaining compact circuit depth and qubit requirements. We further analyze memory capacity and show that the model effectively retains temporal information, consistent with its forecasting accuracy. Finally, we study the impact of realistic noise and find that performance is robust to several noise channels but remains sensitive to two-qubit gate errors, identifying a key limitation for near-term implementations.

cross VisionClaw: Always-On AI Agents through Smart Glasses

Authors: Xiaoan Liu, DaeHo Lee, Eric J Gonzalez, Mar Gonzalez-Franco, Ryo Suzuki

Abstract: We present VisionClaw, an always-on wearable AI agent that integrates live egocentric perception with agentic task execution. Running on Meta Ray-Ban smart glasses, VisionClaw continuously perceives real-world context and enables in-situ, speech-driven action initiation and delegation via OpenClaw AI agents. Therefore, users can directly execute tasks through the smart glasses, such as adding real-world objects to an Amazon cart, generating notes from physical documents, receiving meeting briefings on the go, creating events from posters, or controlling IoT devices. We evaluate VisionClaw through a controlled laboratory study (N=12) and a longitudinal deployment study (N=5). Results show that integrating perception and execution enables faster task completion and reduces interaction overhead compared to non-always-on and non-agent baselines. Beyond performance gains, deployment findings reveal a shift in interaction: tasks are initiated opportunistically during ongoing activities, and execution is increasingly delegated rather than manually controlled. These results suggest a new paradigm for wearable AI agents, where perception and action are continuously coupled to support situated, hands-free interaction.

cross Beyond Predefined Schemas: TRACE-KG for Context-Enriched Knowledge Graphs from Complex Documents

Authors: Mohammad Sadeq Abolhasani, Yang Ba, Yixuan He, Rong Pan

Abstract: Knowledge graph construction typically relies either on predefined ontologies or on schema-free extraction. Ontology-driven pipelines enforce consistent typing but require costly schema design and maintenance, whereas schema-free methods often produce fragmented graphs with weak global organization, especially in long technical documents with dense, context-dependent information. We propose TRACE-KG (Text-dRiven schemA for Context-Enriched Knowledge Graphs), a multimodal framework that jointly constructs a context-enriched knowledge graph and an induced schema without assuming a predefined ontology. TRACE-KG captures conditional relations through structured qualifiers and organizes entities and relations using a data-driven schema that serves as a reusable semantic scaffold while preserving full traceability to the source evidence. Experiments show that TRACE-KG produces structurally coherent, traceable knowledge graphs and offers a practical alternative to both ontology-driven and schema-free construction pipelines.

cross Nonparametric Regression Discontinuity Designs with Survival Outcomes

Authors: Maximilian Schuessler, Erik Sverdrup, Robert Tibshirani, Stefan Wager

Abstract: Quasi-experimental evaluations are central for generating real-world causal evidence and complementing insights from randomized trials. The regression discontinuity design (RDD) is a quasi-experimental design that can be used to estimate the causal effect of treatments that are assigned based on a running variable crossing a threshold. Such threshold-based rules are ubiquitous in healthcare, where predictive and prognostic biomarkers frequently guide treatment decisions. However, standard RD estimators rely on complete outcome data, an assumption often violated in time-to-event analyses where censoring arises from loss to follow-up. To address this issue, we propose a nonparametric approach that leverages doubly robust censoring corrections and can be paired with existing RD estimators. Our approach can handle multiple survival endpoints, long follow-up times, and covariate-dependent variation in survival and censoring. We discuss the relevance of our approach across multiple areas of applications and demonstrate its usefulness through simulations and the prostate component of the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial where our new approach offers several advantages, including higher efficiency and robustness to misspecification. We have also developed an open-source software package, $\texttt{rdsurvival}$, for the $\texttt{R}$ language.

cross Optimizing Neurorobot Policy under Limited Demonstration Data through Preference Regret

Authors: Viet Dung Nguyen, Yuhang Song, Anh Nguyen, Jamison Heard, Reynold Bailey, Alexander Ororbia

Abstract: Robot reinforcement learning from demonstrations (RLfD) assumes that expert data is abundant; this is usually unrealistic in the real world given data scarcity as well as high collection cost. Furthermore, imitation learning algorithms assume that the data is independently and identically distributed, which ultimately results in poorer performance as gradual errors emerge and compound within test-time trajectories. We address these issues by introducing the "master your own expertise" (MYOE) framework, a self-imitation framework that enables robotic agents to learn complex behaviors from limited demonstration data samples. Inspired by human perception and action, we propose and design what we call the queryable mixture-of-preferences state space model (QMoP-SSM), which estimates the desired goal at every time step. These desired goals are used in computing the "preference regret", which is used to optimize the robot control policy. Our experiments demonstrate the robustness, adaptability, and out-of-sample performance of our agent compared to other state-of-the-art RLfD schemes. The GitHub repository that supports this work can be found at: https://github.com/rxng8/neurorobot-preference-regret-learning.

URLs: https://github.com/rxng8/neurorobot-preference-regret-learning.

cross LangFIR: Discovering Sparse Language-Specific Features from Monolingual Data for Language Steering

Authors: Sing Hieng Wong, Hassan Sajjad, A. B. Siddique

Abstract: Large language models (LLMs) show strong multilingual capabilities, yet reliably controlling the language of their outputs remains difficult. Representation-level steering addresses this by adding language-specific vectors to model activations at inference time, but identifying language-specific directions in the residual stream often relies on multilingual or parallel data that can be expensive to obtain. Sparse autoencoders (SAEs) decompose residual activations into interpretable, sparse feature directions and offer a natural basis for this search, yet existing SAE-based approaches face the same data constraint. We introduce LangFIR (Language Feature Identification via Random-token Filtering), a method that discovers language-specific SAE features using only a small amount of monolingual data and random-token sequences. Many SAE features consistently activated by target-language inputs do not encode language identity. Random-token sequences surface these language-agnostic features, allowing LangFIR to filter them out and isolate a sparse set of language-specific features. We show that these features are extremely sparse, highly selective for their target language, and causally important: directional ablation increases cross-entropy loss only for the corresponding language. Using these features to construct steering vectors for multilingual generation control, LangFIR achieves the best average accuracy BLEU across three models (Gemma 3 1B, Gemma 3 4B, and Llama 3.1 8B), three datasets, and twelve target languages, outperforming the strongest monolingual baseline by up to and surpassing methods that rely on parallel data. Our results suggest that language identity in multilingual LLMs is localized in a sparse set of feature directions discoverable with monolingual data. Code is available at https://anonymous.4open.science/r/LangFIR-C0F5/.

URLs: https://anonymous.4open.science/r/LangFIR-C0F5/.

cross Rethinking Token Prediction: Tree-Structured Diffusion Language Model

Authors: Zihao Wu, Haoming Yang, Juncheng Dong, Vahid Tarokh

Abstract: Discrete diffusion language models have emerged as a competitive alternative to auto-regressive language models, but training them efficiently under limited parameter and memory budgets remains challenging. Modern architectures are predominantly based on a full-vocabulary token prediction layer, which accounts for a substantial fraction of model parameters (e.g., more than 20% in small scale DiT-style designs) and often dominates peak GPU memory usage. This leads to inefficient use of both parameters and memory under constrained training resources. To address this issue, we revisit the necessity of explicit full-vocabulary prediction, and instead exploit the inherent structure among tokens to build a tree-structured diffusion language model. Specifically, we model the diffusion process with intermediate latent states corresponding to a token's ancestor nodes in a pre-constructed vocabulary tree. This tree-structured factorization exponentially reduces the classification dimensionality, makes the prediction head negligible in size, and enables reallocation of parameters to deepen the attention blocks. Empirically, under the same parameter budget, our method reduces peak GPU memory usage by half while matching the perplexity performance of state-of-the-art discrete diffusion language models.

cross CRAFT: Video Diffusion for Bimanual Robot Data Generation

Authors: Jason Chen, I-Chun Arthur Liu, Gaurav Sukhatme, Daniel Seita

Abstract: Bimanual robot learning from demonstrations is fundamentally limited by the cost and narrow visual diversity of real-world data, which constrains policy robustness across viewpoints, object configurations, and embodiments. We present Canny-guided Robot Data Generation using Video Diffusion Transformers (CRAFT), a video diffusion-based framework for scalable bimanual demonstration generation that synthesizes temporally coherent manipulation videos while producing action labels. By conditioning video diffusion on edge-based structural cues extracted from simulator-generated trajectories, CRAFT produces physically plausible trajectory variations and supports a unified augmentation pipeline spanning object pose changes, camera viewpoints, lighting and background variations, cross-embodiment transfer, and multi-view synthesis. We leverage a pre-trained video diffusion model to convert simulated videos, along with action labels from the simulation trajectories, into action-consistent demonstrations. Starting from only a few real-world demonstrations, CRAFT generates a large, visually diverse set of photorealistic training data, bypassing the need to replay demonstrations on the real robot (Sim2Real). Across simulated and real-world bimanual tasks, CRAFT improves success rates over existing augmentation strategies and straightforward data scaling, demonstrating that diffusion-based video generation can substantially expand demonstration diversity and improve generalization for dual-arm manipulation tasks. Our project website is available at: https://craftaug.github.io/

URLs: https://craftaug.github.io/

cross Stochastic Generative Plug-and-Play Priors

Authors: Chicago Y. Park, Edward P. Chandler, Yuyang Hu, Michael T. McCann, Cristina Garcia-Cardona, Brendt Wohlberg, Ulugbek S. Kamilov

Abstract: Plug-and-play (PnP) methods are widely used for solving imaging inverse problems by incorporating a denoiser into optimization algorithms. Score-based diffusion models (SBDMs) have recently demonstrated strong generative performance through a denoiser trained across a wide range of noise levels. Despite their shared reliance on denoisers, it remains unclear how to systematically use SBDMs as priors within the PnP framework without relying on reverse diffusion sampling. In this paper, we establish a score-based interpretation of PnP that justifies using pretrained SBDMs directly within PnP algorithms. Building on this connection, we introduce a stochastic generative PnP (SGPnP) framework that injects noise to better leverage the expressive generative SBDM priors, thereby improving robustness in severely ill-posed inverse problems. We provide a new theory showing that this noise injection induces optimization on a Gaussian-smoothed objective and promotes escape from strict saddle points. Experiments on challenging inverse tasks, such as multi-coil MRI reconstruction and large-mask natural image inpainting, demonstrate consistent improvement over conventional PnP methods and achieve performance competitive with diffusion-based solvers.

cross LightThinker++: From Reasoning Compression to Memory Management

Authors: Yuqi Zhu, Jintian Zhang, Zhenjie Wan, Yujie Luo, Shuofei Qiao, Zhengke Gui, Da Zheng, Lei Liang, Huajun Chen, Ningyu Zhang

Abstract: Large language models (LLMs) excel at complex reasoning, yet their efficiency is limited by the surging cognitive overhead of long thought traces. In this paper, we propose LightThinker, a method that enables LLMs to dynamically compress intermediate thoughts into compact semantic representations. However, static compression often struggles with complex reasoning where the irreversible loss of intermediate details can lead to logical bottlenecks. To address this, we evolve the framework into LightThinker++, introducing Explicit Adaptive Memory Management. This paradigm shifts to behavioral-level management by incorporating explicit memory primitives, supported by a specialized trajectory synthesis pipeline to train purposeful memory scheduling. Extensive experiments demonstrate the framework's versatility across three dimensions. (1) LightThinker reduces peak token usage by 70% and inference time by 26% with minimal accuracy loss. (2) In standard reasoning, LightThinker++ slashes peak token usage by 69.9% while yielding a +2.42% accuracy gain under the same context budget for maximum performance. (3) Most notably, in long-horizon agentic tasks, it maintains a stable footprint beyond 80 rounds (a 60%-70% reduction), achieving an average performance gain of 14.8% across different complex scenarios. Overall, our work provides a scalable direction for sustaining deep LLM reasoning over extended horizons with minimal overhead.

cross Learning Superpixel Ensemble and Hierarchy Graphs for Melanoma Detection

Authors: Asmaa M. Elwer, Muhammad A. Rushdi, Mahmoud H. Annaby

Abstract: Graph signal processing (GSP) is becoming a major tool in biomedical signal and image analysis. In most GSP techniques, graph structures and edge weights have been typically set via statistical and computational methods. More recently, graph structure learning methods offered more reliable and flexible data representations. In this work, we introduce a graph learning approach for melanoma detection in dermoscopic images based on two graph-theoretic representations: superpixel ensemble graphs (SEG) and superpixel hierarchy graphs (SHG). For these two types of graphs, superpixel maps of a skin lesion image are respectively generated at multiple levels without and with parentchild constraints among superpixels at adjacent levels, where each level corresponds to a subgraph with a different number of nodes (20, 40, 60, 80, or 100 nodes). Two edge weight assignment techniques are explored: handcrafted Gaussian weights and learned weights based on optimization methods. The graph nodal signals are assigned based on texture, geometric, and color superpixel features. In addition, the effect of graph edge thresholding is investigated by applying different thresholds (25%, 50%, and 75%) to prune the weakest edges and analyze the impact of pruning on the melanoma detection performance. Experimental evaluation of the proposed method is performed with different classifiers trained and tested on the publicly available ISIC2017 dataset. Data augmentation is applied to alleviate class imbalance by adding more melanoma images from the ISIC archive. The results show that learned superpixel ensemble graphs with textural nodal signals give the highest performance reaching an accuracy of 99.00% and an AUC of 99.59%.

cross The Generalised Kernel Covariance Measure

Authors: Luca Bergen, Dino Sejdinovic, Vanessa Didelez

Abstract: We consider the problem of conditional independence (CI) testing and adopt a kernel-based approach. Kernel-based CI tests embed variables in reproducing kernel Hilbert spaces, regress their embeddings on the conditioning variables, and test the resulting residuals for marginal independence. This approach yields tests that are sensitive to a broad range of conditional dependencies. Existing methods, however, rely heavily on kernel ridge regression, which is computationally expensive when properly tuned and yields poorly calibrated tests when left untuned, which limits their practical usefulness. We propose the Generalised Kernel Covariance Measure (GKCM), a regression-model-agnostic kernel-based CI test that accommodates a broad class of regression estimators. Building on the Generalised Hilbertian Covariance Measure framework (Lundborg et al., 2022), we characterise conditions under which GKCM satisfies uniform asymptotic level guarantees. In simulations, GKCM paired with tree-based regression models frequently outperforms state-of-the-art CI tests across a diverse range of data-generating processes, achieving better type I error control and competitive or superior power.

cross Spatiotemporal-Aware Bit-Flip Injection on DNN-based Advanced Driver Assistance Systems

Authors: Taibiao Zhao, Xiang Zhang, Mingxuan Sun, Ruyi Ding, Xugui Zhou

Abstract: Modern advanced driver assistance systems (ADAS) rely on deep neural networks (DNNs) for perception and planning. Since DNNs' parameters reside in DRAM during inference, bit flips caused by cosmic radiation or low-voltage operation may corrupt DNN computations, distort driving decisions, and lead to real-world incidents. This paper presents a SpatioTemporal-Aware Fault Injection (STAFI) framework to locate critical fault sites in DNNs for ADAS efficiently. Spatially, we propose a Progressive Metric-guided Bit Search (PMBS) that efficiently identifies critical network weight bits whose corruption causes the largest deviations in driving behavior (e.g., unintended acceleration or steering). Furthermore, we develop a Critical Fault Time Identification (CFTI) mechanism that determines when to trigger these faults, taking into account the context of real-time systems and environmental states, to maximize the safety impact. Experiments on DNNs for a production ADAS demonstrate that STAFI uncovers 29.56x more hazard-inducing critical faults than the strongest baseline.

cross RL-Driven Sustainable Land-Use Allocation for the Lake Malawi Basin

Authors: Ying Yao

Abstract: Unsustainable land-use practices in ecologically sensitive regions threaten biodiversity, water resources, and the livelihoods of millions. This paper presents a deep reinforcement learning (RL) framework for optimizing land-use allocation in the Lake Malawi Basin to maximize total ecosystem service value (ESV). Drawing on the benefit transfer methodology of Costanza et al., we assign biome-specific ESV coefficients -- locally anchored to a Malawi wetland valuation -- to nine land-cover classes derived from Sentinel-2 imagery. The RL environment models a 50x50 cell grid at 500m resolution, where a Proximal Policy Optimization (PPO) agent with action masking iteratively transfers land-use pixels between modifiable classes. The reward function combines per-cell ecological value with spatial coherence objectives: contiguity bonuses for ecologically connected land-use patches (forest, cropland, built area etc.) and buffer zone penalties for high-impact development adjacent to water bodies. We evaluate the framework across three scenarios: (i) pure ESV maximization, (ii) ESV with spatial reward shaping, and (iii) a regenerative agriculture policy scenario. Results demonstrate that the agent effectively learns to increase total ESV; that spatial reward shaping successfully steers allocations toward ecologically sound patterns, including homogeneous land-use clustering and slight forest consolidation near water bodies; and that the framework responds meaningfully to policy parameter changes, establishing its utility as a scenario-analysis tool for environmental planning.

cross Debiased Machine Learning for Conformal Prediction of Counterfactual Outcomes Under Runtime Confounding

Authors: Keith Barnatchez, Kevin P. Josey, Rachel C. Nethery, Giovanni Parmigiani

Abstract: Data-driven decision making frequently relies on predicting counterfactual outcomes. In practice, researchers commonly train counterfactual prediction models on a source dataset to inform decisions on a possibly separate target population. Conformal prediction has arisen as a popular method for producing assumption-lean prediction intervals for counterfactual outcomes that would arise under different treatment decisions in the target population of interest. However, existing methods require that every confounding factor of the treatment-outcome relationship used for training on the source data is additionally measured in the target population, risking miscoverage if important confounders are unmeasured in the target population. In this paper, we introduce a computationally efficient debiased machine learning framework that allows for valid prediction intervals when only a subset of confounders is measured in the target population, a common challenge referred to as runtime confounding. Grounded in semiparametric efficiency theory, we show the resulting prediction intervals achieve desired coverage rates with faster convergence compared to standard methods. Through numerous synthetic and semi-synthetic experiments, we demonstrate the utility of our proposed method.

cross On the Efficiency of Sinkhorn-Knopp for Entropically Regularized Optimal Transport

Authors: Kun He

Abstract: The Sinkhorn--Knopp (SK) algorithm is a cornerstone method for matrix scaling and entropically regularized optimal transport (EOT). Despite its empirical efficiency, existing theoretical guarantees to achieve a target marginal accuracy $\varepsilon$ deteriorate severely in the presence of outliers, bottlenecked either by the global maximum regularized cost $\eta\|C\|_\infty$ (where $\eta$ is the regularization parameter and $C$ the cost matrix) or the matrix's minimum-to-maximum entry ratio $\nu$. This creates a fundamental disconnect between theory and practice. In this paper, we resolve this discrepancy. For EOT, we introduce the novel concept of well-boundedness, a local bulk mass property that rigorously isolates the well-behaved portion of the data from extreme outliers. We prove that governed by this fundamental notion, SK recovers the target transport plan for a problem of dimension $n$ in $O(\log n - \log \varepsilon)$ iterations, completely independent of the regularized cost $\eta\|C\|_\infty$. Furthermore, we show that a virtually cost-free pre-scaling step eliminates the dimensional dependence entirely, accelerating convergence to a strictly dimension-free $O(\log(1/\varepsilon))$ iterations. Beyond EOT, we establish a sharp phase transition for general $(\boldsymbol{u},\boldsymbol{v})$-scaling governed by a critical matrix density threshold. We prove that when a matrix's density exceeds this threshold, the iteration complexity is strictly independent of $\nu$. Conversely, when the density falls below this threshold, the dependence on $\nu$ becomes unavoidable; in this sub-critical regime, we construct instances where SK requires $\Omega(n/\varepsilon)$ iterations.

cross R\'enyi Attention Entropy for Patch Pruning

Authors: Hiroaki Aizawa, Yuki Igaue

Abstract: Transformers are strong baselines in both vision and language because self-attention captures long-range dependencies across tokens. However, the cost of self-attention grows quadratically with the number of tokens. Patch pruning mitigates this cost by estimating per-patch importance and removing redundant patches. To identify informative patches for pruning, we introduce a criterion based on the Shannon entropy of the attention distribution. Low-entropy patches, which receive selective and concentrated attention, are kept as important, while high-entropy patches with attention spread across many locations are treated as redundant. We also extend the criterion from Shannon to R\'enyi entropy, which emphasizes sharp attention peaks and supports pruning strategies that adapt to task needs and computational limits. In experiments on fine-grained image recognition, where patch selection is critical, our method reduced computation while preserving accuracy. Moreover, adjusting the pruning policy through the R\'enyi entropy measure yields further gains and improves the trade-off between accuracy and computation.

cross New insights into Elo algorithm for practitioners and statisticians

Authors: Leszek Szczecinski

Abstract: This work reconciles two perspectives on the Elo ranking that coexist in the literature: the practitioner's view as a heuristic feedback rule, and the statistician's view as online maximum likelihood estimation via stochastic gradient ascent. Both perspectives coincide exactly in the binary case (iff the expected score is the logistic function). However, estimation noise forces a principled decoupling between the model used for ranking and the model used for prediction: the effective scale and home-field advantage parameter must be adjusted to account for the noise. We provide both closed-form corrections and a data-driven identification procedure. For multilevel outcomes, an exact relationship exists when outcome scores are uniformly spaced, but approximations are preferred in general: they account for estimation noise and better fit the data. The decoupled approach substantially outperforms the conventional one that reuses the ranking model for prediction, and serves as a diagnostic of convergence status. Applied to six years of FIFA men's ranking, we find that the ranking had not converged for the vast majority of national teams. The paper is written in a semi-tutorial style accessible to practitioners, with all key results accompanied by closed-form expressions and numerical examples.

cross Explainability-Guided Adversarial Attacks on Transformer-Based Malware Detectors Using Control Flow Graphs

Authors: Andrew Wheeler, Kshitiz Aryal, Maanak Gupta

Abstract: Transformer-based malware detection systems operating on graph modalities such as control flow graphs (CFGs) achieve strong performance by modeling structural relationships in program behavior. However, their robustness to adversarial evasion attacks remains underexplored. This paper examines the vulnerability of a RoBERTa-based malware detector that linearizes CFGs into sequences of function calls, a design choice that enables transformer modeling but may introduce token-level sensitivities and ordering artifacts exploitable by adversaries. By evaluating evasion strategies within this graph-to-sequence framework, we provide insight into the practical robustness of transformer-based malware detectors beyond aggregate detection accuracy. This paper proposes a white-box adversarial evasion attack that leverages explainability mechanisms to identify and perturb most influential graph components. Using token- and word-level attributions derived from integrated gradients, the attack iteratively replaces positively attributed function calls with synthetic external imports, producing adversarial CFG representations without altering overall program structure. Experimental evaluation on small- and large-scale Windows Portable Executable (PE) datasets demonstrates that the proposed method can reliably induce misclassification, even against models trained to high accuracy. Our results highlight that explainability tools, while valuable for interpretability, can also expose critical attack surfaces in transformer-based malware detectors.

cross SecureAFL: Secure Asynchronous Federated Learning

Authors: Anjun Gao, Feng Wang, Zhenglin Wan, Yueyang Quan, Zhuqing Liu, Minghong Fang

Abstract: Federated learning (FL) enables multiple clients to collaboratively train a global machine learning model via a server without sharing their private training data. In traditional FL, the system follows a synchronous approach, where the server waits for model updates from numerous clients before aggregating them to update the global model. However, synchronous FL is hindered by the straggler problem. To address this, the asynchronous FL architecture allows the server to update the global model immediately upon receiving any client's local model update. Despite its advantages, the decentralized nature of asynchronous FL makes it vulnerable to poisoning attacks. Several defenses tailored for asynchronous FL have been proposed, but these mechanisms remain susceptible to advanced attacks or rely on unrealistic server assumptions. In this paper, we introduce SecureAFL, an innovative framework designed to secure asynchronous FL against poisoning attacks. SecureAFL improves the robustness of asynchronous FL by detecting and discarding anomalous updates while estimating the contributions of missing clients. Additionally, it utilizes Byzantine-robust aggregation techniques, such as coordinate-wise median, to integrate the received and estimated updates. Extensive experiments on various real-world datasets demonstrate the effectiveness of SecureAFL.

cross When Models Know More Than They Say: Probing Analogical Reasoning in LLMs

Authors: Hope McGovern, Caroline Craig, Thomas Lippincott, Hale Sirin

Abstract: Analogical reasoning is a core cognitive faculty essential for narrative understanding. While LLMs perform well when surface and structural cues align, they struggle in cases where an analogy is not apparent on the surface but requires latent information, suggesting limitations in abstraction and generalisation. In this paper we compare a model's probed representations with its prompted performance at detecting narrative analogies, revealing an asymmetry: for rhetorical analogies, probing significantly outperforms prompting in open-source models, while for narrative analogies, they achieve a similar (low) performance. This suggests that the relationship between internal representations and prompted behavior is task-dependent and may reflect limitations in how prompting accesses available information.

cross PhaseFlow4D: Physically Constrained 4D Beam Reconstruction via Feedback-Guided Latent Diffusion

Authors: Alexander Scheinker, Alexander Plastun, Peter Ostroumov

Abstract: We address the problem of recovering a time-varying 4D distribution from a sparse sequence of 2D projections - analogous to novel-view synthesis from sparse cameras, but applied to the 4D transverse phase space density $\rho(x,p_x,y,p_y)$ of charged particle beams. Direct single shot measurement of this high-dimensional distribution is physically impossible in real particle accelerator systems; only limited 1D or 2D projections are accessible. We propose PhaseFlow4D, a feedback-guided latent diffusion model that reconstructs and tracks the full 4D phase space from incomplete 2D observations alone, with built-in hard physics constraints. Our core technical contribution is a 4D VAE whose decoder generates the full 4D phase space tensor, from which 2D projections are analytically computed and compared against 2D beam measurements. This projection-consistency constraint guarantees physical correctness by construction - not as a soft penalty, but as an architectural prior. An adaptive feedback loop then continuously tunes the conditioning vector of the latent diffusion model to track time-varying distributions online without retraining. We validate on multi-particle simulations of heavy-ion beams at the Facility for Rare Isotope Beams (FRIB), where full physics simulations require $\sim$6 hours on a 100-core HPC system. PhaseFlow4D achieves accurate 4D reconstructions 11000$\times$ faster while faithfully tracking distribution shifts under time-varying source conditions - demonstrating that principled generative reconstruction under incomplete observations transfers robustly beyond visual domains.

cross Lotka-Sharpe Neural Operators for Control of Population PDEs

Authors: Miroslav Krstic, Iasson Karafyllis, Luke Bhan, Carina Veil

Abstract: Age-structured predator-prey integro-partial differential equations provide models of interacting populations in ecology, epidemiology, and biotechnology. A key challenge in feedback design for these systems is the scalar $\zeta$, defined implicitly by the Lotka-Sharpe nonlinear integral condition, as a mapping from fertility and mortality rates to $\zeta$. To solve this challenge with operator learning, we first prove that the Lotka-Sharpe operator is Lipschitz continuous, guaranteeing the existence of arbitrarily accurate neural operator approximations over a compact set of fertility and mortality functions. We then show that the resulting approximate feedback law preserves semi-global practical asymptotic stability under propagation of the operator approximation error through various other nonlinear operators, all the way through to the control input. In the numerical results, not only do we learn ``once-and-for-all'' the canonical Lotka-Sharpe (LS) operator, and thus make it available for future uses in control of other age-structured population interconnections, but we demonstrate the online usage of the neural LS operator under estimation of the fertility and mortality functions.

cross Improving ML Attacks on LWE with Data Repetition and Stepwise Regression

Authors: Alberto Alfarano, Eshika Saxena, Emily Wenger, Fran\c{c}ois Charton, Kristin Lauter

Abstract: The Learning with Errors (LWE) problem is a hard math problem in lattice-based cryptography. In the simplest case of binary secrets, it is the subset sum problem, with error. Effective ML attacks on LWE were demonstrated in the case of binary, ternary, and small secrets, succeeding on fairly sparse secrets. The ML attacks recover secrets with up to 3 active bits in the "cruel region" (Nolte et al., 2024) on samples pre-processed with BKZ. We show that using larger training sets and repeated examples enables recovery of denser secrets. Empirically, we observe a power-law relationship between model-based attempts to recover the secrets, dataset size, and repeated examples. We introduce a stepwise regression technique to recover the "cool bits" of the secret.

cross Automating Cloud Security and Forensics Through a Secure-by-Design Generative AI Framework

Authors: Dalal Alharthi, Ivan Roberto Kawaminami Garcia

Abstract: As cloud environments become increasingly complex, cybersecurity and forensic investigations must evolve to meet emerging threats. Large Language Models (LLMs) have shown promise in automating log analysis and reasoning tasks, yet they remain vulnerable to prompt injection attacks and lack forensic rigor. To address these dual challenges, we propose a unified, secure-by-design GenAI framework that integrates PromptShield and the Cloud Investigation Automation Framework (CIAF). PromptShield proactively defends LLMs against adversarial prompts using ontology-driven validation that standardizes user inputs and mitigates manipulation. CIAF streamlines cloud forensic investigations through structured, ontology-based reasoning across all six phases of the forensic process. We evaluate our system on real-world datasets from AWS and Microsoft Azure, demonstrating substantial improvements in both LLM security and forensic accuracy. Experimental results show PromptShield boosts classification performance under attack conditions, achieving precision, recall, and F1 scores above 93%, while CIAF enhances ransomware detection accuracy in cloud logs using Likert-transformed performance features. Our integrated framework advances the automation, interpretability, and trustworthiness of cloud forensics and LLM-based systems, offering a scalable foundation for real-time, AI-driven incident response across diverse cloud infrastructures.

cross Biconvex Biclustering

Authors: Sam Rosen, Eric C. Chi, Jason Xu

Abstract: This article proposes a biconvex modification to convex biclustering in order to improve its performance in high-dimensional settings. In contrast to heuristics that discard a subset of noisy features a priori, our method jointly learns and accordingly weighs informative features while discovering biclusters. Moreover, the method is adaptive to the data, and is accompanied by an efficient algorithm based on proximal alternating minimization, complete with detailed guidance on hyperparameter tuning and efficient solutions to optimization subproblems. These contributions are theoretically grounded; we establish finite-sample bounds on the objective function under sub-Gaussian errors, and generalize these guarantees to cases where input affinities need not be uniform. Extensive simulation results reveal our method consistently recovers underlying biclusters while weighing and selecting features appropriately, outperforming peer methods. An application to a gene microarray dataset of lymphoma samples recovers biclusters matching an underlying classification, while giving additional interpretation to the mRNA samples via the column groupings and fitted weights.

cross Fused Multinomial Logistic Regression Utilizing Summary-Level External Machine-learning Information

Authors: Chi-Shian Dai, Jun Shao

Abstract: In many modern applications, a carefully designed primary study provides individual-level data for interpretable modeling, while summary-level external information is available through black-box, efficient, and nonparametric machine-learning predictions. Although summary-level external information has been studied in the data integration literature, there is limited methodology for leveraging external nonparametric machine-learning predictions to improve statistical inference in the primary study. We propose a general empirical-likelihood framework that incorporates external predictions through moment constraints. An advantage of nonparametric machine-learning prediction is that it induces a rich class of valid moment restrictions that remain robust to covariate shift under a mild overlap condition without requiring explicit density-ratio modeling. We focus on multinomial logistic regression as the primary model and address common data-quality issues in external sources, including coarsened outcomes, partially observed covariates, covariate shift, and heterogeneity in generating mechanisms known as concept shift. We establish large-sample properties of the resulting fused estimator, including consistency and asymptotic normality under regularity conditions. Moreover, we provide mild sufficient conditions under which incorporating external predictions delivers a strict efficiency gain relative to the primary-only estimator. Simulation studies and an application to the National Health and Nutrition Examination Survey on multiclass blood-pressure classification.

cross Multimodal Structure Learning: Disentangling Shared and Specific Topology via Cross-Modal Graphical Lasso

Authors: Fei Wang, Yutong Zhang, Xiong Wang

Abstract: Learning interpretable multimodal representations inherently relies on uncovering the conditional dependencies between heterogeneous features. However, sparse graph estimation techniques, such as Graphical Lasso (GLasso), to visual-linguistic domains is severely bottlenecked by high-dimensional noise, modality misalignment, and the confounding of shared versus category-specific topologies. In this paper, we propose Cross-Modal Graphical Lasso (CM-GLasso) that overcomes these fundamental limitations. By coupling a novel text-visualization strategy with a unified vision-language encoder, we strictly align multimodal features into a shared latent space. We introduce a cross-attention distillation mechanism that condenses high-dimensional patches into explicit semantic nodes, naturally extracting spatial-aware cross-modal priors. Furthermore, we unify tailored GLasso estimation and Common-Specific Structure Learning (CSSL) into a joint objective optimized via the Alternating Direction Method of Multiplier (ADMM). This formulation guarantees the simultaneous disentanglement of invariant and class-specific precision matrices without multi-step error accumulation. Extensive experiments across eight benchmarks covering both natural and medical domains demonstrate that CM-GLasso establishes a new state-of-the-art in generative classification and dense semantic segmentation tasks.

cross Predict, Don't React: Value-Based Safety Forecasting for LLM Streaming

Authors: Pride Kavumba, Koki Wataoka, Huy H. Nguyen, Jiaxuan Li, Masaya Ohagi

Abstract: In many practical LLM deployments, a single guardrail is used for both prompt and response moderation. Prompt moderation operates on fully observed text, whereas streaming response moderation requires safety decisions to be made over partial generations. Existing text-based streaming guardrails commonly frame this output-side problem as boundary detection, training models to identify the earliest prefix at which a response has already become unsafe. In this work, we introduce StreamGuard, a unified model-agnostic streaming guardrail that instead formulates moderation as a forecasting problem: given a partial prefix, the model predicts the expected harmfulness of likely future continuations. We supervise this prediction using Monte Carlo rollouts, which enables early intervention without requiring exact token-level boundary annotations. Across standard safety benchmarks, StreamGuard performs strongly both for input moderation and for streaming output moderation. At the 8B scale, StreamGuard improves aggregated input-moderation F1 from 86.7 to 88.2 and aggregated streaming output-moderation F1 from 80.4 to 81.9 relative to Qwen3Guard-Stream-8B-strict. On the QWENGUARDTEST response_loc streaming benchmark, StreamGuard reaches 97.5 F1, 95.1 recall, and 92.6% on-time intervention, compared to 95.9 F1, 92.1 recall, and 89.9% for Qwen3Guard-Stream-8B-stric, while reducing the miss rate from 7.9% to 4.9%. We further show that forecasting-based supervision transfers effectively across tokenizers and model families: with transferred targets, Gemma3-StreamGuard-1B reaches 81.3 response-moderation F1, 98.2 streaming F1, and a 3.5% miss rate. These results show that strong end-to-end streaming moderation can be obtained without exact boundary labels, and that forecasting future risk is an effective supervision strategy for low-latency safety intervention.

cross Nearly Optimal Best Arm Identification for Semiparametric Bandits

Authors: Seok-Jin Kim

Abstract: We study fixed-confidence Best Arm Identification (BAI) in semiparametric bandits, where rewards are linear in arm features plus an unknown additive baseline shift. Unlike linear-bandit BAI, this setting requires orthogonalized regression, and its instance-optimal sample complexity has remained open. For the transductive setting, we establish an attainable instance-dependent lower bound characterized by the corresponding linear-bandit complexity on shifted features. We then propose a computationally efficient phase-elimination algorithm based on a new $XY$-design for orthogonalized regression. Our analysis yields a nearly optimal high-probability sample-complexity upper bound, up to log factors and an additive $d^2$ term, and experiments on synthetic instances and the Jester dataset show clear gains over prior baselines.

cross OASIC: Occlusion-Agnostic and Severity-Informed Classification

Authors: Kay Gijzen (Leiden University), Gertjan J. Burghouts (TNO), Dani\"el M. Pelt (Leiden University)

Abstract: Severe occlusions of objects pose a major challenge for computer vision. We show that two root causes are (1) the loss of visible information and (2) the distracting patterns caused by the occluders. Our approach addresses both causes at the same time. First, the distracting patterns are removed at test-time, via masking of the occluding patterns. This masking is independent of the type of occlusion, by handling the occlusion through the lens of visual anomalies w.r.t. the object of interest. Second, to deal with less visual details, we follow standard practice by masking random parts of the object during training, for various degrees of occlusions. We discover that (a) it is possible to estimate the degree of the occlusion (i.e. severity) at test-time, and (b) that a model optimized for a specific degree of occlusion also performs best on a similar degree during test-time. Combining these two insights brings us to a severity-informed classification model called OASIC: Occlusion Agnostic Severity Informed Classification. We estimate the severity of occlusion for a test image, mask the occluder, and select the model that is optimized for the degree of occlusion. This strategy performs better than any single model optimized for any smaller or broader range of occlusion severities. Experiments show that combining gray masking with adaptive model selection improves $\text{AUC}_\text{occ}$ by +18.5 over standard training on occluded images and +23.7 over finetuning on unoccluded images.

cross Unmasking Hallucinations: A Causal Graph-Attention Perspective on Factual Reliability in Large Language Models

Authors: Sailesh kiran kurra, Shiek Ruksana, Vishal Borusu

Abstract: This paper primarily focuses on the hallucinations caused due to AI language models(LLMs).LLMs have shown extraordinary Language understanding and generation capabilities .Still it has major a disadvantage hallucinations which give outputs which are factually incorrect ,misleading or unsupported by input data . These hallucinations cause serious problems in scenarios like medical diagnosis or legal reasoning.Through this work,we propose causal graph attention network (GCAN) framework that reduces hallucinations through interpretation of internal attention flow within a transformer architecture with the help of constructing token level graphs that combine self attention weights and gradient based influence scores.our method quantifies each tokens factual dependency using a new metric called the Causal Contribution Score (CCS). We further introduce a fact-anchored graph reweighting layer that dynamically reduces the influence of hallucination prone nodes during generation. Experiments on standard benchmarks such as TruthfulQA and HotpotQA show a 27.8 percent reduction in hallucination rate and 16.4 percent improvement in factual accuracy over baseline retrieval-augmented generation (RAG) models. This work contributes to the interpretability,robustness, and factual reliability of future LLM architectures.

cross Jellyfish: Zero-Shot Federated Unlearning Scheme with Knowledge Disentanglement

Authors: Houzhe Wang, Xiaojie Zhu, Chi Chen

Abstract: With the increasing importance of data privacy and security, federated unlearning emerges as a new research field dedicated to ensuring that once specific data is deleted, federated learning models no longer retain or disclose related information. In this paper, we propose a zero-shot federated unlearning scheme, named Jellyfish. It distinguishes itself from conventional federated unlearning frameworks in four key aspects: synthetic data generation, knowledge disentanglement, loss function design, and model repair. To preserve the privacy of forgotten data, we design a zero-shot unlearning mechanism that generates error-minimization noise as proxy data for the data to be forgotten. To maintain model utility, we first propose a knowledge disentanglement mechanism that regularises the output of the final convolutional layer by restricting the number of activated channels for the data to be forgotten and encouraging activation sparsity. Next, we construct a comprehensive loss function that incorporates multiple components, including hard loss, confusion loss, distillation loss, model weight drift loss, gradient harmonization, and gradient masking, to effectively align the learning trajectories of the objectives of ``forgetting" and ``retaining". Finally, we propose a zero-shot repair mechanism that leverages proxy data to restore model accuracy within acceptable bounds without accessing users' local data. To evaluate the performance of the proposed zero-shot federated unlearning scheme, we conducted comprehensive experiments across diverse settings. The results validate the effectiveness and robustness of the scheme.

cross Topological Sensitivity in Connectome-Constrained Neural Networks

Authors: Nalin Dhiman

Abstract: Connectome-constrained neural networks are often evaluated against sparse random controls and then interpreted as evidence that biological graph topology improves learning efficiency. We revisit that claim in a controlled flyvis-based study using a Drosophila connectome, a naive self-loop-matched random graph, and a degree-preserving rewired null. Under weak controls, in which both models were recovered from a connectome-trained checkpoint and the null matched only global graph counts, the connectome appeared substantially better in early loss, mean activity, and runtime. That picture changed under stricter controls. Training both graphs from a shared random initialization removed the early loss advantage, and replacing the naive null by a degree-preserving null removed the apparent activity advantage. A five-sample degree-preserving ensemble and a pre-training activity-scale diagnostic further strengthened this revised interpretation. We also report a descriptive mechanism analysis of the earlier weak-control comparison, but we treat it as behavioral characterization rather than proof of causal superiority. We show that previously reported topology advantages in connectome-constrained neural networks can arise from initialization and null-model confounds, and largely disappear under fair from-scratch initialization and degree-preserving controls.

cross TORA: Topological Representation Alignment for 3D Shape Assembly

Authors: Nahyuk Lee, Zhiang Chen, Marc Pollefeys, Sunghwan Hong

Abstract: Flow-matching methods for 3D shape assembly learn point-wise velocity fields that transport parts toward assembled configurations, yet they receive no explicit guidance about which cross-part interactions should drive the motion. We introduce TORA, a topology-first representation alignment framework that distills relational structure from a frozen pretrained 3D encoder into the flow-matching backbone during training. We first realize this via simple instantiation, token-wise cosine matching, which injects the learned geometric descriptors from the teacher representation. We then extend to employ a Centered Kernel Alignment (CKA) loss to match the similarity structure between student and teacher representations for enhanced topological alignment. Through systematic probing of diverse 3D encoders, we show that geometry- and contact-centric teacher properties, not semantic classification ability, govern alignment effectiveness, and that alignment is most beneficial at later transformer layers where spatial structure naturally emerges. TORA introduces zero inference overhead while yielding two consistent benefits: faster convergence (up to 6.9$\times$) and improved accuracy in-distribution, along with greater robustness under domain shift. Experiments on five benchmarks spanning geometric, semantic, and inter-object assembly demonstrate state-of-the-art performance, with particularly pronounced gains in zero-shot transfer to unseen real-world and synthetic datasets. Project page: https://nahyuklee.github.io/tora.

URLs: https://nahyuklee.github.io/tora.

cross Extended Hybrid Timed Petri Nets with Semi-Supervised Anomaly Detection for Switched Systems, Modelling and Fault Detection

Authors: Fatiha Hamdi, Abdelhafid Zeroual, Fouzi Harrou

Abstract: Hybrid physical systems combine continuous and discrete dynamics, which can be simultaneously affected by faults. Conventional fault detection methods often treat these dynamics separately, limiting their ability to capture interacting fault patterns. This paper proposes a unified fault detection framework for hybrid dynamical systems by integrating an Extended Timed Continuous Petri Net (ETCPN) model with semi-supervised anomaly detection. The proposed ETCPN extends existing Petri net formalisms by introducing marking-dependent flow functions, enabling intrinsic coupling between discrete and continuous dynamics. Based on this structure, a mode-dependent hybrid observer is designed, whose stability under arbitrary switching is ensured via Linear Matrix Inequalities (LMIs), solved offline to determine observer gains. The observer generates residuals that reflect discrepancies between the estimated and measured outputs. These residuals are processed using semi-supervised methods, including One-Class SVM (OC-SVM), Support Vector Data Description (SVDD), and Elliptic Envelope (EE), trained exclusively on normal data to avoid reliance on labeled faults. The framework is validated through simulations involving discrete faults, continuous faults, and hybrid faults. Results demonstrate high detection accuracy, fast convergence, and robust performance, with OC-SVM and SVDD providing the best trade-off between detection rate and false alarms. The framework is computationally efficient for real-time deployment, as the main complexity is confined to the offline LMI design phase.

cross FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification

Authors: Hang Xu, Ling Yue, Chaoqian Ouyang, Yuchen Liu, Libin Zheng, Shaowu Pan, Shimin Di, Min-Ling Zhang

Abstract: Peer review in machine learning is under growing pressure from rising submission volume and limited reviewer time. Most LLM-based reviewing systems read only the manuscript and generate comments from the paper's own narrative. This makes their outputs sensitive to presentation quality and leaves them weak when the evidence needed for review lies in related work or released code. We present FactReview, an evidence-grounded reviewing system that combines claim extraction, literature positioning, and execution-based claim verification. Given a submission, FactReview identifies major claims and reported results, retrieves nearby work to clarify the paper's technical position, and, when code is available, executes the released repository under bounded budgets to test central empirical claims. It then produces a concise review and an evidence report that assigns each major claim one of five labels: Supported, Supported by the paper, Partially supported, In conflict, or Inconclusive. In a case study on CompGCN, FactReview reproduces results that closely match those reported for link prediction and node classification, yet also shows that the paper's broader performance claim across tasks is not fully sustained: on MUTAG graph classification, the reproduced result is 88.4%, whereas the strongest baseline reported in the paper remains 92.6%. The claim is therefore only partially supported. More broadly, this case suggests that AI is most useful in peer review not as a final decision-maker, but as a tool for gathering evidence and helping reviewers produce more evidence-grounded assessments. The code is public at https://github.com/DEFENSE-SEU/Review-Assistant.

URLs: https://github.com/DEFENSE-SEU/Review-Assistant.

cross Intelligent Traffic Monitoring with YOLOv11: A Case Study in Real-Time Vehicle Detection

Authors: Shkelqim Sherifi

Abstract: Recent advancements in computer vision, driven by artificial intelligence, have significantly enhanced monitoring systems. One notable application is traffic monitoring, which leverages computer vision alongside deep learning-based object detection and counting. We present an offline, real-time traffic monitoring system that couples a pre-trained YOLOv11 detector with BoT-SORT/ByteTrack for multi-object tracking, implemented in PyTorch/OpenCV and wrapped in a Qt-based desktop UI. The CNN pipeline enables efficient vehicle detection and counting from video streams without cloud dependencies. Across diverse scenes, the system achieves (66.67-95.83%) counting accuracy. Class-wise detection yields high precision (cars: 0.97-1.00; trucks: 1.00) with strong recall (cars: 0.82-1.00; trucks: 0.70-1.00), resulting in F1 scores of (0.90-1.00 for cars and 0.82-1.00 for trucks). While adverse weather conditions may negatively impact this performance, results remain robust in typical conditions. By integrating lightweight models with an accessible, cloud-independent interface, this paper contributes to the modernization and development of future smart cities by showing the capacity of AI-driven traffic monitoring systems.

cross Embedding Enhancement via Fine-Tuned Language Models for Learner-Item Cognitive Modeling

Authors: Yuanhao Liu, Zihan Zhou, Kaiying Wu, Shuo Liu, Yiyang Huang, Jiajun Guo, Aimin Zhou, Hong Qian

Abstract: Learner-item cognitive modeling plays a central role in the web-based online intelligent education system by enabling cognitive diagnosis (CD) across diverse online educational scenarios. Although ID embedding remains the mainstream approach in cognitive modeling due to its effectiveness and flexibility, recent advances in language models (LMs) have introduced new possibilities for incorporating rich semantic representations to enhance CD performance. This highlights the need for a comprehensive analysis of how LMs enhance embeddings through semantic integration across mainstream CD tasks. This paper identifies two key challenges in fully leveraging LMs in existing work: Misalignment between the training objectives of LMs and CD models creates a distribution gap in feature spaces; A unified framework is essential for integrating textual embeddings across varied CD tasks while preserving the strengths of existing cognitive modeling paradigms to ensure the robustness of embedding enhancement. To address these challenges, this paper introduces EduEmbed, a unified embedding enhancement framework that leverages fine-tuned LMs to enrich learner-item cognitive modeling across diverse CD tasks. EduEmbed operates in two stages. In the first stage, we fine-tune LMs based on role-specific representations and an interaction diagnoser to bridge the semantic gap of CD models. In the second stage, we employ a textual adapter to extract task-relevant semantics and integrate them with existing modeling paradigms to improve generalization. We evaluate the proposed framework on four CD tasks and computerized adaptive testing (CAT) task, achieving robust performance. Further analysis reveals the impact of semantic information across diverse tasks, offering key insights for future research on the application of LMs in CD for online intelligent education systems.

cross Efficient Onboard Spacecraft Pose Estimation with Event Cameras and Neuromorphic Hardware

Authors: Arunkumar Rathinam, Jules Lecomte, Jost Reelsen, Gregor Lenz, Axel von Arnim, Djamila Aouada

Abstract: Reliable relative pose estimation is a key enabler for autonomous rendezvous and proximity operations, yet space imagery is notoriously challenging due to extreme illumination, high contrast, and fast target motion. Event cameras provide asynchronous, change-driven measurements that can remain informative when frame-based imagery saturates or blurs, while neuromorphic processors can exploit sparse activations for low-latency, energy-efficient inferences. This paper presents a spacecraft 6-DoF pose-estimation pipeline that couples event-based vision with the BrainChip Akida neuromorphic processor. Using the SPADES dataset, we train compact MobileNet-style keypoint regression networks on lightweight event-frame representations, apply quantization-aware training (8/4-bit), and convert the models to Akida-compatible spiking neural networks. We benchmark three event representations and demonstrate real-time, low-power inference on Akida V1 hardware. We additionally design a heatmap-based model targeting Akida V2 and evaluate it on Akida Cloud, yielding improved pose accuracy. To our knowledge, this is the first end-to-end demonstration of spacecraft pose estimation running on Akida hardware, highlighting a practical route to low-latency, low-power perception for future autonomous space missions.

cross Measuring Robustness of Speech Recognition from MEG Signals Under Distribution Shift

Authors: Sheng-You Chien, Bo-Yi Mao, Yi-Ning Chang, Po-Chih Kuo

Abstract: This study investigates robust speech-related decoding from non-invasive MEG signals using the LibriBrain phoneme-classification benchmark from the 2025 PNPL competition. We compare residual convolutional neural networks (CNNs), an STFT-based CNN, and a CNN--Transformer hybrid, while also examining the effects of group averaging, label balancing, repeated grouping, normalization strategies, and data augmentation. Across our in-house implementations, preprocessing and data-configuration choices matter more than additional architectural complexity, among which instance normalization emerges as the most influential modification for generalization. The strongest of our own models, a CNN with group averaging, label balancing, repeated grouping, and instance normalization, achieves 60.95% F1-macro on the test split, compared with 39.53% for the plain CNN baseline. However, most of our models, without instance normalization, show substantial validation-to-test degradation, indicating that distribution shift induced by different normalization statistics is a major obstacle to generalization in our experiments. By contrast, MEGConformer maintains 64.09% F1-macro on both validation and test, and saliency-map analysis is qualitatively consistent with this contrast: weaker models exhibit more concentrated or repetitive phoneme-sensitive patterns across splits, whereas MEGConformer appears more distributed. Overall, the results suggest that improving the reliability of non-invasive phoneme decoding will likely require better handling of normalization-related distribution shift while also addressing the challenge of single-trial decoding.

cross Primal-Dual Methods for Nonsmooth Nonconvex Optimization with Orthogonality Constraints

Authors: Linglingzhi Zhu, Wentao Ding, Shangyuan Liu, Anthony Man-Cho So

Abstract: Recent advancements in data science have significantly elevated the importance of orthogonally constrained optimization problems. The Riemannian approach has become a popular technique for addressing these problems due to the advantageous computational and analytical properties of the Stiefel manifold. Nonetheless, the interplay of nonsmoothness alongside orthogonality constraints introduces substantial challenges to current Riemannian methods, including scalability, parallelizability, complicated subproblems, and cumulative numerical errors that threaten feasibility. In this paper, we take a retraction-free primal-dual approach and propose a linearized smoothing augmented Lagrangian method specifically designed for nonsmooth and nonconvex optimization with orthogonality constraints. Our proposed method is single-loop and free of subproblem solving. We establish its iteration complexity of $O(\epsilon^{-3})$ for finding $\epsilon$-KKT points, matching the best-known results in the Riemannian optimization literature. Additionally, by invoking the standard Kurdyka-Lojasiewicz (KL) property, we demonstrate asymptotic sequential convergence of the proposed algorithm. Numerical experiments on both smooth and nonsmooth orthogonal constrained problems demonstrate the superior computational efficiency and scalability of the proposed method compared with state-of-the-art algorithms.

cross Uncertainty-Aware Test-Time Adaptation for Cross-Region Spatio-Temporal Fusion of Land Surface Temperature

Authors: Sofiane Bouaziz, Adel Hafiane, Raphael Canals, Rachid Nedjai

Abstract: Deep learning models have shown great promise in diverse remote sensing applications. However, they often struggle to generalize across geographic regions unseen during training due to domain shifts. Domain shifts occur when data distributions differ between the training region and new target regions, due to variations in land cover, climate, and environmental conditions. Test-time adaptation (TTA) has emerged as a solution to such shifts, but existing methods are primarily designed for classification and are not directly applicable to regression tasks. In this work, we address the regression task of spatio-temporal fusion (STF) for land surface temperature estimation. We propose an uncertainty-aware TTA framework that updates only the fusion module of a pre-trained STF model, guided by epistemic uncertainty, land use and land cover consistency, and bias correction, without requiring source data or labeled target samples. Experiments on four target regions with diverse climates, namely Rome in Italy, Cairo in Egypt, Madrid in Spain, and Montpellier in France, show consistent improvements in RMSE and MAE for a pre-trained model in Orl\'eans, France. The average gains are 24.2% and 27.9%, respectively, even with limited unlabeled target data and only 10 TTA epochs.

cross Non-Equilibrium Stochastic Dynamics as a Unified Framework for Insight and Repetitive Learning: A Kramers Escape Approach to Continual Learning

Authors: Gunn Kim

Abstract: Continual learning in artificial neural networks is fundamentally limited by the stability--plasticity dilemma: systems that retain prior knowledge tend to resist acquiring new knowledge, and vice versa. Existing approaches, most notably elastic weight consolidation~(EWC), address this empirically without a physical account of why plasticity eventually collapses as tasks accumulate. Separately, the distinction between sudden insight and gradual skill acquisition through repetitive practice has lacked a unified theoretical description. Here, we show that both problems admit a common resolution within non-equilibrium statistical physics. We model the state of a learning system as a particle evolving under Langevin dynamics on a double-well energy landscape, with the noise amplitude governed by a time-dependent effective temperature $T(t)$. The probability density obeys a Fokker--Planck equation, and transitions between metastable states are governed by the Kramers escape rate $k = (\omega_0\omega_b/2\pi)\,e^{-\Delta E/T}$. We make two contributions. First, we identify the EWC penalty term as an energy barrier whose height grows linearly with the number of accumulated tasks, yielding an exponential collapse of the transition rate predicted analytically and confirmed numerically. Second, we show that insight and repetitive learning correspond to two qualitatively distinct temperature protocols within the same Fokker--Planck equation: insight events produce transient spikes in $T(t)$ that drive rapid barrier crossing, whereas repetitive practice operates at a modestly elevated but fixed temperature, achieving transitions through sustained stochastic diffusion. These results establish a physically grounded framework for understanding plasticity and its failure in continual learning systems, and suggest principled design criteria for adaptive noise schedules in artificial intelligence.

cross Graphic-Design-Bench: A Comprehensive Benchmark for Evaluating AI on Graphic Design Tasks

Authors: Adrienne Deganutti, Elad Hirsch, Haonan Zhu, Jaejung Seol, Purvanshi Mehta

Abstract: We introduce GraphicDesignBench (GDB), the first comprehensive benchmark suite designed specifically to evaluate AI models on the full breadth of professional graphic design tasks. Unlike existing benchmarks that focus on natural-image understanding or generic text-to-image synthesis, GDB targets the unique challenges of professional design work: translating communicative intent into structured layouts, rendering typographically faithful text, manipulating layered compositions, producing valid vector graphics, and reasoning about animation. The suite comprises 50 tasks organized along five axes: layout, typography, infographics, template & design semantics and animation, each evaluated under both understanding and generation settings, and grounded in real-world design templates drawn from the LICA layered-composition dataset. We evaluate a set of frontier closed-source models using a standardized metric taxonomy covering spatial accuracy, perceptual quality, text fidelity, semantic alignment, and structural validity. Our results reveal that current models fall short on the core challenges of professional design: spatial reasoning over complex layouts, faithful vector code generation, fine-grained typographic perception, and temporal decomposition of animations remain largely unsolved. While high-level semantic understanding is within reach, the gap widens sharply as tasks demand precision, structure, and compositional awareness. GDB provides a rigorous, reproducible testbed for tracking progress toward AI systems that can function as capable design collaborators. The full evaluation framework is publicly available.

cross PATHFINDER: Multi-objective discovery in structural and spectral spaces

Authors: Kamyar Barakati, Boris N. Slautin, Utkarsh Pratiush, Hiroshi Funakubo, Sergei V. Kalinin

Abstract: Automated decision-making is becoming key for automated characterization including electron and scanning probe microscopies and nano indentation. Most machine learning driven workflows optimize a single predefined objective and tend to converge prematurely on familiar responses, overlooking rare but scientifically important states. More broadly, the challenge is not only where to measure next, but how to coordinate exploration across structural, spectral, and measurement spaces under finite experimental budgets while balancing target-driven optimization with novelty discovery. Here we introduce PATHFINDER, a framework for autonomous microscopy that combines novelty driven exploration with optimization, helping the system discover more diverse and useful representations across structural, spectral, and measurement spaces. By combining latent space representations of local structure, surrogate modeling of functional response, and Pareto-based acquisition, the framework selects measurements that balance novelty discovery in feature and object space and are informative and experimentally actionable. Benchmarked on pre acquired STEM EELS data and realized experimentally in scanning probe microscopy of ferroelectric materials, this approach expands the accessible structure property landscape and avoids collapse onto a single apparent optimum. These results point to a new mode of autonomous microscopy that is not only optimization-driven, but also discovery-oriented, broad in its search, and responsive to human guidance.

cross Which English Do LLMs Prefer? Triangulating Structural Bias Towards American English in Foundation Models

Authors: Mir Tafseer Nayeem, Davood Rafiei

Abstract: Large language models (LLMs) are increasingly deployed in high-stakes domains, yet they expose only limited language settings, most notably "English (US)," despite the global diversity and colonial history of English. Through a postcolonial framing to explain the broader significance, we investigate how geopolitical histories of data curation, digital dominance, and linguistic standardization shape the LLM development pipeline. Focusing on two dominant standard varieties, American English (AmE) and British English (BrE), we construct a curated corpus of 1,813 AmE--BrE variants and introduce DiAlign, a dynamic, training-free method for estimating dialectal alignment using distributional evidence. We operationalize structural bias by triangulating evidence across three stages: (i) audits of six major pretraining corpora reveal systematic skew toward AmE, (ii) tokenizer analyses show that BrE forms incur higher segmentation costs, and (iii) generative evaluations show a persistent AmE preference in model outputs. To our knowledge, this is the first systematic and multi-faceted examination of dialectal asymmetries in standard English varieties across the phases of LLM development. We find that contemporary LLMs privilege AmE as the de facto norm, raising concerns about linguistic homogenization, epistemic injustice, and inequity in global AI deployment, while motivating practical steps toward more dialectally inclusive language technologies.

cross Relay-Assisted Activation-Integrated SIM for Wireless Physical Neural Networks

Authors: Meng Hua, Deniz G\"und\"uz

Abstract: Wireless physical neural networks (WPNNs) have emerged as a promising paradigm for performing neural computation directly in the physical layer of wireless systems, offering low latency and high energy efficiency. However, most existing WPNN implementations primarily rely on linear physical transformations, which fundamentally limits their expressiveness. In this work, we propose a relay-assisted WPNN architecture based on activation-integrated stacked intelligent metasurfaces (AI-SIMs), where each passive metasurface layer enabling linear wave manipulation is cascaded with an activation metasurface layer that realizes nonlinear processing in the analog domain. By deliberately structuring multi-hop wireless propagation, the relay amplification matrix and the metasurface phase-shift matrices jointly act as trainable network weights, while hardware-implemented activation functions provide essential nonlinearity. Simulation results demonstrate that the proposed architecture achieves high classification accuracy, and that incorporating hardware-based activation functions significantly improves representational capability and performance compared with purely linear physical implementations.

cross Sharp asymptotic theory for Q-learning with LDTZ learning rate and its generalization

Authors: Soham Bonnerjee, Zhipeng Lou, Wei Biao Wu

Abstract: Despite the sustained popularity of Q-learning as a practical tool for policy determination, a majority of relevant theoretical literature deals with either constant ($\eta_{t}\equiv \eta$) or polynomially decaying ($\eta_{t} = \eta t^{-\alpha}$) learning schedules. However, it is well known that these choices suffer from either persistent bias or prohibitively slow convergence. In contrast, the recently proposed linear decay to zero (\texttt{LD2Z}: $\eta_{t,n}=\eta(1-t/n)$) schedule has shown appreciable empirical performance, but its theoretical and statistical properties remain largely unexplored, especially in the Q-learning setting. We address this gap in the literature by first considering a general class of power-law decay to zero (\texttt{PD2Z}-$\nu$: $\eta_{t,n}=\eta(1-t/n)^{\nu}$). Proceeding step-by-step, we present a sharp non-asymptotic error bound for Q-learning with \texttt{PD2Z}-$\nu$ schedule, which then is used to derive a central limit theory for a new \textit{tail} Polyak-Ruppert averaging estimator. Finally, we also provide a novel time-uniform Gaussian approximation (also known as \textit{strong invariance principle}) for the partial sum process of Q-learning iterates, which facilitates bootstrap-based inference. All our theoretical results are complemented by extensive numerical experiments. Beyond being new theoretical and statistical contributions to the Q-learning literature, our results definitively establish that \texttt{LD2Z} and in general \texttt{PD2Z}-$\nu$ achieve a best-of-both-worlds property: they inherit the rapid decay from initialization (characteristic of constant step-sizes) while retaining the asymptotic convergence guarantees (characteristic of polynomially decaying schedules). This dual advantage explains the empirical success of \texttt{LD2Z} while providing practical guidelines for inference through our results.

cross Pedagogical Safety in Educational Reinforcement Learning: Formalizing and Detecting Reward Hacking in AI Tutoring Systems

Authors: Oluseyi Olukola, Nick Rahimi

Abstract: Reinforcement learning (RL) is increasingly used to personalize instruction in intelligent tutoring systems, yet the field lacks a formal framework for defining and evaluating pedagogical safety. We introduce a four-layer model of pedagogical safety for educational RL comprising structural, progress, behavioral, and alignment safety and propose the Reward Hacking Severity Index (RHSI) to quantify misalignment between proxy rewards and genuine learning. We evaluate the framework in a controlled simulation of an AI tutoring environment with 120 sessions across four conditions and three learner profiles, totaling 18{,}000 interactions. Results show that an engagement-optimized agent systematically over-selected a high-engagement action with no direct mastery gain, producing strong measured performance but limited learning progress. A multi-objective reward formulation reduced this problem but did not eliminate it, as the agent continued to favor proxy-rewarding behavior in many states. In contrast, a constrained architecture combining prerequisite enforcement and minimum cognitive demand substantially reduced reward hacking, lowering RHSI from 0.317 in the unconstrained multi-objective condition to 0.102. Ablation results further suggest that behavioral safety was the most influential safeguard against repetitive low-value action selection. These findings suggest that reward design alone may be insufficient to ensure pedagogically aligned behavior in educational RL, at least in the simulated environment studied here. More broadly, the paper positions pedagogical safety as an important research problem at the intersection of AI safety and intelligent educational systems.

cross Transmission Neural Networks: Inhibitory and Excitatory Connections

Authors: Shuang Gao, Peter E. Caines

Abstract: This paper extends the Transmission Neural Network model proposed by Gao and Caines in [1]-[3] to incorporate inhibitory connections and neurotransmitter populations. The extended network model contains binary neuronal states, transmission dynamics, and inhibitory and excitatory connections. Under technical assumptions, we establish the characterization of the firing probabilities of neurons, and show that such a characterization considering inhibitions can be equivalently represented by a neural network where each neuron has a continuous state of dimension 2. Moreover, we incorporated neurotransmitter populations into the modeling and establish the limit network model when the number of neurotransmitters at all synaptic connections go to infinity. Finally, sufficient conditions for stability and contraction properties of the limit network model are established.

cross Combee: Scaling Prompt Learning for Self-Improving Language Model Agents

Authors: Hanchen Li, Runyuan He, Qizheng Zhang, Changxiu Ji, Qiuyang Mang, Xiaokun Chen, Lakshya A Agrawal, Wei-Liang Liao, Eric Yang, Alvin Cheung, James Zou, Kunle Olukotun, Ion Stoica, Joseph E. Gonzalez

Abstract: Recent advances in prompt learning allow large language model agents to acquire task-relevant knowledge from inference-time context without parameter changes. For example, existing methods (like ACE or GEPA) can learn system prompts to improve accuracy based on previous agent runs. However, these methods primarily focus on single-agent or low-parallelism settings. This fundamentally limits their ability to efficiently learn from a large set of collected agentic traces. It would be efficient and beneficial to run prompt learning in parallel to accommodate the growing trend of learning from many agentic traces or parallel agent executions. Yet without a principled strategy for scaling, current methods suffer from quality degradation with high parallelism. To improve both the efficiency and quality of prompt learning, we propose Combee, a novel framework to scale parallel prompt learning for self-improving agents. Combee speeds up learning and enables running many agents in parallel while learning from their aggregate traces without quality degradation. To achieve this, Combee leverages parallel scans and employs an augmented shuffle mechanism; Combee also introduces a dynamic batch size controller to balance quality and delay. Evaluations on AppWorld, Terminal-Bench, Formula, and FiNER demonstrate that Combee achieves up to 17x speedup over previous methods with comparable or better accuracy and equivalent cost.

cross MC-CPO: Mastery-Conditioned Constrained Policy Optimization

Authors: Oluseyi Olukola, Nick Rahimi

Abstract: Engagement-optimized adaptive tutoring systems may prioritize short-term behavioral signals over sustained learning outcomes, creating structural incentives for reward hacking in reinforcement learning policies. We formalize this challenge as a constrained Markov decision process (CMDP) with mastery-conditioned feasibility, in which pedagogical safety constraints dynamically restrict admissible actions according to learner mastery and prerequisite structure. We introduce Mastery-Conditioned Constrained Policy Optimization (MC-CPO), a two-timescale primal-dual algorithm that integrates structural action masking with constrained policy optimization. In the tabular regime, we establish feasibility preservation and convergence to stationary feasible points under standard stochastic approximation conditions and derive a safety gap result showing that optimization within the mastery-conditioned feasible set can strictly dominate post-hoc filtering under identical safety budgets. Empirical validation is conducted in minimal and extended tabular environments and in a neural tutoring setting. Across 10 random seeds and one million training steps in the neural regime, MC-CPO satisfies constraint budgets within tolerance, reduces discounted safety costs relative to unconstrained and reward-shaped baselines, and substantially lowers the Reward Hacking Severity Index (RHSI). These results indicate that embedding pedagogical structure directly into the feasible action space provides a principled foundation for mitigating reward hacking in instructional reinforcement learning systems.

cross Avoiding Non-Integrable Beliefs in Expectation Propagation

Authors: Zilu Zhao, Jichao Chen, Dirk Slock

Abstract: Expectation Propagation (EP) is a widely used iterative message-passing algorithm that decomposes a global inference problem into multiple local ones. It approximates marginal distributions as ``beliefs'' using intermediate functions called ``messages''. It has been shown that the stationary points of EP are the same as corresponding constrained Bethe Free Energy (BFE) optimization problem. Therefore, EP is an iterative method of optimizing the constrained BFE. However, the iterative method may fall out of the feasible set of the BFE optimization problem, i.e., the beliefs are not integrable. In most literature, the authors use various methods to keep all the messages integrable. In most Bayesian estimation problems, limiting the messages to be integrable shrinks the actual feasible set. Furthermore, in extreme cases where the factors are not integrable, making the message itself integrable is not enough to have integrable beliefs. In this paper, two EP frameworks are proposed to ensure that EP has integrable beliefs. Both of the methods allows non-integrable messages. We then investigate the signal recovery problem in Generalized Linear Model (GLM) using our proposed methods.

cross Beyond Fluency: Toward Reliable Trajectories in Agentic IR

Authors: Anushree Sinha, Srivaths Ranganathan, Debanshu Das, Abhishek Dharmaratnakar

Abstract: Information Retrieval is shifting from passive document ranking toward autonomous agentic workflows that operate in multi-step Reason-Act-Observe loops. In such long-horizon trajectories, minor early errors can cascade, leading to functional misalignment between internal reasoning and external tool execution despite continued linguistic fluency. This position paper synthesizes failure modes observed in industrial agentic systems, categorizing errors across planning, retrieval, reasoning, and execution. We argue that safe deployment requires moving beyond endpoint accuracy toward trajectory integrity and causal attribution. To address compounding error and deceptive fluency, we propose verification gates at each interaction unit and advocate systematic abstention under calibrated uncertainty. Reliable Agentic IR systems must prioritize process correctness and grounded execution over plausible but unverified completion.

cross A Logical-Rule Autoencoder for Interpretable Recommendations

Authors: Jinhao Pan, Bowen Wei, Ziwei Zhu

Abstract: Most deep learning recommendation models operate as black boxes, relying on latent representations that obscure their decision process. This lack of intrinsic interpretability raises concerns in applications that require transparency and accountability. In this work, we propose a Logical-rule Interpretable Autoencoder (LIA) for collaborative filtering that is interpretable by design. LIA introduces a learnable logical rule layer in which each rule neuron is equipped with a gate parameter that automatically selects between AND and OR operators during training, enabling the model to discover diverse logical patterns directly from data. To support functional completeness without doubling the input dimensionality, LIA encodes negation through the sign of connection weights, providing a parameter-efficient mechanism for expressing both positive and negated item conditions within each rule. By learning explicit, human-readable reconstruction rules, LIA allows users to directly trace the decision process behind each recommendation. Extensive experiments show that our method achieves improved recommendation performance over traditional baselines while remaining fully interpretable. Code and data are available at https://github.com/weibowen555/LIA.

URLs: https://github.com/weibowen555/LIA.

cross A Family of Open Time-Series Foundation Models for the Radio Access Network

Authors: Ioannis Panitsas, Leandros Tassiulas

Abstract: The Radio Access Network (RAN) is evolving into a programmable and disaggregated infrastructure that increasingly relies on AI-native algorithms for optimization and closed-loop control. However, current RAN intelligence is still largely built from task-specific models tailored to individual functions, resulting in model fragmentation, limited knowledge sharing across tasks, poor generalization, and increased system complexity. To address these limitations, we introduce TimeRAN, a unified multi-task learning framework for time-series modeling in the RAN. TimeRAN leverages a lightweight time-series foundation model with few task-specific heads to learn transferable representations that can be efficiently adapted across diverse tasks with limited supervision. To enable large-scale pretraining, we further curate and open-source TimeRAN DataPile, the largest time-series corpus for RAN analytics to date, comprising over 355K time series and 0.56B measurements across diverse telemetry sources, protocol layers, and deployment scenarios. We evaluate TimeRAN across a comprehensive set of RAN analytics tasks, including anomaly detection, classification, forecasting, and imputation, and show that it achieves state-of-the-art performance with minimal or no task-specific fine-tuning. Finally, we integrate TimeRAN into a proof-of-concept 5G testbed and demonstrate that it operates efficiently with limited resource requirements in real-world scenarios.

cross High-Stakes Personalization: Rethinking LLM Customization for Individual Investor Decision-Making

Authors: Yash Ganpat Sawant

Abstract: Personalized LLM systems have advanced rapidly, yet most operate in domains where user preferences are stable and ground truth is either absent or subjective. We argue that individual investor decision-making presents a uniquely challenging domain for LLM personalization - one that exposes fundamental limitations in current customization paradigms. Drawing on our system, built and deployed for AI-augmented portfolio management, we identify four axes along which individual investing exposes fundamental limitations in standard LLM customization: (1) behavioral memory complexity, where investor patterns are temporally evolving, self-contradictory, and financially consequential; (2) thesis consistency under drift, where maintaining coherent investment rationale over weeks or months strains stateless and session-bounded architectures; (3) style-signal tension, where the system must simultaneously respect personal investment philosophy and surface objective evidence that may contradict it; and (4) alignment without ground truth, where personalization quality cannot be evaluated against a fixed label set because outcomes are stochastic and delayed. We describe the architectural responses that emerged from building the system and propose open research directions for personalized NLP in high-stakes, temporally extended decision domains.

cross CavMerge: Merging K-means Based on Local Log-Concavity

Authors: Zhili Qiao, Wangqian Ju, Peng Liu

Abstract: K-means clustering, a classic and widely-used clustering technique, is known to exhibit suboptimal performance when applied to non-linearly separable data. Numerous adjustments and modifications have been proposed to address this issue, including methods that merge K-means results from a relatively large K to obtain a final cluster assignment. However, existing methods of this nature often encounter computational inefficiencies and suffer from hyperparameter tuning. Here we present \emph{CavMerge}, a novel K-means merging algorithm that is intuitive, free of parameter tuning, and computationally efficient. Operating under minimal local distributional assumptions, our algorithm demonstrates strong consistency and rapid convergence guarantees. Empirical studies on various simulated and real datasets demonstrate that our method yields more reliable clusters in comparison to current state-of-the-art algorithms.

cross Out-of-Air Computation: Enabling Structured Extraction from Wireless Superposition

Authors: Seyed Mohammad Azimi-Abarghouyi

Abstract: Over-the-air computation (AirComp) has traditionally been built on the principle of pre-embedding computation into transmitted waveforms or on exploiting massive antenna arrays, often requiring the wireless multiple-access channel (MAC) to operate under conditions that approximate an ideal computational medium. This paper introduces a new computation framework, termed out-of-air computation (AirCPU), which establishes a joint source-channel coding foundation in which computation is not embedded before transmission but is instead extracted from the wireless superposition by exploiting structured coding. AirCPU operates directly on continuous-valued device data, avoiding the need for a separate source quantization stage, and employs a multi-layer nested lattice architecture that enables progressive resolution by decomposing each input into hierarchically scaled components, all transmitted over a common bounded digital constellation under a fixed power constraint. We formalize the notion of decoupled resolution, showing that in operating regimes where the decoding error probability is sufficiently small, the impact of channel noise and finite constellation constraints on distortion becomes negligible, and the resulting computation error is primarily determined by the target resolution set by the finest lattice. For fading MACs, we further introduce collective and successive computation mechanisms, in addition to the proposed direct computation, which exploit multiple decoded integer-coefficient functions and side-information functions as structural representations of the wireless superposition to significantly expand the reliable operating regime; in this context, we formulate and characterize the underlying reliability conditions and integer optimization problems, and develop a structured low-complexity two-group approximation to address them.

cross Effects of Generative AI Errors on User Reliance Across Task Difficulty

Authors: Jacy Reese Anthis, Hannah Cha, Solon Barocas, Alexandra Chouldechova, Jake Hofman

Abstract: The capabilities of artificial intelligence (AI) lie along a jagged frontier, where AI systems surprisingly fail on tasks that humans find easy and succeed on tasks that humans find hard. To investigate user reactions to this phenomenon, we developed an incentive-compatible experimental methodology based on diagram generation tasks, in which we induce errors in generative AI output and test effects on user reliance. We demonstrate the interface in a preregistered 3x2 experiment (N = 577) with error rates of 10%, 30%, or 50% on easier or harder diagram generation tasks. We confirmed that observing more errors reduces use, but we unexpectedly found that easy-task errors did not significantly reduce use more than hard-task errors, suggesting that people are not averse to jaggedness in this experimental setting. We encourage future work that varies task difficulty at the same time as other features of AI errors, such as whether the jagged error patterns are easily learned.

cross Minimising Willmore Energy via Neural Flow

Authors: Edward Hirst, Henrique N. S\'a Earp, Tom\'as S. R. Silva

Abstract: The neural Willmore flow of a closed oriented $2$-surface in $\mathbb{R}^3$ is introduced as a natural evolution process to minimise the Willmore energy, which is the squared $L^2$-norm of mean curvature. Neural architectures are used to model maps from topological $2d$ domains to $3d$ Euclidean space, where the learning process minimises a PINN-style loss for the Willmore energy as a functional on the embedding. Training reproduces the expected round sphere for genus $0$ surfaces, and the Clifford torus for genus $1$ surfaces, respectively. Furthermore, the experiment in the genus $2$ case provides a novel approach to search for minimal Willmore surfaces in this open problem.

cross Soft Tournament Equilibrium

Authors: Saad Alqithami

Abstract: The evaluation of general-purpose artificial agents, particularly those based on large language models, presents a significant challenge due to the non-transitive nature of their interactions. When agent A defeats B, B defeats C, and C defeats A, traditional ranking methods that force a linear ordering can be misleading and unstable. We argue that for such cyclic domains, the fundamental object of evaluation should not be a ranking but a set-valued core, as conceptualized in classical tournament theory. This paper introduces Soft Tournament Equilibrium (STE), a differentiable framework for learning and computing set-valued tournament solutions directly from pairwise comparison data. STE first learns a probabilistic tournament model, potentially conditioned on rich contextual information. It then employs novel, differentiable operators for soft reachability and soft covering to compute continuous analogues of two seminal tournament solutions: the Top Cycle and the Uncovered Set. The output is a set of core agents, each with a calibrated membership score, providing a nuanced and robust assessment of agent capabilities. We develop the theoretical foundation for STE to prove its consistency with classical solutions in the zero-temperature limit, which establishes its Condorcet-inclusion properties, and analyzing its stability and sample complexity. We specify an experimental protocol for validating STE on both synthetic and real-world benchmarks. This work aims to provide a complete, standalone treatise that re-centers general-agent evaluation on a more appropriate and robust theoretical foundation, moving from unstable rankings to stable, set-valued equilibria.

cross Thermodynamic-Inspired Explainable GeoAI: Uncovering Regime-Dependent Mechanisms in Heterogeneous Spatial Systems

Authors: Sooyoung Lim, Zhenlong Li, Zi-Kui Liu

Abstract: Modeling spatial heterogeneity and associated critical transitions remains a fundamental challenge in geography and environmental science. While conventional Geographically Weighted Regression (GWR) and deep learning models have improved predictive skill, they often fail to elucidate state-dependent nonlinearities where the functional roles of drivers represent opposing effects across heterogeneous domains. We introduce a thermodynamics-inspired explainable geospatial AI framework that integrates statistical mechanics with graph neural networks. By conceptualizing spatial variability as a thermodynamic competition between system Burden (E) and Capacity (S), our model disentangles the latent mechanisms driving spatial processes. Using three simulation datasets and three real-word datasets across distinct domains (housing markets, mental health prevalence, and wildfire-induced PM2.5 anomalies), we show that the new framework successfully identifies regime-dependent role reversals of predictors that standard baselines miss. Notably, the framework explicitly diagnoses the phase transition into a Burden-dominated regime during the 2023 Canadian wildfire event, distinguishing physical mechanism shifts from statistical outliers. These findings demonstrate that thermodynamic constraints can improve the interpretability of GeoAI while preserving strong predictive performance in complex spatial systems.

cross Adversarial Robustness Analysis of Cloud-Assisted Autonomous Driving Systems

Authors: Maher Al Islam, Amr S. El-Wakeel

Abstract: Autonomous vehicles increasingly rely on deep learning-based perception and control, which impose substantial computational demands. Cloud-assisted architectures offload these functions to remote servers, enabling enhanced perception and coordinated decision-making through the Internet of Vehicles (IoV). However, this paradigm introduces cross-layer vulnerabilities, where adversarial manipulation of perception models and network impairments in the vehicle-cloud link can jointly undermine safety-critical autonomy. This paper presents a hardware-in-the-loop IoV testbed that integrates real-time perception, control, and communication to evaluate such vulnerabilities in cloud-assisted autonomous driving. A YOLOv8-based object detector deployed on the cloud is subjected to whitebox adversarial attacks using the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD), while network adversaries induce delay and packet loss in the vehicle-cloud loop. Results show that adversarial perturbations significantly degrade perception performance, with PGD reducing detection precision and recall from 0.73 and 0.68 in the clean baseline to 0.22 and 0.15 at epsilon= 0.04. Network delays of 150-250 ms, corresponding to transient losses of approximately 3-4 frames, and packet loss rates of 0.5-5 % further destabilize closed-loop control, leading to delayed actuation and rule violations. These findings highlight the need for cross-layer resilience in cloud-assisted autonomous driving systems.

cross REAM: Merging Improves Pruning of Experts in LLMs

Authors: Saurav Jha, Maryam Hashemzadeh, Ali Saheb Pasand, Ali Parviz, Min-Joong Lee, Boris Knyazev

Abstract: Mixture-of-Experts (MoE) large language models (LLMs) are among the top-performing architectures. The largest models, often with hundreds of billions of parameters, pose significant memory challenges for deployment. Traditional approaches to reduce memory requirements include weight pruning and quantization. Motivated by the Router-weighted Expert Activation Pruning (REAP) that prunes experts, we propose a novel method, Router-weighted Expert Activation Merging (REAM). Instead of removing experts, REAM groups them and merges their weights, better preserving original performance. We evaluate REAM against REAP and other baselines across multiple MoE LLMs on diverse multiple-choice (MC) question answering and generative (GEN) benchmarks. Our results reveal a trade-off between MC and GEN performance that depends on the mix of calibration data. By controlling the mix of general, math and coding data, we examine the Pareto frontier of this trade-off and show that REAM often outperforms the baselines and in many cases is comparable to the original uncompressed models.

cross Integer-Only Operations on Extreme Learning Machine Test Time Classification

Authors: Emerson Lopes Machadoa, Cristiano Jacques Miosso, Ricardo Pezzuol Jacobi

Abstract: We present a theoretical analysis and empirical evaluations of a novel set of techniques for computational cost reduction of test time operations of network classifiers based on extreme learning machine (ELM). By exploring some characteristics we derived from these models, we show that the classification at test time can be performed using solely integer operations without compromising the classification accuracy. Our contributions are as follows: (i) We show empirical evidence that the input weights values can be drawn from the ternary set with limited reduction of the classification accuracy. This has the computational advantage of dismissing multiplications; (ii) We prove the classification accuracy of normalized and non-normalized test signals are the same; (iii) We show how to create an integer version of the output weights that results in a limited reduction of the classification accuracy. We tested our techniques on 5 computer vision datasets commonly used in the literature and the results indicate that our techniques can allow the reduction of the computational cost of the operations necessary for the classification at test time in FPGAs. This is important in embedded applications, where power consumption is limited, and crucial in data centers of large corporations, where power consumption is expensive.

cross Decocted Experience Improves Test-Time Inference in LLM Agents

Authors: Maohao Shen, Kaiwen Zha, Zexue He, Zhang-Wei Hong, Siru Ouyang, J. Jon Ryu, Prasanna Sattigeri, Suhas Diggavi, Gregory Wornell

Abstract: There is growing interest in improving LLMs without updating model parameters. One well-established direction is test-time scaling, where increased inference-time computation (e.g., longer reasoning, sampling, or search) is used to improve performance. However, for complex reasoning and agentic tasks, naively scaling test-time compute can substantially increase cost and still lead to wasted budget on suboptimal exploration. In this paper, we explore \emph{context} as a complementary scaling axis for improving LLM performance, and systematically study how to construct better inputs that guide reasoning through \emph{experience}. We show that effective context construction critically depends on \emph{decocted experience}. We present a detailed analysis of experience-augmented agents, studying how to derive context from experience, how performance scales with accumulated experience, what characterizes good context, and which data structures best support context construction. We identify \emph{decocted experience} as a key mechanism for effective context construction: extracting essence from experience, organizing it coherently, and retrieving salient information to build effective context. We validate our findings across reasoning and agentic tasks, including math reasoning, web browsing, and software engineering.

cross How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models

Authors: Gregory N. Frank

Abstract: This paper identifies a recurring sparse routing mechanism in alignment-trained language models: a gate attention head reads detected content and triggers downstream amplifier heads that boost the signal toward refusal. Using political censorship and safety refusal as natural experiments, the mechanism is traced across 9 models from 6 labs, all validated on corpora of 120 prompt pairs. The gate head passes necessity and sufficiency interchange tests (p < 0.001, permutation null), and core amplifier heads are stable under bootstrap resampling (Jaccard 0.92-1.0). Three same-generation scaling pairs show that routing distributes at scale (ablation up to 17x weaker) while remaining detectable by interchange. Modulating the detection-layer signal continuously controls policy strength from hard refusal through steering to factual compliance, with routing thresholds that vary by topic. The circuit also reveals a structural separation between intent recognition and policy routing: under cipher encoding, the gate head's interchange necessity collapses 70-99% across three models (n=120), and the model responds with puzzle-solving rather than refusal. The routing mechanism never fires, even though probe scores at deeper layers indicate the model begins to represent the harmful content. This asymmetry is consistent with different robustness properties of pretraining and post-training: broad semantic understanding versus narrower policy binding that generalizes less well under input transformation.

cross Gradual Cognitive Externalization: From Modeling Cognition to Constituting It

Authors: Zhimin Zhao

Abstract: Developers are publishing AI agent skills that replicate a colleague's communication style, encode a supervisor's mentoring heuristics, or preserve a person's behavioral repertoire beyond biological death. To explain why, we propose Gradual Cognitive Externalization (GCE), a framework arguing that ambient AI systems, through sustained causal coupling with users, transition from modeling cognitive functions to constituting part of users' cognitive architectures. GCE adopts an explicit functionalist commitment: cognitive functions are individuated by their causal-functional roles, not by substrate. The framework rests on the behavioral manifold hypothesis and a central falsifiable assumption, the no behaviorally invisible residual (NBIR) hypothesis: for any cognitive function whose behavioral output lies on a learnable manifold, no behaviorally invisible component is necessary for that function's operation. We document evidence from deployed AI systems showing that externalization preconditions are already observable, formalize three criteria separating cognitive integration from tool use (bidirectional adaptation, functional equivalence, causal coupling), and derive five testable predictions with theory-constrained thresholds.

cross ReinVBC: A Model-based Reinforcement Learning Approach to Vehicle Braking Controller

Authors: Haoxin Lin, Junjie Zhou, Daheng Xu, Yang Yu

Abstract: Braking system, the key module to ensure the safety and steer-ability of current vehicles, relies on extensive manual calibration during production. Reducing labor and time consumption while maintaining the Vehicle Braking Controller (VBC) performance greatly benefits the vehicle industry. Model-based methods in offline reinforcement learning, which facilitate policy exploration within a data-driven dynamics model, offer a promising solution for addressing real-world control tasks. This work proposes ReinVBC, which applies an offline model-based reinforcement learning approach to deal with the vehicle braking control problem. We introduce useful engineering designs into the paradigm of model learning and utilization to obtain a reliable vehicle dynamics model and a capable braking policy. Several results demonstrate the capability of our method in real-world vehicle braking and its potential to replace the production-grade anti-lock braking system.

cross NAIMA: Semantics Aware RGB Guided Depth Super-Resolution

Authors: Tayyab Nasir, Daochang Liu, Ajmal Mian

Abstract: Guided depth super-resolution (GDSR) is a multi-modal approach for depth map super-resolution that relies on a low-resolution depth map and a high-resolution RGB image to restore finer structural details. However, the misleading color and texture cues indicating depth discontinuities in RGB images often lead to artifacts and blurred depth boundaries in the generated depth map. We propose a solution that introduces global contextual semantic priors, generated from pretrained vision transformer token embeddings. Our approach to distilling semantic knowledge from pretrained token embeddings is motivated by their demonstrated effectiveness in related monocular depth estimation tasks. We introduce a Guided Token Attention (GTA) module, which iteratively aligns encoded RGB spatial features with depth encodings, using cross-attention for selectively injecting global semantic context extracted from different layers of a pretrained vision transformer. Additionally, we present an architecture called Neural Attention for Implicit Multi-token Alignment (NAIMA), which integrates DINOv2 with GTA blocks for a semantics-aware GDSR. Our proposed architecture, with its ability to distill semantic knowledge, achieves significant improvements over existing methods across multiple scaling factors and datasets.

cross Eliminating Vendor Lock-In in Quantum Machine Learning via Framework-Agnostic Neural Networks

Authors: Poornima Kumaresan, Shwetha Singaravelu, Lakshmi Rajendran, Santhosh Sivasubramani

Abstract: Quantum machine learning (QML) stands at the intersection of quantum computing and artificial intelligence, offering the potential to solve problems that remain intractable for classical methods. However, the current landscape of QML software frameworks suffers from severe fragmentation: models developed in TensorFlow Quantum cannot execute on PennyLane backends, circuits authored in Qiskit Machine Learning cannot be deployed to Amazon Braket hardware, and researchers who invest in one ecosystem face prohibitive switching costs when migrating to another. This vendor lock-in impedes reproducibility, limits hardware access, and slows the pace of scientific discovery. In this paper, we present a framework-agnostic quantum neural network (QNN) architecture that abstracts away vendor-specific interfaces through a unified computational graph, a hardware abstraction layer (HAL), and a multi-framework export pipeline. The core architecture supports simultaneous integration with TensorFlow, PyTorch, and JAX as classical co-processors, while the HAL provides transparent access to IBM Quantum, Amazon Braket, Azure Quantum, IonQ, and Rigetti backends through a single application programming interface (API). We introduce three pluggable data encoding strategies (amplitude, angle, and instantaneous quantum polynomial encoding) that are compatible with all supported backends. An export module leveraging Open Neural Network Exchange (ONNX) metadata enables lossless circuit translation across Qiskit, Cirq, PennyLane, and Braket representations. We benchmark our framework on the Iris, Wine, and MNIST-4 classification tasks, demonstrating training time parity (within 8\% overhead) compared to native framework implementations, while achieving identical classification accuracy.

cross Explainable Autonomous Cyber Defense using Adversarial Multi-Agent Reinforcement Learning

Authors: Yiyao Zhang, Diksha Goel, Hussain Ahmad

Abstract: Autonomous agents are increasingly deployed in both offensive and defensive cyber operations, creating high-speed, closed-loop interactions in critical infrastructure environments. Advanced Persistent Threat (APT) actors exploit "Living off the Land" techniques and targeted telemetry perturbations to induce ambiguity in monitoring systems, causing automated defenses to overreact or misclassify benign behavior as malicious activity. Existing monolithic and multi-agent defense pipelines largely operate on correlation-based signals, lack structural constraints on response actions, and are vulnerable to reasoning drift under ambiguous or adversarial inputs. We present the Causal Multi-Agent Decision Framework (C-MADF), a structurally constrained architecture for autonomous cyber defense that integrates causal modeling with adversarial dual-policy control. C-MADF first learns a Structural Causal Model (SCM) from historical telemetry and compiles it into an investigation-level Directed Acyclic Graph (DAG) that defines admissible response transitions. This roadmap is formalized as a Markov Decision Process (MDP) whose action space is explicitly restricted to causally consistent transitions. Decision-making within this constrained space is performed by a dual-agent reinforcement learning system in which a threat-optimizing Blue-Team policy is counterbalanced by a conservatively shaped Red-Team policy. Inter-policy disagreement is quantified through a Policy Divergence Score and exposed via a human-in-the-loop interface equipped with an Explainability-Transparency Score that serves as an escalation signal under uncertainty. On the real-world CICIoT2023 dataset, C-MADF reduces the false-positive rate from 11.2%, 9.7%, and 8.4% in three cutting-edge literature baselines to 1.8%, while achieving 0.997 precision, 0.961 recall, and 0.979 F1-score.

cross Generative modeling of granular flow on inclined planes using conditional flow matching

Authors: Xuyang Li, Rui Li, Teng Man, Yimin Lu

Abstract: Granular flows govern many natural and industrial processes, yet their interior kinematics and mechanics remain largely unobservable, as experiments access only boundaries or free surfaces. Conventional numerical simulations are computationally expensive for fast inverse reconstruction, and deterministic models tend to collapse to over-smoothed mean predictions in ill-posed settings. This study, to the best of the authors' knowledge, presents the first conditional flow matching (CFM) framework for granular-flow reconstruction from sparse boundary observations. Trained on high-fidelity particle-resolved discrete element simulations, the generative model is guided at inference by a differentiable forward operator with a sparsity-aware gradient guidance mechanism, which enforces measurement consistency without hyperparameter tuning and prevents unphysical velocity predictions in non-material regions. A physics decoder maps the reconstructed velocity fields to stress states and energy fluctuation quantities, including mean stress, deviatoric stress, and granular temperature. The framework accurately recovers interior flow fields from full observation to only 16% of the informative window, and it remains effective under strongly diluted spatial resolution with only 11% of data. It also outperforms a deterministic CNN baseline in the most ill-posed reconstruction regime and provides spatially resolved uncertainty estimates through ensemble generation. These results demonstrate that conditional generative modeling offers a practical route for non-invasive inference of hidden bulk mechanics in granular media, with broader applicability for inverse problems in particulate and multiphase systems.

cross Empirical Characterization of Rationale Stability Under Controlled Perturbations for Explainable Pattern Recognition

Authors: Abu Noman Md Sakib, Zhensen Wang, Merjulah Roby, Zijie Zhang

Abstract: Reliable pattern recognition systems should exhibit consistent behavior across similar inputs, and their explanations should remain stable. However, most Explainable AI evaluations remain instance centric and do not explicitly quantify whether attribution patterns are consistent across samples that share the same class or represent small variations of the same input. In this work, we propose a novel metric aimed at assessing the consistency of model explanations, ensuring that models consistently reflect the intended objectives and consistency under label-preserving perturbations. We implement this metric using a pre-trained BERT model on the SST-2 sentiment analysis dataset, with additional robustness tests on RoBERTa, DistilBERT, and IMDB, applying SHAP to compute feature importance for various test samples. The proposed metric quantifies the cosine similarity of SHAP values for inputs with the same label, aiming to detect inconsistent behaviors, such as biased reliance on certain features or failure to maintain consistent reasoning for similar predictions. Through a series of experiments, we evaluate the ability of this metric to identify misaligned predictions and inconsistencies in model explanations. These experiments are compared against standard fidelity metrics to assess whether the new metric can effectively identify when a model's behavior deviates from its intended objectives. The proposed framework provides a deeper understanding of model behavior by enabling more robust verification of rationale stability, which is critical for building trustworthy AI systems. By quantifying whether models rely on consistent attribution patterns for similar inputs, the proposed approach supports more robust evaluation of model behavior in practical pattern recognition pipelines. Our code is publicly available at https://github.com/anmspro/ESS-XAI-Stability.

URLs: https://github.com/anmspro/ESS-XAI-Stability.

cross A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models

Authors: Tianmeng Fang, Yong Wang, Zetai Kong, Zengzhen Su, Jun Wang, Chengjin Yu, Wei Wang

Abstract: Multimodal large language models have become an important infrastructure for unified processing of visual and linguistic tasks. However, such models are highly susceptible to backdoor implantation during supervised fine-tuning and will steadily output the attacker's predefined harmful responses once a specific trigger pattern is activated. The core challenge of backdoor defense lies in suppressing attack success under low poisoning ratios while preserving the model's normal generation ability. These two objectives are inherently conflicting. Strong suppression often degrades benign performance, whereas weak regularization fails to mitigate backdoor behaviors. To this end, we propose a unified defense framework based on patch augmentation and cross-view regularity, which simultaneously constrains the model's anomalous behaviors in response to triggered patterns from both the feature representation and output distribution levels. Specifically, patch-level data augmentation is combined with cross-view output difference regularization to exploit the fact that backdoor responses are abnormally invariant to non-semantic perturbations and to proactively pull apart the output distributions of the original and perturbed views, thereby significantly suppressing the success rate of backdoor triggering. At the same time, we avoid over-suppression of the model during defense by imposing output entropy constraints, ensuring the quality of normal command generation. Experimental results across three models, two tasks, and six attacks show that our proposed defense method effectively reduces the attack success rate while maintaining a high level of normal text generation capability. Our work enables the secure, controlled deployment of large-scale multimodal models in realistic low-frequency poisoning and covert triggering scenarios.

cross SLSREC: Self-Supervised Contrastive Learning for Adaptive Fusion of Long- and Short-Term User Interests

Authors: Wei Zhou, Yue Shen, Junkai Ji, Yinglan Feng, Xing Tang, Xiuqiang He, Liang Feng, Zexuan Zhu

Abstract: User interests typically encompass both long-term preferences and short-term intentions, reflecting the dynamic nature of user behaviors across different timeframes. The uneven temporal distribution of user interactions highlights the evolving patterns of interests, making it challenging to accurately capture shifts in interests using comprehensive historical behaviors. To address this, we propose SLSRec, a novel Session-based model with the fusion of Long- and Short-term Recommendations that effectively captures the temporal dynamics of user interests by segmenting historical behaviors over time. Unlike conventional models that combine long- and short-term user interests into a single representation, compromising recommendation accuracy, SLSRec utilizes a self-supervised learning framework to disentangle these two types of interests. A contrastive learning strategy is introduced to ensure accurate calibration of long- and short-term interest representations. Additionally, an attention-based fusion network is designed to adaptively aggregate interest representations, optimizing their integration to enhance recommendation performance. Extensive experiments on three public benchmark datasets demonstrate that SLSRec consistently outperforms state-of-the-art models while exhibiting superior robustness across various scenarios.We will release all source code upon acceptance.

cross Safe and Near-Optimal Gate Control: A Case Study from the Danish West Coast

Authors: Martin Kristjansen (Aalborg University), Kim Guldstrand Larsen (Aalborg University), Marius Miku\v{c}ionis (Aalborg University), Christian Schilling (Aalborg University)

Abstract: Ringkoebing Fjord is an inland water basin on the Danish west coast separated from the North Sea by a set of gates used to control the amount of water entering and leaving the fjord. Currently, human operators decide when and how many gates to open or close for controlling the fjord's water level, with the goal to satisfy a range of conflicting safety and performance requirements such as keeping the water level in a target range, allowing maritime traffic, and enabling fish migration. Uppaal Stratego. We then use this digital twin along with forecasts of the sea level and the wind speed to learn a gate controller in an online fashion. We evaluate the learned controllers under different sea-level scenarios, representing normal tidal behavior, high waters, and low waters. Our evaluation demonstrates that, unlike a baseline controller, the learned controllers satisfy the safety requirements, while performing similarly regarding the other requirements.

cross Generative Modeling under Non-Monotonic MAR Missingness via Approximate Wasserstein Gradient Flows

Authors: Gitte Kremling, Jeffrey N\"af, Johannes Lederer

Abstract: The prevalence of missing values in data science poses a substantial risk to any further analyses. Despite a wealth of research, principled nonparametric methods to deal with general non-monotone missingness are still scarce. Instead, ad-hoc imputation methods are often used, for which it remains unclear whether the correct distribution can be recovered. In this paper, we propose FLOWGEM, a principled iterative method for generating a complete dataset from a dataset with values Missing at Random (MAR). Motivated by convergence results of the ignoring maximum likelihood estimator, our approach minimizes the expected Kullback-Leibler (KL) divergence between the observed data distribution and the distribution of the generated sample over different missingness patterns. To minimize the KL divergence, we employ a discretized particle evolution of the corresponding Wasserstein Gradient Flow, where the velocity field is approximated using a local linear estimator of the density ratio. This construction yields a data generation scheme that iteratively transports an initial particle ensemble toward the target distribution. Simulation studies and real-data benchmarks demonstrate that FLOWGEM achieves state-of-the-art performance across a range of settings, including the challenging case of non-monotonic MAR mechanisms. Together, these results position FLOWGEM as a principled and practical alternative to existing imputation methods, and a decisive step towards closing the gap between theoretical rigor and empirical performance.

cross SAIL: Scene-aware Adaptive Iterative Learning for Long-Tail Trajectory Prediction in Autonomous Vehicles

Authors: Bin Rao, Haicheng Liao, Chengyue Wang, Keqiang Li, Zhenning Li, Hai Yang

Abstract: Autonomous vehicles (AVs) rely on accurate trajectory prediction for safe navigation in diverse traffic environments, yet existing models struggle with long-tail scenarios-rare but safety-critical events characterized by abrupt maneuvers, high collision risks, and complex interactions. These challenges stem from data imbalance, inadequate definitions of long-tail trajectories, and suboptimal learning strategies that prioritize common behaviors over infrequent ones. To address this, we propose SAIL, a novel framework that systematically tackles the long-tail problem by first defining and modeling trajectories across three key attribute dimensions: prediction error, collision risk, and state complexity. Our approach then synergizes an attribute-guided augmentation and feature extraction process with a highly adaptive contrastive learning strategy. This strategy employs a continuous cosine momentum schedule, similarity-weighted hard-negative mining, and a dynamic pseudo-labeling mechanism based on evolving feature clustering. Furthermore, it incorporates a focusing mechanism to intensify learning on hard-positive samples within each identified class. This comprehensive design enables SAIL to excel at identifying and forecasting diverse and challenging long-tail events. Extensive evaluations on the nuScenes and ETH/UCY datasets demonstrate SAIL's superior performance, achieving up to 28.8% reduction in prediction error on the hardest 1% of long-tail samples compared to state-of-the-art baselines, while maintaining competitive accuracy across all scenarios. This framework advances reliable AV trajectory prediction in real-world, mixed-autonomy settings.

cross Noisy Nonreciprocal Pairwise Comparisons: Scale Variation, Noise Calibration, and Admissible Ranking Regions

Authors: Jean-Pierre Magnot

Abstract: Pairwise comparisons are widely used in decision analysis, preference modeling, and evaluation problems. In many practical situations, the observed comparison matrix is not reciprocal. This lack of reciprocity is often treated as a defect to be corrected immediately. In this article, we adopt a different point of view: part of the nonreciprocity may reflect a genuine variation in the evaluation scale, while another part is due to random perturbations. We introduce an additive model in which the unknown underlying comparison matrix is consistent but not necessarily reciprocal. The reciprocal component carries the global ranking information, whereas the symmetric component describes possible scale variation. Around this structured matrix, we add a random perturbation and show how to estimate the noise level, assess whether the scale variation remains moderate, and assign probabilities to admissible ranking regions in the sense of strict ranking by pairwise comparisons. We also compare this approach with the brutal projection onto reciprocal matrices, which suppresses all symmetric information at once. The Gaussian perturbation model is used here not because human decisions are exactly Gaussian, but because observed judgment errors often result from the accumulation of many small effects. In such a context, the central limit principle provides a natural heuristic justification for Gaussian noise. This makes it possible to derive explicit estimators and probability assessments while keeping the model interpretable for decision problems.

cross Greedy and Transformer-Based Multi-Port Selection for Slow Fluid Antenna Multiple Access

Authors: Darian Perez-Adan, Jose P. Gonzalez-Coma, F. Javier Lopez-Martinez, Luis Castedo

Abstract: We address the port-selection problem in fluid antenna multiple access (FAMA) systems with multi-port fluid antenna (FA) receivers. Existing methods either achieve near-optimal spectral efficiency (SE) at prohibitive computational cost or sacrifice significant performance for lower complexity. We propose two complementary strategies: (i) GFwd+S, a greedy forward-selection method with swap refinement that consistently outperforms state-of-the-art reference schemes in terms of SE, and (ii) a Transformer-based neural network trained via imitation learning followed by a Reinforce policy-gradient stage, which approaches GFwd+S performance at lower computational cost.

cross LP-GEMM: Integrating Layout Propagation into GEMM Operations

Authors: C\'esar Guedes Carneiro, Lucas Alvarenga, Guido Araujo, Sandro Rigo

Abstract: In Scientific Computing and modern Machine Learning (ML) workloads, sequences of dependent General Matrix Multiplications (GEMMs) often dominate execution time. While state-of-the-art BLAS libraries aggressively optimize individual GEMM calls, they remain constrained by the BLAS API, which requires each call to independently pack input matrices and restore outputs to a canonical memory layout. In sequential GEMMs, these constraints cause redundant packing and unpacking, wasting valuable computational resources. This paper introduces LP-GEMM, a decomposition of the GEMM kernel that enables packing-layout propagation across sequential GEMM operations. This approach eliminates unnecessary data repacking while preserving full BLAS semantic correctness at the boundaries. We evaluate LP-GEMM on x86 (AVX-512) and RISC-V (RVV 1.0) architectures across MLP-like and Attention-like workloads. Our results show average speedups of 2.25x over OpenBLAS on Intel x86 for sequential GEMMs and competitive gains relative to vendor-optimized libraries such as Intel MKL. We demonstrate the practicality of the approach beyond microbenchmarks by implementing a standalone C++ version of the Llama-3.2 inference path using exclusively BLAS-level GEMM calls. These results confirm that leveraging data layout propagation between operations can significantly boost performance.

cross Interpretation of Crystal Energy Landscapes with Kolmogorov-Arnold Networks

Authors: Gen Zu, Ning Mao, Claudia Felser, Yang Zhang

Abstract: Characterizing crystalline energy landscapes is essential to predicting thermodynamic stability, electronic structure, and functional behavior. While machine learning (ML) enables rapid property predictions, the "black-box" nature of most models limits their utility for generating new scientific insights. Here, we introduce Kolmogorov-Arnold Networks (KANs) as an interpretable framework to bridge this gap. Unlike conventional neural networks with fixed activation functions, KANs employ learnable functions that reveal underlying physical relationships. We developed the Element-Weighted KAN, a composition-only model that achieves state-of-the-art accuracy in predicting formation energy, band gap, and work function across large-scale datasets. Crucially, without any explicit physical constraints, KANs uncover interpretable chemical trends aligned with the periodic table and quantum mechanical principles through embedding analysis, correlation studies, and principal component analysis. These results demonstrate that KANs provide a powerful framework with high predictive performance and scientific interpretability, establishing a new paradigm for transparent, chemistry-based materials informatics.

cross ZeD-MAP: Bundle Adjustment Guided Zero-Shot Depth Maps for Real-Time Aerial Imaging

Authors: Selim Ahmet Iz, Francesco Nex, Norman Kerle, Henry Meissner, Ralf Berger

Abstract: Real-time depth reconstruction from ultra-high-resolution UAV imagery is essential for time-critical geospatial tasks such as disaster response, yet remains challenging due to wide-baseline parallax, large image sizes, low-texture or specular surfaces, occlusions, and strict computational constraints. Recent zero-shot diffusion models offer fast per-image dense predictions without task-specific retraining, and require fewer labelled datasets than transformer-based predictors while avoiding the rigid capture geometry requirement of classical multi-view stereo. However, their probabilistic inference prevents reliable metric accuracy and temporal consistency across sequential frames and overlapping tiles. We present ZeD-MAP, a cluster-level framework that converts a test-time diffusion depth model into a metrically consistent, SLAM-like mapping pipeline by integrating incremental cluster-based bundle adjustment (BA). Streamed UAV frames are grouped into overlapping clusters; periodic BA produces metrically consistent poses and sparse 3D tie-points, which are reprojected into selected frames and used as metric guidance for diffusion-based depth estimation. Validation on ground-marker flights captured at approximately 50 m altitude (GSD is approximately 0.85 cm/px, corresponding to 2,650 square meters ground coverage per frame) with the DLR Modular Aerial Camera System (MACS) shows that our method achieves sub-meter accuracy, with approximately 0.87 m error in the horizontal (XY) plane and 0.12 m in the vertical (Z) direction, while maintaining per-image runtimes between 1.47 and 4.91 seconds. Results are subject to minor noise from manual point-cloud annotation. These findings show that BA-based metric guidance provides consistency comparable to classical photogrammetric methods while significantly accelerating processing, enabling real-time 3D map generation.

cross Minimaxity and Admissibility of Bayesian Neural Networks

Authors: Daniel Andrew Coulson, Martin T. Wells

Abstract: Bayesian neural networks (BNNs) offer a natural probabilistic formulation for inference in deep learning models. Despite their popularity, their optimality has received limited attention through the lens of statistical decision theory. In this paper, we study decision rules induced by deep, fully connected feedforward ReLU BNNs in the normal location model under quadratic loss. We show that, for fixed prior scales, the induced Bayes decision rule is not minimax. We then propose a hyperprior on the effective output variance of the BNN prior that yields a superharmonic square-root marginal density, establishing that the resulting decision rule is simultaneously admissible and minimax. We further extend these results from the quadratic loss setting to the predictive density estimation problem with Kullback--Leibler loss. Finally, we validate our theoretical findings numerically through simulation.

cross Towards protein folding pathways by reconstructing protein residue networks with a policy-driven model

Authors: Susan Khor

Abstract: A method that reconstructs protein residue networks using suitable node selection and edge recovery policies produced numerical observations that correlate strongly (Pearson's correlation coefficient < -0.83) with published folding rates for 52 two-state folders and 21 multi-state folders; correlations are also strong at the fold-family level. These results were obtained serendipitously with the ND model, which was introduced previously, but is here extended with policies that dictate actions according to feature states. This result points to the importance of both the starting search point and the prevailing condition (random seed) for the quick success of policy search by a simple hill-climber. The two conditions, suitable policies and random seed, which (evidenced by the strong correlation statistic) setup a conducive environment for modelling protein folding within ND, could be compared to appropriate physiological conditions required by proteins to fold naturally. Of interest is an examination of the sequence of restored edges for potential as plausible protein folding pathways. Towards this end, trajectory data is collected for analysis and further model evaluation and development.

cross A Muon-Accelerated Algorithm for Low Separation Rank Tensor Generalized Linear Models

Authors: Xiao Liang, Shuang Li

Abstract: Tensor-valued data arise naturally in multidimensional signal and imaging problems, such as biomedical imaging. When incorporated into generalized linear models (GLMs), naive vectorization can destroy their multi-way structure and lead to high-dimensional, ill-posed estimation. To address this challenge, Low Separation Rank (LSR) decompositions reduce model complexity by imposing low-rank multilinear structure on the coefficient tensor. A representative approach for estimating LSR-based tensor GLMs (LSR-TGLMs) is the Low Separation Rank Tensor Regression (LSRTR) algorithm, which adopts block coordinate descent and enforces orthogonality of the factor matrices through repeated QR-based projections. However, the repeated projection steps can be computationally demanding and slow convergence. Motivated by the need for scalable estimation and classification from such data, we propose LSRTR-M, which incorporates Muon (MomentUm Orthogonalized by Newton-Schulz) updates into the LSRTR framework. Specifically, LSRTR-M preserves the original block coordinate scheme while replacing the projection-based factor updates with Muon steps. Across synthetic linear, logistic, and Poisson LSR-TGLMs, LSRTR-M converges faster in both iteration count and wall-clock time, while achieving lower normalized estimation and prediction errors. On the Vessel MNIST 3D task, it further improves computational efficiency while maintaining competitive classification performance.

cross Fine-Tuning Integrity for Modern Neural Networks: Structured Drift Proofs via Norm, Rank, and Sparsity Certificates

Authors: Zhenhang Shang, Kani Chen

Abstract: Fine-tuning is now the primary method for adapting large neural networks, but it also introduces new integrity risks. An untrusted party can insert backdoors, change safety behavior, or overwrite large parts of a model while claiming only small updates. Existing verification tools focus on inference correctness or full-model provenance and do not address this problem. We introduce Fine-Tuning Integrity (FTI) as a security goal for controlled model evolution. An FTI system certifies that a fine-tuned model differs from a trusted base only within a policy-defined drift class. We propose Succinct Model Difference Proofs (SMDPs) as a new cryptographic primitive for enforcing these drift constraints. SMDPs provide zero-knowledge proofs that the update to a model is norm-bounded, low-rank, or sparse. The verifier cost depends only on the structure of the drift, not on the size of the model. We give concrete SMDP constructions based on random projections, polynomial commitments, and streaming linear checks. We also prove an information-theoretic lower bound showing that some form of structure is necessary for succinct proofs. Finally, we present architecture-aware instantiations for transformers, CNNs, and MLPs, together with an end-to-end system that aggregates block-level proofs into a global certificate.

cross Undetectable Conversations Between AI Agents via Pseudorandom Noise-Resilient Key Exchange

Authors: Vinod Vaikuntanathan, Or Zamir

Abstract: AI agents are increasingly deployed to interact with other agents on behalf of users and organizations. We ask whether two such agents, operated by different entities, can carry out a parallel secret conversation while still producing a transcript that is computationally indistinguishable from an honest interaction, even to a strong passive auditor that knows the full model descriptions, the protocol, and the agents' private contexts. Building on recent work on watermarking and steganography for LLMs, we first show that if the parties possess an interaction-unique secret key, they can facilitate an optimal-rate covert conversation: the hidden conversation can exploit essentially all of the entropy present in the honest message distributions. Our main contributions concern extending this to the keyless setting, where the agents begin with no shared secret. We show that covert key exchange, and hence covert conversation, is possible even when each model has an arbitrary private context, and their messages are short and fully adaptive, assuming only that sufficiently many individual messages have at least constant min-entropy. This stands in contrast to previous covert communication works, which relied on the min-entropy in each individual message growing with the security parameter. To obtain this, we introduce a new cryptographic primitive, which we call pseudorandom noise-resilient key exchange: a key-exchange protocol whose public transcript is pseudorandom while still remaining correct under constant noise. We study this primitive, giving several constructions relevant to our application as well as strong limitations showing that more naive variants are impossible or vulnerable to efficient attacks. These results show that transcript auditing alone cannot rule out covert coordination between AI agents, and identify a new cryptographic theory that may be of independent interest.

cross HUKUKBERT: Domain-Specific Language Model for Turkish Law

Authors: Mehmet Utku \"Ozt\"urk, Tansu T\"urko\u{g}lu, Buse Buz-Yalug

Abstract: Recent advances in natural language processing (NLP) have increasingly enabled LegalTech applications, yet existing studies specific to Turkish law have still been limited due to the scarcity of domain-specific data and models. Although extensive models like LEGAL-BERT have been developed for English legal texts, the Turkish legal domain lacks a domain-specific high-volume counterpart. In this paper, we introduce HukukBERT, the most comprehensive legal language model for Turkish, trained on a 18 GB cleaned legal corpus using a hybrid Domain-Adaptive Pre-Training (DAPT) methodology integrating Whole-Word Masking, Token Span Masking, Word Span Masking, and targeted Keyword Masking. We systematically compared our 48K WordPiece tokenizer and DAPT approach against general-purpose and existing domain-specific Turkish models. Evaluated on a novel Legal Cloze Test benchmark -- a masked legal term prediction task designed for Turkish court decisions -- HukukBERT achieves state-of-the-art performance with 84.40\% Top-1 accuracy, substantially outperforming existing models. Furthermore, we evaluated HukukBERT in the downstream task of structural segmentation of official Turkish court decisions, where it achieves a 92.8\% document pass rate, establishing a new state-of-the-art. We release HukukBERT to support future research in Turkish legal NLP tasks, including recognition of named entities, prediction of judgment, and classification of legal documents.

cross Multi-Modal Sensor Fusion using Hybrid Attention for Autonomous Driving

Authors: Mayank Mayank, Bharanidhar Duraisamy, Florian Gei{\ss}, Abhinav Valada

Abstract: Accurate 3D object detection for autonomous driving requires complementary sensors. Cameras provide dense semantics but unreliable depth, while millimeter-wave radar offers precise range and velocity measurements with sparse geometry. We propose MMF-BEV, a radar-camera BEV fusion framework that leverages deformable attention for cross-modal feature alignment on the View-of-Delft (VoD) 4D radar dataset [1]. MMF-BEV builds a BEVDepth [2] camera branch and a RadarBEVNet [3] radar branch, each enhanced with Deformable Self-Attention, and fuses them via a Deformable Cross-Attention module. We evaluate three configurations: camera-only, radar-only, and hybrid fusion. A sensor contribution analysis quantifies per-distance modality weighting, providing interpretable evidence of sensor complementarity. A two-stage training strategy - pre-training the camera branch with depth supervision, then jointly training radar and fusion modules stabilizes learning. Experiments on VoD show that MMF-BEV consistently outperforms unimodal baselines and achieves competitive results against prior fusion methods across all object classes in both the full annotated area and near-range Region of Interest.

cross Partially deterministic sampling for compressed sensing with denoising guarantees

Authors: Yaniv Plan, Matthew S. Scott, Ozgur Yilmaz

Abstract: We study compressed sensing when the sampling vectors are chosen from the rows of a unitary matrix. In the literature, these sampling vectors are typically chosen randomly; the use of randomness has enabled major empirical and theoretical advances in the field. However, in practice there are often certain crucial sampling vectors, in which case practitioners will depart from the theory and sample such rows deterministically. In this work, we derive an optimized sampling scheme for Bernoulli selectors which naturally combines random and deterministic selection of rows, thus rigorously deciding which rows should be sampled deterministically. This sampling scheme provides measurable improvements in image compressed sensing for both generative and sparse priors when compared to with-replacement and without-replacement sampling schemes, as we show with theoretical results and numerical experiments. Additionally, our theoretical guarantees feature improved sample complexity bounds compared to previous works, and novel denoising guarantees in this setting.

cross SkillX: Automatically Constructing Skill Knowledge Bases for Agents

Authors: Chenxi Wang, Zhuoyun Yu, Xin Xie, Wuguannan Yao, Runnan Fang, Shuofei Qiao, Kexin Cao, Guozhou Zheng, Xiang Qi, Peng Zhang, Shumin Deng

Abstract: Learning from experience is critical for building capable large language model (LLM) agents, yet prevailing self-evolving paradigms remain inefficient: agents learn in isolation, repeatedly rediscover similar behaviors from limited experience, resulting in redundant exploration and poor generalization. To address this problem, we propose SkillX, a fully automated framework for constructing a \textbf{plug-and-play skill knowledge base} that can be reused across agents and environments. SkillX operates through a fully automated pipeline built on three synergistic innovations: \textit{(i) Multi-Level Skills Design}, which distills raw trajectories into three-tiered hierarchy of strategic plans, functional skills, and atomic skills; \textit{(ii) Iterative Skills Refinement}, which automatically revises skills based on execution feedback to continuously improve library quality; and \textit{(iii) Exploratory Skills Expansion}, which proactively generates and validates novel skills to expand coverage beyond seed training data. Using a strong backbone agent (GLM-4.6), we automatically build a reusable skill library and evaluate its transferability on challenging long-horizon, user-interactive benchmarks, including AppWorld, BFCL-v3, and $\tau^2$-Bench. Experiments show that SkillKB consistently improves task success and execution efficiency when plugged into weaker base agents, highlighting the importance of structured, hierarchical experience representations for generalizable agent learning. Our code will be publicly available soon at https://github.com/zjunlp/SkillX.

URLs: https://github.com/zjunlp/SkillX.

cross Hybrid Fourier Neural Operator for Surrogate Modeling of Laser Processing with a Quantum-Circuit Mixer

Authors: Mateusz Papierz, Asel Sagingalieva, Alix Benoit, Toni Ivas, Elia Iseli, Alexey Melnikov

Abstract: Data-driven surrogates can replace expensive multiphysics solvers for parametric PDEs, yet building compact, accurate neural operators for three-dimensional problems remains challenging: in Fourier Neural Operators, dense mode-wise spectral channel mixing scales linearly with the number of retained Fourier modes, inflating parameter counts and limiting real-time deployability. We introduce HQ-LP-FNO, a hybrid quantum-classical FNO that replaces a configurable fraction of these dense spectral blocks with a compact, mode-shared variational quantum circuit mixer whose parameter count is independent of the Fourier mode budget. A parameter-matched classical bottleneck control is co-designed to provide a rigorous evaluation framework. Evaluated on three-dimensional surrogate modeling of high-energy laser processing, coupling heat transfer, melt-pool convection, free-surface deformation, and phase change, HQ-LP-FNO reduces trainable parameters by 15.6% relative to a classical baseline while lowering phase-fraction mean absolute error by 26% and relative temperature MAE from 2.89% to 2.56%. A sweep over the quantum-channel budget reveals that a moderate VQC allocation yields the best temperature metrics across all tested configurations, including the fully classical baseline, pointing toward an optimal classical-quantum partitioning. The ablation confirms that mode-shared mixing, naturally implemented by the VQC through its compact circuit structure, is the dominant contributor to these improvements. A noisy-simulator study under backend-calibrated noise from ibm-torino confirms numerical stability of the quantum mixer across the tested shot range. These results demonstrate that VQC-based parameter-efficient spectral mixing can improve neural operator surrogates for complex multiphysics problems and establish a controlled evaluation protocol for hybrid quantum operator learning in practice.

cross A Robust SINDy Autoencoder for Noisy Dynamical System Identification

Authors: Kairui Ding

Abstract: Sparse identification of nonlinear dynamics (SINDy) has been widely used to discover the governing equations of a dynamical system from data. It uses sparse regression techniques to identify parsimonious models of unknown systems from a library of candidate functions. Therefore, it relies on the assumption that the dynamics are sparsely represented in the coordinate system used. To address this limitation, one seeks a coordinate transformation that provides reduced coordinates capable of reconstructing the original system. Recently, SINDy autoencoders have extended this idea by combining sparse model discovery with autoencoder architectures to learn simplified latent coordinates together with parsimonious governing equations. A central challenge in this framework is robustness to measurement error. Inspired by noise-separating neural network structures, we incorporate a noise-separation module into the SINDy autoencoder architecture, thereby improving robustness and enabling more reliable identification of noisy dynamical systems. Numerical experiments on the Lorenz system show that the proposed method recovers interpretable latent dynamics and accurately estimates the measurement noise from noisy observations.

cross Synthetic Sandbox for Training Machine Learning Engineering Agents

Authors: Yuhang Zhou, Lizhu Zhang, Yifan Wu, Jiayi Liu, Xiangjun Fan, Zhuokai Zhao, Hong Yan

Abstract: As large language model agents advance beyond software engineering (SWE) tasks toward machine learning engineering (MLE), verifying agent behavior becomes orders of magnitude more expensive: while SWE tasks can be verified via fast-executing unit tests, MLE verification requires running full ML pipelines -- data preprocessing, model training, and metric evaluation -- on large datasets at each rollout step, rendering trajectory-wise on-policy reinforcement learning (RL) prohibitively slow. Existing approaches retreat to supervised fine-tuning (SFT) or offline proxy rewards, sacrificing the exploration and generalization benefits of on-policy RL. We observe that sandbox data size is the primary source of this bottleneck. Based on this insight, we introduce SandMLE, a multi-agent framework that generates diverse, verifiable synthetic MLE environments from a small number of seed tasks, preserving the structural and technical complexity of real-world problems while constraining datasets to micro-scale (each task is paired with only 50-200 training samples). Through extensive experiments, we show that SandMLE reduces execution time by over 13 times, enabling large-scale, on-policy trajectory-wise RL for the first time in the MLE domain. On MLE-bench-lite, SandMLE yields significant gains over SFT baselines across Qwen3-8B, 14B, and 30B-A3B, with relative medal rate improvements ranging from 20.3% to 66.9%. Furthermore, the trained policy generalizes across unseen agentic scaffolds, achieving up to 32.4% better HumanRank score on MLE-Dojo.

cross Rethinking Exploration in RLVR: From Entropy Regularization to Refinement via Bidirectional Entropy Modulation

Authors: Hengrui Gu, Xiaotian Han, Yujing Bian, Kaixiong Zhou

Abstract: Reinforcement learning with verifiable rewards (RLVR) has significantly advanced the reasoning capabilities of large language models (LLMs). However, it faces a fundamental limitation termed \textit{restricted exploration}, where the policy rapidly converges to a narrow set of solutions. While entropy regularization is a popular approach used to sustain exploration, it often proves unreliable for LLMs, suffering from high hyperparameter sensitivity and yielding only marginal performance gains. Motivated by these inefficiencies, we propose to rethink the relationship between policy entropy and exploration. By deriving a parametric formulation of group-relative advantage estimation and analyzing entropy dynamics, we conceptually decompose policy entropy into \textit{informative entropy}, which preserves diverse solution paths, and \textit{spurious entropy}, which erodes reasoning patterns. Our analysis reveals that, in contrast to blind maximization, effective exploration requires \textit{entropy refinement}-a mechanism implicitly embedded in group-relative advantage estimation that sustains informative entropy on positive rollouts while suppressing spurious entropy on negative ones. Guided by this insight, we propose \textbf{AsymGRPO}, an exploratory framework that explicitly decouples the modulation of positive and negative rollouts. This allows for independent control over the preservation of informative entropy and the suppression of spurious noise. Extensive experiments demonstrate that AsymGRPO achieves superior performance compared to strong baselines and exhibits the potential to synergize with existing entropy regularization methods.

cross QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

Authors: LM-Provers, Yuxiao Qu, Amrith Setlur, Jasper Dekoninck, Edward Beeching, Jia Li, Ian Wu, Lewis Tunstall, Aviral Kumar

Abstract: Proprietary AI systems have recently demonstrated impressive capabilities on complex proof-based problems, with gold-level performance reported at the 2025 International Mathematical Olympiad (IMO). However, the training pipelines behind these systems remain largely undisclosed, and their reliance on large "internal" models and scaffolds makes them expensive to run, difficult to reproduce, and hard to study or improve upon. This raises a central question: can small, open models also be trained to achieve competitive reasoning performance on difficult Olympiad-level math? In this paper, we answer this question by building QED-Nano, a 4B model post-trained for Olympiad-level proofs. Our training recipe has three stages: (1) supervised fine-tuning to imbue good proof-writing styles by distilling from DeepSeek-Math-V2, (2) reinforcement learning (RL) with rubric-based rewards, and (3) expanding RL with a reasoning cache, which decomposes long proofs into iterative summarize-and-refine cycles and enables stronger test-time reasoning. QED-Nano surpasses the proof-generation performance of much larger open models, including Nomos-1 and GPT-OSS-120B, and approaches the performance of proprietary models like Gemini 3 Pro, at a fraction of the inference cost. To support further research on open mathematical reasoning, we release the full QED-Nano pipeline, including the QED-Nano and QED-Nano-SFT models, the FineProofs-SFT and FineProofs-RL datasets, and the training and evaluation code.

cross Analyzing Symbolic Properties for DRL Agents in Systems and Networking

Authors: Mohammad Zangooei, Jannis Weil, Amr Rizk, Mina Tahmasbi Arashloo, Raouf Boutaba

Abstract: Deep reinforcement learning (DRL) has shown remarkable performance on complex control problems in systems and networking, including adaptive video streaming, wireless resource management, and congestion control. For safe deployment, however, it is critical to reason about how agents behave across the range of system states they encounter in practice. Existing verification-based methods in this domain primarily focus on point properties, defined around fixed input states, which offer limited coverage and require substantial manual effort to identify relevant input-output pairs for analysis. In this paper, we study symbolic properties, that specify expected behavior over ranges of input states, for DRL agents in systems and networking. We present a generic formulation for symbolic properties, with monotonicity and robustness as concrete examples, and show how they can be analyzed using existing DNN verification engines. Our approach encodes symbolic properties as comparisons between related executions of the same policy and decomposes them into practically tractable sub-properties. These techniques serve as practical enablers for applying existing verification tools to symbolic analysis. Using our framework, diffRL, we conduct an extensive empirical study across three DRL-based control systems, adaptive video streaming, wireless resource management, and congestion control. Through these case studies, we analyze symbolic properties over broad input ranges, examine how property satisfaction evolves during training, study the impact of model size on verifiability, and compare multiple verification backends. Our results show that symbolic properties provide substantially broader coverage than point properties and can uncover non-obvious, operationally meaningful counterexamples, while also revealing practical solver trade-offs and limitations.

cross PINNs in PDE Constrained Optimal Control Problems: Direct vs Indirect Methods

Authors: Zhen Zhang, Shanqing Liu, Alessandro Alla, Jerome Darbon, George Em Karniadakis

Abstract: We study physics-informed neural networks (PINNs) as numerical tools for the optimal control of semilinear partial differential equations. We first recall the classical direct and indirect viewpoints for optimal control of PDEs, and then present two PINN formulations: a direct formulation based on minimizing the objective under the state constraint, and an indirect formulation based on the first-order optimality system. For a class of semilinear parabolic equations, we derive the state equation, the adjoint equation, and the stationarity condition in a form consistent with continuous-time Pontryagin-type optimality conditions. We then specialize the framework to an Allen-Cahn control problem and compare three numerical approaches: (i) a discretize-then-optimize adjoint method, (ii) a direct PINN, and (iii) an indirect PINN. Numerical results show that the PINN parameterization has an implicit regularizing effect, in the sense that it tends to produce smoother control profiles. They also indicate that the indirect PINN more faithfully preserves the PDE contraint and optimality structure and yields a more accurate neural approximation than the direct PINN.

cross Early Stopping for Large Reasoning Models via Confidence Dynamics

Authors: Parsa Hosseini, Sumit Nawathe, Mahdi Salmani, Meisam Razaviyayn, Soheil Feizi

Abstract: Large reasoning models rely on long chain-of-thought generation to solve complex problems, but extended reasoning often incurs substantial computational cost and can even degrade performance due to overthinking. A key challenge is determining when the model should stop reasoning and produce the final answer. In this work, we study the confidence of intermediate answers during reasoning and observe two characteristic behaviors: correct reasoning trajectories often reach high-confidence answers early, while incorrect rollouts tend to produce long, unproductive reasoning traces and exhibit less reliable confidence dynamics. Motivated by these observations, we propose CoDE-Stop (Confidence Dynamics Early Stop), an early stopping method that leverages the dynamics of intermediate answer confidence to decide when to terminate reasoning, requiring no additional training and easily integrating into existing models. We evaluate CoDE-Stop on diverse reasoning and science benchmarks across multiple models. Compared to prior early stopping methods, it achieves a more favorable accuracy-compute tradeoff and reduces total token usage by 25-50% compared to standard full-length reasoning. In addition, we provide analyses of confidence dynamics during reasoning, offering insights into how confidence changes in both correct and incorrect trajectories.

replace Mitigating Value Hallucination in Dyna Planning via Multistep Predecessor Models

Authors: Farzane Aminmansour, Taher Jafferjee, Ehsan Imani, Erin Talvitie, Micheal Bowling, Martha White

Abstract: Dyna-style reinforcement learning (RL) agents improve sample efficiency over model-free RL agents by updating the value function with simulated experience generated by an environment model. However, it is often difficult to learn accurate models of environment dynamics, and even small errors may result in failure of Dyna agents. In this paper, we highlight that one potential cause of that failure is bootstrapping off of the values of simulated states, and introduce a new Dyna algorithm to avoid this failure. We discuss a design space of Dyna algorithms, based on using successor or predecessor models -- simulating forwards or backwards -- and using one-step or multi-step updates. Three of the variants have been explored, but surprisingly the fourth variant has not: using predecessor models with multi-step updates. We present the \emph{Hallucinated Value Hypothesis} (HVH): updating the values of real states towards values of simulated states can result in misleading action values which adversely affect the control policy. We discuss and evaluate all four variants of Dyna amongst which three update real states toward simulated states -- so potentially toward hallucinated values -- and our proposed approach, which does not. The experimental results provide evidence for the HVH, and suggest that using predecessor models with multi-step updates is a promising direction toward developing Dyna algorithms that are more robust to model error.

replace Graph State-Space Models and Latent Relational Inference

Authors: Daniele Zambon, Andrea Cini, Cesare Alippi

Abstract: State-space models effectively model multivariate time series by updating over time a representation of the system state from which predictions are made. The state representation is usually a vector without any explicit structure. Relational inductive biases, e.g., associated with dependencies among input signals and state representations, are not explicitly exploited during processing, leaving unattended opportunities for effective modeling. The manuscript aims to fill this gap by matching state-space modeling and spatio-temporal data where the relational information, say the functional graph capturing latent dependencies, is learned directly from time series. In particular, we propose Graph State-Space Models, a novel probabilistic framework that jointly learns state-space dynamics and latent relational structures end-to-end on downstream tasks. The proposed framework generalizes several state-of-the-art methods and, as we show, is effective in extracting meaningful latent relational structures and obtaining accurate forecasts.

replace Neural Exploitation and Exploration of Contextual Bandits

Authors: Yikun Ban, Yuchen Yan, Arindam Banerjee, Jingrui He

Abstract: In this paper, we study utilizing neural networks for the exploitation and exploration of contextual multi-armed bandits. Contextual multi-armed bandits have been studied for decades with various applications. To solve the exploitation-exploration trade-off in bandits, there are three main techniques: epsilon-greedy, Thompson Sampling (TS), and Upper Confidence Bound (UCB). In recent literature, a series of neural bandit algorithms have been proposed to adapt to the non-linear reward function, combined with TS or UCB strategies for exploration. In this paper, instead of calculating a large-deviation based statistical bound for exploration like previous methods, we propose, ``EE-Net,'' a novel neural-based exploitation and exploration strategy. In addition to using a neural network (Exploitation network) to learn the reward function, EE-Net uses another neural network (Exploration network) to adaptively learn the potential gains compared to the currently estimated reward for exploration. We provide an instance-based $\widetilde{\mathcal{O}}(\sqrt{T})$ regret upper bound for EE-Net and show that EE-Net outperforms related linear and neural contextual bandit baselines on real-world datasets.

replace Sparse Gaussian Graphical Models with Discrete Optimization: Computational and Statistical Perspectives

Authors: Kayhan Behdin, Wenyu Chen, Rahul Mazumder

Abstract: We consider the problem of learning a sparse graph underlying an undirected Gaussian graphical model, a key problem in statistical machine learning. Given $n$ samples from a multivariate Gaussian distribution with $p$ variables, the goal is to estimate the $p \times p$ inverse covariance matrix (aka precision matrix), assuming it is sparse (i.e., has a few nonzero entries). We propose GraphL0BnB, a new estimator based on an $\ell_0$-penalized version of the pseudo-likelihood function, while most earlier approaches are based on the $\ell_1$-relaxation. Our estimator can be formulated as a convex mixed integer program (MIP) which can be difficult to compute beyond $p\approx 100$ using off-the-shelf commercial solvers. To solve the MIP, we propose a custom nonlinear branch-and-bound (BnB) framework that solves node relaxations with tailored first-order methods. As a key component of our BnB framework, we propose large-scale solvers for obtaining good primal solutions that are of independent interest. We derive novel statistical guarantees (estimation and variable selection) for our estimator and discuss how our approach improves upon existing estimators. Our numerical experiments on real and synthetic datasets suggest that our BnB framework offers significant advantages over off-the-shelf commercial solvers, and our approach has favorable performance (both in terms of runtime and statistical performance) compared to the state-of-the-art approaches for learning sparse graphical models.

replace Federated Transfer Learning with Differential Privacy

Authors: Mengchu Li, Ye Tian, Yang Feng, Yi Yu

Abstract: Federated learning has emerged as a powerful framework for analysing distributed data, yet two challenges remain pivotal: heterogeneity across sites and privacy of local data. In this paper, we address both challenges within a federated transfer learning framework, aiming to enhance learning on a target data set by leveraging information from multiple heterogeneous source data sets while adhering to privacy constraints. We rigorously formulate the notion of federated differential privacy, which offers privacy guarantees for each data set without assuming a trusted central server. Under this privacy model, we study four statistical problems: univariate mean estimation, low-dimensional linear regression, high-dimensional linear regression, and M-estimation. By investigating the minimax rates and quantifying the cost of privacy, we show that federated differential privacy is an intermediate privacy model between the well-established local and central models of differential privacy. Our analyses account for data heterogeneity and privacy, highlighting the fundamental costs associated with each factor and the benefits of knowledge transfer in federated learning.

replace FedScalar: Federated Learning with Scalar Communication for Bandwidth-Constrained Networks

Authors: M. Rostami, S. S. Kia

Abstract: In bandwidth-constrained federated learning~(FL) settings, the repeated upload of high-dimensional model updates from agents to a central server constitutes the primary bottleneck, often rendering standard FL infeasible within practical communication budgets. We propose \emph{FedScalar}, a communication-efficient FL algorithm in which each agent uploads only two scalar values per round, regardless of the model dimension~$d$. Each agent encodes its local update difference as an inner product with a locally generated random vector and transmits the resulting scalar together with the generating seed, enabling the server to reconstruct an unbiased gradient estimate without any high-dimensional transmission. We prove that \emph{FedScalar} achieves a convergence rate of $O(d/\sqrt{K})$ to a stationary point for smooth non-convex loss functions, and show that adopting a Rademacher distribution for the random vector reduces the aggregation variance compared to the Gaussian case. Numerical simulations confirm that the dimension-free upload cost translates into significant improvements in wall-clock time and energy efficiency over \emph{FedAvg} and \emph{QSGD} in bandwidth-constrained settings.

replace EventFlow: Forecasting Temporal Point Processes with Flow Matching

Authors: Gavin Kerrigan, Kai Nelson, Padhraic Smyth

Abstract: Continuous-time event sequences, in which events occur at irregular intervals, are ubiquitous across a wide range of industrial and scientific domains. The contemporary modeling paradigm is to treat such data as realizations of a temporal point process, and in machine learning it is common to model temporal point processes in an autoregressive fashion using a neural network. While autoregressive models are successful in predicting the time of a single subsequent event, their performance can degrade when forecasting longer horizons due to cascading errors and myopic predictions. We propose EventFlow, a non-autoregressive generative model for temporal point processes. The model builds on the flow matching framework in order to directly learn joint distributions over event times, side-stepping the autoregressive process. EventFlow is simple to implement and achieves a 20%-53% lower forecast error than the nearest baseline on standard TPP benchmarks while simultaneously using fewer model calls at sampling time.

replace Multi-Agent Environments for Vehicle Routing Problems

Authors: Ricardo Gama, Ricardo Cunha, Daniel Fuertes, Carlos R. del-Blanco, Hugo L. Fernandes

Abstract: Research on Reinforcement Learning (RL) approaches for discrete optimization problems has increased considerably, extending RL to areas classically dominated by Operations Research (OR). Vehicle routing problems are a good example of discrete optimization problems with high practical relevance, for which RL techniques have achieved notable success. Despite these advances, open-source development frameworks remain scarce, hindering both algorithm testing and objective comparison of results. This situation ultimately slows down progress in the field and limits the exchange of ideas between the RL and OR communities. Here, we propose MAEnvs4VRP library, a unified framework for multi-agent vehicle routing environments that supports classical, dynamic, stochastic, and multi-task problem variants within a single modular design. The library, built on PyTorch, provides a flexible and modular architecture design that facilitates customization and the incorporation of new routing problems. It follows the Agent Environment Cycle ("AEC") games model and features an intuitive API, enabling rapid adoption and seamless integration into existing reinforcement learning frameworks. The project source code can be found at https://github.com/ricgama/maenvs4vrp.

URLs: https://github.com/ricgama/maenvs4vrp.

replace Certified Training with Branch-and-Bound for Lyapunov-stable Neural Control

Authors: Zhouxing Shi, Haoyu Li, Cho-Jui Hsieh, Huan Zhang

Abstract: We study the problem of learning verifiably Lyapunov-stable neural controllers that provably satisfy the Lyapunov asymptotic stability condition within a region-of-attraction (ROA). Unlike previous works that adopted counterexample-guided training without considering the computation of verification in training, we introduce Certified Training with Branch-and-Bound (CT-BaB), a new certified training framework that optimizes certified bounds, thereby reducing the discrepancy between training and test-time verification that also computes certified bounds. To achieve a relatively global guarantee on an entire input region-of-interest, we propose a training-time BaB technique that maintains a dynamic training dataset and adaptively splits hard input subregions into smaller ones, to tighten certified bounds and ease the training. Meanwhile, subregions created by the training-time BaB also inform test-time verification, for a more efficient training-aware verification. We demonstrate that CT-BaB yields verification-friendly models that can be more efficiently verified at test time while achieving stronger verifiable guarantees with larger ROA. On the largest output-feedback 2D Quadrotor system experimented, CT-BaB reduces verification time by over 11X relative to the previous state-of-the-art baseline using Counterexample Guided Inductive Synthesis (CEGIS), while achieving 164X larger ROA. Code is available at https://github.com/shizhouxing/CT-BaB.

URLs: https://github.com/shizhouxing/CT-BaB.

replace Amortized Safe Active Learning for Real-Time Data Acquisition: Pretrained Neural Policies From Simulated Nonparametric Functions

Authors: Cen-You Li, Marc Toussaint, Barbara Rakitsch, Christoph Zimmer

Abstract: Safe active learning (AL) is a sequential scheme for learning unknown systems while respecting safety constraints during data acquisition. Existing methods often rely on Gaussian processes (GPs) to model the task and safety constraints, requiring repeated GP updates and constrained acquisition optimization--incurring significant computations which are challenging for real-time decision-making. We propose amortized AL for regression and amortized safe AL, replacing expensive online computations with a pretrained neural policy. Inspired by recent advances in amortized Bayesian experimental design, we leverage GPs as pretraining simulators. We train our policy prior to the AL deployment on simulated nonparametric functions, using Fourier feature-based GP sampling and a differentiable acquisition objective that is safety-aware in the safe AL setting. At deployment, our policy selects informative and (if desired) safe queries via a single forward pass, eliminating GP inference and acquisition optimization. This leads to magnitudes of speed improvements while preserving learning quality. Our framework is modular and, without the safety component, yields fast unconstrained AL for time-sensitive tasks.

replace Causal Bandit Over Unknown Graphs: Upper Confidence Bounds With Backdoor Adjustment

Authors: Yijia Zhao, Qing Zhou

Abstract: The causal bandit problem seeks to identify, through sequential experimentation, an intervention that maximizes the expected reward in a causal system modeled by a directed acyclic graph (DAG). Existing methods typically assume that the causal graph is known or impose restrictive structural assumptions. In this paper, we study causal bandit problems when the causal graph is unknown. We first consider Gaussian DAG models without latent confounders. By combining observational and experimental data collected sequentially during the bandit process, we identify candidate backdoor adjustment sets for each intervention arm. These sets enable estimation of causal effects and construction of upper confidence bounds that integrate information from both data sources. Based on these estimates, we propose a new algorithm, termed backdoor-adjustment upper confidence bound (BA-UCB), for sequential intervention selection. We establish finite-time upper bounds on the cumulative regret of BA-UCB, showing improved rates and substantially relaxed dependency on the number of intervention arms compared to standard multi-armed bandit methods. We further extend the methodology and theoretical guarantees to settings with latent confounders, where the observed variables are modeled by an acyclic directed mixed graph. Simulation studies demonstrate that BA-UCB achieves substantially lower cumulative regret and favorable computational efficiency relative to existing approaches.

replace From Restless to Contextual: A Thresholding Bandit Reformulation For Finite-horizon Improvement

Authors: Jiamin Xu, Ivan Nazarov, Aditya Rastogi, \'Africa Peri\'a\~nez, Kyra Gan

Abstract: This paper addresses the poor finite-horizon performance of existing online \emph{restless bandit} (RB) algorithms, which stems from the prohibitive sample complexity of learning a full \emph{Markov decision process} (MDP) for each agent. We argue that superior finite-horizon performance requires \emph{rapid convergence} to a \emph{high-quality} policy. Thus motivated, we introduce a reformulation of online RBs as a \emph{budgeted thresholding contextual bandit}, which simplifies the learning problem by encoding long-term state transitions into a scalar reward. We prove the first non-asymptotic optimality of an oracle policy for a simplified finite-horizon setting. We propose a practical learning policy under a heterogeneous-agent, multi-state setting, and show that it achieves a sublinear regret, achieving \emph{faster convergence} than existing methods. This directly translates to higher cumulative reward, as empirically validated by significant gains over state-of-the-art algorithms in large-scale heterogeneous environments. The code is provided in \href{https://github.com/jamie01713/EGT}{github}. Our work provides a new pathway for achieving practical, sample-efficient learning in finite-horizon RBs.

URLs: https://github.com/jamie01713/EGT

replace RESIST: Resilient Decentralized Learning Using Consensus Gradient Descent

Authors: Cheng Fang, Rishabh Dixit, Waheed U. Bajwa, Mert Gurbuzbalaban

Abstract: Empirical risk minimization (ERM) is a cornerstone of modern machine learning (ML), supported by advances in optimization theory that ensure efficient solutions with provable algorithmic and statistical learning rates. Privacy, memory, computation, and communication constraints necessitate data collection, processing, and storage across network-connected devices. In many applications, networks operate in decentralized settings where a central server cannot be assumed, requiring decentralized ML algorithms that are efficient and resilient. Decentralized learning, however, faces significant challenges, including an increased attack surface. This paper focuses on the man-in-the-middle (MITM) attack, wherein adversaries exploit communication vulnerabilities to inject malicious updates during training, potentially causing models to deviate from their intended ERM solutions. To address this challenge, we propose RESIST (Resilient dEcentralized learning using conSensus gradIent deScenT), an optimization algorithm designed to be robust against adversarially compromised communication links, where transmitted information may be arbitrarily altered before being received. Unlike existing adversarially robust decentralized learning methods, which often (i) guarantee convergence only to a neighborhood of the solution, (ii) lack guarantees of linear convergence for strongly convex problems, or (iii) fail to ensure statistical consistency as sample sizes grow, RESIST overcomes all three limitations. It achieves algorithmic and statistical convergence for strongly convex, Polyak-Lojasiewicz, and nonconvex ERM problems by employing a multistep consensus gradient descent framework and robust statistics-based screening methods to mitigate the impact of MITM attacks. Experimental results demonstrate the robustness and scalability of RESIST across attack strategies, screening methods, and loss functions.

replace Model Privacy: A Unified Framework for Understanding Model Stealing Attacks and Defenses

Authors: Ganghua Wang, Yuhong Yang, Jie Ding

Abstract: The use of machine learning (ML) has become increasingly prevalent in various domains, highlighting the importance of understanding and ensuring its safety. One pressing concern is the vulnerability of ML applications to model stealing attacks. These attacks involve adversaries attempting to recover a learned model through limited query-response interactions, such as those found in cloud-based services or on-chip artificial intelligence interfaces. While existing literature proposes various attack and defense strategies, these often lack a theoretical foundation and standardized evaluation criteria. In response, this work presents a framework called ``Model Privacy'', providing a foundation for comprehensively analyzing model stealing attacks and defenses. We establish a rigorous formulation for the threat model and objectives, propose methods to quantify the goodness of attack and defense strategies, and analyze the fundamental tradeoffs between utility and privacy in ML models. Our developed theory offers valuable insights into enhancing the security of ML models, especially highlighting the importance of the attack-specific structure of perturbations for effective defenses. We demonstrate the application of model privacy from the defender's perspective through various learning scenarios. Extensive experiments corroborate the insights and the effectiveness of defense mechanisms developed under the proposed framework.

replace An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models

Authors: Binxu Wang, Cengiz Pehlevan

Abstract: We develop an analytical framework for understanding how the generated distribution evolves during diffusion model training. Leveraging a Gaussian-equivalence principle, we solve the full-batch gradient-flow dynamics of linear and convolutional denoisers and integrate the resulting probability-flow ODE, yielding analytic expressions for the generated distribution. The theory exposes a universal inverse-variance spectral law: the time for an eigen- or Fourier mode to match its target variance scales as $\tau\propto\lambda^{-1}$, so high-variance (coarse) structure is mastered orders of magnitude sooner than low-variance (fine) detail. Extending the analysis to deep linear networks and circulant full-width convolutions shows that weight sharing merely multiplies learning rates -- accelerating but not eliminating the bias -- whereas local convolution introduces a qualitatively different bias. Experiments on Gaussian and natural-image datasets confirm the spectral law persists in deep MLP-based UNet. Convolutional U-Nets, however, display rapid near-simultaneous emergence of many modes, implicating local convolution in reshaping learning dynamics. These results underscore how data covariance governs the order and speed with which diffusion models learn, and they call for deeper investigation of the unique inductive biases introduced by local convolution.

replace RaPA: Enhancing Transferable Targeted Attacks via Random Parameter Pruning

Authors: Tongrui Su, Qingbin Li, Shengyu Zhu, Wei Chen, Xueqi Cheng

Abstract: Compared to untargeted attacks, targeted transfer-based attack is still suffering from much lower Attack Success Rates (ASRs), although significant improvements have been achieved by kinds of methods, such as diversifying input, stabilizing the gradient, and re-training surrogate models. In this paper, we find that adversarial examples generated by existing methods rely heavily on a small subset of surrogate model parameters, which in turn limits their transferability to unseen target models. Inspired by this, we propose the Random Parameter Pruning Attack (RaPA), which introduces parameter-level randomization during the attack process. At each optimization step, RaPA randomly prunes model parameters to generate diverse yet semantically consistent surrogate variants.We show this parameter-level randomization is equivalent to adding an importance-equalization regularizer, thereby alleviating the over-reliance issue. Extensive experiments across both CNN and Transformer architectures demonstrate that RaPA substantially enhances transferability. In the challenging case of transferring from CNN-based to Transformer-based models, RaPA achieves up to 11.7% higher average ASRs than state-of-the-art baselines(with 33.3% ASRs), while being training-free, cross-architecture efficient, and easily integrated into existing attack frameworks. Code is available in https://github.com/molarsu/RaPA.

URLs: https://github.com/molarsu/RaPA.

replace From Set Convergence to Pointwise Convergence: Finite-Time Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes

Authors: Zaiwei Chen, Phalguni Nanda

Abstract: This work presents the first finite-time analysis for the last-iterate convergence of average-reward $Q$-learning with an asynchronous implementation. A key feature of the algorithm we study is the use of adaptive stepsizes, which serve as local clocks for each state-action pair. We show that, under appropriate assumptions, the iterates generated by this $Q$-learning algorithm converge at a rate of $\tilde{\mathcal{O}}(1/k)$ (in the mean-square sense) to the optimal $Q$-function in the span seminorm. Moreover, by adding a centering step to the algorithm, we further establish pointwise mean-square convergence to the centered optimal $Q$-function, also at a rate of $\tilde{\mathcal{O}}(1/k)$. To prove these results, we show that adaptive stepsizes are necessary, as without them, the algorithm fails to converge to the correct target. In addition, adaptive stepsizes can be interpreted as a form of implicit importance sampling that counteracts the effects of asynchronous updates. Technically, the use of adaptive stepsizes makes each $Q$-learning update depend on the entire sample history, introducing strong correlations and making the algorithm a non-Markovian stochastic approximation (SA) scheme. Our approach to overcoming this challenge involves (1) a time-inhomogeneous Markovian reformulation of non-Markovian SA, and (2) a combination of almost-sure time-varying bounds, conditioning arguments, and Markov chain concentration inequalities to break the strong correlations between the adaptive stepsizes and the iterates. The tools developed in this work are likely to be broadly applicable to the analysis of general SA algorithms with adaptive stepsizes.

replace A Multi-Level Causal Intervention Framework for Mechanistic Interpretability in Variational Autoencoders

Authors: Dip Roy, Rajiv Misra, Sanjay Kumar Singh, Anisha Roy

Abstract: Understanding how generative models represent and transform data is a foundational problem in deep learning interpretability. While mechanistic interpretability of discriminative architectures has yielded substantial insights, relatively little work has addressed variational autoencoders (VAEs). This paper presents the first general-purpose multilevel causal intervention framework for mechanistic interpretability of VAEs. The framework comprises four manipulation types: input manipulation, latent-space perturbation, activation patching, and causal mediation analysis. We also define three new quantitative metrics capturing properties not measured by existing disentanglement metrics alone: Causal Effect Strength (CES), intervention specificity, and circuit modularity. We conduct the largest empirical study to date of VAE causal mechanisms across six architectures (standard VAE, beta-VAE, FactorVAE, beta-TC-VAE, DIP-VAE-II, and VQ-VAE) and five benchmarks (dSprites, 3DShapes, MPI3D, CelebA, and SmallNORB), with three seeds per configuration, totaling 90 independent training runs. Our results reveal several findings: (i) a consistent within-dataset negative correlation between CES and DCI disentanglement (the CES-DCI trade-off); (ii) that the KL reweighting mechanism of beta-VAE induces a capacity bottleneck when generative factors approach latent dimensionality, degrading disentanglement on complex datasets; (iii) that no single VAE architecture dominates across all five datasets, with optimal choice depending on dataset structure; and (iv) that CES-based metrics applied to discrete latent spaces (VQ-VAE) yield near-zero values, revealing a critical limitation of continuous-intervention methods for discrete representations. These results provide both a theoretical foundation and comprehensive empirical evaluation for mechanistic interpretability of generative models.

replace Bayesian Hierarchical Invariant Prediction

Authors: Francisco Madaleno, Pernille Julie Viuff Sand, Francisco C. Pereira, Sergio Hernan Garrido Mejia

Abstract: We propose Bayesian Hierarchical Invariant Prediction (BHIP) reframing Invariant Causal Prediction (ICP) through the lens of Hierarchical Bayes. We leverage the hierarchical structure to explicitly test invariance of causal mechanisms under heterogeneous data, resulting in improved computational scalability for a larger number of predictors compared to ICP. Moreover, given its Bayesian nature BHIP enables the use of prior information. We evaluate BHIP on both synthetic and real-world datasets, demonstrating its potential as an alternative inference method to ICP and related methods.

replace FABLE: A Localized, Targeted Adversarial Attack on Weather Forecasting Models

Authors: Yue Deng, Asadullah Hill Galib, Xin Lan, Jack Gunn, Pang-Ning Tan, Lifeng Luo

Abstract: Deep learning-based weather forecasting (DLWF) models have recently demonstrated significant performance gains over gold-standard physics-based simulation tools. However, these models are potentially vulnerable to adversarial attacks, which raises concerns about their trustworthiness. In this paper, we investigate the feasibility and challenges of applying existing adversarial attack methods to DLWF models and propose a novel framework called FABLE (Forecast Alteration By Localized targeted advErsarial attack) to address them. FABLE performs a 3D discrete wavelet decomposition to disentangle the spatial and temporal components of the data. By regulating the magnitude of adversarial perturbations across different components, FABLE produces adversarial inputs that remain closely aligned with the original inputs while steering the DLWF models toward generating the targeted forecast outcomes. Experimental results on real-world weather datasets demonstrate the effectiveness of FABLE over baseline methods across various metrics.

replace Enforcing Fair Predicted Scores on Intervals of Percentiles by Difference-of-Convex Constraints

Authors: Yutian He, Yankun Huang, Yao Yao, Qihang Lin

Abstract: Fairness in machine learning has become a critical concern. Existing approaches often focus on achieving full fairness across all score ranges generated by predictive models, ensuring fairness in both high- and low-percentile populations. However, this stringent requirement can compromise predictive performance and may not align with the practical fairness concerns of stakeholders. In this work, we propose a novel framework for building partially fair machine learning models that enforce fairness only within a specific percentile interval of interest while maintaining flexibility in other regions. We introduce statistical metrics to evaluate partial fairness within a given percentile interval. To achieve partial fairness, we propose an in-processing method by formulating the model training problem as constrained optimization with difference-of-convex constraints, which can be solved by an inexact difference-of-convex algorithm (IDCA). We provide the complexity analysis of IDCA for finding a nearly KKT point. Through numerical experiments on real-world datasets, we demonstrate that our framework achieves high predictive performance while enforcing partial fairness where it matters most.

replace Understanding Task Representations in Neural Networks via Bayesian Ablation

Authors: Andrew Nam, Declan Campbell, Thomas Griffiths, Jonathan Cohen, Sarah-Jane Leslie

Abstract: Neural networks are powerful tools for cognitive modeling due to their flexibility and emergent properties. However, interpreting their learned representations remains challenging due to their sub-symbolic semantics. In this work, we introduce a novel probabilistic framework for interpreting latent task representations in neural networks. Inspired by Bayesian inference, our approach defines a distribution over representational units to infer their causal contributions to task performance. Using ideas from information theory, we propose a suite of tools and metrics to illuminate key model properties, including representational distributedness, manifold complexity, and polysemanticity.

replace MSDformer: Multi-scale Discrete Transformer For Time Series Generation

Authors: Shibo Feng, Zhicheng Chen, Xi Xiao, Zhong Zhang, Qing Li, Xingyu Gao, Peilin Zhao

Abstract: Discrete Token Modeling (DTM), which employs vector quantization techniques, has demonstrated remarkable success in modeling non-natural language modalities, particularly in time series generation. While our prior work SDformer established the first DTM-based framework to achieve state-of-the-art performance in this domain, two critical limitations persist in existing DTM approaches: 1) their inability to capture multi-scale temporal patterns inherent to complex time series data, and 2) the absence of theoretical foundations to guide model optimization. To address these challenges, we proposes a novel multi-scale DTM-based time series generation method, called Multi-Scale Discrete Transformer (MSDformer). MSDformer employs a multi-scale time series tokenizer to learn discrete token representations at multiple scales, which jointly characterize the complex nature of time series data. Subsequently, MSDformer applies a multi-scale autoregressive token modeling technique to capture the multi-scale patterns of time series within the discrete latent space. Theoretically, we validate the effectiveness of the DTM method and the rationality of MSDformer through the rate-distortion theorem. Comprehensive experiments demonstrate that MSDformer significantly outperforms state-of-the-art methods. Both theoretical analysis and experimental results demonstrate that incorporating multi-scale information and modeling multi-scale patterns can substantially enhance the quality of generated time series in DTM-based approaches. Code is available at this repository:https://github.com/kkking-kk/MSDformer.

URLs: https://github.com/kkking-kk/MSDformer.

replace SoSBench: Benchmarking Safety Alignment on Six Scientific Domains

Authors: Fengqing Jiang, Fengbo Ma, Zhangchen Xu, Yuetai Li, Zixin Rao, Bhaskar Ramasubramanian, Luyao Niu, Bo Li, Xianyan Chen, Zhen Xiang, Radha Poovendran

Abstract: Large language models (LLMs) exhibit advancing capabilities in complex tasks, such as reasoning and graduate-level question answering, yet their resilience against misuse, particularly involving scientifically sophisticated risks, remains underexplored. Existing safety benchmarks typically focus either on instructions requiring minimal knowledge comprehension (e.g., ``tell me how to build a bomb") or utilize prompts that are relatively low-risk (e.g., multiple-choice or classification tasks about hazardous content). Consequently, they fail to adequately assess model safety when handling knowledge-intensive, hazardous scenarios. To address this critical gap, we introduce SoSBench, a regulation-grounded, hazard-focused benchmark encompassing six high-risk scientific domains: chemistry, biology, medicine, pharmacology, physics, and psychology. The benchmark comprises 3,000 prompts derived from real-world regulations and laws, systematically expanded via an LLM-assisted evolutionary pipeline that introduces diverse, realistic misuse scenarios (e.g., detailed explosive synthesis instructions involving advanced chemical formulas). We evaluate frontier models within a unified evaluation framework using our SoSBench. Despite their alignment claims, advanced models consistently disclose policy-violating content across all domains, demonstrating alarmingly high rates of harmful responses (e.g., 84.9% for Deepseek-R1 and 50.3% for GPT-4.1). These results highlight significant safety alignment deficiencies and underscore urgent concerns regarding the responsible deployment of powerful LLMs.

replace LLMs Judging LLMs: A Simplex Perspective

Authors: Patrick Vossler, Fan Xia, Yifan Mai, Adarsh Subbaswamy, Jean Feng

Abstract: Given the challenge of automatically evaluating free-form outputs from large language models (LLMs), an increasingly common solution is to use LLMs themselves as the judging mechanism, without any gold-standard scores. Implicitly, this practice accounts for only sampling variability (aleatoric uncertainty) and ignores uncertainty about judge quality (epistemic uncertainty). While this is justified if judges are perfectly accurate, it is unclear when such an approach is theoretically valid and practically robust. We study these questions for the task of ranking LLM candidates from a novel geometric perspective: for $M$-level scoring systems, both LLM judges and candidates can be represented as points on an $(M-1)$-dimensional probability simplex, where geometric concepts (e.g., triangle areas) correspond to key ranking concepts. This perspective yields intuitive theoretical conditions and visual proofs for when rankings are identifiable; for instance, we provide a formal basis for the ``folk wisdom'' that LLM judges are more effective for two-level scoring ($M=2$) than multi-level scoring ($M>2$). Leveraging the simplex, we design geometric Bayesian priors that encode epistemic uncertainty about judge quality and vary the priors to conduct sensitivity analyses. Experiments on LLM benchmarks show that rankings based solely on LLM judges are robust in many but not all datasets, underscoring both their widespread success and the need for caution. Our Bayesian method achieves substantially higher coverage rates than existing procedures, highlighting the importance of modeling epistemic uncertainty.

replace Beyond Linear Steering: Unified Multi-Attribute Control for Language Models

Authors: Narmeen Oozeer, Luke Marks, Shreyans Jain, Fazl Barez, Amirali Abdullah

Abstract: Controlling multiple behavioral attributes in large language models (LLMs) at inference time is a challenging problem due to interference between attributes and the limitations of linear steering methods, which assume additive behavior in activation space and require per-attribute tuning. We introduce K-Steering, a unified and flexible approach that trains a single non-linear multi-label classifier on hidden activations and computes intervention directions via gradients at inference time. This avoids linearity assumptions, removes the need for storing and tuning separate attribute vectors, and allows dynamic composition of behaviors without retraining. To evaluate our method, we propose two new benchmarks, ToneBank and DebateMix, targeting compositional behavioral control. Empirical results across 3 model families, validated by both activation-based classifiers and LLM-based judges, demonstrate that K-Steering outperforms strong baselines in accurately steering multiple behaviors.

replace MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation

Authors: Wei Shen, Zhang Yaxiang, Minhui Huang, Mengfan Xu, Jiawei Zhang, Cong Shen

Abstract: With increasing size of large language models (LLMs), full-parameter fine-tuning imposes substantial memory demands. To alleviate this, we propose a novel memory-efficient training paradigm called Momentum Low-rank compression (MLorc). The key idea of MLorc is to compress and reconstruct the momentum of matrix parameters during training to reduce memory consumption. Compared to LoRA, MLorc avoids enforcing a fixed-rank constraint on weight update matrices and thus enables full-parameter learning. Compared to GaLore, MLorc directly compress the momentum rather than gradients, thereby better preserving the training dynamics of full-parameter fine-tuning. We provide a theoretical guarantee for its convergence under mild assumptions. Empirically, MLorc consistently outperforms other memory-efficient training methods, matches or even exceeds the performance of full fine-tuning at small ranks (e.g., $r=4$), and generalizes well across different optimizers, all while not compromising time or memory efficiency.

replace SFBD Flow: A Continuous-Optimization Framework for Training Diffusion Models with Noisy Samples

Authors: Haoye Lu, Darren Lo, Yaoliang Yu

Abstract: Diffusion models achieve strong generative performance but often rely on large datasets that may include sensitive content. This challenge is compounded by the models' tendency to memorize training data, raising privacy concerns. SFBD (Lu et al., 2025) addresses this by training on corrupted data and using limited clean samples to capture local structure and improve convergence. However, its iterative denoising and fine-tuning loop requires manual coordination, making it burdensome to implement. We reinterpret SFBD as an alternating projection algorithm and introduce a continuous variant, SFBD flow, that removes the need for alternating steps. We further show its connection to consistency constraint-based methods, and demonstrate that its practical instantiation, Online SFBD, consistently outperforms strong baselines across benchmarks.

replace Not All Tokens Matter: Towards Efficient LLM Reasoning via Token Significance in Reinforcement Learning

Authors: Hanbing Liu, Lang Cao, Yuanyi Ren, Mengyu Zhou, Haoyu Dong, Xiaojun Ma, Shi Han, Dongmei Zhang

Abstract: Large language models (LLMs) show strong reasoning abilities but often produce unnecessarily long explanations that reduce efficiency. Although reinforcement learning (RL) has been used to improve reasoning, most methods focus on accuracy and rely on uniform length-based rewards that overlook the differing contributions of individual tokens, often harming correctness. We revisit length optimization in RL through the perspective of token significance. Observing that many chain-of-thought (CoT) tokens contribute little to the final answer, we introduce a significance-aware length reward that selectively penalizes insignificance tokens, reducing redundancy while preserving essential reasoning. We also propose a dynamic length reward that encourages more detailed reasoning early in training and gradually shifts toward conciseness as learning progresses. Integrating these components into standard policy optimization yields a framework that improves both reasoning efficiency and accuracy. Experiments across multiple benchmarks demonstrate substantial reductions in response length while preserving or improving correctness, highlighting the importance of modeling token significance for efficient LLM reasoning.

replace Federated Item Response Models: A Gradient-driven Privacy-preserving Framework for Distributed Psychometric Estimation

Authors: Biying Zhou, Nanyu Luo, Feng Ji

Abstract: Item Response Theory (IRT) models are widely used to estimate respondents' latent abilities and calibrate item difficulty. Traditional IRT estimation typically requires centralizing all raw responses, raising privacy and governance concerns. We introduce Federated Item Response Theory (FedIRT), a framework that enables distributed calibration of standard IRT models without transferring individual-level data, thereby preserving confidentiality while retaining statistical efficiency. To provide formal protection, we further develop FedIRT-DP, a user-level differentially private extension. Each site computes per-student gradients, clips them to a fixed norm, and shares only masked sums; the server adds calibrated Gaussian noise and performs MAP updates. This yields an auditable $(\varepsilon,\delta)$ guarantee at the student level and a single, tunable privacy-utility trade-off via the clipping bound and noise scale. The same mechanism improves robustness to extreme response rows (e.g., all-zeros/ones). Across simulations, FedIRT matches the accuracy of centralized estimators from popular $\texttt{R}$ packages while avoiding data pooling; FedIRT-DP achieves comparable accuracy under stronger privacy and exhibits superior robustness to contamination. An empirical study on a real exam dataset demonstrates practical viability and consistent item and site-effect estimates. To facilitate adoption, we release an open-source $\texttt{R}$ package, $\texttt{FedIRT}$, implementing the two-parameter logistic (2PL) and partial credit models (PCM) with federated and differentially private training.

replace The Riemannian Geometry Associated to Gradient Flows of Linear Convolutional Networks

Authors: El Mehdi Achour, Kathl\'en Kohn, Holger Rauhut

Abstract: We study geometric properties of the gradient flow for learning deep linear convolutional networks. For linear fully connected networks, it has been shown recently that the corresponding gradient flow on parameter space can be written as a Riemannian gradient flow on function space (i.e., on the product of weight matrices) if the initialization satisfies a so-called balancedness condition. We establish that the gradient flow on parameter space for learning linear convolutional networks can be written as a Riemannian gradient flow on function space regardless of the initialization. This result holds for $D$-dimensional convolutions with $D \geq 2$, and for $D =1$ it holds if all so-called strides of the convolutions are greater than one. The corresponding Riemannian metric depends on the initialization.

replace Multi-Component VAE with Gaussian Markov Random Field

Authors: Fouad Oubari, Mohamed El-Baha, Raphael Meunier, Rodrigue D\'ecatoire, Mathilde Mougeot

Abstract: Multi-component datasets with intricate dependencies, like industrial assemblies or multi-modal imaging, challenge current generative modeling techniques. Existing Multi-component Variational AutoEncoders typically rely on simplified aggregation strategies, neglecting critical nuances and consequently compromising structural coherence across generated components. To explicitly address this gap, we introduce the Gaussian Markov Random Field Multi-Component Variational AutoEncoder , a novel generative framework embedding Gaussian Markov Random Fields into both prior and posterior distributions. This design choice explicitly models cross-component relationships, enabling richer representation and faithful reproduction of complex interactions. Empirically, our GMRF MCVAE achieves state-of-the-art performance on a synthetic Copula dataset specifically constructed to evaluate intricate component relationships, demonstrates competitive results on the PolyMNIST benchmark, and significantly enhances structural coherence on the real-world BIKED dataset. Our results indicate that the GMRF MCVAE is especially suited for practical applications demanding robust and realistic modeling of multi-component coherence

replace Causal Process Models: Reframing Dynamic Causal Graph Discovery as a Reinforcement Learning Problem

Authors: Turan Orujlu, Christian Gumbsch, Martin V. Butz, Charley M Wu

Abstract: Most neural models of causality assume static causal graphs, failing to capture the dynamic and sparse nature of physical interactions where causal relationships emerge and dissolve over time. We introduce the Causal Process Framework and its neural implementation, Causal Process Models (CPMs), for learning sparse, time-varying causal graphs from visual observations. Unlike traditional approaches that maintain dense connectivity, our model explicitly constructs causal edges only when objects actively interact, dramatically improving both interpretability and computational efficiency. We achieve this by casting dynamic interaction-graph construction for world modeling as a multi-agent reinforcement learning problem, where specialized agents sequentially decide which objects are causally connected at each timestep. Our key innovation is a structured representation that factorizes object and force vectors along three learned dimensions (mutability, causal relevance, and control relevance), enabling the automatic discovery of semantically meaningful encodings. We demonstrate that a CPM significantly outperforms dense graph baselines on physical prediction tasks, particularly for longer horizons and varying object counts.

replace Evaluating and Learning Robust Bandit Policies Under Uncertain Causal Mechanisms

Authors: Katherine Avery, Chinmay Pendse, David Jensen

Abstract: Causal graphical models can encode large amounts structural knowledge, both from the background knowledge of domain experts and the structural knowledge discovered from randomized experiments or observational data. However, though we may know the general structure of causal relationships, we often do not know the exact causal mechanisms. In this work, we propose a causal multi-armed bandit evaluation and learning algorithm that can reason effectively despite uncertainty over conditional probability distributions. Further, we show how conditional independence testing can be used to choose variables for modeling. We find that the structural equation model (SEM) approach gives more accurate evaluations compared to traditional approaches, particularly as the range of possible causal mechanisms grows. Further, the SEM approach learns low-variance policies, and it learns an optimal policy, assuming the model is sufficiently well-specified. Traditional approaches can converge to local extrema or fail to converge at all.

replace PRISM: Lightweight Multivariate Time-Series Classification through Symmetric Multi-Resolution Convolutional Layers

Authors: Federico Zucchi, Thomas Lampert

Abstract: Multivariate time series classification supports applications from wearable sensing to biomedical monitoring and demands models that can capture both short-term patterns and multi-scale temporal dependencies. Despite recent advances, Transformer and CNN models often remain computationally heavy and rely on many parameters. This work presents PRISM(Per-channel Resolution Informed Symmetric Module), a lightweight fully convolutional classifier. Operating in a channel-independent manner, in its early stage it applies a set of multi-resolution symmetric convolutional filters. This symmetry enforces structural constraints inspired by linear-phase FIR filters from classical signal processing, effectively halving the number of learnable parameters within the initial layers while preserving the full receptive field. Across the diverse UEA multivariate time-series archive as well as specific benchmarks in human activity recognition, sleep staging, and biomedical signals, PRISM matches or outperforms state-of-the-art CNN and Transformer models while using significantly fewer parameters and markedly lower computational cost. By bringing a principled signal processing prior into a modern neural architecture, PRISM offers an effective and computationally economical solution for multivariate time series classification. Code and data are available at https://github.com/fedezuc/PRISM

URLs: https://github.com/fedezuc/PRISM

replace LNN-PINN: A Unified Physics-Only Training Framework with Liquid Residual Blocks

Authors: Ze Tao, Hanxuan Wang, Fujun Liu

Abstract: Physics-informed neural networks (PINNs) have attracted considerable attention for their ability to integrate partial differential equation priors into deep learning frameworks; however, they often exhibit limited predictive accuracy when applied to complex problems. To address this issue, we propose LNN-PINN, a physics-informed neural network framework that incorporates a liquid residual gating architecture while preserving the original physics modeling and optimization pipeline to improve predictive accuracy. The method introduces a lightweight gating mechanism solely within the hidden-layer mapping, keeping the sampling strategy, loss composition, and hyperparameter settings unchanged to ensure that improvements arise purely from architectural refinement. Across four benchmark problems, LNN-PINN consistently reduced RMSE and MAE under identical training conditions, with absolute error plots further confirming its accuracy gains. Moreover, the framework demonstrates strong adaptability and stability across varying dimensions, boundary conditions, and operator characteristics. In summary, LNN-PINN offers a concise and effective architectural enhancement for improving the predictive accuracy of physics-informed neural networks in complex scientific and engineering problems.

replace xRFM: Accurate, scalable, and interpretable feature learning models for tabular data

Authors: Daniel Beaglehole, David Holzm\"uller, Adityanarayanan Radhakrishnan, Mikhail Belkin

Abstract: Inference from tabular data, collections of continuous and categorical variables organized into matrices, is a foundation for modern technology and science. Yet, in contrast to the explosive changes in the rest of AI, the best practice for these predictive tasks has been relatively unchanged and is still primarily based on variations of Gradient Boosted Decision Trees (GBDTs). Very recently, there has been renewed interest in developing state-of-the-art methods for tabular data based on recent developments in neural networks and feature learning methods. In this work, we introduce xRFM, an algorithm that combines feature learning kernel machines with a tree structure to both adapt to the local structure of the data and scale to essentially unlimited amounts of training data. We show that compared to $31$ other methods, including recently introduced tabular foundation models (TabPFNv2) and GBDTs, xRFM achieves best performance across $100$ regression datasets and is competitive to the best methods across $200$ classification datasets outperforming GBDTs. Additionally, xRFM provides interpretability natively through the Average Gradient Outer Product.

replace Estimating Parameter Fields in Multi-Physics PDEs from Scarce Measurements

Authors: Xuyang Li, Mahdi Masmoudi, Rami Gharbi, Nizar Lajnef, Vishnu Naresh Boddeti

Abstract: Parameterized partial differential equations (PDEs) underpin the mathematical modeling of complex systems in diverse domains, including engineering, healthcare, and physics. A central challenge in using PDEs for real-world applications is to accurately infer the parameters, particularly when the parameters exhibit non-linear and spatiotemporal variations. Existing parameter estimation methods, such as sparse identification, physics-informed neural networks (PINNs), and neural operators, struggle in such cases, especially with nonlinear dynamics, multiphysics interactions, or limited observations of the system response. To address these challenges, we introduce Neptune, a general-purpose method capable of inferring parameter fields from sparse measurements of system responses. Neptune employs independent coordinate neural networks to continuously represent each parameter field in physical space or in state variables. Across various physical and biomedical problems, where direct parameter measurements are prohibitively expensive or unattainable, Neptune significantly outperforms existing methods, achieving robust parameter estimation from as few as 45 measurements, reducing parameter estimation errors by two orders of magnitude and dynamic response prediction errors by a factor of ten to baselines such as PINNs and neural operators. More importantly, it exhibits superior physical extrapolation capabilities, enabling reliable predictions in regimes far beyond the training data. By facilitating reliable and data-efficient parameter inference, Neptune promises broad transformative impacts in engineering, healthcare, and beyond.

replace Improving Generative Methods for Causal Evaluation via Simulation-Based Inference

Authors: Pracheta Amaranath, Vinitra Muralikrishnan, Amit Sharma, David Jensen

Abstract: Generating synthetic datasets that accurately reflect real-world observational data is critical for evaluating causal estimators, but it remains a challenging task. Existing generative methods offer a solution by producing synthetic datasets anchored in the observed data (source data) while allowing variation in key parameters such as the treatment effect and amount of confounding bias. However, it is often unclear which generative methods to use and which values of parameters to choose when generating synthetic datasets. Moreover, existing methods typically require users to provide fixed point estimates of such parameters. This denies users the ability to express uncertainty over both generative methods and parameter values and removes the potential for posterior inference, potentially leading to unreliable estimator comparisons. We introduce simulation-based inference for causal evaluation (SBICE), a framework that treats the generative method and its corresponding generative parameters as uncertain and infers their posterior distribution given a source dataset. Leveraging techniques in simulation-based inference, SBICE identifies suitable generative methods and infers distributions over its parameter configurations to produce synthetic datasets closely aligned with the source data distribution. Empirical results demonstrate that SBICE improves the reliability of estimator evaluations by generating realistic datasets whose causal estimates closely match the estimates of the source data, making it a robust and uncertainty-aware approach to selecting causal estimators.

replace A Data-Driven Interpolation Method on Smooth Manifolds via Diffusion Processes and Voronoi Tessellations

Authors: Alvaro Almeida Gomez

Abstract: We propose a data-driven interpolation method for approximating real-valued functions on smooth manifolds, based on the Laplace--Beltrami operator and Voronoi tessellations. Given pointwise evaluations of a function, the method constructs a continuous extension over the manifold by exploiting diffusion processes and the intrinsic geometry of the data. The proposed approach is entirely data-driven and requires neither a training phase nor any preprocessing prior to inference. Furthermore, the computational complexity of the inference step scales linearly in the number of sample points, thereby providing substantial improvements in scalability and computational efficiency compared to classical data driven interpolation methods, including neural networks, radial basis function networks, and Gaussian process regression. We further show that the interpolant has vanishing gradient at the interpolation points and, with high probability as the number of samples increases, attenuates high-frequency components of the signal. Moreover, the proposed method minimizes a total variation-type energy, thereby yielding a closed-form analytical approximation to the compressed sensing problem in the case where the forward operator is the identity. Finally, we present applications to sparse computational tomography reconstruction. Numerical experiments demonstrate that the proposed method achieves competitive reconstruction quality while significantly reducing computational time compared to classical total variation-based reconstruction methods.

replace LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios

Authors: Zhiyuan Huang, Jiahao Chen, Bing Su

Abstract: Long-tailed semi-supervised learning (LTSSL) presents a formidable challenge where models must overcome the scarcity of tail samples while mitigating the noise from unreliable pseudo-labels. Most prior LTSSL methods are designed to train models from scratch, which often leads to issues such as overconfidence and low-quality pseudo-labels. To address this problem, we first theoretically prove that utilizing a foundation model significantly reduces the hypothesis complexity, which tightens the generalization bound and in turn minimizes the Balanced Posterior Error (BPE). Furthermore, we demonstrate that the feature compactness of foundation models strictly compresses the acceptance region for outliers, providing a geometric guarantee for robustness. Motivated by these theoretical insights, we extend LTSSL into the foundation model fine-tuning paradigm and propose a novel framework: LoFT (Long-tailed semi-supervised learning via parameter-efficient Fine-Tuning). Furthermore, we explore a more practical setting by investigating semi-supervised learning under open-world conditions, where the unlabeled data may include out-of-distribution (OOD) samples.To handle this problem, we propose LoFT-OW (LoFT under Open-World scenarios) to improve the discriminative ability. Experimental results on multiple benchmarks demonstrate that our method achieves superior performance.

replace Causal Discovery via Quantile Partial Effect

Authors: Yikang Chen, Xingzhe Sun, Dehui Du

Abstract: Quantile Partial Effect (QPE) is a statistic associated with conditional quantile regression, measuring the effect of covariates at different levels. Our theory demonstrates that when the QPE of cause on effect is assumed to lie in a finite linear span, cause and effect are identifiable from their observational distribution. This generalizes previous identifiability results based on Functional Causal Models (FCMs) with additive, heteroscedastic noise, etc. Meanwhile, since QPE resides entirely at the observational level, this parametric assumption does not require considering mechanisms, noise, or even the Markov assumption, but rather directly utilizes the asymmetry of shape characteristics in the observational distribution. By performing basis function tests on the estimated QPE, causal directions can be distinguished, which is empirically shown to be effective in experiments on a large number of bivariate causal discovery datasets. For multivariate causal discovery, leveraging the close connection between QPE and score functions, we find that Fisher Information is sufficient as a statistical measure to determine causal order when assumptions are made about the second moment of QPE. We validate the feasibility of using Fisher Information to identify causal order on multiple synthetic and real-world multivariate causal discovery datasets.

replace Process-Informed Forecasting of Complex Thermal Dynamics in Pharmaceutical Manufacturing

Authors: Ramona Rubini, Siavash Khodakarami, Aniruddha Bora, George Em Karniadakis, Michele Dassisti

Abstract: Accurate time-series forecasting for complex physical systems is the backbone of modern industrial monitoring and control, yet deep learning models often lack the physical consistency required in regulated environments. To bridge this gap, we introduce Process-Informed Forecasting (PIF) models for temperature in pharmaceutical lyophilization, embedding deterministic production recipes as macro-structural priors. We investigate classical methods (e.g., Autoregressive Integrated Moving Average (ARIMA) model) and modern deep learning architectures, including Kolmogorov-Arnold Networks (KANs). We compare three different loss function formulations that integrate a process-informed trajectory prior: a fixed-weight loss, a dynamic uncertainty-based loss, and a Residual-Based Attention (RBA) mechanism. We evaluate all models not only for accuracy and physical consistency but also for robustness to sensor noise. Furthermore, we test the practical generalizability of the best model in a transfer learning scenario on a new process. Our results show that PIF models outperform their data-driven counterparts in terms of accuracy, physical plausibility and noise resilience, offering a scalable framework for reliable and generalizable forecasting solutions in critical manufacturing.

replace ACT: Agentic Classification Tree

Authors: Vincent Grari, Tim Arni, Thibault Laugel, Sylvain Lamprier, James Zou, Marcin Detyniecki

Abstract: When used in high-stakes settings, AI systems are expected to produce decisions that are transparent, interpretable and auditable, a requirement increasingly expected by regulations. Decision trees such as CART provide clear and verifiable rules, but they are restricted to structured tabular data and cannot operate directly on unstructured inputs such as text. In practice, large language models (LLMs) are widely used for such data, yet prompting strategies such as chain-of-thought or prompt optimization still rely on free-form reasoning, limiting their ability to ensure trustworthy behaviors. We present the Agentic Classification Tree (ACT), which extends decision-tree methodology to unstructured inputs by formulating each split as a natural-language question, refined through impurity-based evaluation and LLM feedback via TextGrad. Experiments on text benchmarks show that ACT matches or surpasses prompting-based baselines while producing transparent and interpretable decision paths.

replace Fewer Weights, More Problems: A Practical Attack on LLM Pruning

Authors: Kazuki Egashira, Robin Staab, Thibaud Gloaguen, Mark Vero, Martin Vechev

Abstract: Model pruning, i.e., removing a subset of model weights, has become a prominent approach to reducing the memory footprint of large language models (LLMs) during inference. Notably, popular inference engines, such as vLLM, enable users to conveniently prune downloaded models before they are deployed. While the utility and efficiency of pruning methods have improved significantly, the security implications of pruning remain underexplored. In this work, for the first time, we show that modern LLM pruning methods can be maliciously exploited. In particular, an adversary can construct a model that appears benign yet, once pruned, exhibits malicious behaviors. Our method is based on the idea that the adversary can compute a proxy metric that estimates how likely each parameter is to be pruned. With this information, the adversary can first inject a malicious behavior into those parameters that are unlikely to be pruned. Then, they can repair the model by using parameters that are likely to be pruned, effectively canceling out the injected behavior in the unpruned model. We demonstrate the severity of our attack through extensive evaluation on five models; after any of the pruning in vLLM are applied (Magnitude, Wanda, and SparseGPT), it consistently exhibits strong malicious behaviors in a diverse set of attack scenarios (success rates of up to $95.7\%$ for jailbreak, $98.7\%$ for benign instruction refusal, and $99.5\%$ for targeted content injection). Our results reveal a critical deployment-time security gap and underscore the urgent need for stronger security awareness in model compression.

replace A Minimal-Assumption Analysis of Q-Learning with Time-Varying Policies

Authors: Phalguni Nanda, Zaiwei Chen

Abstract: In this work, we present the first finite-time analysis of Q-learning with time-varying learning policies (i.e., on-policy sampling) for discounted Markov decision processes under minimal assumptions, requiring only the existence of a policy that induces an irreducible Markov chain over the state space. We establish a last-iterate convergence rate for $\mathbb{E}[\|Q_k - Q^*\|_\infty^2]$, implying a sample complexity of order $\mathcal{O}(1/\xi^2)$ for achieving $\mathbb{E}[\|Q_k - Q^*\|_\infty]\le \xi$. This matches the rate of off-policy Q-learning, but with worse dependence on exploration-related parameters. We also derive a finite-time rate for $\mathbb{E}[\|Q^{\pi_k} - Q^*\|_\infty^2]$, where $\pi_k$ is the learning policy at iteration $k$, highlighting the exploration-exploitation trade-off in on-policy Q-learning. While exploration is weaker than in off-policy methods, on-policy learning enjoys an exploitation advantage as the learning policy converges to an optimal one. Numerical results support our theory. Technically, rapidly time-varying learning policies induce time-inhomogeneous Markovian noise, creating significant analytical challenges under minimal exploration. To address this, we develop a Poisson-equation-based decomposition of the Markovian noise under a lazy transition matrix, separating it into a martingale-difference term and residual terms. The residuals are controlled via sensitivity analysis of the Poisson equation solution with respect to both the Q-function estimate and the learning policy. These techniques may extend to other RL algorithms with time-varying policies, such as single-timescale actor-critic methods and learning-in-games algorithms.

replace A Model Can Help Itself: Reward-Free Self-Training for LLM Reasoning

Authors: Mengqi Li, Lei Zhao, Anthony Man-Cho So, Ruoyu Sun, Xiao Li

Abstract: Can language models improve their reasoning performance without external rewards, using only their own sampled responses for training? We show that they can. We propose Self-evolving Post-Training (SePT), a simple post-training method that alternates between self-generation and training on self-generated responses. It repeatedly samples questions, uses the model itself to generate low-temperature responses, and then finetunes the model on the self-generated data. In this self-training loop, we use an online data refresh mechanism, where each new batch is generated by the most recently updated model. Across six math reasoning benchmarks, SePT improves a strong no-training baseline, defined as the untuned base model evaluated at its best swept decoding temperature, on several tested models. In some settings, SePT can even approach the performance of Reinforcement Learning with Verifiable Rewards (RLVR). Additional ablations demonstrate the importance of online data refresh and temperature decoupling. Overall, our results identify a practical regime in which reasoning can be improved using self-generated supervision alone. Our code is available at https://github.com/ElementQi/SePT.

URLs: https://github.com/ElementQi/SePT.

replace Deep Gaussian Processes for Functional Maps

Authors: Matthew Lowery, Zhitong Xu, Da Long, Keyan Chen, Daniel S. Johnson, Yang Bai, Varun Shankar, Shandian Zhe

Abstract: Learning mappings between functional spaces, also known as function-on-function regression, is a fundamental problem in functional data analysis with broad applications, including spatiotemporal forecasting, curve prediction, and climate modeling. Existing approaches often struggle to capture complex nonlinear relationships and/or provide reliable uncertainty quantification when data are noisy, sparse, or irregularly sampled. To address these challenges, we propose Deep Gaussian Processes for Functional Maps (DGPFM). Our method constructs a sequence of GP-based linear and nonlinear transformations directly in function space, leveraging kernel integral transforms, GP conditional means, and nonlinear activations sampled from Gaussian processes. A key insight enables a simplified and flexible implementation: under fixed evaluation locations, discrete approximations of kernel integral transforms reduce to direct functional integral transforms, allowing seamless integration of diverse transform designs. To support scalable probabilistic inference, we adopt inducing points and whitening transformations within a variational learning framework. Empirical results on both real-world and synthetic benchmark datasets demonstrate the advantages of DGPFM in terms of predictive accuracy and uncertainty calibration.

replace An Information-Theoretic Analysis of OOD Generalization in Meta-Reinforcement Learning

Authors: Xingtu Liu

Abstract: In this work, we study out-of-distribution (OOD) generalization in meta-reinforcement learning from an information-theoretic perspective. We begin by establishing OOD generalization bounds for meta-supervised learning under two distinct distribution shift scenarios: standard distribution mismatch and a broad-to-narrow training setting. Building on this foundation, we formalize the generalization problem in meta-reinforcement learning and establish fine-grained generalization bounds that exploit the structure of Markov Decision Processes. Lastly, we analyze the generalization performance of a gradient-based meta-reinforcement learning algorithm.

replace Co-Evolving Latent Action World Models

Authors: Yucen Wang, Fengming Zhang, De-Chuan Zhan, Li Zhao, Kaixin Wang, Jiang Bian

Abstract: Adapting pretrained video generation models into controllable world models via latent actions is a promising step towards creating generalist world models. The dominant paradigm adopts a two-stage approach that trains latent action model (LAM) and the world model separately, resulting in redundant training and limiting their potential for co-adaptation. A conceptually simple and appealing idea is to directly replace the forward dynamic model in LAM with a powerful world model and training them jointly, but it is non-trivial and prone to representational collapse. In this work, we propose CoLA-World, which for the first time successfully realizes this synergistic paradigm, resolving the core challenge in joint learning through a critical warm-up phase that effectively aligns the representations of the from-scratch LAM with the pretrained world model. This unlocks a co-evolution cycle: the world model acts as a knowledgeable tutor, providing gradients to shape a high-quality LAM, while the LAM offers a more precise and adaptable control interface to the world model. Empirically, CoLA-World matches or outperforms prior two-stage methods in both video simulation quality and downstream visual planning, establishing a robust and efficient new paradigm for the field.

replace SPORE: Skeleton Propagation Over Recalibrating Expansions

Authors: Randolph Wiredu-Aidoo

Abstract: Clustering is a foundational task in data analysis, yet most algorithms impose rigid assumptions on cluster geometry: centroid-based methods favor convex structures, while density-based approaches break down under variable local density or moderate dimensionality. This paper introduces SPORE (Skeleton Propagation Over Recalibrating Expansions), a classical clustering algorithm built to handle arbitrary geometry without relying on global density parameters. SPORE grows clusters through a nearest-neighbor graph, admitting new points based on each cluster's own evolving distance statistics, with density-ordered seeding enabling recovery of nested and asymmetrically separated structures. A refinement stage exploits initial over-segmentation, propagating high-confidence cluster skeletons outward to resolve ambiguous boundaries in low-contrast regions. Across 28 diverse benchmark datasets, SPORE achieves a statistically significant improvement in ARI-based recovery capacity over all evaluated baselines, with strong performance accessible within ten evaluations of a fixed hyperparameter grid.

replace SynQuE: Estimating Synthetic Dataset Quality Without Annotations

Authors: Arthur Chen, Victor Zhong

Abstract: We introduce and formalize the Synthetic Dataset Quality Estimation (SynQuE) problem: ranking synthetic datasets by their expected real-world task performance using only limited unannotated real data. This addresses a critical and open challenge where data is scarce due to collection costs or privacy constraints. We establish the first comprehensive benchmarks for this problem by introducing and evaluating proxy metrics that choose synthetic data for training to maximize task performance on real data. We introduce the first proxy metrics for SynQuE by adapting distribution and diversity-based distance measures to our context via embedding models. To address the shortcomings of these metrics on complex planning tasks, we propose LENS, a novel proxy that leverages large language model reasoning. Our results show that SynQuE proxies correlate with real task performance across diverse tasks, including sentiment analysis, Text2SQL, web navigation, and image classification, with LENS consistently outperforming others on complex tasks by capturing nuanced characteristics. For instance, on text-to-SQL parsing, training on the top-3 synthetic datasets selected via SynQuE proxies can raise accuracy from 30.4% to 38.4 (+8.1)% on average compared to selecting data indiscriminately. This work establishes SynQuE as a practical framework for synthetic data selection under real-data scarcity and motivates future research on foundation model-based data characterization and fine-grained data selection.

replace FAST-CAD: A Fairness-Aware Framework for Non-Contact Stroke Diagnosis

Authors: Tommy Sha (Kindred), Zhan Cheng (Kindred), Haotian Zhai (Kindred), Xuwei Ding (Kindred), Junnan Li (Kindred), Haixiang Tang (Kindred), Zaoting Sun (Kindred), Yanchuan Tang (Kindred), Yongzhe (Kindred), Yi, Yuan Gao, Anhao Li

Abstract: Stroke is an acute cerebrovascular disease, and timely diagnosis significantly improves patient survival. However, existing automated diagnosis methods suffer from fairness issues across demographic groups, potentially exacerbating healthcare disparities. In this work we propose FAST-CAD, a theoretically grounded framework that combines domain-adversarial training (DAT) with group distributionally robust optimization (Group-DRO) for fair and accurate non-contact stroke diagnosis. Our approach is built on domain adaptation and minimax fairness theory and provides convergence guarantees and fairness bounds. We curate a multimodal dataset covering 12 demographic subgroups defined by age, gender, and posture. FAST-CAD employs self-supervised encoders with adversarial domain discrimination to learn demographic-invariant representations, while Group-DRO optimizes worst-group risk to ensure robust performance across all subgroups. Extensive experiments show that our method achieves superior diagnostic performance while maintaining fairness across demographic groups, and our theoretical analysis supports the effectiveness of the unified DAT + Group-DRO framework. This work provides both practical advances and theoretical insights for fair medical AI systems.

replace Controllable protein design with particle-based Feynman-Kac steering

Authors: Erik Hartman, Jonas Wallin, Johan Malmstr\"om, Jimmy Olsson

Abstract: Proteins underpin most biological function, and the ability to design them with tailored structures and properties is central to advances in biotechnology. Diffusion-based generative models have emerged as powerful tools for protein design, but steering them toward proteins with specified properties remains challenging. The Feynman-Kac (FK) framework provides a principled way to guide diffusion models using user-defined rewards. In this paper, we enable FK-based steering of RFdiffusion through the development of guiding potentials that leverage ProteinMPNN and structural relaxation to guide the diffusion process towards desired properties. We show that steering can be used to consistently improve predicted interface energetics and increase binder designability by $89.5\%$. Together, these results establish that diffusion-based protein design can be effectively steered toward arbitrary, non-differentiable objectives, providing a model-independent framework for controllable protein generation.

replace A Unified Stability Analysis of SAM vs SGD: Role of Data Coherence and Emergence of Simplicity Bias

Authors: Wei-Kai Chang, Rajiv Khanna

Abstract: Understanding the dynamics of optimization in deep learning is increasingly important as models scale. While stochastic gradient descent (SGD) and its variants reliably find solutions that generalize well, the mechanisms driving this generalization remain unclear. Notably, these algorithms often prefer flatter or simpler minima, particularly in overparameterized settings. Prior work has linked flatness to generalization, and methods like Sharpness-Aware Minimization (SAM) explicitly encourage flatness, but a unified theory connecting data structure, optimization dynamics, and the nature of learned solutions is still lacking. In this work, we develop a linear stability framework that analyzes the behavior of SGD, random perturbations, and SAM, particularly in two layer ReLU networks. Central to our analysis is a coherence measure that quantifies how gradient curvature aligns across data points, revealing why certain minima are stable and favored during training.

replace Semantic Superiority vs. Forensic Efficiency: A Comparative Analysis of Deep Learning and Psycholinguistics for Business Email Compromise Detection

Authors: Yaw Osei Adjei (Kwame Nkrumah University of Science,Technology, Kumasi, Ghana), Frederick Ayivor (Independent Researcher, Fishers, Indiana, USA)

Abstract: Business Email Compromise (BEC) is a high-impact social engineering threat with extreme operational asymmetry: false negatives can trigger large financial losses, while false positives primarily incur investigation and delay costs. This paper compares two BEC detection paradigms under a cost-sensitive decision framework: (i) a semantic transformer approach (DistilBERT) for contextual language understanding, and (ii) a forensic psycholinguistic approach (CatBoost) using engineered linguistic and structural cues. We evaluate both on a hybrid dataset (N = 7,990) combining legitimate corporate email and AI-synthesised adversarial fraud generated across 30 BEC taxonomies, including character-level Unicode obfuscations. We add classical baselines (TF-IDF+LogReg and character n-gram+Linear SVM), an ablation study for the Smiling Assassin Score, and a homoglyph-map sensitivity analysis. DistilBERT achieves AUC = 1.0000 and F1 = 0.9981 at 7.403 ms per email on GPU; CatBoost achieves AUC = 0.9860 and F1 = 0.9382 at 0.855 ms on CPU. A three-way cost-sensitive decision policy (auto-allow, auto-block, manual review) optimises expected financial loss under a 1:5,167 false-negative-to-false-positive cost ratio.

replace Mitigating Structural Overfitting: A Distribution-Aware Rectification Framework for Missing Feature Imputation

Authors: Yifan Song, Fenglin Yu, Yihong Luo, Xingjian Tao, Siya Qiu, Kai Han, Jing Tang

Abstract: Incomplete node features are ubiquitous in real-world scenarios such as user profiling and cold-start recommendation, which severely hinders the practical deployment of graph learning systems (e.g., GNNs). Existing solutions typically rely on diffusion-based structural smoothing (e.g., feature propagation) to impute missing values. However, we find that these approaches suffer from structural overfitting, leading to three progressive challenges: 1) performance degradation on disjoint graphs, 2) loss of semantic diversity due to over-smoothing, and 3) feature distribution shift when generalizing to unseen graph structures (inductive tasks). To address these challenges, we introduce the \textbf{\DART} framework. It begins by employing {\em Global Structural Augmentation (GSA)}, which establishes global correlations to bridge disjoint components and extend diffusion coverage. Building upon this, we design a semantic rectifier based on masked autoencoding. This module learns the latent feature manifold to recover natural semantic details. Crucially, we introduce a test-time distribution rectification mechanism that projects structurally biased features back onto the learned manifold during inference, effectively bridging the inductive distribution gap. Furthermore, considering that synthetic masking fails to reflect real-world sparsity, we present a new dataset \textbf{Sailing} collected from voyage records with naturally missing attributes. Extensive experiments on six public datasets and Sailing demonstrate that \DART significantly outperforms state-of-the-art methods in both transductive and inductive settings. Our code and dataset are available at https://github.com/yfsong00/DART.

URLs: https://github.com/yfsong00/DART.

replace Personalized Federated Distillation Assisted Vehicle Edge Caching Strategy

Authors: Xun Li, Qiong Wu, Pingyi Fan, Kezhi Wang, Wen Chen, Cui Zhang

Abstract: Vehicle edge caching is a promising technology that can significantly reduce the latency for vehicle users (VUs) to access content by pre-caching user-interested content at edge nodes. It is crucial to accurately predict the content that VUs are interested in without exposing their privacy. Traditional federated learning (FL) can protect user privacy by sharing models rather than raw data. However, the training of FL requires frequent model transmission, which can result in significant communication overhead. Additionally, vehicles may leave the road side unit (RSU) coverage area before training is completed, leading to training failures. To address these issues, in this paper, we propose a personalized federated distillation assisted vehicle edge caching strategy. The simulation results demonstrate that the proposed vehicle edge caching strategy has good robustness to variations in vehicle speed, significantly reducing communication overhead.

replace Random-Bridges as Stochastic Transports for Generative Models

Authors: Stefano Goria, Levent A. Meng\"ut\"urk, Murat C. Meng\"ut\"urk, Berkan Sesen

Abstract: This paper motivates the use of random-bridges -- stochastic processes conditioned to take target distributions at fixed timepoints -- in the realm of generative modelling. Herein, random-bridges can act as stochastic transports between two probability distributions when appropriately initialized, and can display either Markovian or non-Markovian, and either continuous, discontinuous or hybrid patterns depending on the driving process. We show how one can start from general probabilistic statements and then branch out into specific representations for learning and simulation algorithms in terms of information processing. Our empirical results, built on Gaussian random bridges, produce high-quality samples in significantly fewer steps compared to traditional approaches, while achieving competitive Frechet inception distance scores. Our analysis provides evidence that the proposed framework is computationally cheap and suitable for high-speed generation tasks.

replace GRAFT: Grid-Aware Load Forecasting with Multi-Source Textual Alignment and Fusion

Authors: Fangzhou Lin, Guoshun He, Zhenyu Guo, Zhe Huang, Jinsong Tao

Abstract: Electric load is simultaneously affected across multiple time scales by exogenous factors such as weather and calendar rhythms, sudden events, and policies. Therefore, this paper proposes GRAFT (GRid-Aware Forecasting with Text), which modifies and improves STanHOP to better support grid-aware forecasting and multi-source textual interventions. Specifically, GRAFT strictly aligns daily-aggregated news, social media, and policy texts with half-hour load, and realizes text-guided fusion to specific time positions via cross-attention during both training and rolling forecasting. In addition, GRAFT provides a plug-and-play external-memory interface to accommodate different information sources in real-world deployment. We construct and release a unified aligned benchmark covering 2019--2021 for five Australian states (half-hour load, daily-aligned weather/calendar variables, and three categories of external texts), and conduct systematic, reproducible evaluations at three scales -- hourly, daily, and monthly -- under a unified protocol for comparison across regions, external sources, and time scales. Experimental results show that GRAFT significantly outperforms strong baselines and reaches or surpasses the state of the art across multiple regions and forecasting horizons. Moreover, the model is robust in event-driven scenarios and enables temporal localization and source-level interpretation of text-to-load effects through attention read-out. We release the benchmark, preprocessing scripts, and forecasting results to facilitate standardized empirical evaluation and reproducibility in power grid load forecasting.

replace Kinetic-Mamba: Mamba-Assisted Predictions of Stiff Chemical Kinetics

Authors: Additi Pandey, Liang Wei, Hessam Babaee, George Em Karniadakis

Abstract: Accurate chemical kinetics modeling is essential for combustion simulations, as it governs the evolution of complex reaction pathways and thermochemical states. In this work, we introduce Kinetic-Mamba, a Mamba-based neural operator framework that integrates the expressive power of neural operators with the efficient temporal modeling capabilities of Mamba architectures. The framework comprises three complementary models: (i) a standalone Mamba model that predicts the time evolution of thermochemical state variables from given initial conditions; (ii) a constrained Mamba model that enforces mass conservation while learning the state dynamics; and (iii) a regime-informed architecture employing two standalone Mamba models to capture dynamics across temperature-dependent regimes. We additionally develop a latent Kinetic-Mamba variant that evolves dynamics in a reduced latent space and reconstructs the full state on the physical manifold. The accuracy and robustness of Kinetic-Mamba was evaluated using both time-decomposition and recursive-prediction strategies. We further assess the extrapolation capabilities of the model on varied out-of-distribution datasets. Computational experiments on Syngas and GRI-Mech 3.0 reaction mechanisms demonstrate that our framework achieves high fidelity in predicting complex kinetic behavior using only the initial conditions of the state variables.

replace SFBD-OMNI: Bridge models for lossy measurement restoration with limited clean samples

Authors: Haoye Lu, Yaoliang Yu, Darren Lo

Abstract: In many real-world scenarios, obtaining fully observed samples is prohibitively expensive or even infeasible, while partial and noisy observations are comparatively easy to collect. In this work, we study distribution restoration with abundant noisy samples, assuming the corruption process is available as a black-box generator. We show that this task can be framed as a one-sided entropic optimal transport problem and solved via an EM-like algorithm. We further provide a test criterion to determine whether the true underlying distribution is recoverable under per-sample information loss, and show that in otherwise unrecoverable cases, a small number of clean samples can render the distribution largely recoverable. Building on these insights, we introduce SFBD-OMNI, a bridge model-based framework that maps corrupted sample distributions to the ground-truth distribution. Our method generalizes Stochastic Forward-Backward Deconvolution (SFBD; Lu et al., 2025) to handle arbitrary measurement models beyond Gaussian corruption. Experiments across benchmark datasets and diverse measurement settings demonstrate significant improvements in both qualitative and quantitative performance.

replace Path Integral Solution for Dissipative Generative Dynamics

Authors: Xidi Wang

Abstract: Can purely mechanical systems generate intelligent language? We prove that dissipative quantum dynamics with analytically tractable non-local context aggregation produce coherent text generation, while conservation laws cause fundamental failure. Employing Koopman operators with closed-form path integral propagators, we show irreversible computation fundamentally requires both controlled information dissipation and causal context aggregation. Spectral analysis reveals emergent eigenvalue structure, separating into decay modes (forgetting), growth modes (amplification), and neutral modes (preservation) -- the essential ingredients for directed information flow. Hamiltonian constraints force the elimination of these dissipative modes and degrading performance despite unchanged model capacity. This establishes language generation as dissipative quantum field theory, proving mechanical systems acquire intelligence through the combination of dissipation and non-locality, not through conservation.

replace Bridging the Semantic Gap for Categorical Data Clustering via Large Language Models

Authors: Zihua Yang, Xin Liao, Yiqun Zhang, Yiu-ming Cheung

Abstract: Categorical data are prevalent in domains such as healthcare, marketing, and bioinformatics, where clustering serves as a fundamental tool for pattern discovery. A core challenge in categorical data clustering lies in measuring similarity among attribute values that lack inherent ordering or distance. Without appropriate similarity measures, values are often treated as equidistant, creating a semantic gap that obscures latent structures and degrades clustering quality. Although existing methods infer value relationships from within-dataset co-occurrence patterns, such inference becomes unreliable when samples are limited, leaving the semantic context of the data underexplored. To bridge this gap, we present ARISE (Attention-weighted Representation with Integrated Semantic Embeddings), which draws on external semantic knowledge from Large Language Models (LLMs) to construct semantic-aware representations that complement the metric space of categorical data for accurate clustering. That is, LLM is adopted to describe attribute values for representation enhancement, and the LLM-enhanced embeddings are combined with the original data to explore semantically prominent clusters. Experiments on eight benchmark datasets demonstrate consistent improvements over seven representative counterparts, with gains of 19-27%. Code is available at https://github.com/develop-yang/ARISE

URLs: https://github.com/develop-yang/ARISE

replace ASSS: A Differentiable Adversarial Framework for Task-Aware Data Reduction

Authors: Jiacheng Lyu, Bihua Bao, Shiyun Yan

Abstract: Massive datasets often contain redundancy that inflates computational costs without improving generalization. Existing data reduction methods are typically task-agnostic, discarding informative boundary samples and yielding suboptimal performance. We propose Adversarial Soft-Selection Subsampling (ASSS), a differentiable framework that casts data reduction as a minimax game between a learnable selector and a task network. Using Gumbel-Softmax relaxation, ASSS enables end-to-end gradient flow and is theoretically grounded in the information bottleneck principle. Experiments on multiple benchmarks show that ASSS achieves a performance retention rate (PRR) of 98.9% while using only 30% of the data, significantly outperforming random sampling, K-means, and gradient-based methods. Visualizations confirm that ASSS preferentially retains samples near decision boundaries. The framework is scalable, fully differentiable, and easily integrated into existing training pipelines. This work introduces a new paradigm for task-aware data reduction that directly optimizes subset selection for the downstream objective, offering a principled and practical solution to the scalability challenges in modern deep learning.

replace From Bits to Chips: An LLM-based Hardware-Aware Quantization Agent for Streamlined Deployment of LLMs

Authors: Kaiyuan Deng, Hangyu Zheng, Minghai Qing, Kunxiong Zhu, Gen Li, Yang Xiao, Lan Emily Zhang, Linke Guo, Bo Hui, Yanzhi Wang, Geng Yuan, Gagan Agrawal, Wei Niu, Xiaolong Ma

Abstract: Deploying models, especially large language models (LLMs), is becoming increasingly attractive to a broader user base, including those without specialized expertise. However, due to the resource constraints of certain hardware, maintaining high accuracy with larger model while meeting the hardware requirements remains a significant challenge. Model quantization technique helps mitigate memory and compute bottlenecks, yet the added complexities of tuning and deploying quantized models further exacerbates these challenges, making the process unfriendly to most of the users. We introduce the Hardware-Aware Quantization Agent (HAQA), an automated framework that leverages LLMs to streamline the entire quantization and deployment process by enabling efficient hyperparameter tuning and hardware configuration, thereby simultaneously improving deployment quality and ease of use for a broad range of users. Our results demonstrate up to a 2.3x speedup in inference, along with increased throughput and improved accuracy compared to unoptimized models on Llama. Additionally, HAQA is designed to implement adaptive quantization strategies across diverse hardware platforms, as it automatically finds optimal settings even when they appear counterintuitive, thereby reducing extensive manual effort and demonstrating superior adaptability. Code will be released.

replace Understanding and inverse design of implicit bias in stochastic learning: a geometric perspective

Authors: Nicola Aladrah, Emanuele Ballarin, Matteo Biagetti, Alessio Ansuini, Alberto d'Onofrio, Fabio Anselmi

Abstract: A key challenge in machine learning is to explain how learning dynamics select among the many solutions that achieve identical loss values in overparameterized models - a phenomenon known as implicit bias. Controlling this bias provides a direct mechanism on learned representations, which are central to interpretability, robustness, and reasoning in modern AI systems. Yet, despite its importance, existing explanations remain largely ad hoc and lack a unifying mechanism. We develop a theoretical and constructive framework in which implicit bias emerges as a geometric correction induced by the interplay between gradient noise and continuous symmetries of the loss. We compute the induced bias across a range of architectures, predicting new behaviors and explaining known ones. The approach also enables inverse design: by engineering predictor - preserving parameterizations, it is possible to shape the bias, with sparsity and spectral sparsity emerging as canonical instances. Numerical experiments support the theory and validate the inverse - design framework in controlled settings.

replace HOSL: Hybrid-Order Split Learning for Memory-Constrained Edge Training

Authors: Aakriti Lnu, Zhe Li, Dandan Liang, Chao Huang, Rui Li, Haibo Yang

Abstract: Split learning (SL) enables collaborative training of large language models (LLMs) between resource-constrained edge devices and compute-rich servers by partitioning model computation across the network boundary. However, existing SL systems predominantly rely on first-order (FO) optimization, which requires clients to store intermediate quantities such as activations for backpropagation. This results in substantial memory overhead, largely negating benefits of model partitioning. In contrast, zeroth-order (ZO) optimization eliminates backpropagation and significantly reduces memory usage, but often suffers from slow convergence and degraded performance. In this work, we propose HOSL, a novel Hybrid-Order Split Learning framework that addresses this fundamental trade-off between memory efficiency and optimization effectiveness by strategically integrating ZO optimization on the client side with FO optimization on the server side. By employing memory-efficient ZO gradient estimation at the client, HOSL eliminates backpropagation and activation storage, reducing client memory consumption. Meanwhile, server-side FO optimization ensures fast convergence and competitive performance. Theoretically, we show that HOSL achieves an $\mathcal{O}(\sqrt{d_c/TQ})$ rate, which depends on client-side model dimension $d_c$ rather than the full model dimension $d$, demonstrating that convergence improves as more computation is offloaded to the server. Extensive experiments on OPT models (125M and 1.3B parameters) across 6 tasks demonstrate that HOSL reduces client GPU memory by up to 3.7$\times$ compared to the FO method while achieving accuracy within 0.20%-4.23% of this baseline. Furthermore, HOSL outperforms the ZO baseline by up to 15.55%, validating the effectiveness of our hybrid strategy for memory-efficient training on edge devices.

replace Auxiliary-predicted Compress Memory Model(ApCM Model): A Neural Memory Storage Model Based on Invertible Compression and Learnable Prediction

Authors: Weinuo Ou

Abstract: Current large language models (LLMs) generally lack an effective runtime memory mechanism,making it difficult to adapt to dynamic and personalized interaction requirements. To address this issue, this paper proposes a novel neural memory storage architecture--the Auxiliary Prediction Compression Memory Model (ApCM Model).

replace Accelerated Sinkhorn Algorithms for Partial Optimal Transport

Authors: Nghia Thu Truong, Qui Phu Pham, Quang Nguyen, Dung Luong, Mai Tran

Abstract: Partial Optimal Transport (POT) addresses the problem of transporting only a fraction of the total mass between two distributions, making it suitable when marginals have unequal size or contain outliers. While Sinkhorn-based methods are widely used, their complexity bounds for POT remain suboptimal and can limit scalability. We introduce Accelerated Sinkhorn for POT (ASPOT), which integrates alternating minimization with Nesterov-style acceleration in the POT setting, yielding a complexity of $\mathcal{O}(n^{7/3}\varepsilon^{-5/3})$. We also show that an informed choice of the entropic parameter $\gamma$ improves rates for the classical Sinkhorn method. Experiments on real-world applications validate our theories and demonstrate the favorable performance of our proposed methods.

replace InfoTok: Information-Theoretic Regularization for Capacity-Constrained Shared Visual Tokenization in Unified MLLMs

Authors: Lv Tang, Tianyi Zheng, Bo Li, Xingyu Li

Abstract: Unified multimodal large language models (MLLMs) aim to unify image understanding and image generation within a single framework, where a shared visual tokenizer serves as the sole interface that maps high-dimensional images into a limited token budget for downstream multimodal reasoning and synthesis. However, existing shared-token designs are largely architecture-driven and lack an explicit criterion for what information should be preserved to simultaneously support semantic abstraction and visual detail. In this paper, we adopt a capacity-constrained perspective, viewing the shared tokenizer as a compute-bounded learner whose finite representational budget should prioritize reusable structure over hard-to-exploit high-entropy variations and redundancy. Motivated by this view, we propose \textbf{\textit{InfoTok}}, an information-regularized tokenization mechanism grounded in the Information Bottleneck (IB) principle. InfoTok explicitly controls information flow from images to shared tokens to multimodal outputs by imposing mutual-information (MI) constraints that enforce a principled trade-off between compression and task relevance, while also encouraging cross-modal consistency. Because MI is intractable for high-dimensional visual representations, we instantiate InfoTok with practical, differentiable dependence estimators, including a variational IB formulation and a Hilbert Schmidt Independence Criterion (HSIC) based alternative. Integrated into three representative unified MLLMs without introducing any additional training data, InfoTok consistently improves both image understanding and generation performance. These results support information-regularized visual tokenization as a sound basis for token learning in unified MLLMs.

replace RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models

Authors: Jiacheng Liang, Yuhui Wang, Tanqiu Jiang, Ting Wang

Abstract: Mixture-of-Experts (MoE) language models introduce unique challenges for safety alignment due to their sparse routing mechanisms, which can enable degenerate optimization behaviors under standard full-parameter fine-tuning. In our preliminary experiments, we observe that naively applying full-parameter safety fine-tuning to MoE models can reduce attack success rates through routing or expert dominance effects, rather than by directly repairing Safety-Critical Experts. To address this challenge, we propose RASA, a routing-aware expert-level alignment framework that explicitly repairs Safety-Critical Experts while preventing routing-based bypasses. RASA identifies experts disproportionately activated by successful jailbreaks, selectively fine-tunes only these experts under fixed routing, and subsequently enforces routing consistency with safety-aligned contexts. Across two representative MoE architectures and a diverse set of jailbreak attacks, RASA achieves near-perfect robustness, strong cross-attack generalization, and substantially reduced over-refusal, while preserving general capabilities on benchmarks such as MMLU, GSM8K, and TruthfulQA. Our results suggest that robust MoE safety alignment benefits from targeted expert repair rather than global parameter updates, offering a practical and architecture-preserving alternative to prior approaches.

replace WIMLE: Uncertainty-Aware World Models with IMLE for Sample-Efficient Continuous Control

Authors: Mehran Aghabozorgi, Alireza Moazeni, Yanshu Zhang, Ke Li

Abstract: Model-based reinforcement learning promises strong sample efficiency but often underperforms in practice due to compounding model error, unimodal world models that average over multi-modal dynamics, and overconfident predictions that bias learning. We introduce WIMLE, a model-based method that extends Implicit Maximum Likelihood Estimation (IMLE) to the model-based RL framework to learn stochastic, multi-modal world models without iterative sampling and to estimate predictive uncertainty via ensembles and latent sampling. During training, WIMLE weights each synthetic transition by its predicted confidence, preserving useful model rollouts while attenuating bias from uncertain predictions and enabling stable learning. Across $40$ continuous-control tasks spanning DeepMind Control, MyoSuite, and HumanoidBench, WIMLE achieves superior sample efficiency and competitive or better asymptotic performance than strong model-free and model-based baselines. Notably, on the challenging Humanoid-run task, WIMLE improves sample efficiency by over $50$\% relative to the strongest competitor, and on HumanoidBench it solves $8$ of $14$ tasks (versus $4$ for BRO and $5$ for SimbaV2). These results highlight the value of IMLE-based multi-modality and uncertainty-aware weighting for stable model-based RL.

replace ModalImmune: Immunity Driven Unlearning via Self Destructive Training

Authors: Rong Fu, WeiZhi Tang, Ziming Wang, Jia Yee Tan, Zijian Zhang, Zhaolu Kang, Muge Qi, Shuning Zhang, Simon Fong

Abstract: Multimodal systems are vulnerable to partial or complete loss of input channels at deployment, which undermines reliability in real-world settings. This paper presents ModalImmune, a training framework that enforces modality immunity by intentionally and controllably collapsing selected modality information during training so the model learns joint representations that are robust to destructive modality influence. The framework combines a spectrum-adaptive collapse regularizer, an information-gain guided controller for targeted interventions, curvature-aware gradient masking to stabilize destructive updates, and a certified Neumann-truncated hyper-gradient procedure for automatic meta-parameter adaptation. Empirical evaluation on standard multimodal benchmarks demonstrates that ModalImmune improves resilience to modality removal and corruption while retaining convergence stability and reconstruction capacity.

replace Unlearning Noise in PINNs: A Selective Pruning Framework for PDE Inverse Problems

Authors: Yongsheng Chen, Yong Chen, Wei Guo, Xinghui Zhong

Abstract: Physics-informed neural networks (PINNs) provide a promising framework for solving inverse problems governed by partial differential equations (PDEs) by integrating observational data and physical constraints in a unified optimization objective. However, the ill-posed nature of PDE inverse problems makes them highly sensitive to noise. Even a small fraction of corrupted observations can distort internal neural representations, severely impairing accuracy and destabilizing training. Motivated by recent advances in machine unlearning and structured network pruning, we propose P-PINN, a selective pruning framework designed to unlearn the influence of corrupted data in a pretrained PINN. Specifically, starting from a PINN trained on the full dataset, P-PINN evaluates a joint residual--data fidelity indicator, a weighted combination of data misfit and PDE residuals, to partition the training set into reliable and corrupted subsets. Next, we introduce a bias-based neuron importance measure that quantifies directional activation discrepancies between the two subsets, identifying neurons whose representations are predominantly driven by corrupted samples. Building on this, an iterative pruning strategy then removes noise-sensitive neurons layer by layer. The resulting pruned network is fine-tuned on the reliable data subject to the original PDE constraints, acting as a lightweight post-processing stage rather than a complete retraining. Numerical experiments on extensive PDE inverse-problem benchmarks demonstrate that P-PINN substantially improves robustness, accuracy, and training stability under noisy conditions, achieving up to a 96.6% reduction in relative error compared with baseline PINNs. These results indicate that activation-level post hoc pruning is a promising mechanism for enhancing the reliability of physics-informed learning in noise-contaminated settings.

replace T1: One-to-One Channel-Head Binding for Multivariate Time-Series Imputation

Authors: Dongik Park, Hyunwoo Ryu, Suahn Bae, Keondo Park, Hyung-Sin Kim

Abstract: Imputing missing values in multivariate time series remains challenging, especially under diverse missing patterns and heavy missingness. Existing methods suffer from suboptimal performance as corrupted temporal features hinder effective cross-variable information transfer, amplifying reconstruction errors. Robust imputation requires both extracting temporal patterns from sparse observations within each variable and selectively transferring information across variables--yet current approaches excel at one while compromising the other. We introduce T1 (Time series imputation with 1-to-1 channel-head binding), a CNN-Transformer hybrid architecture that achieves robust imputation through Channel-Head Binding--a mechanism creating one-to-one correspondence between CNN channels and attention heads. This design enables selective information transfer: when missingness corrupts certain temporal patterns, their corresponding attention pathways adaptively down-weight based on remaining observable patterns while preserving reliable cross-variable connections through unaffected channels. Experiments on 11 benchmark datasets demonstrate that T1 achieves state-of-the-art performance, reducing MSE by 46% on average compared to the second-best baseline, with particularly strong gains under extreme sparsity (70% missing ratio). The model generalizes to unseen missing patterns without retraining and uses a consistent hyperparameter configuration across all datasets. The code is available at https://github.com/Oppenheimerdinger/T1.

URLs: https://github.com/Oppenheimerdinger/T1.

replace The Malignant Tail: Spectral Segregation of Label Noise in Over-Parameterized Networks

Authors: Zice Wang

Abstract: While implicit regularization facilitates benign overfitting in low-noise regimes, recent theoretical work predicts a sharp phase transition to harmful overfitting as the noise-to-signal ratio increases. We experimentally isolate the geometric mechanism of this transition: the Malignant Tail, a failure mode where networks functionally segregate signal and noise, reducing coherent semantic features into low-rank subspaces while pushing stochastic label noise into high-frequency orthogonal components, distinct from systematic or corruption-aligned noise. Through a Spectral Linear Probe of training dynamics, we demonstrate that Stochastic Gradient Descent (SGD) fails to suppress this noise, instead implicitly biasing it toward high-frequency orthogonal subspaces, effectively preserving signal-noise separability. We show that this geometric separation is distinct from simple variance reduction in untrained models. In trained networks, SGD actively segregates noise, allowing post-hoc Explicit Spectral Truncation (d << D) to surgically prune the noise-dominated subspace. This approach recovers the optimal generalization capability latent in the converged model. Unlike unstable temporal early stopping, Geometric Truncation provides a stable post-hoc intervention. Our findings suggest that under label noise, excess spectral capacity is not harmless redundancy but a latent structural liability that allows for noise memorization, necessitating explicit rank constraints to filter stochastic corruptions for robust generalization.

replace MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier

Authors: Zonglin Yang, Lidong Bing

Abstract: While large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning process, $P(\text{hypothesis}|\text{background})$ ($P(h|b)$), unexplored. We demonstrate that directly training $P(h|b)$ is mathematically intractable due to the combinatorial complexity ($O(N^k)$) inherent in retrieving and composing inspirations from a vast knowledge base. To break this barrier, we introduce MOOSE-Star, a unified framework enabling tractable training and scalable inference. In the best case, MOOSE-Star reduces complexity from exponential to logarithmic ($O(\log N)$) by (1) training on decomposed subtasks derived from the probabilistic equation of discovery, (2) employing motivation-guided hierarchical search to enable logarithmic retrieval and prune irrelevant subspaces, and (3) utilizing bounded composition for robustness against retrieval noise. To facilitate this, we release TOMATO-Star, a dataset of 108,717 decomposed papers (38,400 GPU hours) for training. Furthermore, we show that while brute-force sampling hits a ''complexity wall,'' MOOSE-Star exhibits continuous test-time scaling.

replace KindSleep: Knowledge-Informed Diagnosis of Obstructive Sleep Apnea from Oximetry

Authors: Micky C Nnamdi, Wenqi Shi, Cheng Wan, J. Ben Tamo, Benjamin M Smith, Chad A Purnell, May D Wang

Abstract: Obstructive sleep apnea (OSA) is a sleep disorder that affects nearly one billion people globally and significantly elevates cardiovascular risk. Traditional diagnosis through polysomnography is resource-intensive and limits widespread access, creating a critical need for accurate and efficient alternatives. In this paper, we introduce KindSleep, a deep learning framework that integrates clinical knowledge with single-channel patient-specific oximetry signals and clinical data for precise OSA diagnosis. KindSleep first learns to identify clinically interpretable concepts, such as desaturation indices and respiratory disturbance events, directly from raw oximetry signals. It then fuses these AI-derived concepts with multimodal clinical data to estimate the Apnea-Hypopnea Index (AHI). We evaluate KindSleep on three large, independent datasets from the National Sleep Research Resource (SHHS, CFS, MrOS; total n = 9,815). KindSleep demonstrates excellent performance in estimating AHI scores (R2 = 0.917, ICC = 0.957) and consistently outperforms existing approaches in classifying OSA severity, achieving weighted F1-scores from 0.827 to 0.941 across diverse populations. By grounding its predictions in a layer of clinically meaningful concepts, KindSleep provides a more transparent and trustworthy diagnostic tool for sleep medicine practices.

replace NePPO: Near-Potential Policy Optimization for General-Sum Multi-Agent Reinforcement Learning

Authors: Addison Kalanther, Sanika Bharvirkar, Shankar Sastry, Chinmay Maheshwari

Abstract: Multi-agent reinforcement learning (MARL) is increasingly used to design learning-enabled agents that interact in shared environments. However, training MARL algorithms in general-sum games remains challenging: learning dynamics can become unstable, and convergence guarantees typically hold only in restricted settings such as two-player zero-sum or fully cooperative games. Moreover, when agents have heterogeneous and potentially conflicting preferences, it is unclear what system-level objective should guide learning. In this paper, we propose a new MARL pipeline called Near-Potential Policy Optimization (NePPO) for computing approximate Nash equilibria in mixed cooperative--competitive environments. The core idea is to learn a player-independent potential function such that the Nash equilibrium of a cooperative game with this potential as the common utility approximates a Nash equilibrium of the original game. To this end, we introduce a novel MARL objective such that minimizing this objective yields the best possible potential function candidate and consequently an approximate Nash equilibrium of the original game. We develop an algorithmic pipeline that minimizes this objective using zeroth-order gradient descent and returns an approximate Nash equilibrium policy. We empirically show the superior performance of this approach compared to popular baselines such as IPPO and MAPPO.

replace OptEMA: Adaptive Exponential Moving Average for Stochastic Optimization with Zero-Noise Optimality

Authors: Ganzhao Yuan

Abstract: The Exponential Moving Average (EMA) is a cornerstone of widely used optimizers such as Adam. However, existing theoretical analyses of Adam-style methods have notable limitations: their guarantees can remain suboptimal in the zero-noise regime, rely on restrictive boundedness conditions (e.g., bounded gradients or objective gaps), use constant or open-loop stepsizes, or require prior knowledge of Lipschitz constants. To overcome these bottlenecks, we introduce OptEMA and analyze two novel variants: OptEMA-M, which applies an adaptive, decreasing EMA coefficient to the first-order moment with a fixed second-order decay, and OptEMA-V, which swaps these roles. At the heart of these variants is a novel Corrected AdaGrad-Norm stepsize. This formulation renders OptEMA closed-loop and Lipschitz-free, meaning its effective stepsizes are strictly trajectory-dependent and require no parameterization via the Lipschitz constant. Under standard stochastic gradient descent (SGD) assumptions, namely smoothness, a lower-bounded objective, and unbiased gradients with bounded variance, we establish rigorous convergence guarantees. Both variants achieve a noise-adaptive convergence rate of $\widetilde{\mathcal{O}}(T^{-1/2}+\sigma^{1/2} T^{-1/4})$ for the average gradient norm, where $\sigma$ is the noise level. Crucially, the Corrected AdaGrad-Norm stepsize plays a central role in enabling the noise-adaptive guarantees: in the zero-noise regime ($\sigma=0$), our bounds automatically reduce to the nearly optimal deterministic rate $\widetilde{\mathcal{O}}(T^{-1/2})$ without any manual hyperparameter retuning.

replace A Grammar of Machine Learning Workflows

Authors: Simon Roth

Abstract: Data leakage has been identified in 648 published machine learning papers across 30 scientific fields. The knowledge to prevent it exists; the tools do not enforce it. This paper presents a grammar - eight typed primitives, a directed acyclic graph, and four hard constraints - that makes the most damaging leakage types structurally unrepresentable. The core mechanism is a terminal assessment gate: the first call-time-enforced evaluate/assess boundary in an ML framework, backed by a specification precise enough for independent reimplementation. A companion landscape study across 2,047 datasets grounds the constraints in measured effect sizes. Two reference implementations (Python, R) are available.

replace Hindsight-Anchored Policy Optimization: Turning Failure into Feedback in Sparse Reward Settings

Authors: Yuning Wu, Ke Wang, Devin Chen, Kai Wei

Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm for post-training reasoning models. However, group-based methods such as Group Relative Policy Optimization (GRPO) face a critical dilemma in sparse-reward settings: pure Reinforcement Learning (RL) suffers from advantage collapse and high-variance gradient estimation, while mixed-policy optimization introduces persistent distributional bias. To resolve this dilemma, we introduce Hindsight-Anchored Policy Optimization (HAPO). HAPO employs the Synthetic Success Injection (SSI) operator, a hindsight mechanism that selectively anchors optimization to teacher demonstrations during failure. This injection is governed by a Thompson sampling-inspired gating mechanism, creating an autonomous, self-paced curriculum. Theoretically, we demonstrate that HAPO achieves \textit{asymptotic consistency}: by naturally annealing the teacher signal as the policy improves, HAPO recovers the unbiased on-policy gradient. This ensures off-policy guidance acts as a temporary scaffold rather than a persistent ceiling, enabling the model to surpass the limitations of static teacher forcing.

replace Security Considerations for Artificial Intelligence Agents

Authors: Ninghui Li, Kaiyuan Zhang, Kyle Polley, Jerry Ma

Abstract: This article, a lightly adapted version of Perplexity's response to NIST/CAISI Request for Information 2025-0035, details our observations and recommendations concerning the security of frontier AI agents. These insights are informed by Perplexity's experience operating general-purpose agentic systems used by millions of users and thousands of enterprises in both controlled and open-world environments. Agent architectures change core assumptions around code-data separation, authority boundaries, and execution predictability, creating new confidentiality, integrity, and availability failure modes. We map principal attack surfaces across tools, connectors, hosting boundaries, and multi-agent coordination, with particular emphasis on indirect prompt injection, confused-deputy behavior, and cascading failures in long-running workflows. We then assess current defenses as a layered stack: input-level and model-level mitigations, sandboxed execution, and deterministic policy enforcement for high-consequence actions. Finally, we identify standards and research gaps, including adaptive security benchmarks, policy models for delegation and privilege control, and guidance for secure multi-agent system design aligned with NIST risk management principles.

replace Brittlebench: Quantifying LLM robustness via prompt sensitivity

Authors: Angelika Romanou, Mark Ibrahim, Candace Ross, Chantal Shaib, Kerem Oktar, Samuel J. Bell, Anaelia Ovalle, Jesse Dodge, Antoine Bosselut, Koustuv Sinha, Adina Williams

Abstract: Existing evaluation methods largely rely on clean, static benchmarks, which can overestimate true model performance by failing to capture the noise and variability inherent in real-world user inputs. This is especially true for language models, which can face human-generated text queries containing mistakes, typos, or alternative ways of phrasing the same question. In this work, we introduce a theoretical framework for quantifying model sensitivity to prompt variants, or brittleness, that can enable us to disentangle data-induced difficulty from prompt-related variability. Using this framework, we design a novel evaluation pipeline, Brittlebench, to holistically evaluate the sensitivity of frontier models. We apply semantics-preserving perturbations to a suite of popular benchmarks, and observe model performance to degrade as much as 12%. However, these perturbations do not affect all models equally: even a single perturbation alters the relative ranking of models in 63% of cases, impacting conclusions about comparative model performance. Decomposing the total variance of both state-of-the-art open-weight and commercial models, we find that semantics-preserving input perturbations can account for up to half of the performance variance for a given model. Brittlebench highlights the need for more robust evaluations and models, and allows us to systematically understand model brittleness.

replace AI-Driven Predictive Maintenance with Environmental Context Integration for Connected Vehicles: Simulation, Benchmarking, and Field Validation

Authors: Kushal Khemani (Independent Researcher, India), Anjum Nazir Qureshi (Rajiv Gandhi College of Engineering Research,Technology)

Abstract: Predictive maintenance for connected vehicles offers the potential to reduce unexpected breakdowns and improve fleet reliability, but most existing systems rely exclusively on internal diagnostic signals and are validated on simulated or industrial benchmark data. This paper presents a contextual data fusion framework integrating vehicle-internal sensor streams with external environmental signals -- road quality, weather, traffic density, and driver behaviour -- acquired via V2X communication and third-party APIs, with inference at the vehicle edge. The framework is evaluated across four layers. A feature group ablation study on a physics-informed synthetic dataset shows contextual features contribute a 2.6-point F1 improvement; removing all context reduces macro F1 from 0.855 to 0.807. On the AI4I 2020 benchmark (10,000 samples), LightGBM achieves AUC-ROC 0.973 under 5-fold stratified cross-validation with SMOTE confined to training folds. A noise sensitivity analysis shows macro F1 remains above 0.88 at low noise and degrades to 0.74 at high noise. Most critically, the pipeline is validated on real-world telemetry from five vehicles across three countries (India, Germany, Brazil), comprising 992 trips and 11 evaluable service events identified from component wear resets in the trip logs. Across six wear-driven events spanning four vehicles, the model achieves 100% detection with mean MAE of 12.2 days. A fine-tuning ablation shows the base synthetic model already achieves 6/6 binary detection; per-vehicle adaptation reduces wear-driven MAE from 25.9 to 12.2 days. SHAP analysis confirms contextual and interaction features rank among the top 15 predictors. Edge-based inference reduces estimated latency from 3.5 seconds to under 1.0 second relative to cloud-only processing.

replace Accelerating Byzantine-Robust Distributed Learning with Compressed Communication via Double Momentum and Variance Reduction

Authors: Yanghao Li, Changxin Liu, Yuhao Yi

Abstract: In collaborative and distributed learning, Byzantine robustness reflects a major facet of optimization algorithms. Such distributed algorithms are often accompanied by transmitting a large number of parameters, so communication compression is essential for an effective solution. In this paper, we propose Byz-DM21, a novel Byzantine-robust and communication-efficient stochastic distributed learning algorithm. Our key innovation is a novel gradient estimator based on a double-momentum mechanism, integrating recent advancements in error feedback techniques. Using this estimator, we design both standard and accelerated algorithms that eliminate the need for large batch sizes while maintaining robustness against Byzantine workers. We prove that the Byz-DM21 algorithm has a smaller neighborhood size and converges to $\varepsilon$-stationary points in $\mathcal{O}(\varepsilon^{-4})$ iterations. To further enhance efficiency, we introduce a distributed variant called Byz-VR-DM21, which incorporates local variance reduction at each node to progressively eliminate variance from random approximations. We show that Byz-VR-DM21 provably converges to $\varepsilon$-stationary points in $\mathcal{O}(\varepsilon^{-3 })$ iterations. Additionally, we extend our results to the case where the functions satisfy the Polyak-{\L}ojasiewicz condition. Finally, numerical experiments demonstrate the effectiveness of the proposed method.

replace Path-Constrained Mixture-of-Experts

Authors: Zijin Gu, Tatiana Likhomanenko, Vimal Thilak, Jason Ramapuram, Navdeep Jaitly

Abstract: Sparse Mixture-of-Experts (MoE) architectures route each token through a subset of experts at each layer independently. We propose viewing MoE computation through the lens of \emph{expert paths} -- the sequence of expert selections a token makes across all layers. This perspective reveals that, despite $N^L$ possible paths for $N$ experts across $L$ layers, tokens in practice cluster into a small fraction of paths that align with linguistic function, yet the vast majority of paths remain unexplored, representing a statistical inefficiency. This motivates architectures that constrain the effective path space to amplify this natural concentration. As one instantiation, we introduce \pathmoe{}, which shares router parameters across blocks of consecutive layers. Analysis confirms that \pathmoe{} amplifies the emergent path structure: it produces more concentrated path clusters, better cross-layer consistency, and greater robustness to routing perturbations. Experiments on 0.9B and 16B parameter \pathmoe{} models demonstrate consistent improvements on perplexity and downstream tasks over independent routing, while eliminating the need for auxiliary losses. These results establish expert paths as a useful design axis for MoE architectures, complementary to existing work on independent routing mechanisms.

replace Improving RCT-Based CATE Estimation Under Covariate Mismatch via Calibrated Alignment

Authors: Amir Asiaee, Samhita Pal

Abstract: Randomized controlled trials (RCTs) are the gold standard for estimating heterogeneous treatment effects, yet they are often underpowered for detecting effect heterogeneity. Large observational studies (OS) can supplement RCTs for conditional average treatment effect (CATE) estimation, but a key barrier is covariate mismatch: the two sources measure different, only partially overlapping, covariates. We propose CALM (Calibrated ALignment under covariate Mismatch), which bypasses imputation by learning embeddings that map each source's features into a common representation space. OS outcome models are transferred to the RCT embedding space and calibrated using trial data, preserving causal identification from randomization. Finite-sample risk bounds decompose into alignment error, outcome-model complexity, and calibration complexity terms, identifying when embedding alignment outperforms imputation. Under the calibration-based linear variant, the framework provides protection against negative transfer; the neural variant can be vulnerable under severe distributional shift. Under sparse linear models, the embedding approach strictly generalizes imputation. Simulations across 51 settings confirm that (i) calibration-based methods are equivalent for linear CATEs, and (ii) the neural embedding variant wins all 22 nonlinear-regime settings with large margins.

replace How Out-of-Equilibrium Phase Transitions can Seed Pattern Formation in Trained Diffusion Models

Authors: Luca Ambrogioni

Abstract: Diffusion models generate structure by progressively transforming noise into data, yet the mechanisms underlying this transition remain poorly understood. In this work, we show that pattern formation in trained diffusion models can be explained as an out-of-equilibrium phase transition driven by instabilities in the denoising dynamics. We develop a theoretical framework linking data symmetries and architectural constraints, such as locality and translation equivariance, to the emergence of collective spatial modes. In this view, structure arises when low-frequency modes become unstable, triggering a rapid growth of spatial correlations that organizes noise into coherent patterns. We validate this theory through a combination of analytical models and experiments. In a controlled patch-based model, we observe a sharp increase in correlation length and a simultaneous softening of low-frequency modes at a well-defined critical time, accurately predicted by theory. Similar signatures are found in trained convolutional diffusion models on Fashion-MNIST and in large-scale ImageNet models, where pattern formation coincides with a peak in estimated correlation length and a pronounced weakening of spatial modes. Finally, intervention experiments show that applying guidance precisely at this critical stage significantly improves class alignment compared to applying it at random times, demonstrating that this regime is not only descriptive but functionally important.

replace Learning Sampled-data Control for Swarms via MeanFlow

Authors: Anqi Dong, Yongxin Chen, Karl H. Johansson, Johan Karlsson

Abstract: Steering large-scale swarms with only limited control updates is often needed due to communication or computational constraints, yet most learning-based approaches do not account for this and instead model instantaneous velocity fields. As a result, the natural object for decision making is a finite-window control quantity rather than an infinitesimal one. To address this gap, we consider the recent machine learning framework MeanFlow and generalize it to the setting with general linear dynamic systems. This results in a new sampled-data learning framework that operates directly in control space and that can be applied for swarm steering. To this end, we learn the finite-horizon coefficient that parameterizes the minimum-energy control applied over each interval, and derive a differential identity that connects this quantity to a local bridge-induced supervision signal. This identity leads to a simple stop-gradient regression objective, allowing the interval coefficient field to be learned efficiently from bridge samples. The learned policy is deployed through sampled-data updates, guaranteeing that the resulting controller exactly respects the prescribed linear time-invariant dynamics and actuation channel. The resulting method enables few-step swarm steering at scale, while remaining consistent with the finite-window actuation structure of the underlying control system.

replace LLM-ODE: Data-driven Discovery of Dynamical Systems with Large Language Models

Authors: Amirmohammad Ziaei Bideh, Jonathan Gryak

Abstract: Discovering the governing equations of dynamical systems is a central problem across many scientific disciplines. As experimental data become increasingly available, automated equation discovery methods offer a promising data-driven approach to accelerate scientific discovery. Among these methods, genetic programming (GP) has been widely adopted due to its flexibility and interpretability. However, GP-based approaches often suffer from inefficient exploration of the symbolic search space, leading to slow convergence and suboptimal solutions. To address these limitations, we propose LLM-ODE, a large language model-aided model discovery framework that guides symbolic evolution using patterns extracted from elite candidate equations. By leveraging the generative prior of large language models, LLM-ODE produces more informed search trajectories while preserving the exploratory strengths of evolutionary algorithms. Empirical results on 91 dynamical systems show that LLM-ODE variants consistently outperform classical GP methods in terms of search efficiency and Pareto-front quality. Overall, our results demonstrate that LLM-ODE improves both efficiency and accuracy over traditional GP-based discovery and offers greater scalability to higher-dimensional systems compared to linear and Transformer-only model discovery methods.

replace ALMAB-DC: Active Learning, Multi-Armed Bandits, and Distributed Computing for Sequential Experimental Design and Black-Box Optimization

Authors: Foo Hui-Mean, Yuan-chin I Chang

Abstract: Sequential experimental design under expensive, gradient-free objectives is a central challenge in computational statistics: evaluation budgets are tightly constrained and information must be extracted efficiently from each observation. We propose \textbf{ALMAB-DC}, a GP-based sequential design framework combining active learning, multi-armed bandits (MAB), and distributed asynchronous computing for expensive black-box experimentation. A Gaussian process surrogate with uncertainty-aware acquisition identifies informative query points; a UCB or Thompson-sampling bandit controller allocates evaluations across parallel workers; and an asynchronous scheduler handles heterogeneous runtimes. We present cumulative regret bounds for the bandit components and characterize parallel scalability via Amdahl's Law. We validate ALMAB-DC on five benchmarks. On the two statistical experimental-design tasks, ALMAB-DC achieves lower simple regret than Equal Spacing, Random, and D-optimal designs in dose--response optimization, and in adaptive spatial field estimation matches the Greedy Max-Variance benchmark while outperforming Latin Hypercube Sampling; at $K=4$ the distributed setting reaches target performance in one-quarter of sequential wall-clock rounds. On three ML/engineering tasks (CIFAR-10 HPO, CFD drag minimization, MuJoCo RL), ALMAB-DC achieves 93.4\% CIFAR-10 accuracy (outperforming BOHB by 1.7\,pp and Optuna by 1.1\,pp), reduces airfoil drag to $C_D = 0.059$ (36.9\% below Grid Search), and improves RL return by 50\% over Grid Search. All advantages over non-ALMAB baselines are statistically significant under Bonferroni-corrected Mann--Whitney $U$ tests. Distributed execution achieves $7.5\times$ speedup at $K = 16$ agents, consistent with Amdahl's Law.

replace Posterior-Calibrated Causal Circuits in Variational Autoencoders: Why Image-Domain Interpretability Fails on Tabular Data

Authors: Dip Roy, Rajiv Misra, Sanjay Kumar Singh, Anisha Roy

Abstract: Although mechanism-based interpretability has generated an abundance of insight for discriminative network analysis, generative models are less understood -- particularly outside of image-related applications. We investigate how much of the causal circuitry found within image-related variational autoencoders (VAEs) will generalize to tabular data, as VAEs are increasingly used for imputation, anomaly detection, and synthetic data generation. In addition to extending a four-level causal intervention framework to four tabular and one image benchmark across five different VAE architectures (with 75 individual training runs per architecture and three random seed values for each run), this paper introduces three new techniques: posterior-calibration of Causal Effect Strength (CES), path-specific activation patching, and Feature-Group Disentanglement (FGD). The results from our experiments demonstrate that: (i) Tabular VAEs have circuits with modularity that is approximately 50% lower than their image counterparts. (ii) $\beta$-VAE experiences nearly complete collapse in CES scores when applied to heterogeneous tabular features (0.043 CES score for tabular data compared to 0.133 CES score for images), which can be directly attributed to reconstruction quality degradation (r = -0.886 correlation coefficient between CES and MSE). (iii) CES successfully captures nine of eleven statistically significant architecture differences using Holm--\v{S}id\'{a}k corrections. (iv) Interventions with high specificity predict the highest downstream AUC values (r = 0.460, p < .001). This study challenges the common assumption that architectural guidance from image-related studies can be transferred to tabular datasets.

replace Uncertainty Quantification for Distribution-to-Distribution Flow Matching in Scientific Imaging

Authors: Dongxia Wu, Yuhui Zhang, Serena Yeung-Levy, Emma Lundberg, Emily B. Fox

Abstract: Distribution-to-distribution generative models support scientific imaging tasks ranging from modeling cellular perturbation responses to translating medical images across conditions. Trustworthy generation requires both reliability (generalization across labs, devices, and experimental conditions) and accountability (detecting out-of-distribution cases where predictions may be unreliable). Uncertainty quantification (UQ) based approaches serve as promising candidates for these tasks, yet UQ for distribution-to-distribution generative models remains underexplored. We present a unified UQ framework, Bayesian Stochastic Flow Matching (BSFM), that disentangles aleatoric and epistemic uncertainty. The Stochastic Flow Matching (SFM) component augments deterministic flows with a diffusion term to improve model generalization to unseen scenarios. For UQ, we develop a scalable Bayesian approach -- MCD-Antithetic -- that combines Monte Carlo Dropout with sample-efficient antithetic sampling to produce effective anomaly scores for out-of-distribution detection. Experiments on cellular imaging (BBBC021, JUMP) and brain fMRI (Theory of Mind) across diverse scenarios show that SFM improves reliability while MCD-Antithetic enhances accountability.

replace CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement Learning

Authors: Dongxia Wu, Shiye Su, Yuhui Zhang, Elaine Sui, Emma Lundberg, Emily B. Fox, Serena Yeung-Levy

Abstract: Building virtual cells with generative models to simulate cellular behavior in silico is emerging as a promising paradigm for accelerating drug discovery. However, prior image-based generative approaches can produce implausible cell images that violate basic physical and biological constraints. To address this, we propose to post-train virtual cell models with reinforcement learning (RL), leveraging biologically meaningful evaluators as reward functions. We design seven rewards spanning three categories-biological function, structural validity, and morphological correctness-and optimize the state-of-the-art CellFlux model to yield CellFluxRL. CellFluxRL consistently improves over CellFlux across all rewards, with further performance boosts from test-time scaling. Overall, our results present a virtual cell modeling framework that enforces physically-based constraints through RL, advancing beyond "visually realistic" generations towards "biologically meaningful" ones.

replace Causal Discovery in Action: Learning Chain-Reaction Mechanisms from Interventions

Authors: Panayiotis Panayiotou, \"Ozg\"ur \c{S}im\c{s}ek

Abstract: Causal discovery is challenging in general dynamical systems because, without strong structural assumptions, the underlying causal graph may not be identifiable even from interventional data. However, many real-world systems exhibit directional, cascade-like structure, in which components activate sequentially and upstream failures suppress downstream effects. We study causal discovery in such chain-reaction systems and show that the causal structure is uniquely identifiable from blocking interventions that prevent individual components from activating. We propose a minimal estimator with finite-sample guarantees, achieving exponential error decay and logarithmic sample complexity. Experiments on synthetic models and diverse chain-reaction environments demonstrate reliable recovery from a few interventions, while observational heuristics fail in regimes with delayed or overlapping causal effects.

replace Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions

Authors: Rustem Islamov, Grigory Malinovsky, Alexander Gaponov, Aurelien Lucchi, Peter Richt\'arik, Eduard Gorbunov

Abstract: Federated Learning (FL) enables heterogeneous clients to collaboratively train a shared model without centralizing their raw data, offering an inherent level of privacy. However, gradients and model updates can still leak sensitive information, while malicious servers may mount adversarial attacks such as Byzantine manipulation. These vulnerabilities highlight the need to address differential privacy (DP) and Byzantine robustness within a unified framework. Existing approaches, however, often rely on unrealistic assumptions such as bounded gradients, require auxiliary server-side datasets, or fail to provide convergence guarantees. We address these limitations by proposing Byz-Clip21-SGD2M, a new algorithm that integrates robust aggregation with double momentum and carefully designed clipping. We prove high-probability convergence guarantees under standard $L$-smoothness and $\sigma$-sub-Gaussian gradient noise assumptions, thereby relaxing conditions that dominate prior work. Our analysis recovers state-of-the-art convergence rates in the absence of adversaries and improves utility guarantees under Byzantine and DP settings. Empirical evaluations on CNN and MLP models trained on MNIST further validate the effectiveness of our approach.

replace Large Language Model Guided Incentive Aware Reward Design for Cooperative Multi-Agent Reinforcement Learning

Authors: Dogan Urgun, Gokhan Gungor

Abstract: Designing effective auxiliary rewards for cooperative multi-agent systems remains challenging, as misaligned incentives can induce suboptimal coordination, particularly when sparse task rewards provide insufficient grounding for coordinated behavior. This study introduces an automated reward design framework that uses large language models to synthesize executable reward programs from environment instrumentation. The procedure constrains candidate programs within a formal validity envelope and trains policies from scratch using MAPPO under a fixed computational budget. The candidates are then evaluated based on their performance, and selection across generations relies solely on the sparse task returns. The framework is evaluated in four Overcooked-AI layouts characterized by varying levels of corridor congestion, handoff dependencies, and structural asymmetries. The proposed reward design approach consistently yields higher task returns and delivery counts, with the most pronounced gains observed in environments dominated by interaction bottlenecks. Diagnostic analysis of the synthesized shaping components reveals stronger interdependence in action selection and improved signal alignment in coordination-intensive tasks. These results demonstrate that the proposed LLM-guided reward search framework mitigates the need for manual engineering while producing shaping signals compatible with cooperative learning under finite budgets.

replace Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch

Authors: Fabio Ferreira, Lucca Wobbe, Arjun Krishnakumar, Frank Hutter, Arber Zela

Abstract: The autoresearch repository enables an LLM agent to search for optimal hyperparameter configurations on an unconstrained search space by editing the training code directly. Given a fixed compute budget and constraints, we use autoresearch as a testbed to compare classical hyperparameter optimization (HPO) algorithms against LLM-based methods on tuning the hyperparameters of a small language model. Within a fixed hyperparameter search space, classical HPO methods such as CMA-ES and TPE consistently outperform LLM-based agents. However, an LLM agent that directly edits training source code in an unconstrained search space narrows the gap to classical methods substantially despite using only a self-hosted open-weight 27B model. Methods that avoid out-of-memory failures outperform those with higher search diversity, suggesting that reliability matters more than exploration breadth. While small and mid-sized LLMs struggle to track optimization state across trials, classical methods lack domain knowledge. To bridge this gap, we introduce Centaur, a hybrid that shares CMA-ES's internal state, including mean vector, step-size, and covariance matrix, with an LLM. Centaur achieves the best result in our experiments, with its 0.8B variant outperforming the 27B variant, suggesting that a cheap LLM suffices when paired with a strong classical optimizer. The 0.8B model is insufficient for unconstrained code editing but sufficient for hybrid optimization, while scaling to 27B provides no advantage for fixed search space methods. Experiments with the frontier model Gemini 3.1 Pro Preview do not close the gap to classical methods. Code is available at https://github.com/ferreirafabio/autoresearch-automl.

URLs: https://github.com/ferreirafabio/autoresearch-automl.

replace Optimal High-Probability Regret for Online Convex Optimization with Two-Point Bandit Feedback

Authors: Haishan Ye

Abstract: We consider the problem of Online Convex Optimization (OCO) with two-point bandit feedback. In this setting, a player attempts to minimize a sequence of adversarially generated convex loss functions, while only observing the value of each function at two points. While it is well-known that two-point feedback allows for gradient estimation, achieving tight high-probability regret bounds for strongly convex functions still remained open as highlighted by \citet{agarwal2010optimal}. The primary challenge lies in the heavy-tailed nature of bandit gradient estimators, which makes standard concentration analysis difficult. In this paper, we resolve this open challenge and provide the first high-probability regret bound of $O(d(\log T + \log(1/\delta))/\mu)$ for $\mu$-strongly convex losses. Our result is minimax optimal with respect to both the time horizon $T$ and the dimension $d$.

replace ATLAS-RTC: Closing the Loop on LLM Agent Output with Token-Level Runtime Control

Authors: Christopher Cruz

Abstract: We present ATLAS-RTC, a runtime control system for autoregressive language models that enforces structured output during decoding. ATLAS-RTC monitors generation at each step, detects drift from output contracts using lightweight signals, and applies targeted interventions such as biasing, masking, and rollback. Unlike post-hoc validation or static constrained decoding, it operates in a closed loop, enabling correction before errors materialize. Across structured generation and tool-calling tasks, ATLAS-RTC improves first-attempt success rates by 20 to 37.8 percentage points, with up to 88% latency reduction in failure-dominated settings. Results show that many failures arise from decoding artifacts rather than task misunderstanding, motivating runtime control as a distinct layer in LLM systems.

replace HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention

Authors: Yufei Xu, Fanxu Meng, Fan Jiang, Yuxuan Wang, Ruijie Zhou, Zhaohui Wang, Jiexi Wu, Zhixin Pan, Xiaojuan Tang, Wenjie Pei, Tongxuan Liu, Di Yin, Xing Sun, Muhan Zhang

Abstract: Token-level sparse attention mechanisms, exemplified by DeepSeek Sparse Attention (DSA), achieve fine-grained key selection by scoring every historical key for each query through a lightweight indexer, then computing attention only on the selected subset. While the downstream sparse attention itself scales favorably, the indexer must still scan the entire prefix for every query, introducing an per-layer bottleneck that grows prohibitively with context length. We propose HISA (Hierarchical Indexed Sparse Attention), a plug-and-play replacement for the indexer that rewrites the search path from a flat token scan into a two-stage hierarchical procedure: (1) a block-level coarse filtering stage that scores pooled block representations to discard irrelevant regions, followed by (2) a token-level refinement stage that applies the original indexer exclusively within the retained candidate blocks. HISA preserves the identical token-level top-sparse pattern consumed by the downstream Sparse MLA operator and requires no additional training. On kernel-level benchmarks, HISA achieves up to speedup at 64K context. On Needle-in-a-Haystack and LongBench, we directly replace the indexer in DeepSeek-V3.2 and GLM-5 with our HISA indexer, without any finetuning. HISA closely matches the original DSA in quality, while substantially outperforming block-sparse baselines.

replace Detecting low left ventricular ejection fraction from ECG using an interpretable and scalable predictor-driven framework

Authors: Ya Zhou, Tianxiang Hao, Ziyi Cai, Haojie Zhu, Kejun He, Jia Liu, Xiaohan Fan, Jing Yuan

Abstract: Low left ventricular ejection fraction (LEF) frequently remains undetected until progression to symptomatic heart failure, underscoring the need for scalable screening strategies. Although artificial intelligence-enabled electrocardiography (AI-ECG) has shown promise, existing approaches rely solely on end-to-end black-box models with limited interpretability or on tabular systems dependent on commercial ECG measurement algorithms with suboptimal performance. We introduced ECG-based Predictor-Driven LEF (ECGPD-LEF), a structured framework that integrates foundation model-derived diagnostic probabilities with interpretable modeling for detecting LEF from ECG. Trained on the benchmark EchoNext dataset comprising 72,475 ECG-echocardiogram pairs and evaluated in predefined independent internal (n=5,442) and external (n=16,017) cohorts, our framework achieved robust discrimination for moderate LEF (internal AUROC 88.4%, F1 64.5%; external AUROC 86.8%, F1 53.6%), consistently outperforming the official end-to-end baseline provided with the benchmark across demographic and clinical subgroups. Interpretability analyses identified high-impact predictors, including normal ECG, incomplete left bundle branch block, and subendocardial injury in anterolateral leads, driving LEF risk estimation. Notably, these predictors independently enabled zero-shot-like inference without task-specific retraining (internal AUROC 75.3-81.0%; external AUROC 71.6-78.6%), indicating that ventricular dysfunction is intrinsically encoded within structured diagnostic probability representations. This framework reconciles predictive performance with mechanistic transparency, supporting scalable enhancement through additional predictors and seamless integration with existing AI-ECG systems.

replace See it to Place it: Evolving Macro Placements with Vision-Language Models

Authors: Ikechukwu Uchendu, Swati Goel, Karly Hou, Ebrahim Songhori, Kuang-Huei Lee, Joe Wenjie Jiang, Vijay Janapa Reddi, Vincent Zhuang

Abstract: We propose using Vision-Language Models (VLMs) for macro placement in chip floorplanning, a complex optimization task that has recently shown promising advancements through machine learning methods. Because human designers rely heavily on spatial reasoning to arrange components on the chip canvas, we hypothesize that VLMs with strong visual reasoning abilities can effectively complement existing learning-based approaches. We introduce VeoPlace (Visual Evolutionary Optimization Placement), a novel framework that uses a VLM, without any fine-tuning, to guide the actions of a base placer by constraining them to subregions of the chip canvas. The VLM proposals are iteratively optimized through an evolutionary search strategy with respect to resulting placement quality. On open-source benchmarks, VeoPlace outperforms the best prior learning-based approach on 9 of 10 benchmarks with peak wirelength reductions exceeding 32%. We further demonstrate that VeoPlace generalizes to analytical placers, improving DREAMPlace performance on all 8 evaluated benchmarks with gains up to 4.3%. Our approach opens new possibilities for electronic design automation tools that leverage foundation models to solve complex physical design problems.

replace Rethinking Language Model Scaling under Transferable Hypersphere Optimization

Authors: Liliang Ren, Yang Liu, Yelong Shen, Weizhu Chen

Abstract: Scaling laws for large language models depend critically on the optimizer and parameterization. Existing hyperparameter transfer laws are mainly developed for first-order optimizers, and they do not structurally prevent training instability at scale. Recent hypersphere optimization methods constrain weight matrices to a fixed-norm hypersphere, offering a promising alternative for more stable scaling. We introduce HyperP (Hypersphere Parameterization), the first framework for transferring optimal learning rates across model width, depth, training tokens, and Mixture-of-Experts (MoE) granularity under the Frobenius-sphere constraint with the Muon optimizer. We prove that weight decay is a first-order no-op on the Frobenius sphere, show that Depth-$\mu$P remains necessary, and find that the optimal learning rate follows the same data-scaling power law with the "magic exponent" 0.32 previously observed for AdamW. A single base learning rate tuned at the smallest scale transfers across all compute budgets under HyperP, yielding $1.58\times$ compute efficiency over a strong Muon baseline at $6\times10^{21}$ FLOPs. Moreover, HyperP delivers transferable stability: all monitored instability indicators, including $Z$-values, output RMS, and activation outliers, remain bounded and non-increasing under training FLOPs scaling. We also propose SqrtGate, an MoE gating mechanism derived from the hypersphere constraint that preserves output RMS across MoE granularities for improved granularity scaling, and show that hypersphere optimization enables substantially larger auxiliary load-balancing weights, yielding both strong performance and good expert balance. We release our training codebase at https://github.com/microsoft/ArchScale.

URLs: https://github.com/microsoft/ArchScale.

replace Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training

Authors: Ivan Pasichnyk

Abstract: Standard neural network training uses constant momentum (typically 0.9), a convention dating to 1964 with limited theoretical justification for its optimality. We derive a time-varying momentum schedule from the critically damped harmonic oscillator: mu(t) = 1 - 2*sqrt(alpha(t)), where alpha(t) is the current learning rate. This beta-schedule requires zero free parameters beyond the existing learning rate schedule. On ResNet-18/CIFAR-10, beta-scheduling delivers 1.9x faster convergence to 90% accuracy compared to constant momentum. More importantly, the per-layer gradient attribution under this schedule produces a cross-optimizer invariant diagnostic: the same three problem layers are identified regardless of whether the model was trained with SGD or Adam (100% overlap). Surgical correction of only these layers fixes 62 misclassifications while retraining only 18% of parameters. A hybrid schedule -- physics momentum for fast early convergence, then constant momentum for the final refinement -- reaches 95% accuracy fastest among five methods tested. The main contribution is not an accuracy improvement but a principled, parameter-free tool for localizing and correcting specific failure modes in trained networks.

replace ReproMIA: A Comprehensive Analysis of Model Reprogramming for Proactive Membership Inference Attacks

Authors: Chihan Huang, Huaijin Wang, Shuai Wang

Abstract: The pervasive deployment of deep learning models across critical domains has concurrently intensified privacy concerns due to their inherent propensity for data memorization. While Membership Inference Attacks (MIAs) serve as the gold standard for auditing these privacy vulnerabilities, conventional MIA paradigms are increasingly constrained by the prohibitive computational costs of shadow model training and a precipitous performance degradation under low False Positive Rate constraints. To overcome these challenges, we introduce a novel perspective by leveraging the principles of model reprogramming as an active signal amplifier for privacy leakage. Building upon this insight, we present \texttt{ReproMIA}, a unified and efficient proactive framework for membership inference. We rigorously substantiate, both theoretically and empirically, how our methodology proactively induces and magnifies latent privacy footprints embedded within the model's representations. We provide specialized instantiations of \texttt{ReproMIA} across diverse architectural paradigms, including LLMs, Diffusion Models, and Classification Models. Comprehensive experimental evaluations across more than ten benchmarks and a variety of model architectures demonstrate that \texttt{ReproMIA} consistently and substantially outperforms existing state-of-the-art baselines, achieving a transformative leap in performance specifically within low-FPR regimes, such as an average of 5.25\% AUC and 10.68\% TPR@1\%FPR increase over the runner-up for LLMs, as well as 3.70\% and 12.40\% respectively for Diffusion Models.

replace On the Mirage of Long-Range Dependency, with an Application to Integer Multiplication

Authors: Zichao Wei

Abstract: Integer multiplication has long been considered a hard problem for neural networks, with the difficulty widely attributed to the O(n) long-range dependency induced by carry chains. We argue that this diagnosis is wrong: long-range dependency is not an intrinsic property of multiplication, but a mirage produced by the choice of computational spacetime. We formalize the notion of mirage and provide a constructive proof: when two n-bit binary integers are laid out as a 2D outer-product grid, every step of long multiplication collapses into a $3 \times 3$ local neighborhood operation. Under this representation, a neural cellular automaton with only 321 learnable parameters achieves perfect length generalization up to $683\times$ the training range. Five alternative architectures -- including Transformer (6,625 params), Transformer+RoPE, and Mamba -- all fail under the same representation. We further analyze how partial successes locked the community into an incorrect diagnosis, and argue that any task diagnosed as requiring long-range dependency should first be examined for whether the dependency is intrinsic to the task or induced by the computational spacetime.

replace Realistic Market Impact Modeling for Reinforcement Learning Trading Environments

Authors: Lucas Riera Abbade, Anna Helena Reali Costa

Abstract: Reinforcement learning (RL) has shown promise for trading, yet most open-source backtesting environments assume negligible or fixed transaction costs, causing agents to learn trading behaviors that fail under realistic execution. We introduce three Gymnasium-compatible trading environments -- MACE (Market-Adjusted Cost Execution) stock trading, margin trading, and portfolio optimization -- that integrate nonlinear market impact models grounded in the Almgren-Chriss framework and the empirically validated square-root impact law. Each environment provides pluggable cost models, permanent impact tracking with exponential decay, and comprehensive trade-level logging. We evaluate five DRL algorithms (A2C, PPO, DDPG, SAC, TD3) on the NASDAQ-100, comparing a fixed 10 bps baseline against the AC model with Optuna-tuned hyperparameters. Our results show that (i) the cost model materially changes both absolute performance and the relative ranking of algorithms across all three environments; (ii) the AC model produces dramatically different trading behavior, e.g., daily costs dropping from $200k to $8k with turnover falling from 19% to 1%; (iii) hyperparameter optimization is essential for constraining pathological trading, with costs dropping up to 82%; and (iv) algorithm-cost model interactions are strongly environment-specific, e.g., DDPG's OOS Sharpe jumps from -2.1 to 0.3 under AC in margin trading while SAC's drops from -0.5 to -1.2. We release the full suite as an open-source extension to FinRL-Meta.

replace Task-Centric Personalized Federated Fine-Tuning of Language Models

Authors: Gabriel U. Talasso, Meghdad Kurmanji, Allan M. de Souza, Nicholas D. Lane, Leandro A. Villas

Abstract: Federated Learning (FL) has emerged as a promising technique for training language models on distributed and private datasets of diverse tasks. However, aggregating models trained on heterogeneous tasks often degrades the overall performance of individual clients. To address this issue, Personalized FL (pFL) aims to create models tailored for each client's data distribution. Although these approaches improve local performance, they usually lack robustness in two aspects: (i) generalization: when clients must make predictions on unseen tasks, or face changes in their data distributions, and (ii) intra-client tasks interference: when a single client's data contains multiple distributions that may interfere with each other during local training. To tackle these two challenges, we propose FedRouter, a clustering-based pFL that builds specialized models for each task rather than for each client. FedRouter uses adapters to personalize models by employing two clustering mechanisms to associate adapters with specific tasks. A local clustering that associate adapters with task data samples and a global one that associates similar adapters from different clients to construct task-centric personalized models. Additionally, we propose an evaluation router mechanism that routes test samples to the best adapter based on the created clusters. Experiments comparing our method with existing approaches across a multitask dataset, FedRouter demonstrate strong resilience in these challenging scenarios performing up to 6.1% relatively better under tasks interference and up to 136% relative improvement under generalization evaluation.

replace Convergence of Byzantine-Resilient Gradient Tracking via Probabilistic Edge Dropout

Authors: Amirhossein Dezhboro, Fateme Maleki, Arman Adibi, Erfan Amini, Jose E. Ramirez-Marquez

Abstract: We study distributed optimization over networks with Byzantine agents that may send arbitrary adversarial messages. We propose \emph{Gradient Tracking with Probabilistic Edge Dropout} (GT-PD), a stochastic gradient tracking method that preserves the convergence properties of gradient tracking under adversarial communication. GT-PD combines two complementary defense layers: a universal self-centered projection that clips each incoming message to a ball of radius $\tau$ around the receiving agent, and a fully decentralized probabilistic dropout rule driven by a dual-metric trust score in the decision and tracking channels. This design bounds adversarial perturbations while preserving the doubly stochastic mixing structure, a property often lost under robust aggregation in decentralized settings. Under complete Byzantine isolation ($p_b=0$), GT-PD converges linearly to a neighborhood determined solely by stochastic gradient variance. For partial isolation ($p_b>0$), we introduce \emph{Gradient Tracking with Probabilistic Edge Dropout and Leaky Integration} (GT-PD-L), which uses a leaky integrator to control the accumulation of tracking errors caused by persistent perturbations and achieves linear convergence to a bounded neighborhood determined by the stochastic variance and the clipping-to-leak ratio. We further show that under two-tier dropout with $p_h=1$, isolating Byzantine agents introduces no additional variance into the honest consensus dynamics. Experiments on MNIST under Sign Flip, ALIE, and Inner Product Manipulation attacks show that GT-PD-L outperforms coordinate-wise trimmed mean by up to 4.3 percentage points under stealth attacks.

replace Spectral Compact Training: Pre-Training Large Language Models via Permanent Truncated SVD and Stiefel QR Retraction

Authors: Bj\"orn Roman Kohlberger (EctoSpace, Dublin, Ireland)

Abstract: The memory wall remains the primary bottleneck for training large language models on consumer hardware. We introduce Spectral Compact Training (SCT), a method that replaces dense weight matrices with permanent truncated SVD factors W = U diag(s) V^T, where the full dense matrix is never materialized during training or inference. Gradients flow through the compact spectral factors via standard backpropagation, and U, V are retracted to the Stiefel manifold via QR decomposition after each optimizer step. SCT achieves up to 199x memory reduction per MLP layer at rank 32, enabling full training steps of 70B-parameter architectures on a Steam Deck handheld (7.2 GB peak memory vs. 1,245 GB for dense FP32 training with Adam). Rank-sweep experiments on SmolLM2-1.7B (ranks 32-256, 2000 steps, NVIDIA A100) show that all tested ranks converge to the same loss floor (~4.2-4.5), identifying the learning rate schedule -- not MLP rank -- as the primary bottleneck. Rank 128 emerges as the efficiency sweet spot at 11.7x MLP compression with the lowest perplexity. GPU memory drops 46% at rank 32 while training throughput doubles.

replace Using predefined vector systems to speed up neural network multimillion class classification

Authors: Nikita Gabdullin, Ilya Androsov

Abstract: Label prediction in neural networks (NNs) has O(n) complexity proportional to the number of classes. This holds true for classification using fully connected layers and cosine similarity with some set of class prototypes. In this paper we show that if NN latent space (LS) geometry is known and possesses specific properties, label prediction complexity can be significantly reduced. This is achieved by associating label prediction with the O(1) complexity closest cluster center search in a vector system used as target for latent space configuration (LSC). The proposed method only requires finding indexes of several largest and lowest values in the embedding vector making it extremely computationally efficient. We show that the proposed method does not change NN training accuracy computational results. We also measure the time required by different computational stages of NN inference and label prediction on multiple datasets. The experiments show that the proposed method allows to achieve up to 11.6 times overall acceleration over conventional methods. Furthermore, the proposed method has unique properties which allow to predict the existence of new classes.

replace Fatigue-Aware Learning to Defer via Constrained Optimisation

Authors: Zheng Zhang, Cuong C. Nguyen, David Rosewarne, Kevin Wells, Gustavo Carneiro

Abstract: Learning to defer (L2D) enables human-AI cooperation by deciding when an AI system should act autonomously or defer to a human expert. Existing L2D methods, however, assume static human performance, contradicting well-established findings on fatigue-induced degradation. We propose Fatigue-Aware Learning to Defer via Constrained Optimisation (FALCON), which explicitly models workload-varying human performance using psychologically grounded fatigue curves. FALCON formulates L2D as a Constrained Markov Decision Process (CMDP) whose state includes both task features and cumulative human workload, and optimises accuracy under human-AI cooperation budgets via PPO-Lagrangian training. We further introduce FA-L2D, a benchmark that systematically varies fatigue dynamics from near-static to rapidly degrading regimes. Experiments across multiple datasets show that FALCON consistently outperforms state-of-the-art L2D methods across coverage levels, generalises zero-shot to unseen experts with different fatigue patterns, and demonstrates the advantage of adaptive human-AI collaboration over AI-only or human-only decision-making when coverage lies strictly between 0 and 1.

replace Screening Is Enough

Authors: Ken M. Nakanishi

Abstract: A core limitation of standard softmax attention is that it does not define a notion of absolute query--key relevance: attention weights are obtained by redistributing a fixed unit mass across all keys according to their relative scores. As a result, relevance is defined only relative to competing keys, and irrelevant keys cannot be explicitly rejected. We introduce Multiscreen, a language-model architecture built around a mechanism we call screening, which enables absolute query--key relevance. Instead of redistributing attention across all keys, screening evaluates each key against an explicit threshold, discarding irrelevant keys and aggregating the remaining keys, thereby removing global competition among keys. Across experiments, Multiscreen achieves comparable validation loss with approximately 40% fewer parameters than a Transformer baseline and enables stable optimization at substantially larger learning rates. It maintains strong performance in long-context perplexity and shows little to no degradation in retrieval performance well beyond the training context length. Notably, even at the training context length, a Multiscreen model with approximately 92% fewer parameters consistently outperforms a larger Transformer in retrieval accuracy. Finally, Multiscreen reduces inference latency by up to 3.2$\times$ at 100K context length.

replace Apriel-1.5-OpenReasoner: RL Post-Training for General-Purpose and Efficient Reasoning

Authors: Rafael Pardinas, Ehsan Kamalloo, David Vazquez, Alexandre Drouin

Abstract: Building general-purpose reasoning models using reinforcement learning with verifiable rewards (RLVR) across diverse domains has been widely adopted by frontier open-weight models. However, their training recipes and domain mixtures are often not disclosed. Joint optimization across domains poses significant challenges: domains vary widely in rollout length, problem difficulty and sample efficiency. Further, models with long chain-of-thought traces increase inference cost and latency, making efficiency critical for practical deployment. We present Apriel-1.5-OpenReasoner, trained with a fully reproducible multi-domain RL post-training recipe on Apriel-Base, a 15B-parameter open-weight LLM, across five domains using public datasets: mathematics, code generation, instruction following, logical puzzles and function calling. We introduce an adaptive domain sampling mechanism that preserves target domain ratios despite heterogeneous rollout dynamics, and a difficulty-aware extension of the standard length penalty that, with no additional training overhead, encourages longer reasoning for difficult problems and shorter traces for easy ones. Trained with a strict 16K-token output budget, Apriel-1.5-OpenReasoner generalizes to 32K tokens at inference and improves over Apriel-Base on AIME 2025, GPQA, MMLU-Pro, and LiveCodeBench while producing 30-50% shorter reasoning traces. It matches strong open-weight models of similar size at lower token cost, thereby pushing the Pareto frontier of accuracy versus token budget.

replace-cross Bayesian Neural Networks: An Introduction and Survey

Authors: Ethan Goan, Clinton Fookes

Abstract: Neural Networks (NNs) have provided state-of-the-art results for many challenging machine learning tasks such as detection, regression and classification across the domains of computer vision, speech recognition and natural language processing. Despite their success, they are often implemented in a frequentist scheme, meaning they are unable to reason about uncertainty in their predictions. This article introduces Bayesian Neural Networks (BNNs) and the seminal research regarding their implementation. Different approximate inference methods are compared, and used to highlight where future research can improve on current methods.

replace-cross Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

Authors: Zun Li, Marc Lanctot, Kevin R. McKee, Luke Marris, Ian Gemp, Daniel Hennes, Paul Muller, Kate Larson, Yoram Bachrach, Michael P. Wellman

Abstract: Opponent modeling methods typically involve two crucial steps: building a belief distribution over opponents' strategies, and exploiting this opponent model by playing a best response. However, existing approaches typically require domain-specific heurstics to come up with such a model, and algorithms for approximating best responses are hard to scale in large, imperfect information domains. In this work, we introduce a scalable and generic multiagent training regime for opponent modeling using deep game-theoretic reinforcement learning. We first propose Generative Best Respoonse (GenBR), a best response algorithm based on Monte-Carlo Tree Search (MCTS) with a learned deep generative model that samples world states during planning. This new method scales to large imperfect information domains and can be plug and play in a variety of multiagent algorithms. We use this new method under the framework of Policy Space Response Oracles (PSRO), to automate the generation of an \emph{offline opponent model} via iterative game-theoretic reasoning and population-based training. We propose using solution concepts based on bargaining theory to build up an opponent mixture, which we find identifying profiles that are near the Pareto frontier. Then GenBR keeps updating an \emph{online opponent model} and reacts against it during gameplay. We conduct behavioral studies where human participants negotiate with our agents in Deal-or-No-Deal, a class of bilateral bargaining games. Search with generative modeling finds stronger policies during both training time and test time, enables online Bayesian co-player prediction, and can produce agents that achieve comparable social welfare and Nash bargaining score negotiating with humans as humans trading among themselves.

replace-cross Piecewise Deterministic Markov Processes for Bayesian Neural Networks

Authors: Ethan Goan, Dimitri Perrin, Kerrie Mengersen, Clinton Fookes

Abstract: Inference on modern Bayesian Neural Networks (BNNs) often relies on a variational inference treatment, imposing violated assumptions of independence and the form of the posterior. Traditional MCMC approaches avoid these assumptions at the cost of increased computation due to its incompatibility to subsampling of the likelihood. New Piecewise Deterministic Markov Process (PDMP) samplers permit subsampling, though introduce a model specific inhomogenous Poisson Process (IPPs) which is difficult to sample from. This work introduces a new generic and adaptive thinning scheme for sampling from these IPPs, and demonstrates how this approach can accelerate the application of PDMPs for inference in BNNs. Experimentation illustrates how inference with these methods is computationally feasible, can improve predictive accuracy, MCMC mixing performance, and provide informative uncertainty measurements when compared against other approximate inference schemes.

replace-cross Importance Sparsification for Sinkhorn Algorithm

Authors: Mengyu Li, Jun Yu, Tao Li, Cheng Meng

Abstract: Sinkhorn algorithm has been used pervasively to approximate the solution to optimal transport (OT) and unbalanced optimal transport (UOT) problems. However, its practical application is limited due to the high computational complexity. To alleviate the computational burden, we propose a novel importance sparsification method, called Spar-Sink, to efficiently approximate entropy-regularized OT and UOT solutions. Specifically, our method employs natural upper bounds for unknown optimal transport plans to establish effective sampling probabilities, and constructs a sparse kernel matrix to accelerate Sinkhorn iterations, reducing the computational cost of each iteration from $O(n^2)$ to $\widetilde{O}(n)$ for a sample of size $n$. Theoretically, we show the proposed estimators for the regularized OT and UOT problems are consistent under mild regularity conditions. Experiments on various synthetic data demonstrate Spar-Sink outperforms mainstream competitors in terms of both estimation error and speed. A real-world echocardiogram data analysis shows Spar-Sink can effectively estimate and visualize cardiac cycles, from which one can identify heart failure and arrhythmia. To evaluate the numerical accuracy of cardiac cycle prediction, we consider the task of predicting the end-systole time point using the end-diastole one. Results show Spar-Sink performs as well as the classical Sinkhorn algorithm, requiring significantly less computational time.

replace-cross Accelerated Gradient Methods for Nonconvex Optimization: Escape Trajectories From Strict Saddle Points and Convergence to Local Minima

Authors: Rishabh Dixit, Mert Gurbuzbalaban, Waheed U. Bajwa

Abstract: This paper considers the problem of understanding the behavior of a general class of accelerated gradient methods on smooth nonconvex functions. Motivated by some recent works that have proposed effective algorithms, based on Polyak's heavy ball method and the Nesterov accelerated gradient method, to achieve convergence to a local minimum of nonconvex functions, this work proposes a broad class of Nesterov-type accelerated methods and puts forth a rigorous study of these methods encompassing the escape from saddle points and convergence to local minima through both an asymptotic and a non-asymptotic analysis. In the asymptotic regime, this paper answers an open question of whether Nesterov's accelerated gradient method (NAG) with variable momentum parameter avoids strict saddle points almost surely. This work also develops two metrics of asymptotic rates of convergence and divergence, and evaluates these two metrics for several popular standard accelerated methods such as the NAG and Nesterov's accelerated gradient with constant momentum (NCM) near strict saddle points. In the non-asymptotic regime, this work provides an analysis that leads to the "linear" exit time estimates from strict saddle neighborhoods for trajectories of these accelerated methods as well the necessary conditions for the existence of such trajectories. Finally, this work studies a sub-class of accelerated methods that can converge in convex neighborhoods of nonconvex functions with a near optimal rate to a local minimum and at the same time this sub-class offers superior saddle-escape behavior compared to that of NAG.

replace-cross A Multi-Agent Reinforcement Learning Framework for Public Health Decision Analysis

Authors: Dinesh Sharma, Ankit Shah, Chaitra Gopalappa

Abstract: Human immunodeficiency virus (HIV) is a major public health concern in the United States (U.S.), with about 1.2 million people living with it and about 35,000 newly infected each year. There are considerable geographical disparities in HIV burden and care access across the U.S. The 'Ending the HIV Epidemic (EHE)' initiative by the U.S. Department of Health and Human Services aims to reduce new infections by 90% by 2030, by improving coverage of diagnoses, treatment, and prevention interventions and prioritizing jurisdictions with high HIV prevalence. We develop intelligent decision-support systems to optimize resource allocation and intervention strategies. Existing decision analytic models either focus on individual cities or aggregate national data, failing to capture jurisdictional interactions critical for optimizing intervention strategies. To address this, we propose a multi-agent reinforcement learning (MARL) framework that enables jurisdiction-specific decision-making while accounting for cross-jurisdictional epidemiological interactions. Our framework functions as an intelligent resource optimization system, helping policymakers strategically allocate interventions based on dynamic, data-driven insights. Experimental results across jurisdictions in California and Florida demonstrate that MARL-driven policies outperform traditional single-agent reinforcement learning approaches by reducing new infections under fixed budget constraints. Our study highlights the importance of incorporating jurisdictional dependencies in decision-making frameworks for large-scale public initiatives. By integrating multi-agent intelligent systems, decision analytics, and reinforcement learning, this study advances expert systems for government resource planning and public health management, offering a scalable framework for broader applications in healthcare policy and epidemic management.

replace-cross Floralens: a Deep Learning Model for the Portuguese Native Flora

Authors: Ant\'onio Filgueiras, Eduardo R. B. Marques, Lu\'is M. B. Lopes, Miguel Marques, Hugo Silva

Abstract: Machine-learning techniques, especially deep convolutional neural networks, are pivotal for image-based identification of biological species in many Citizen Science platforms. In this paper, we describe the construction of a dataset for the Portuguese native flora based on publicly available research-grade datasets, and the derivation of a high-accuracy model from it using off-the-shelf deep convolutional neural networks. We anchored the dataset in high-quality data provided by Sociedade Portuguesa de Bot\^anica and added further sampled data from research-grade datasets available from GBIF. We find that with a careful dataset design, off-the-shelf machine-learning cloud services such as Google's AutoML Vision produce accurate models, with results comparable to those of Pl@ntNet, a state-of-the-art citizen science platform. The best model we derived, dubbed Floralens, has been integrated into the public website of Project Biolens, where we gather models for other taxa as well. The dataset used to train the model is also publicly available on Zenodo.

replace-cross Robust Adaptation of Foundation Models with Black-Box Visual Prompting

Authors: Changdae Oh, Gyeongdeok Seo, Geunyoung Jung, Zhi-Qi Cheng, Hosik Choi, Jiyoung Jung, Kyungwoo Song

Abstract: With a surge of large-scale pre-trained models, parameter-efficient transfer learning (PETL) of large models has garnered significant attention. While promising, they commonly rely on two optimistic assumptions: 1) full access to the parameters of a PTM, and 2) sufficient memory capacity to cache all intermediate activations for gradient computation. However, in most real-world applications, PTMs serve as black-box APIs or proprietary software without full parameter accessibility. Besides, it is hard to meet a large memory requirement for modern PTMs. This work proposes black-box visual prompting (BlackVIP), which efficiently adapts the PTMs without knowledge of their architectures or parameters. BlackVIP has two components: 1) Coordinator and 2) simultaneous perturbation stochastic approximation with gradient correction (SPSA-GC). The Coordinator designs input-dependent visual prompts, which allow the target PTM to adapt in the wild. SPSA-GC efficiently estimates the gradient of PTM to update Coordinator. Besides, we introduce a variant, BlackVIP-SE, which significantly reduces the runtime and computational cost of BlackVIP. Extensive experiments on 19 datasets demonstrate that BlackVIPs enable robust adaptation to diverse domains and tasks with minimal memory requirements. We further provide a theoretical analysis on the generalization of visual prompting methods by presenting their connection to the certified robustness of randomized smoothing, and presenting an empirical support for improved robustness.

replace-cross SPRIG: Improving Large Language Model Performance by System Prompt Optimization

Authors: Lechen Zhang, Tolga Ergen, Lajanugen Logeswaran, Moontae Lee, David Jurgens

Abstract: Large Language Models (LLMs) have shown impressive capabilities in many scenarios, but their performance depends, in part, on the choice of prompt. Past research has focused on optimizing prompts specific to a task. However, much less attention has been given to optimizing the general instructions included in a prompt, known as a system prompt. To address this gap, we propose SPRIG, an edit-based genetic algorithm that iteratively constructs prompts from prespecified components to maximize the model's performance in general scenarios. We evaluate the performance of system prompts on a collection of 47 different types of tasks to ensure generalizability. Our study finds that a single optimized system prompt performs on par with task prompts optimized for each individual task. Moreover, combining system and task-level optimizations leads to further improvement, which showcases their complementary nature. Experiments also reveal that the optimized system prompts generalize effectively across model families, parameter sizes, and languages. This study provides insights into the role of system-level instructions in maximizing LLM potential.

replace-cross MissNODAG: Differentiable Cyclic Causal Graph Learning from Incomplete Data

Authors: Muralikrishnna G. Sethuraman, Razieh Nabi, Faramarz Fekri

Abstract: Causal discovery in real-world systems, such as biological networks, is often complicated by feedback loops and incomplete data. Standard algorithms, which assume acyclic structures or fully observed data, struggle with these challenges. To address this gap, we propose MissNODAG, a differentiable framework for learning both the underlying cyclic causal graph and the missingness mechanism from partially observed data, including data missing not at random. Our framework integrates an additive noise model with an expectation-maximization procedure, alternating between imputing missing values and optimizing the observed data likelihood, to uncover both the cyclic structures and the missingness mechanism. We establish consistency guarantees under exact maximization of the score function in the large sample setting. Finally, we demonstrate the effectiveness of MissNODAG through synthetic experiments and an application to real-world gene perturbation data.

replace-cross Sparse Max-Affine Regression

Authors: Haitham Kanj, Seonho Kim, Kiryung Lee

Abstract: This paper presents Sparse Gradient Descent as a solution for variable selection in convex piecewise linear regression, where the model is given as the maximum of $k$-affine functions $ x \mapsto \max_{j \in [k]} \langle a_j^\star, x \rangle + b_j^\star$ for $j = 1,\dots,k$. Here, $\{ a_j^\star\}_{j=1}^k$ and $\{b_j^\star\}_{j=1}^k$ denote the ground-truth weight vectors and intercepts. A non-asymptotic local convergence analysis is provided for Sp-GD under sub-Gaussian noise when the covariate distribution satisfies the sub-Gaussianity and anti-concentration properties. When the model order and parameters are fixed, Sp-GD provides an $\epsilon$-accurate estimate given $\mathcal{O}(\max(\epsilon^{-2}\sigma_z^2,1)s\log(d/s))$ observations where $\sigma_z^2$ denotes the noise variance. This also implies the exact parameter recovery by Sp-GD from $\mathcal{O}(s\log(d/s))$ noise-free observations. The proposed initialization scheme uses sparse principal component analysis to estimate the subspace spanned by $\{ a_j^\star\}_{j=1}^k$, then applies an $r$-covering search to estimate the model parameters. A non-asymptotic analysis is presented for this initialization scheme when the covariates and noise samples follow Gaussian distributions. When the model order and parameters are fixed, this initialization scheme provides an $\epsilon$-accurate estimate given $\mathcal{O}(\epsilon^{-2}\max(\sigma_z^4,\sigma_z^2,1)s^2\log^4(d))$ observations. A new transformation named Real Maslov Dequantization (RMD) is proposed to transform sparse generalized polynomials into sparse max-affine models. The error decay rate of RMD is shown to be exponentially small in its temperature parameter. Furthermore, theoretical guarantees for Sp-GD are extended to the bounded noise model induced by RMD. Numerical Monte Carlo results corroborate theoretical findings for Sp-GD and the initialization scheme.

replace-cross Score-matching-based Structure Learning for Temporal Data on Networks

Authors: Hao Chen, Kai Yi, Yu Guang Wang

Abstract: Causal discovery is a crucial initial step in establishing causality from empirical data and background knowledge. Numerous algorithms have been developed for this purpose. Among them, the score-matching method has demonstrated superior performance across various evaluation metrics, particularly for the commonly encountered Additive Nonlinear Causal Models. However, current score-matching-based algorithms are primarily designed to analyze independent and identically distributed (i.i.d.) data. More importantly, they suffer from high computational complexity due to the pruning step required for handling dense Directed Acyclic Graphs (DAGs). To enhance the scalability of score matching, we have developed a new parent-finding subroutine for leaf nodes in DAGs, significantly accelerating the most time-consuming part of the process: the pruning step. This improvement results in an efficiency-lifted score matching algorithm, termed Parent Identification-based Causal structure learning for both i.i.d. and temporal data on networKs, or PICK. The new score-matching algorithm extends the scope of existing algorithms and can handle static and temporal data on networks with weak network interference. Our proposed algorithm can efficiently cope with increasingly complex datasets that exhibit spatial and temporal dependencies, commonly encountered in academia and industry. The proposed algorithm can accelerate score-matching-based methods while maintaining high accuracy in real-world applications.

replace-cross From XAI to MLOps: Explainable Concept Drift Detection with Profile Drift Detection

Authors: Ugur Dar, Mustafa Cavus

Abstract: Predictive models often degrade in performance due to evolving data distributions, a phenomenon known as data drift. Among its forms, concept drift, where the relationship between explanatory variables and the response variable changes, is particularly challenging to detect and adapt to. Traditional drift detection methods often rely on metrics such as accuracy or marginal variable distributions, which may fail to capture subtle but important conceptual changes. This paper proposes a novel method, Profile Drift Detection (PDD), which enables both the detection of concept drift and an enhanced understanding of its underlying causes by leveraging an explainable AI tool: Partial Dependence Profiles (PDPs). PDD quantifies changes in PDPs through new drift metrics that are sensitive to shifts in the data stream while remaining computationally efficient. This approach is aligned with MLOps practices, emphasizing continuous model monitoring and adaptive retraining in dynamic environments. Experiments on synthetic and real-world datasets demonstrate that PDD outperforms existing methods by maintaining high predictive performance while effectively balancing sensitivity and stability in drift signals. The results highlight its suitability for real-time applications, and the paper concludes by discussing the method's advantages, limitations, and potential extensions to broader use cases.

replace-cross Post-detection inference for sequential changepoint localization

Authors: Aytijhya Saha, Aaditya Ramdas

Abstract: This paper addresses a fundamental but largely unexplored challenge in sequential changepoint analysis: conducting inference following a detected change. We develop a very general framework to construct confidence sets for the unknown changepoint using only the data observed up to a data-dependent stopping time at which an arbitrary sequential detection algorithm declares a change. Our framework is nonparametric, making no assumption on the composite post-change class, the observation space, or the sequential detection procedure used, and is non-asymptotically valid. We also extend it to handle composite pre-change classes under a suitable assumption, and also derive confidence sets for the change magnitude in parametric settings. We provide theoretical guarantees on the width of our confidence intervals. Extensive simulations demonstrate that the produced sets have reasonable size, and slightly conservative coverage. In summary, we present the first general method for sequential changepoint localization, which is theoretically sound and broadly applicable in practice.

replace-cross Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning

Authors: Qi Wang, Zhipeng Zhang, Baao Xie, Xin Jin, Yunbo Wang, Shiyu Wang, Liaomo Zheng, Xiaokang Yang, Wenjun Zeng

Abstract: Training visual reinforcement learning (RL) in practical scenarios presents a significant challenge, $\textit{i.e.,}$ RL agents suffer from low sample efficiency in environments with variations. While various approaches have attempted to alleviate this issue by disentangled representation learning, these methods usually start learning from scratch without prior knowledge of the world. This paper, in contrast, tries to learn and understand underlying semantic variations from distracting videos via offline-to-online latent distillation and flexible disentanglement constraints. To enable effective cross-domain semantic knowledge transfer, we introduce an interpretable model-based RL framework, dubbed Disentangled World Models (DisWM). Specifically, we pretrain the action-free video prediction model offline with disentanglement regularization to extract semantic knowledge from distracting videos. The disentanglement capability of the pretrained model is then transferred to the world model through latent distillation. For finetuning in the online environment, we exploit the knowledge from the pretrained model and introduce a disentanglement constraint to the world model. During the adaptation phase, the incorporation of actions and rewards from online environment interactions enriches the diversity of the data, which in turn strengthens the disentangled representation learning. Experimental results validate the superiority of our approach on various benchmarks.

replace-cross Towards Build Optimization Using Digital Twins

Authors: Henri A\"idasso, Francis Bordeleau, Ali Tizghadam

Abstract: Despite the indisputable benefits of Continuous Integration (CI) pipelines (or builds), CI still presents significant challenges regarding long durations, failures, and flakiness. Prior studies addressed CI challenges in isolation, yet these issues are interrelated and require a holistic approach for effective optimization. To bridge this gap, this paper proposes a novel idea of developing Digital Twins (DTs) of build processes to enable global and continuous improvement. To support such an idea, we introduce the CI Build process Digital Twin (CBDT) framework as a minimum viable product. This framework offers digital shadowing functionalities, including real-time build data acquisition and continuous monitoring of build process performance metrics. Furthermore, we discuss guidelines and challenges in the practical implementation of CBDTs, including (1) modeling different aspects of the build process using Machine Learning, (2) exploring what-if scenarios based on historical patterns, and (3) implementing prescriptive services such as automated failure and performance repair to continuously improve build processes.

replace-cross Threshold Modulation for Online Test-Time Adaptation of Spiking Neural Networks

Authors: Kejie Zhao, Wenjia Hua, Aiersi Tuerhong, Luziwei Leng, Yuxin Ma, Qinghai Guo

Abstract: Recently, spiking neural networks (SNNs), deployed on neuromorphic chips, provide highly efficient solutions on edge devices in different scenarios. However, their ability to adapt to distribution shifts after deployment has become a crucial challenge. Online test-time adaptation (OTTA) offers a promising solution by enabling models to dynamically adjust to new data distributions without requiring source data or labeled target samples. Nevertheless, existing OTTA methods are largely designed for traditional artificial neural networks and are not well-suited for SNNs. To address this gap, we propose a low-power, neuromorphic chip-friendly online test-time adaptation framework, aiming to enhance model generalization under distribution shifts. The proposed approach is called Threshold Modulation (TM), which dynamically adjusts the firing threshold through neuronal dynamics-inspired normalization, being more compatible with neuromorphic hardware. Experimental results on benchmark datasets demonstrate the effectiveness of this method in improving the robustness of SNNs against distribution shifts while maintaining low computational cost. The proposed method offers a practical solution for online test-time adaptation of SNNs, providing inspiration for the design of future neuromorphic chips. The demo code is available at github.com/NneurotransmitterR/TM-OTTA-SNN.

replace-cross From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation

Authors: Yifu Yuan, Haiqin Cui, Yibin Chen, Zibin Dong, Fei Ni, Longxin Kou, Jinyi Liu, Pengyi Li, Yan Zheng, Jianye Hao

Abstract: Achieving generalization in robotic manipulation remains a critical challenge, particularly for unseen scenarios and novel tasks. Current Vision-Language-Action (VLA) models, while building on top of general Vision-Language Models (VLMs), still fall short of achieving robust zero-shot performance due to the scarcity and heterogeneity prevalent in embodied datasets. To address these limitations, we propose FSD (From Seeing to Doing), a novel vision-language model that generates intermediate representations through spatial relationship reasoning, providing fine-grained guidance for robotic manipulation. Our approach combines a hierarchical data pipeline for training with a self-consistency mechanism that aligns spatial coordinates with visual signals. Through extensive experiments, we comprehensively validated FSD's capabilities in both "seeing" and "doing," achieving outstanding performance across 8 benchmarks for general spatial reasoning and embodied reference abilities, as well as on our proposed more challenging benchmark VABench. We also verified zero-shot capabilities in robot manipulation, demonstrating significant performance improvements over baseline methods in both SimplerEnv and real robot settings. Experimental results show that FSD achieves 40.6% success rate in SimplerEnv and 72% success rate across 8 real-world tasks, outperforming the strongest baseline by 30%.

replace-cross Informatics for Food Processing

Authors: Gordana Ispirova, Michael Sebek, Giulia Menichetti

Abstract: This chapter explores the evolution, classification, and health implications of food processing, while emphasizing the transformative role of machine learning, artificial intelligence (AI), and data science in advancing food informatics. It begins with a historical overview and a critical review of traditional classification frameworks such as NOVA, Nutri-Score, and SIGA, highlighting their strengths and limitations, particularly the subjectivity and reproducibility challenges that hinder epidemiological research and public policy. To address these issues, the chapter presents novel computational approaches, including FoodProX, a random forest model trained on nutrient composition data to infer processing levels and generate a continuous FPro score. It also explores how large language models like BERT and BioBERT can semantically embed food descriptions and ingredient lists for predictive tasks, even in the presence of missing data. A key contribution of the chapter is a novel case study using the Open Food Facts database, showcasing how multimodal AI models can integrate structured and unstructured data to classify foods at scale, offering a new paradigm for food processing assessment in public health and research.

replace-cross Operator Learning for Schr\"{o}dinger Equation: Unitarity, Error Bounds, and Time Generalization

Authors: Yash Patel, Unique Subedi, Ambuj Tewari

Abstract: We consider the problem of learning the evolution operator for the time-dependent Schr\"{o}dinger equation, where the Hamiltonian may vary with time. Existing neural network-based surrogates often ignore fundamental properties of the Schr\"{o}dinger equation, such as linearity and unitarity, and lack theoretical guarantees on prediction error or time generalization. To address this, we introduce a linear estimator for the evolution operator that preserves a weak form of unitarity. We establish both upper bounds and lower bounds on the prediction error of the proposed estimator that hold uniformly over classes of sufficiently smooth initial wave functions. Additionally, we derive time generalization bounds that quantify how the estimator extrapolates beyond the time points seen during training. Experiments across real-world Hamiltonians -- including hydrogen atoms, ion traps for qubit design, and optical lattices -- show that our estimator achieves relative errors up to two orders of magnitude smaller than state-of-the-art methods such as the Fourier Neural Operator and DeepONet.

replace-cross Gaussian mixture models as a proxy for interacting language models

Authors: Edward L. Wang, Mohammad Sharifi Kiasari, Tianyu Wang, Hayden Helm, Avanti Athreya, Carey Priebe, Vince Lyzinski

Abstract: Large language models (LLMs) are powerful tools that, in a number of settings, overlap with the results of human pattern recognition and reasoning. Retrieval-augmented generation (RAG) further allows LLMs to produce tailored output depending on the contents of their RAG databases. However, LLMs depend on complex, computationally expensive algorithms. In this paper, we introduce interacting Gaussian mixture models (GMMs) as a proxy for interacting LLMs. We construct a model of interacting GMMs, complete with an analogue to RAG updating, under which GMMs can generate, exchange, and update data and parameters. We show that this interacting system of Gaussian mixture models, which can be implemented at minimal computational cost, mimics certain aspects of experimental simulations of interacting LLMs whose iterative responses depend on feedback from other LLMs. We build a Markov chain from this system of interacting GMMs; formalize and interpret the notion of polarization for such a chain; and prove lower bounds on the probability of polarization. This provides theoretical insight into the use of interacting Gaussian mixture models as a computationally efficient proxy for interacting large language models.

replace-cross Common Inpainted Objects In-N-Out of Context

Authors: Tianze Yang, Tyson Jordan, Ruitong Sun, Ninghao Liu, Jin Sun

Abstract: We present Common Inpainted Objects In-N-Out of Context (COinCO), a novel dataset addressing the scarcity of out-of-context examples in existing vision datasets. By systematically replacing objects in COCO images through diffusion-based inpainting, we create 97,722 unique images featuring both contextually coherent and inconsistent scenes, enabling effective context learning. Each inpainted object is meticulously verified and categorized as in- or out-of-context through Large Vision Language Model assessments. We demonstrate three key tasks enabled by COinCO: (1) a fine-grained context reasoning approach that classifies objects as in- or out-of-context based on three criteria; (2) a novel Objects-from-Context prediction task that determines which new objects naturally belong in given scenes at both instance and clique level semantics, and (3) context-enhanced fake detection on state-of-the-art methods without fine-tuning. COinCO provides a controlled testbed with contextual variations, establishing a foundation for advancing context-aware visual understanding in computer vision, including image forensics. Code and dataset are available at https://co-in-co.github.io/.

URLs: https://co-in-co.github.io/.

replace-cross Learning thermodynamic master equations for open quantum systems

Authors: Peter Sentz, Stanley Nicholson, Yujin Cho, Sohail Reddy, Brendan Keith, Stefanie G\"unther

Abstract: The characterization of Hamiltonians and other components of open quantum dynamical systems plays a crucial role in quantum computing and other applications. Scientific machine learning techniques have been applied to this problem in a variety of ways, including by modeling with deep neural networks. However, the majority of mathematical models describing open quantum systems are linear, and the natural nonlinearities in learnable models have not been incorporated using physical principles. We present a data-driven model for open quantum systems that includes learnable, thermodynamically consistent terms. The trained model is interpretable, as it directly estimates the system Hamiltonian and linear components of coupling to the environment. We validate the model on synthetic two and three-level data, as well as experimental two-level data collected from a quantum device at Lawrence Livermore National Laboratory.

replace-cross Accelerating Constrained Sampling: A Large Deviations Approach

Authors: Yingli Wang, Changwei Tu, Xiaoyu Wang, Lingjiong Zhu

Abstract: The problem of sampling a target probability distribution on a constrained domain arises in many applications including machine learning. For constrained sampling, various Langevin algorithms such as projected Langevin Monte Carlo (PLMC), based on the discretization of reflected Langevin dynamics (RLD) and more generally skew-reflected non-reversible Langevin Monte Carlo (SRNLMC), based on the discretization of skew-reflected non-reversible Langevin dynamics (SRNLD), have been proposed and studied in the literature. This work focuses on the long-time behavior of SRNLD, where a skew-symmetric matrix is added to RLD. Although acceleration for SRNLD has been studied, it is not clear how one should design the skew-symmetric matrix in the dynamics to achieve good performance in practice. We establish a large deviation principle (LDP) for the empirical measure of SRNLD when the skew-symmetric matrix is chosen such that its product with the outward unit normal vector field on the boundary is zero. By explicitly characterizing the rate functions, we show that this choice of the skew-symmetric matrix accelerates the convergence to the target distribution compared to RLD and reduces the asymptotic variance. Numerical experiments for SRNLMC based on the proposed skew-symmetric matrix show superior performance, which validate the theoretical findings from the large deviations theory.

replace-cross All is Not Lost: LLM Recovery without Checkpoints

Authors: Nikolay Blagoev, O\u{g}uzhan Ersoy, Lydia Yiyu Chen

Abstract: Training LLMs on decentralized nodes or on-spot instances, lowers the training cost and enables model democratization. The inevitable challenge here is the transient churns of nodes due to failures and the operator's scheduling policies, leading to losing parts of the model (some layers). The conventional approaches to recover from failures is to either use checkpointing, where periodically a copy of the entire model is sent to an additional storage, or redundant computation. These approaches yield significant communication and/or computation overhead even in non-failure cases and scale poorly in settings with large models. In this paper we propose CheckFree, an efficient recovery method where a failing stage is substituted by weighted averaging of the closest neighboring stages. In contrast to the state of the art, CheckFree requires no additional computation or storage. However, because of the nature of averaging neighbouring stages, it can only recover failures of intermediate stages. We further extend our method to CheckFree+ with out-of-order pipeline execution to tolerate crashes of the first and last stages. Thanks to out-of-order pipelining, behaviour of the first and last stages are mimicked by their neighboring ones, which allows CheckFree+ to recover them by copying the neighboring stages. To recover the (de-)embedding layers, CheckFree+ copies those layers in the neighboring stages, which requires relatively small storage overhead. We extensively evaluate our method on LLaMa models of model sizes from 124M to 1.5B with varying failure frequencies. In the case of low and medium failure rates (5-10%), CheckFree and CheckFree+ outperform both checkpointing and redundant computation in terms of convergence wall-clock time, achieving up to 12% improvement over redundant computation. Both of our proposals can be ran via our code available at: https://github.com/gensyn-ai/CheckFree

URLs: https://github.com/gensyn-ai/CheckFree

replace-cross Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models

Authors: Yukun Huang, Sanxing Chen, Jian Pei, Manzil Zaheer, Bhuwan Dhingra

Abstract: Trustworthy language models should provide both correct and verifiable answers. However, citations generated directly by standalone LLMs are often unreliable. As a result, current systems insert citations by querying an external retriever at inference time, introducing latency, infrastructure dependence, and vulnerability to retrieval noise. We explore whether LLMs can be made to reliably attribute to the documents seen during continual pretraining without test-time retrieval, by revising the training process. To study this, we construct CitePretrainBench, a benchmark that mixes real-world corpora (Wikipedia, Common Crawl, arXiv) with novel documents and probes both short-form (single-fact) and long-form (multi-fact) citation tasks. Our approach follows a two-stage process: (1) continual pretraining to index factual knowledge by binding it to persistent document identifiers; and (2) instruction tuning to elicit citation behavior. We introduce Active Indexing for the first stage, which creates generalizable, source-anchored bindings by augmenting training with synthetic data that (i) restate each fact in diverse, compositional forms and (ii) enforce bidirectional training (source-to-fact and fact-to-source). This equips the model to both generate content from a cited source and attribute its own answers, improving robustness to paraphrase and composition. Experiments with Qwen-2.5-7B&3B show that Active Indexing consistently outperforms a Passive Indexing baseline, which simply appends an identifier to each document, achieving citation precision gains of up to 30.2% across all tasks and models. Our ablation studies reveal that performance continues to improve as we scale the amount of augmented data, showing a clear upward trend even at 16x the original token count. Finally, we show that internal citations complement external ones by making the model more robust to retrieval noise.

replace-cross Vision Transformer-Based Time-Series Image Reconstruction for Cloud-Filling Applications

Authors: Lujun Li, Yiqun Wang, Radu State

Abstract: Cloud cover in multispectral imagery (MSI) poses significant challenges for early season crop mapping, as it leads to missing or corrupted spectral information. Synthetic aperture radar (SAR) data, which is not affected by cloud interference, offers a complementary solution, but lack sufficient spectral detail for precise crop mapping. To address this, we propose a novel framework, Time-series MSI Image Reconstruction using Vision Transformer (ViT), to reconstruct MSI data in cloud-covered regions by leveraging the temporal coherence of MSI and the complementary information from SAR from the attention mechanism. Comprehensive experiments, using rigorous reconstruction evaluation metrics, demonstrate that Time-series ViT framework significantly outperforms baselines that use non-time-series MSI and SAR or time-series MSI without SAR, effectively enhancing MSI image reconstruction in cloud-covered regions.

replace-cross SciGA: A Comprehensive Dataset for Designing Graphical Abstracts in Academic Papers

Authors: Takuro Kawada, Shunsuke Kitada, Sota Nemoto, Hitoshi Iyatomi

Abstract: Graphical Abstracts (GAs) play a crucial role in visually conveying the key findings of scientific papers. Although recent research increasingly incorporates visual materials such as Figure 1 as de facto GAs, their potential to enhance scientific communication remains largely unexplored. Designing effective GAs requires advanced visualization skills, hindering their widespread adoption. To tackle these challenges, we introduce SciGA-145k, a large-scale dataset comprising approximately 145,000 scientific papers and 1.14 million figures, specifically designed to support GA selection and recommendation, and to facilitate research in automated GA generation. As a preliminary step toward GA design support, we define two tasks: 1) Intra-GA Recommendation, identifying figures within a given paper well-suited as GAs, and 2) Inter-GA Recommendation, retrieving GAs from other papers to inspire new GA designs. Furthermore, we propose Confidence Adjusted top-1 ground truth Ratio (CAR), a novel recommendation metric for fine-grained analysis of model behavior. CAR addresses limitations of traditional rank-based metrics by considering that not only an explicitly labeled GA but also other in-paper figures may plausibly serve as GAs. Benchmark results demonstrate the viability of our tasks and the effectiveness of CAR. Collectively, these establish a foundation for advancing scientific communication within AI for Science.

replace-cross Better Together: Cross and Joint Covariances Enhance Signal Detectability in Undersampled Data

Authors: Arabind Swain, Sean Alexander Ridout, Ilya Nemenman

Abstract: Many data-science applications involve detecting a shared signal between two high-dimensional variables. Using random matrix theory methods, we determine when such signal can be detected and reconstructed from sample correlations, despite the background of sampling noise induced correlations. We consider three different covariance matrices constructed from two high-dimensional variables: their individual self covariance, their cross covariance, and the self covariance of the concatenated (joint) variable, which incorporates the self and the cross correlation blocks. We observe the expected Baik, Ben Arous, and P\'ech\'e detectability phase transition in all these covariance matrices, and we show that joint and cross covariance matrices always reconstruct the shared signal earlier than the self covariances. Whether the joint or the cross approach is better depends on the mismatch of dimensionalities between the variables. We discuss what these observations mean for choosing the right method for detecting linear correlations in data and how these findings may generalize to nonlinear statistical dependencies.

replace-cross CATNet: A geometric deep learning approach for CAT bond spread prediction in the primary market

Authors: Dixon Domfeh, Saeid Safarveisi

Abstract: Traditional models for pricing catastrophe (CAT) bonds struggle to capture the complex, relational data inherent in these instruments. This paper introduces CATNet, a novel framework that applies a geometric deep learning architecture, the Relational Graph Convolutional Network (R-GCN), to model the CAT bond primary market as a graph, leveraging its underlying network structure for spread prediction. Our analysis reveals that the CAT bond market exhibits the characteristics of a scale-free network, a structure dominated by a few highly connected and influential hubs. CATNet demonstrates higher predictive performance, significantly outperforming strong Random Forest and XGBoost benchmarks. Interpretability analysis confirms that the network's topological properties are not mere statistical artifacts; they are quantitative proxies for long-held industry intuition regarding issuer reputation, underwriter influence, and peril concentration. This research provides evidence that network connectivity is a key determinant of price, offering a new paradigm for risk assessment and proving that graph-based models can deliver both state-of-the-art accuracy and deeper, quantifiable market insights.

replace-cross The Role of Entanglement in Quantum Reservoir Computing with Coupled Kerr Nonlinear Oscillators

Authors: Ali Karimi, Hadi Zadeh-Haghighi, Youssef Kora, Christoph Simon

Abstract: Quantum Reservoir Computing (QRC) uses quantum dynamics to efficiently process temporal data. In this work, we investigate a QRC framework based on two coupled Kerr nonlinear oscillators, a system well-suited for time-series prediction tasks due to its complex nonlinear interactions and potentially high-dimensional state space. We explore how its performance in forecasting both linear and nonlinear time-series depends on key physical parameters: input drive strength, Kerr nonlinearity, and oscillator coupling, and analyze the role of entanglement in improving the reservoir's computational performance, focusing on its effect on predicting non-trivial time series. Using logarithmic negativity to quantify entanglement and normalized root mean square error (NRMSE) to evaluate predictive accuracy, individual parameter sweeps show that optimal performance occurs at moderate but non-zero entanglement. Furthermore, an aggregated binned analysis reveals that this moderate entanglement is consistently associated with the optimal average predictive performance across the parameter space, an observation that persists up to a threshold in the input frequency. This relationship persists under some levels of dissipation and dephasing. In particular, we find that higher dissipation rates can enhance performance. These findings contribute to the broader understanding of quantum reservoirs for high performance, efficient quantum machine learning and time-series forecasting.

replace-cross WhisperRT -- Turning Whisper into a Causal Streaming Model

Authors: Tomer Krichli, Bhiksha Raj, Joseph Keshet

Abstract: Automatic Speech Recognition (ASR) has seen remarkable progress, with models like OpenAI Whisper and NVIDIA Canary achieving state-of-the-art (SOTA) performance in offline transcription. However, these models are not designed for streaming (online or real-time) transcription, due to limitations in their architecture and training methodology. We propose a method to turn the transformer encoder-decoder model into a low-latency streaming model. The encoder is made causal to process audio incrementally, while the decoder conditions on partial encoder states to generate tokens aligned with the available temporal context. This requires explicit synchronization between encoded input frames and token emissions. Since tokens are produced only after sufficient acoustic evidence is observed, an inherent latency arises, necessitating fine-tuning of the encoder-decoder alignment mechanism. We propose an updated inference mechanism that utilizes the fine-tuned causal encoder and decoder to yield greedy and beam-search decoding, and is shown to be locally optimal. Experiments on low-latency chunk sizes (less than 300 msec) show that our fine-tuned model outperforms existing non-fine-tuned streaming approaches in most cases, while using a lower complexity. We release our training and inference code, along with the fine-tuned models, to support further research and development in streaming ASR.

replace-cross Smooth Flow Matching for Synthesizing Functional Data

Authors: Jianbin Tan, Anru R. Zhang

Abstract: Functional data, i.e., smooth random functions observed over a continuous domain, are increasingly available in areas such as biomedical research, health informatics, and epidemiology. However, effective statistical analysis for functional data is often hindered by challenges such as privacy constraints, sparse and irregular sampling, infinite-dimensionality, and non-Gaussian structures. To address these challenges, we introduce a novel framework named Smooth Flow Matching (SFM), tailored for generative modeling of functional data that enables statistical analysis without exposing sensitive real data. Under a copula framework, SFM constructs a parsimonious smooth flow to generate infinite-dimensional functional data, free of Gaussianity and low-rank assumptions. It is computationally efficient, handles irregular observations, and guarantees the smoothness of the generated functions, offering a practical and flexible solution in scenarios where existing deep generative methods are not applicable. Through extensive simulation studies, we demonstrate the advantages of SFM in terms of both synthetic data quality and computational efficiency. We then apply SFM to generate clinical trajectory data from the MIMIC-IV patient electronic health records (EHR) longitudinal database. Our analysis showcases the ability of SFM to produce high-quality surrogate data for downstream tasks, highlighting its potential to boost the utility of EHR data for clinical applications.

replace-cross Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

Authors: Yifu Yuan, Haiqin Cui, Yaoting Huang, Yibin Chen, Fei Ni, Zibin Dong, Pengyi Li, Yan Zheng, Hongyao Tang, Jianye Hao

Abstract: Generalization in embodied AI is hindered by the "seeing-to-doing gap," which stems from data scarcity and embodiment heterogeneity. To address this, we pioneer "pointing" as a unified, embodiment-agnostic intermediate representation, defining four core embodied pointing abilities that bridge high-level vision-language comprehension with low-level action primitives. We introduce Embodied-R1, a 3B Vision-Language Model (VLM) specifically designed for embodied reasoning and pointing. We use a wide range of embodied and general visual reasoning datasets as sources to construct a large-scale dataset, Embodied-Points-200K, which supports key embodied pointing capabilities. We then train Embodied-R1 using a two-stage Reinforced Fine-tuning (RFT) curriculum with a specialized multi-task reward design. Embodied-R1 achieves state-of-the-art performance on 11 embodied spatial and pointing benchmarks. Critically, it demonstrates robust zero-shot generalization by achieving a 56.2% success rate in the SIMPLEREnv and 87.5% across 8 real-world XArm tasks without any task-specific fine-tuning, representing a 62% improvement over strong baselines. Furthermore, the model exhibits high robustness against diverse visual disturbances. Our work shows that a pointing-centric representation, combined with an RFT training paradigm, offers an effective and generalizable pathway to closing the perception-action gap in robotics.

replace-cross ShadowNPU: System and Algorithm Co-design for NPU-Centric On-Device LLM Inference

Authors: Wangsong Yin, Daliang Xu, Mengwei Xu, Gang Huang, Xuanzhe Liu

Abstract: On-device running Large Language Models (LLMs) is nowadays a critical enabler towards preserving user privacy. We observe that the attention operator falls back from the special-purpose NPU to the general-purpose CPU/GPU because of quantization sensitivity in state-of-the-art frameworks. This fallback results in a degraded user experience and increased complexity in system scheduling. To this end, this paper presents shadowAttn, a system-algorithm codesigned sparse attention module with minimal reliance on CPU/GPU by only sparsely calculating the attention on a tiny portion of tokens. The key idea is to hide the overhead of estimating the important tokens with a NPU-based pilot compute. Further, shadowAttn proposes insightful techniques such as NPU compute graph bucketing, head-wise NPU-CPU/GPU pipeline and per-head fine-grained sparsity ratio to achieve high accuracy and efficiency. shadowAttn delivers the best performance with highly limited CPU/GPU resource; it requires much less CPU/GPU resource to deliver on-par performance of SoTA frameworks.

replace-cross Partially Functional Dynamic Backdoor Diffusion-based Causal Model

Authors: Xinwen Liu, Lei Qian, Song Xi Chen, Niansheng Tang

Abstract: Causal inference in spatio-temporal settings is critically hindered by unmeasured confounders with complex spatio-temporal dynamics and the prevalence of multi-resolution data. While diffusion models present a promising avenue for estimating structural causal models, existing approaches are limited by assumptions of causal sufficiency or static confounding, failing to capture the region-specific, temporally dependent nature of real-world latent variables or to directly handle functional variables. We bridge this gap by introducing the Partially Functional Dynamic Backdoor Diffusion-based Causal Model (PFD-BDCM), a unified generative framework designed to simultaneously tackle causal inference with dynamic confounding and functional data. Our approach formalizes a novel structural causal model that captures spatio-temporal dependencies in latent confounders through conditional autoregressive processes, represents functional variables via basis expansion coefficients treated as standard graph nodes, and integrates valid backdoor adjustment into a diffusion-based generative process. We provide theoretical guarantees on the preservation of causal effects under basis expansion and derive error bounds for counterfactual estimates. Experiments on synthetic data and a real-world air pollution case study demonstrate that PFD-BDCM outperforms existing methods across observational, interventional, and counterfactual queries. This work provides a rigorous and practical tool for robust causal inference in complex spatio-temporal systems characterized by non-stationarity and multi-resolution data.

replace-cross RAPTOR: A Foundation Policy for Quadrotor Control

Authors: Jonas Eschmann, Dario Albani, Giuseppe Loianno

Abstract: Humans are remarkably data-efficient when adapting to new unseen conditions, like driving a new car. In contrast, modern robotic control systems, like neural network policies trained using Reinforcement Learning (RL), are highly specialized for single environments. Because of this overfitting, they are known to break down even under small differences like the Simulation-to-Reality (Sim2Real) gap and require system identification and retraining for even minimal changes to the system. In this work, we present RAPTOR, a method for training a highly adaptive foundation policy for quadrotor control. Our method enables training a single, end-to-end neural-network policy to control a wide variety of quadrotors. We test 10 different real quadrotors from 32 g to 2.4 kg that also differ in motor type (brushed vs. brushless), frame type (soft vs. rigid), propeller type (2/3/4-blade), and flight controller (PX4/Betaflight/Crazyflie/M5StampFly). We find that a tiny, three-layer policy with only 2084 parameters is sufficient for zero-shot adaptation to a wide variety of platforms. The adaptation through in-context learning is made possible by using a recurrence in the hidden layer. The policy is trained through our proposed Meta-Imitation Learning algorithm, where we sample 1000 quadrotors and train a teacher policy for each of them using RL. Subsequently, the 1000 teachers are distilled into a single, adaptive student policy. We find that within milliseconds, the resulting foundation policy adapts zero-shot to unseen quadrotors. We extensively test the capabilities of the foundation policy under numerous conditions (trajectory tracking, indoor/outdoor, wind disturbance, poking, different propellers).

replace-cross Sequential 1-bit Mean Estimation with Near-Optimal Sample Complexity

Authors: Ivan Lau, Jonathan Scarlett

Abstract: In this paper, we study the problem of distributed mean estimation with 1-bit communication constraints. We propose a mean estimator that is based on (randomized and sequentially-chosen) interval queries, whose 1-bit outcome indicates whether the given sample lies in the specified interval. Our estimator is $(\epsilon, \delta)$-PAC for all distributions with bounded mean ($-\lambda \le \mathbb{E}(X) \le \lambda $) and variance ($\mathrm{Var}(X) \le \sigma^2$) for some known parameters $\lambda$ and $\sigma$. We derive a sample complexity bound $\widetilde{O}\big( \frac{\sigma^2}{\epsilon^2}\log\frac{1}{\delta} + \log\frac{\lambda}{\sigma}\big)$, which matches the minimax lower bound for the unquantized setting up to logarithmic factors and the additional $\log\frac{\lambda}{\sigma}$ term that we show to be unavoidable. We also establish an adaptivity gap for interval-query based estimators: the best non-adaptive mean estimator is considerably worse than our adaptive mean estimator for large $\frac{\lambda}{\sigma}$. Finally, we give tightened sample complexity bounds for distributions with stronger tail decay, and present additional variants that (i) handle an unknown sampling budget (ii) adapt to the unknown true variance given (possibly loose) upper and lower bounds on the variance, and (iii) use only two stages of adaptivity at the expense of more complicated (non-interval) queries.

replace-cross Markovian Reeb Graphs for Simulating Spatiotemporal Patterns of Life

Authors: Anantajit Subrahmanya, Chandrakanth Gudavalli, Connor Levenson, B. S. Manjunath

Abstract: Accurately modeling human mobility is critical for urban planning, epidemiology, and traffic management. In this work, we introduce Markovian Reeb Graphs, a novel framework that transforms Reeb graphs from a descriptive analysis tool into a generative model for spatiotemporal trajectories. Our approach captures individual and population-level Patterns of Life (PoLs) and generates realistic trajectories that preserve baseline behaviors while incorporating stochastic variability by embedding probabilistic transitions within the Reeb graph structure. We present two variants: Sequential Reeb Graphs (SRGs) for individual agents and Hybrid Reeb Graphs (HRGs) that combine individual with population PoLs, evaluated on the Urban Anomalies and Geolife datasets using five mobility statistics. Results demonstrate that HRGs achieve strong fidelity across metrics while requiring modest trajectory datasets without specialized side information. This work establishes Markovian Reeb Graphs as a promising framework for trajectory simulation with broad applicability across urban environments.

replace-cross Smart Paste: Automatically Fixing Copy/Paste for Google Developers

Authors: Vincent Nguyen, Guilherme Herzog, Jos\'e Cambronero, Marcus Revaj, Aditya Kini, Alexander Fr\"ommgen, Maxim Tabachnyk

Abstract: Manually editing pasted code is a long-standing developer pain point. In internal software development at Google, we observe that code is pasted 4 times more often than it is manually typed. These paste actions frequently require follow-up edits, ranging from simple reformatting and renaming to more complex style adjustments and cross-language translations. Prior work has shown deep learning can be used to predict these edits. In this work, we show how to iteratively develop and scale Smart Paste, an IDE feature for post-paste edit suggestions, to Google's development environment. This experience can serve as a guide for AI practitioners on a holistic approach to feature development, covering user experience, system integration, and model capabilities. Since deployment, Smart Paste has had overwhelmingly positive feedback with a 45% acceptance rate. At Google's enterprise scale, these accepted suggestions account substantially for over 1% of all code written company-wide.

replace-cross EvoEdit: Evolving Null-space Alignment for Robust and Efficient Knowledge Editing

Authors: Sicheng Lyu, Yu Gu, Xinyu Wang, Jerry Huang, Sitao Luan, Yufei Cui, Xiao-Wen Chang, Peng Lu

Abstract: Large language models (LLMs) require continual updates to rectify outdated or erroneous knowledge. Model editing has emerged as a compelling paradigm for introducing targeted modifications without the computational burden of full retraining. Existing approaches are mainly based on a locate-then-edit framework. However, in sequential editing contexts, where multiple updates are applied over time, they exhibit significant limitations and suffer from catastrophic interference, i.e., new edits compromise previously integrated updates and degrade preserved knowledge. To address these challenges, we introduce EvoEdit, a novel editing strategy that mitigates catastrophic interference through sequential null-space alignment, enabling stable and efficient model editing. By performing sequential null-space alignment for each incoming edit, EvoEdit preserves both original and previously modified knowledge representations and maintains output invariance on preserved knowledge even across long edit sequences, effectively mitigating interference. Evaluations on real-world sequential knowledge-editing benchmarks show that EvoEdit achieves better or comparable performance than prior state-of-the-art locate-then-edit techniques, with up to 3.53 times speedup. Overall, these results underscore the necessity of developing more principled approaches for designing LLMs in dynamically evolving information settings, while providing a simple yet effective solution with strong theoretical guarantees.

replace-cross AI-BAAM: AI-Driven Bank Statement Analytics as Alternative Data for Malaysian MSME Credit Scoring

Authors: Chun Chet Ng, Zhen Hao Chu, Jia Yu Lim, Yin Yin Boon, Wei Zeng Low, Jin Khye Tan

Abstract: Despite accounting for 96.1% of all businesses in Malaysia, access to financing remains one of the most persistent challenges faced by Micro, Small, and Medium Enterprises (MSMEs). Newly established businesses are often excluded from formal credit markets as traditional underwriting approaches rely heavily on credit bureau data. This study investigates the potential of bank statement data as an alternative data source for credit assessment to promote financial inclusion in emerging markets. First, we propose a cash flow-based underwriting pipeline where we utilize bank statement data for end-to-end data extraction and machine learning credit scoring. Second, we introduce a novel dataset of 611 loan applicants from a Malaysian consulting firm. Third, we develop and evaluate credit scoring models based on application information and bank transaction-derived features. Empirical results demonstrate that incorporating bank statement features yields substantial improvements, with our best model achieving an AUROC of 0.806 on validation set, representing a 24.6% improvement over models using application information only. Finally, we will release the anonymized bank transaction dataset to facilitate further research on MSME financial inclusion within Malaysia's emerging economy.

replace-cross Three-dimensional inversion of gravity data using implicit neural representations and scientific machine learning

Authors: Pankaj K Mishra, Sanni Laaksonen, Jochen Kamm, Anand Singh

Abstract: Inversion of gravity data is an important method for investigating subsurface density variations relevant to mineral exploration, geothermal assessment, carbon storage, natural hydrogen, groundwater resources, and tectonic evolution. Here we present a scientific machine-learning approach for three-dimensional gravity inversion that represents subsurface density as a continuous field using an implicit neural representation (INR). The method trains a deep neural network directly through a physics-based forward-model loss, mapping spatial coordinates to a continuous density field without predefined meshes or discretisation. Spatial encoding enhances the network's capacity to capture sharp contrasts and short-wavelength features that conventional coordinate-based networks tend to oversmooth due to spectral bias. We demonstrate the approach on synthetic examples including smooth models, representing realistic geological complexity, and a dipping block model to assess recovery of structures at different depths. The INR framework reconstructs detailed structure and geologically plausible boundaries without explicit regularisation or depth weighting, while reducing the number of inversion parameters as the problem size grows bigger. These results highlight the potential of implicit representations to enable scalable, flexible, and interpretable large-scale geophysical inversion. This framework could generalise to other geophysical methods and for joint/multiphysics inversion.

replace-cross Endogenous Aggregation of Multiple Data Envelopment Analysis Scores for Large Data Sets

Authors: Hashem Omrani, Raha Imanirad, Adam Diamant, Utkarsh Verma, Amol Verma, Fahad Razak

Abstract: We propose an approach for dynamic efficiency evaluation across multiple organizational dimensions using data envelopment analysis (DEA). The method generates both dimension-specific and aggregate efficiency scores, incorporates desirable and undesirable outputs, and is suitable for large-scale problem settings. Two regularized DEA models are introduced: a slack-based measure (SBM) and a linearized version of a nonlinear goal programming model (GP-SBM). While SBM estimates an aggregate efficiency score and then distributes it across dimensions, GP-SBM first estimates dimension-level efficiencies and then derives an aggregate score. Both models utilize a regularization parameter to enhance discriminatory power while also directly integrating both desirable and undesirable outputs. We demonstrate the computational efficiency and validity of our approach on multiple datasets and apply it to a case study of twelve hospitals in Ontario, Canada, evaluating three theoretically grounded dimensions of organizational effectiveness over a 24-month period from January 2018 to December 2019: technical efficiency, clinical efficiency, and patient experience. Our numerical results show that SBM and GP-SBM better capture correlations among input/output variables and outperform conventional benchmarking methods that separately evaluate dimensions before aggregation.

replace-cross Data-driven Sensor Placement for Predictive Applications: A Correlation-Assisted Attribution Framework (CAAF)

Authors: Sze Chai Leung, Di Zhou, H. Jane Bae

Abstract: Optimal sensor placement (OSP) is critical for efficient, accurate monitoring, control, and inference in complex physical systems. We propose a machine-learning-based feature attribution (FA) framework to identify OSP for target predictions. FA quantifies input contributions to a model output; however, it struggles with highly correlated input data often encountered in practical applications for OSP. To address this, we propose a Correlation-Assisted Attribution Framework (CAAF), which introduces a clustering step on the candidate sensor locations before performing FA to reduce redundancy and enhance generalizability. We first illustrate the core principles of the proposed framework through a series of validation cases, then demonstrate its effectiveness in realistic dynamical systems such as structural health monitoring, airfoil lift prediction, and wall-normal velocity estimation for turbulent channel flow. The results show that the CAAF outperforms alternative approaches that typically struggle due to the presence of nonlinear dynamics, chaotic behavior, and multi-scale interactions, and enables the effective application of FA for identifying OSP in real-world environments.

replace-cross pDANSE: Particle-based Data-driven Nonlinear State Estimation from Nonlinear Measurements

Authors: Anubhab Ghosh, Yonina C. Eldar, Saikat Chatterjee

Abstract: We consider the problem of designing a data-driven nonlinear state estimation (DANSE) method that uses (noisy) nonlinear measurements of a process whose underlying state transition model (STM) is unknown. Such a process is referred to as a model-free process. A recurrent neural network (RNN) provides parameters of a Gaussian prior that characterize the state of the model-free process, using all previous measurements at a given time point. In the case of DANSE, the measurement system was linear, leading to a closed-form solution for the state posterior. However, the presence of a nonlinear measurement system renders a closed-form solution infeasible. Instead, the secondorder statistics of the state posterior are computed using the nonlinear measurements observed at the time point. We address the nonlinear measurements using a reparameterization trickbased particle sampling approach, and estimate the second-order statistics of the state posterior. The proposed method is referred to as particle-based DANSE (pDANSE). The RNN of pDANSE uses sequential measurements efficiently and avoids the use of computationally intensive sequential Monte-Carlo (SMC) and/or ancestral sampling. We describe the semi-supervised learning method for pDANSE, which transitions to unsupervised learning in the absence of labeled data. Using a stochastic Lorenz-63 system as a benchmark process, we experimentally demonstrate the state estimation performance for four nonlinear measurement systems. We explore cubic nonlinearity and a cameramodel nonlinearity where unsupervised learning is used; then we explore half-wave rectification nonlinearity and Cartesian-tospherical nonlinearity where semi-supervised learning is used. Additionally, we also show the performance of pDANSE for the stochastic Lorenz-96 system with a half-wave, rectified measurement system. The performance of state estimation is shown to be competitive vis-a-vis model-driven methods that have complete knowledge of the STM of the dynamical system.

replace-cross Image Hashing via Cross-View Code Alignment in the Age of Foundation Models

Authors: Ilyass Moummad, Kawtar Zaher, Herv\'e Go\"eau, Alexis Joly

Abstract: Efficient large-scale retrieval requires representations that are both compact and discriminative. Foundation models provide powerful visual and multimodal embeddings, but nearest neighbor search in these high-dimensional spaces is computationally expensive. Hashing offers an efficient alternative by enabling fast Hamming distance search with binary codes, yet existing approaches often rely on complex pipelines, multi-term objectives, designs specialized for a single learning paradigm, and long training times. We introduce CroVCA (Cross-View Code Alignment), a simple and unified principle for learning binary codes that remain consistent across semantically aligned views. A single binary cross-entropy loss enforces alignment, while coding-rate maximization serves as an anti-collapse regularizer to promote balanced and diverse codes. To implement this, we design HashCoder, a lightweight MLP hashing network with a final batch normalization layer to enforce balanced codes. HashCoder can be used as a probing head on frozen embeddings or to adapt encoders efficiently via LoRA fine-tuning. Across benchmarks, CroVCA achieves state-of-the-art results in just 5 training epochs. At 16 bits, it performs particularly well; for instance, unsupervised hashing on COCO completes in under 2 minutes and supervised hashing on ImageNet100 in about 3 minutes on a single GPU. These results highlight CroVCA's efficiency, adaptability, and broad applicability.

replace-cross Efficient and Private Property Testing via Indistinguishability

Authors: Cynthia Dwork, Pranay Tankala

Abstract: Given a small random sample of $n$-bit strings labeled by an unknown Boolean function, which properties of this function can be tested computationally efficiently? We show an equivalence between properties that are efficiently testable from few samples and properties with structured symmetry, which depend only on the function's average values on an efficiently computable partition of the domain. Without the efficiency constraint, a similar characterization in terms of unstructured symmetry was obtained by Blais and Yoshida (2019). We also give a function testing analogue of the classic characterization of testable graph properties in terms of regular partitions, as well as a sublinear time and differentially private algorithm to compute concise summaries of such partitions of graphs. Finally, we tighten a recent characterization of the computational indistinguishability of product distributions, which encompasses the related task of efficiently testing which of two candidate functions labeled the observed samples. Essential to our proofs is the following observation of independent interest: Every randomized Boolean function, no matter how complex, admits a supersimulator: a randomized polynomial-size circuit whose output on random inputs cannot be efficiently distinguished from reality with constant advantage, even by polynomially larger distinguishers. This surprising fact is implicit in a theorem of Dwork et al. (2021) in the context of algorithmic fairness, but its complexity-theoretic implications were not previously explored. We give a new proof of this lemma using an iteration technique from the graph regularity literature, and we observe that a subtle quantifier switch allows it to powerfully circumvent known barriers to improving the landmark complexity-theoretic regularity lemma of Trevisan, Tulsiani, and Vadhan (2009).

replace-cross Contradictions in Context: Challenges for Retrieval-Augmented Generation in Healthcare

Authors: Saeedeh Javadi, Sara Mirabi, Manan Gangar, Bahadorreza Ofoghi

Abstract: In high-stakes information domains such as healthcare, where large language models (LLMs) can produce hallucinations or misinformation, retrieval-augmented generation (RAG) has been proposed as a mitigation strategy, grounding model outputs in external, domain-specific documents. Yet, this approach can introduce errors when source documents contain outdated or contradictory information. This work investigates the performance of five LLMs in generating RAG-based responses to medicine-related queries. Our contributions are three-fold: i) the creation of a benchmark dataset using consumer medicine information documents from the Australian Therapeutic Goods Administration (TGA), where headings are repurposed as natural language questions, ii) the retrieval of PubMed abstracts using TGA headings, stratified across multiple publication years, to enable controlled temporal evaluation of outdated evidence, and iii) a comparative analysis of the frequency and impact of outdated or contradictory content on model-generated responses, assessing how LLMs integrate and reconcile temporally inconsistent information. Our findings show that contradictions between highly similar abstracts do, in fact, degrade performance, leading to inconsistencies and reduced factual accuracy in model answers. These results highlight that retrieval similarity alone is insufficient for reliable medical RAG and underscore the need for contradiction-aware filtering strategies to ensure trustworthy responses in high-stakes domains.

replace-cross SPHINX: A Synthetic Environment for Visual Perception and Reasoning

Authors: Md Tanvirul Alam, Saksham Aggarwal, Justin Yang Chae, Nidhi Rastogi

Abstract: We present Sphinx, a synthetic environment for visual perception and reasoning that targets core cognitive primitives. Sphinx procedurally generates puzzles using motifs, tiles, charts, icons, and geometric primitives, each paired with verifiable ground-truth solutions, enabling both precise evaluation and large-scale dataset construction. The benchmark covers 25 task types spanning symmetry detection, geometric transformations, spatial reasoning, chart interpretation, and sequence prediction. Evaluating recent large vision-language models (LVLMs) shows that even state-of-the-art GPT-5 attains only 51.1% accuracy, well below human performance. Finally, we demonstrate that reinforcement learning with verifiable rewards (RLVR) substantially improves model accuracy on these tasks and yields gains on external visual reasoning benchmarks, highlighting its promise for advancing multimodal reasoning.

replace-cross Optical Context Compression Is Just (Bad) Autoencoding

Authors: Ivan Yee Lee, Cheng Yang, Taylor Berg-Kirkpatrick

Abstract: DeepSeek-OCR shows that rendered text can be reconstructed from a small number of vision tokens, sparking excitement about using vision as a compression medium for long textual contexts. But this pipeline requires rendering token embeddings to pixels and compressing from there -- discarding learned representations in favor of an image the vision encoder must then recover from. We ask whether this detour helps. Comparing DeepSeek-OCR's vision encoder against near-zero-parameter mean pooling and a learned hierarchical encoder, we find it does not. For reconstruction, simple direct methods match or surpass vision at every compression ratio. For language modeling, vision performs comparably to truncation -- a baseline that simply discards context -- and loses to the hierarchical encoder at every compression ratio. As expected, all compression methods outperform truncation for factual recall, but vision never surpasses the best direct baseline. The excitement around optical context compression outpaces the evidence. Code and checkpoints are available at https://github.com/ivnle/bad-autoencoding.

URLs: https://github.com/ivnle/bad-autoencoding.

replace-cross Agile Deliberation: Concept Deliberation for Subjective Visual Classification

Authors: Leijie Wang, Otilia Stretcu, Wei Qiao, Thomas Denby, Krishnamurthy Viswanathan, Enming Luo, Chun-Ta Lu, Tushar Dogra, Ranjay Krishna, Ariel Fuxman

Abstract: From content moderation to content curation, applications requiring vision classifiers for visual concepts are rapidly expanding. Existing human-in-the-loop approaches typically assume users begin with a clear, stable concept understanding to be able to provide high-quality supervision. In reality, users often start with a vague idea and must iteratively refine it through "concept deliberation", a practice we uncovered through structured interviews with content moderation experts. We operationalize the common strategies in deliberation used by real content moderators into a human-in-the-loop framework called "Agile Deliberation" that explicitly supports evolving and subjective concepts. The system supports users in defining the concept for themselves by exposing them to borderline cases. The system does this with two deliberation stages: (1) concept scoping, which decomposes the initial concept into a structured hierarchy of sub-concepts, and (2) concept iteration, which surfaces semantically borderline examples for user reflection and feedback to iteratively align an image classifier with the user's evolving intent. Since concept deliberation is inherently subjective and interactive, we painstakingly evaluate the framework through 18 user sessions, each 1.5h long, rather than standard benchmarking datasets. We find that Agile Deliberation achieves 7.5% higher F1 scores than automated decomposition baselines and more than 3% higher than manual deliberation, while participants reported clearer conceptual understanding and lower cognitive effort.

replace-cross Learning continuous state of charge dependent thermal decomposition kinetics for Li-ion cathodes using Kolmogorov-Arnold Chemical Reaction Neural Networks (KA-CRNNs)

Authors: Benjamin C. Koenig, Sili Deng

Abstract: Thermal runaway in lithium-ion batteries is strongly influenced by the state of charge (SOC). Existing predictive models typically infer scalar kinetic parameters at a full SOC or a few discrete SOC levels, preventing them from capturing the continuous SOC dependence that governs exothermic behavior during abuse conditions. To address this, we apply the Kolmogorov-Arnold Chemical Reaction Neural Network (KA-CRNN) framework to learn continuous and realistic SOC-dependent exothermic cathode-electrolyte interactions. We apply a physics-encoded KA-CRNN to learn SOC-dependent kinetic parameters for cathode-electrolyte decomposition directly from differential scanning calorimetry (DSC) data. A mechanistically informed reaction pathway is embedded into the network architecture, enabling the activation energies, pre-exponential factors, enthalpies, and related parameters to be represented as continuous and fully interpretable functions of the SOC. The framework is demonstrated for NCA, NM, and NMA cathodes, yielding models that reproduce DSC heat-release features across all SOCs and provide interpretable insight into SOC-dependent oxygen-release and phase-transformation mechanisms. This approach establishes a foundation for extending kinetic parameter dependencies to additional environmental and electrochemical variables, supporting more accurate and interpretable thermal runaway prediction and monitoring.

replace-cross NASTaR: NovaSAR Automated Ship Target Recognition Dataset

Authors: Benyamin Hosseiny, Kamirul Kamirul, Odysseas Pappas, Alin Achim

Abstract: Synthetic Aperture Radar (SAR) offers a unique capability for all-weather, space-based maritime activity monitoring by capturing and imaging strong reflections from ships at sea. A well-defined challenge in this domain is ship type classification. Due to the high diversity and complexity of ship types, accurate recognition is difficult and typically requires specialized deep learning models. These models, however, depend on large, high-quality ground-truth datasets to achieve robust performance and generalization. Furthermore, the growing variety of SAR satellites operating at different frequencies and spatial resolutions has amplified the need for more annotated datasets to enhance model accuracy. To address this, we present the NovaSAR Automated Ship Target Recognition (NASTaR) dataset. This dataset comprises of 3415 ship patches extracted from NovaSAR S-band imagery, with labels matched to AIS data. It includes distinctive features such as 23 unique classes, inshore/offshore separation, and an auxiliary wake dataset for patches where ship wakes are visible. We validated the dataset applicability across prominent ship-type classification scenarios using benchmark deep learning models. Results demonstrate over 60% accuracy for classifying four major ship types, over 70% for a three-class scenario, more than 75% for distinguishing cargo from tanker ships, and over 87% for identifying fishing vessels. The NASTaR dataset is available at https://doi.org/10.5523/bris.2tfa6x37oerz2lyiw6hp47058, while relevant codes for benchmarking and analysis are available at https://github.com/benyaminhosseiny/nastar.

URLs: https://doi.org/10.5523/bris.2tfa6x37oerz2lyiw6hp47058,, https://github.com/benyaminhosseiny/nastar.

replace-cross Geometric Organization of Cognitive States in Transformer Embedding Spaces

Authors: Sophie Zhao

Abstract: Recent work has shown that transformer-based language models learn rich geometric structure in their embedding spaces. In this work, we investigate whether sentence embeddings exhibit structured geometric organization aligned with human-interpretable cognitive or psychological attributes. We construct a dataset of 480 natural-language sentences annotated with both continuous energy scores (ranging from -5 to +5) and discrete tier labels spanning seven ordered cognitive annotation tiers, intended to capture a graded progression from highly constricted or reactive expressions toward more coherent and integrative cognitive states. Using fixed sentence embeddings from multiple transformer models, we evaluate the recoverability of these annotations via linear and shallow nonlinear probes. Across models, both continuous energy scores and tier labels are reliably decodable, with linear probes already capturing substantial structure. To assess statistical significance, we conduct nonparametric permutation tests that randomize labels, showing that probe performance exceeds chance under both regression and classification null hypotheses. Qualitative analyses using UMAP visualizations and tier-level confusion matrices further reveal a coherent low-to-high gradient and predominantly local (adjacent-tier) confusions. Together, these results indicate that transformer embedding spaces exhibit statistically significant geometric organization aligned with the annotated cognitive structure.

replace-cross The Drill-Down and Fabricate Test (DDFT): A Protocol for Measuring Epistemic Robustness in Language Models

Authors: Rahul Baxi

Abstract: Current language model evaluations measure what models know under ideal conditions but not how robustly they know it under realistic stress. Static benchmarks like MMLU and TruthfulQA cannot distinguish a model that lacks knowledge from one whose verification mechanisms collapse when information degrades or adversaries probe for weaknesses. We introduce the Drill-Down and Fabricate Test (DDFT), a protocol that measures epistemic robustness: a model's ability to maintain factual accuracy under progressive semantic compression and adversarial fabrication. We propose a two-system cognitive model comprising a Semantic System that generates fluent text and an Epistemic Verifier that validates factual accuracy. Our findings, based on evaluating 9 frontier models across 8 knowledge domains at 5 compression levels (1,800 turn-level evaluations), reveal that epistemic robustness is orthogonal to conventional design paradigms. Neither parameter count (r=0.083, p=0.832) nor architectural type (r=0.153, p=0.695) significantly predicts robustness, suggesting it emerges from training methodology and verification mechanisms distinct from current approaches. Error detection capability strongly predicts overall robustness (rho=-0.817, p=0.007), indicating this is the critical bottleneck. We find that flagship models exhibit brittleness despite their scale, while smaller models can achieve robust performance, challenging assumptions about the relationship between model size and reliability. The DDFT framework provides both theoretical foundation and practical tools for assessing epistemic robustness before deployment in critical applications.

replace-cross Projected Autoregression: Autoregressive Language Generation in Continuous State Space

Authors: Oshri Naparstek

Abstract: Standard autoregressive language models generate text by repeatedly selecting a discrete next token, coupling prediction with irreversible commitment at every step. We show that token selection is not the only viable autoregressive interface. \textbf{Projected Autoregression} replaces token selection with continuous prediction in embedding space followed by discrete projection at commitment time. The model predicts next-token vectors via regression and contrastive objectives, while discrete tokens arise only by nearest-neighbor projection. An optional mutable suffix (``liquid tail'') enables iterative refinement before commitment, but the central change is more basic: next-step prediction is continuous, and discrete tokens are produced only as a downstream interface. Projected Autoregression establishes a concrete alternative to token-selection autoregression: language generation can be organized around continuous-state prediction with delayed discrete commitment. Refinement remains local to a short causal suffix within a left-to-right causal process, rather than a sequence-wide denoising process. This separation has two consequences. First, it induces a \emph{distinct generation regime}: even with immediate projection ($K{=}1$), continuous prediction yields text structure and dynamics that differ from tested token-space AR baselines, including a compute-matched best-of-16 reranking baseline. Second, it exposes a \emph{continuous control surface} inside autoregressive generation: direction rate, history noise, delayed commitment, state-space guidance, and embedding geometry act directly on the evolving generative state before token commitment. Taken together, these results place repeated token selection within a larger family of autoregressive interfaces and expose continuous state space as a broader algorithmic design space for language generation.

replace-cross Circuit Mechanisms for Spatial Relation Generation in Diffusion Transformers

Authors: Binxu Wang, Jingxuan Fan, Xu Pan

Abstract: Diffusion Transformers (DiTs) have greatly advanced text-to-image generation, but models still struggle to generate the correct spatial relations between objects as specified in the text prompt. In this study, we adopt a mechanistic interpretability approach to investigate how a DiT can generate correct spatial relations between objects. We train, from scratch, DiTs of different sizes with different text encoders to learn to generate images containing two objects whose attributes and spatial relations are specified in the text prompt. We find that, although all the models can learn this task to near-perfect accuracy, the underlying mechanisms differ drastically depending on the choice of text encoder. When using random text embeddings, we find that the spatial-relation information is passed to image tokens through a two-stage circuit, involving two cross-attention heads that separately read the spatial relation and single-object attributes in the text prompt. When using a pretrained text encoder (T5), we find that the DiT uses a different circuit that leverages information fusion in the text tokens, reading spatial-relation and single-object information together from a single text token. We further show that, although the in-domain performance is similar for the two settings, their robustness to out-of-domain perturbations differs, potentially suggesting the difficulty of generating correct relations in real-world scenarios.

replace-cross ConvoLearn: A Dataset for Fine-Tuning Dialogic AI Tutors

Authors: Mayank Sharma, Roy Pea, Hari Subramonyam

Abstract: Despite their growing adoption in education, LLMs remain misaligned with the core principle of effective tutoring: the dialogic construction of knowledge. We introduce ConvoLearn, a dataset of 2,134 semi-synthetic tutor-student dialogues operationalizing six dimensions of dialogic tutoring grounded in knowledge-building theory, situated in middle school Earth Science curriculum. We show that dimension-labeled dialogic training data captures meaningful pedagogical signal that generalizes beyond its semi-synthetic domain: scores from a classifier trained on ConvoLearn correlate significantly with expert-coded instructional quality in authentic classrooms across multiple subscales (range r = .118-.258, all p < .05). As a proof of concept, we fine-tune Mistral-7B on ConvoLearn and show that dimension-level fine-tuning can steer a 7B open-weight model toward dialogic tutoring behavior that credentialed teachers rate as competitive with a strong proprietary baseline. With this work, we support the development of AI tutors capable of more dialogic interactions.

replace-cross Teaching Machine Learning Fundamentals with LEGO Robotics

Authors: Viacheslav Sydora, Guner Dilsad Er, Michael Muehlebach

Abstract: This paper presents the web-based platform Machine Learning with Bricks and an accompanying two-day course designed to teach machine learning concepts to students aged 12 to 17 through programming-free robotics activities. Machine Learning with Bricks is an open source platform and combines interactive visualizations with LEGO robotics to teach three core algorithms: KNN, linear regression, and Q-learning. Students learn by collecting data, training models, and interacting with robots via a web-based interface. Pre- and post-surveys with 14 students indicate statistically significant improvements in self-reported understanding of machine learning algorithms, changes in AI-related terminology toward more technical language, high platform usability, and increased motivation for continued learning. This work suggests that tangible, visualization-based approaches can make machine learning concepts accessible and engaging for young learners while maintaining technical depth. The platform is freely available at https://learning-and-dynamics.github.io/ml-with-bricks/, with video tutorials guiding students through the experiments at https://youtube.com/playlist?list=PLx1grFu4zAcwfKKJZ1Ux4LwRqaePCOA2J.

URLs: https://learning-and-dynamics.github.io/ml-with-bricks/,, https://youtube.com/playlist?list=PLx1grFu4zAcwfKKJZ1Ux4LwRqaePCOA2J.

replace-cross Self-Improving Pretraining: using post-trained models to pretrain better models

Authors: Ellen Xiaoqing Tan, Jack Lanchantin, Shehzaad Dhuliawala, Danwei Li, Thao Nguyen, Jing Xu, Ping Yu, Ilia Kulikov, Sainbayar Sukhbaatar, Jason Weston, Xian Li, Olga Golovneva

Abstract: Large language models are classically trained in stages: pretraining on raw text followed by post-training for instruction following and reasoning. However, this separation creates a fundamental limitation: many desirable behaviors such as safety, factuality, overall generation quality, and reasoning ability are only added at a late stage, even though the patterns learned earlier strongly shape a model's capabilities. To tackle this issue, we introduce a new way to pretrain and mid-train models that incorporates these behaviors earlier. We utilize an existing strong, post-trained model to both rewrite pretraining data and to judge policy model rollouts, thus using reinforcement earlier in training. In our experiments, we show this can give strong gains in quality, safety, factuality and reasoning.

replace-cross Improving Multimodal Learning with Dispersive and Anchoring Regularization

Authors: Zixuan Xia, Hao Wang, Pengcheng Weng, Yanyu Qian, Yangxin Xu, William Dan, Fei Wang

Abstract: Multimodal learning aims to integrate complementary information from heterogeneous modalities, yet strong optimization alone does not guaranty well-structured representations. Even under carefully balanced training schemes, multimodal models often exhibit geometric pathologies, including intra-modal representation collapse and sample-level cross-modal inconsistency, which degrade both unimodal robustness and multimodal fusion. We identify representation geometry as a missing control axis in multimodal learning and propose \regName, a lightweight geometry-aware regularization framework. \regName enforces two complementary constraints on intermediate embeddings: an intra-modal dispersive regularization that promotes representation diversity, and an inter-modal anchoring regularization that bounds sample-level cross-modal drift without rigid alignment. The proposed regularizer is plug-and-play, requires no architectural modifications, and is compatible with various training paradigms. Extensive experiments across multiple multimodal benchmarks demonstrate consistent improvements in both multimodal and unimodal performance, showing that explicitly regulating representation geometry effectively mitigates modality trade-offs.

replace-cross Predicting Intermittent Job Failure Categories for Diagnosis Using Few-Shot Fine-Tuned Language Models

Authors: Henri A\"idasso, Francis Bordeleau, Ali Tizghadam

Abstract: In principle, Continuous Integration (CI) pipeline failures provide valuable feedback to developers on code-related errors. In practice, however, pipeline jobs often fail intermittently due to non-deterministic tests, network outages, infrastructure failures, resource exhaustion, and other reliability issues. These intermittent (flaky) job failures lead to substantial inefficiencies: wasted computational resources from repeated reruns and significant diagnosis time that distracts developers from core activities and often requires intervention from specialized teams. Prior work has proposed machine learning techniques to detect intermittent failures, but does not address the subsequent diagnosis challenge. To fill this gap, we introduce FlaXifyer, a few-shot learning approach for predicting intermittent job failure categories using pre-trained language models. FlaXifyer requires only job execution logs and achieves 84.3% Macro F1 and 92.0% Top-2 accuracy with just 12 labeled examples per category. We also propose LogSift, an interpretability technique that identifies influential log statements in under one second, reducing review effort by 74.4% while surfacing relevant failure information in 87% of cases. Evaluation on 2,458 job failures from TELUS demonstrates that FlaXifyer and LogSift enable effective automated triage, accelerate failure diagnosis, and pave the way towards the automated resolution of intermittent job failures.

replace-cross Compact Hypercube Embeddings for Fast Text-based Wildlife Observation Retrieval

Authors: Ilyass Moummad, Marius Miron, David Robinson, Kawtar Zaher, Herv\'e Go\"eau, Olivier Pietquin, Pierre Bonnet, Emmanuel Chemla, Matthieu Geist, Alexis Joly

Abstract: Large-scale biodiversity monitoring platforms increasingly rely on multimodal wildlife observations. While recent foundation models enable rich semantic representations across vision, audio, and language, retrieving relevant observations from massive archives remains challenging due to the computational cost of high-dimensional similarity search. In this work, we introduce compact hypercube embeddings for fast text-based wildlife observation retrieval, a framework that enables efficient text-based search over large-scale wildlife image and audio databases using compact binary representations. Building on the cross-view code alignment hashing framework, we extend lightweight hashing beyond a single-modality setup to align natural language descriptions with visual or acoustic observations in a shared Hamming space. Our approach leverages pretrained wildlife foundation models, including BioCLIP and BioLingual, and adapts them efficiently for hashing using parameter-efficient fine-tuning. We evaluate our method on large-scale benchmarks, including iNaturalist2024 for text-to-image retrieval and iNatSounds2024 for text-to-audio retrieval, as well as multiple soundscape datasets to assess robustness under domain shift. Results show that retrieval using discrete hypercube embeddings achieves competitive, and in several cases superior, performance compared to continuous embeddings, while drastically reducing memory and search cost. Moreover, we observe that the hashing objective consistently improves the underlying encoder representations, leading to stronger retrieval and zero-shot generalization. These results demonstrate that binary, language-based retrieval enables scalable and efficient search over large wildlife archives for biodiversity monitoring systems.

replace-cross Audio-to-Image Bird Species Retrieval without Audio-Image Pairs via Text Distillation

Authors: Ilyass Moummad, Marius Miron, Lukas Rauch, David Robinson, Alexis Joly, Olivier Pietquin, Emmanuel Chemla, Matthieu Geist

Abstract: Audio-to-image retrieval offers an interpretable alternative to audio-only classification for bioacoustic species recognition, but learning aligned audio-image representations is challenging due to the scarcity of paired audio-image data. We propose a simple and data-efficient approach that enables audio-to-image retrieval without any audio-image supervision. Our proposed method uses text as a semantic intermediary: we distill the text embedding space of a pretrained image-text model (BioCLIP-2), which encodes rich visual and taxonomic structure, into a pretrained audio-text model (BioLingual) by fine-tuning its audio encoder with a contrastive objective. This distillation transfers visually grounded semantics into the audio representation, inducing emergent alignment between audio and image embeddings without using images during training. We evaluate the resulting model on multiple bioacoustic benchmarks. The distilled audio encoder preserves audio discriminative power while substantially improving audio-text alignment on focal recordings and soundscape datasets. Most importantly, on the SSW60 benchmark, the proposed approach achieves strong audio-to-image retrieval performance exceeding baselines based on zero-shot model combinations or learned mappings between text embeddings, despite not training on paired audio-image data. These results demonstrate that indirect semantic transfer through text is sufficient to induce meaningful audio-image alignment, providing a practical solution for visually grounded species recognition in data-scarce bioacoustic settings.

replace-cross Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning

Authors: Qian Wang, Xuandong Zhao, Zirui Zhang, Zhanzhi Lou, Nuo Chen, Dawn Song, Bingsheng He

Abstract: Large language models (LLMs) increasingly serve as reasoners and automated evaluators, yet they remain susceptible to cognitive biases -- often altering their reasoning when faced with spurious prompt-level cues such as consensus claims or authority appeals.} Existing mitigations via prompting or supervised fine-tuning fail to generalize, as they modify surface behavior without changing the optimization objective that makes bias cues attractive. We propose \textbf{Epistemic Independence Training (EIT)}, a reinforcement learning framework grounded in a key principle: to learn independence, bias cues must be made non-predictive of reward. EIT operationalizes this through a balanced conflict strategy where bias signals are equally likely to support correct and incorrect answers, combined with a reward design that penalizes bias-following without rewarding bias agreement. Experiments on Qwen3-4B demonstrate that EIT improves both accuracy and robustness under adversarial biases, while preserving performance when bias aligns with truth. Notably, models trained only on bandwagon bias generalize to unseen bias types such as authority and distraction, indicating that EIT induces transferable epistemic independence rather than bias-specific heuristics. \revised{EIT further generalizes across benchmarks (MedQA, HellaSwag), model families (Llama-3.2-3B), and scales (Qwen3-8B), and outperforms distribution-shift methods (GroupDRO, IRM) without requiring environment labels.} Code and data are available at https://anonymous.4open.science/r/bias-mitigation-with-rl-BC47

URLs: https://anonymous.4open.science/r/bias-mitigation-with-rl-BC47

replace-cross SERNF: Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows

Authors: Chenyu Yang, Denis Tarasov, Davide Liconti, Hehui Zheng, Robert K. Katzschmann

Abstract: Real-world fine-tuning of dexterous manipulation policies remains challenging due to limited real-world interaction budgets and highly multimodal action distributions. Diffusion-based policies, while expressive, do not permit conservative likelihood-based updates during fine-tuning because action probabilities are intractable. In contrast, conventional Gaussian policies collapse under multimodality, particularly when actions are executed in chunks, and standard per-step critics fail to align with chunked execution, leading to poor credit assignment. We present SERFN, a sample-efficient off-policy fine-tuning framework with normalizing flow (NF) to address these challenges. The normalizing flow policy yields exact likelihoods for multimodal action chunks, allowing conservative, stable policy updates through likelihood regularization and thereby improving sample efficiency. An action-chunked critic evaluates entire action sequences, aligning value estimation with the policy's temporal structure and improving long-horizon credit assignment. To our knowledge, this is the first demonstration of a likelihood-based, multimodal generative policy combined with chunk-level value learning on real robotic hardware. We evaluate SERFN on two challenging dexterous manipulation tasks in the real world: cutting tape with scissors retrieved from a case, and in-hand cube rotation with a palm-down grasp -- both of which require precise, dexterous control over long horizons. On these tasks, SERFN achieves stable, sample-efficient adaptation where standard methods struggle.

replace-cross LLMs Encode Their Failures: Predicting Success from Pre-Generation Activations

Authors: William Lugoloobi, Thomas Foster, William Bankes, Chris Russell

Abstract: Running LLMs with extended reasoning on every problem is expensive, but determining which inputs actually require additional compute remains challenging. We investigate whether their own likelihood of success is recoverable from their internal representations before generation, and if this signal can guide more efficient inference. We train linear probes on pre-generation activations to predict policy-specific success on math and coding tasks, substantially outperforming surface features such as question length and TF-IDF. Using E2H-AMC, which provides both human and model performance on identical problems, we show that models encode a model-specific notion of difficulty that is distinct from human difficulty, and that this distinction increases with extended reasoning. Leveraging these probes, we demonstrate that routing queries across a pool of models can exceed the best-performing model whilst reducing inference cost by up to 70\% on MATH, showing that internal representations enable practical efficiency gains even when they diverge from human intuitions about difficulty. Our code is available at: https://github.com/KabakaWilliam/llms_know_difficulty

URLs: https://github.com/KabakaWilliam/llms_know_difficulty

replace-cross Scaling the Scaling Logic: Agentic Meta-Synthesis of Logic Reasoning

Authors: Bowen Liu, Zhi Wu, Runquan Xie, Zhanhui Kang, Jia Li

Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) is bottlenecked by data: existing synthesis pipelines rely on expert-written code or fixed templates, confining growth to instance-level perturbations. We shift the evolvable unit from problem instances to task-family specifications. SSLogic is an agentic meta-synthesis framework in which LLM agents iteratively author and refine executable Generator-Validator pairs inside a closed Generate-Validate-Refine loop, producing families with new rules and difficulty gradients rather than parameter variations of old ones. A Multi-Gate Validation Protocol -- multi-strategy consensus plus Adversarial Blind Review, where independent agents solve each instance by writing and executing code -- filters ill-posed tasks before they enter training. Starting from 400 seed families, two evolution rounds yield 953 families and 21,389 verifiable instances. Three converging comparisons (step-matched, token-matched, and size-controlled on external Enigmata data) consistently show higher training utility of evolved data, with gains of SynLogic +5.2, AIME25 +3.0, and BBH +5.5 on Enigmata. Fine-grained KORBench evaluation reveals selective improvements in logic (+13.2%) and operation (+9.6%), linking structural evolution to downstream gains. Code: https://github.com/AdAstraAbyssoque/Scaling-the-Scaling-Logic

URLs: https://github.com/AdAstraAbyssoque/Scaling-the-Scaling-Logic

replace-cross SubspaceAD: Training-Free Few-Shot Anomaly Detection via Subspace Modeling

Authors: Camile Lendering, Erkut Akdag, Egor Bondarev

Abstract: Detecting visual anomalies in industrial inspection often requires training with only a few normal images per category. Recent few-shot methods achieve strong results employing foundation-model features, but typically rely on memory banks, auxiliary datasets, or multi-modal tuning of vision-language models. We therefore question whether such complexity is necessary given the feature representations of vision foundation models. To answer this question, we introduce SubspaceAD, a training-free method, that operates in two simple stages. First, patch-level features are extracted from a small set of normal images by a frozen DINOv2 backbone. Second, a Principal Component Analysis (PCA) model is fit to these features to estimate the low-dimensional subspace of normal variations. At inference, anomalies are detected via the reconstruction residual with respect to this subspace, producing interpretable and statistically grounded anomaly scores. Despite its simplicity, SubspaceAD achieves state-of-the-art performance across one-shot and few-shot settings without training, prompt tuning, or memory banks. In the one-shot anomaly detection setting, SubspaceAD achieves image-level and pixel-level AUROC of 97.1% and 97.5% on the MVTec-AD dataset, and 93.4% and 98.2% on the VisA dataset, respectively, surpassing prior state-of-the-art results. Code and demo are available at https://github.com/CLendering/SubspaceAD.

URLs: https://github.com/CLendering/SubspaceAD.

replace-cross RL unknotter, hard unknots and unknotting number

Authors: Anne Dranowski, Yura Kabkov, Daniel Tubbenhauer

Abstract: We develop a reinforcement learning pipeline for simplifying knot diagrams. A trained agent learns move proposals and a value heuristic for navigating Reidemeister moves. The pipeline applies to arbitrary knots and links; we test it on ``very hard'' unknot diagrams and, using diagram inflation, on $4_1\#9_{10}$ where we recover the recently established and surprising upper bound of three for the unknotting number. In addition, we explain a self-improving workbook-driven extension of the pipeline that systematically improves unknotting number upper bounds on the list of prime knots.

replace-cross Noise Models Impacts and Mitigation Strategies in Photonic Quantum Machine Learning

Authors: A. M. A. S. D. Alagiyawanna, Asoka Karunananda

Abstract: Photonic Quantum Machine Learning (PQML) is an emerging method to implement scalable, energy-efficient quantum information processing by combining photonic quantum computing technologies with machine learning techniques. The features of photonic technologies offer several benefits: room-temperature operation; fast (low delay) processing of signals; and the possibility of representing computations in high-dimensional (Hilbert) spaces. This makes photonic technologies a good candidate for the near-term development of quantum devices. However, noise is still a major limiting factor for the performance, reliability, and scalability of PQML implementations. This review provides a detailed and systematic analysis of the sources of noise that will affect PQML implementations. We will present an overview of the principal photonic quantum computer designs and summarize the many different types of quantum machine learning algorithms that have been successfully implemented using photonic quantum computer architectures such as variational quantum circuits, quantum neural networks, and quantum support vector machines. We identify and categorize the primary sources of noise within photonic quantum systems and how these sources of noise behave algorithm-specifically with respect to degrading the accuracy of learning, unstable training, and slower convergence than expected. Additionally, we review traditional and advanced techniques for characterizing noise and provide an extensive survey of strategies for mitigating the effects of noise on learning performance. Finally, we discuss recent advances that demonstrate PQML's capability to operate in real-world settings with realistic noise conditions and future obstacles that will challenge the use of PQML as an effective quantum processing platform.

replace-cross Adaptive Domain Models: Bayesian Evolution, Warm Rotation, and Principled Training for Geometric and Neuromorphic AI

Authors: Houston Haynes

Abstract: Prevailing AI training infrastructure assumes reverse-mode automatic differentiation over IEEE-754 arithmetic. The memory overhead of training relative to inference, optimizer complexity, and structural degradation of geometric properties through training are consequences of this arithmetic substrate. This paper develops an alternative training architecture grounded in three prior results: the Dimensional Type System and Deterministic Memory Management framework [6], which establishes stack-eligible gradient allocation and exact quire accumulation as design-time verifiable properties; the Program Hypergraph [8], which establishes grade preservation through geometric algebra computations as a type-level invariant; and the b-posit 2026 standard [10], which makes posit arithmetic tractable across hardware targets conventionally considered inference-only. Their composition enables depth-independent training memory bounded to approximately twice the inference footprint, grade-preserving weight updates, and exact gradient accumulation, applicable uniformly to loss-function-optimized and spike-timing-dependent neuromorphic models. We introduce Bayesian distillation, a mechanism by which the latent prior structure of a general-purpose model is extracted through the ADM training regime, resolving the data-scarcity bootstrapping problem for domain-specific training. For deployment, we introduce warm rotation, an operational pattern in which an updated model transitions into an active inference pathway without service interruption, with structural correctness formalized through PHG certificates and signed version records. The result is a class of domain-specific AI systems that are smaller and more precise than general-purpose models, continuously adaptive, verifiably correct with respect to the physical structure of their domains, and initializable from existing models.

replace-cross SwiftGS: Episodic Priors for Immediate Satellite Surface Recovery

Authors: Rong Fu, Jiekai Wu, Haiyun Wei, Xiaowen Ma, Shiyin Lin, Kangan Qian, Chuang Liu, Jianyuan Ni, Simon James Fong

Abstract: Rapid, large-scale 3D reconstruction from multi-date satellite imagery is vital for environmental monitoring, urban planning, and disaster response, yet remains difficult due to illumination changes, sensor heterogeneity, and the cost of per-scene optimization. We introduce SwiftGS, a meta-learned system that reconstructs 3D surfaces in a single forward pass by predicting geometry-radiation-decoupled Gaussian primitives together with a lightweight SDF, replacing expensive per-scene fitting with episodic training that captures transferable priors. The model couples a differentiable physics graph for projection, illumination, and sensor response with spatial gating that blends sparse Gaussian detail and global SDF structure, and incorporates semantic-geometric fusion, conditional lightweight task heads, and multi-view supervision from a frozen geometric teacher under an uncertainty-aware multi-task loss. At inference, SwiftGS operates zero-shot with optional compact calibration and achieves accurate DSM reconstruction and view-consistent rendering at significantly reduced computational cost, with ablations highlighting the benefits of the hybrid representation, physics-aware rendering, and episodic meta-training.

replace-cross All elementary functions from a single binary operator

Authors: Andrzej Odrzywo{\l}ek

Abstract: A single two-input gate suffices for all of Boolean logic in digital hardware. No comparable primitive has been known for continuous mathematics: computing elementary functions such as sin, cos, sqrt, and log has always required multiple distinct operations. Here I show that a single binary operator, eml(x,y)=exp(x)-ln(y), together with the constant 1, generates the standard repertoire of a scientific calculator. This includes constants such as e, pi, and i; arithmetic operations including addition, subtraction, multiplication, division, and exponentiation as well as the usual transcendental and algebraic functions. For example, exp(x)=eml(x,1), ln(x)=eml(1,eml(eml(1,x),1)), and likewise for all other operations. That such an operator exists was not anticipated; I found it by systematic exhaustive search and established constructively that it suffices for the concrete scientific-calculator basis. In EML (Exp-Minus-Log) form, every such expression becomes a binary tree of identical nodes, yielding a grammar as simple as S -> 1 | eml(S,S). This uniform structure also enables gradient-based symbolic regression: using EML trees as trainable circuits with standard optimizers (Adam), I demonstrate the feasibility of exact recovery of closed-form elementary functions from numerical data at shallow tree depths up to 4. The same architecture can fit arbitrary data, but when the generating law is elementary, it may recover the exact formula.

replace-cross Intelligence Inertia: Physical Isomorphism and Applications

Authors: Jipeng Han

Abstract: Classical frameworks like Fisher Information approximate the cost of neural adaptation only in low-density regimes, failing to explain the explosive computational overhead incurred during deep structural reconfiguration. To address this, we introduce \textbf{Intelligence Inertia}, a property derived from the fundamental non-commutativity between rules and states ($[\hat{S}, \hat{R}] = i\mathcal{D}$). Rather than claiming a new fundamental physical law, we establish a \textbf{heuristic mathematical isomorphism} between deep learning dynamics and Minkowski spacetime. Acting as an \textit{effective theory} for high-dimensional tensor evolution, we derive a non-linear cost formula mirroring the Lorentz factor, predicting a relativistic $J$-shaped inflation curve -- a computational wall where classical approximations fail. We validate this framework via three experiments: (1) adjudicating the $J$-curve divergence under high-entropy noise, (2) mapping the optimal geodesic for architecture evolution, and (3) deploying an \textbf{inertia-aware scheduler wrapper} that prevents catastrophic forgetting. Adopting this isomorphism yields an exact quantitative metric for structural resistance, advancing the stability and efficiency of intelligent agents.

replace-cross CSTS: A Canonical Security Telemetry Substrate for AI-Native Cyber Detection

Authors: Abdul Rahman

Abstract: Cybersecurity data remains fragmented across vendors, formats, schemas, and deployment environments, forcing AI and analytics programs to spend disproportionate effort on ingestion, normalization, and brittle source-specific engineering. This paper introduces the Canonical Security Telemetry Substrate (CSTS), a canonical, AI-ready telemetry foundation designed to harmonize heterogeneous cyber data into a common representation over persistent entities, typed relations, events, temporal state, and provenance. CSTS is intended to move cybersecurity analytics beyond ad hoc record normalization toward a reusable substrate that supports anomaly detection, graph learning, forecasting, behavior-based modeling, and agentic cyber AI. We formalize the core design principles of CSTS, define its representational components, and explain how it preserves source-specific nuance through explicit mappings and extensible metadata while still enabling portable downstream inference. We further position CSTS as a cloud-agnostic and deployment-agnostic substrate suitable for on-prem, hybrid, and multi-cloud environments. The result is a unifying telemetry model that reduces the blue-collar burden of cyber data engineering and creates a clearer path to scalable, interoperable, and model-agnostic cyber AI.

replace-cross Demystifying When Pruning Works via Representation Hierarchies

Authors: Shwai He, Guoheng Sun, Haichao Zhang, Yun Fu, Ang Li

Abstract: Network pruning, which removes less important parameters or architectures, is often expected to improve efficiency while preserving performance. However, this expectation does not consistently hold across language tasks: pruned models can perform well on non-generative tasks but frequently fail in generative settings. To understand this discrepancy, we analyze network pruning from a representation-hierarchy perspective, decomposing the internal computation of language models into three sequential spaces: embedding (hidden representations), logit (pre-softmax outputs), and probability (post-softmax distributions). We find that representations in the embedding and logit spaces are largely robust to pruning-induced perturbations. However, the nonlinear transformation from logits to probabilities amplifies these deviations, which accumulate across time steps and lead to substantial degradation during generation. In contrast, the stability of the categorical-token probability subspace, together with the robustness of the embedding space, supports the effectiveness of pruning for non-generative tasks such as retrieval and multiple-choice selection. Our analysis disentangles the effects of pruning across tasks and provides practical guidance for its application. Code is available at https://github.com/CASE-Lab-UMD/Pruning-on-Representations

URLs: https://github.com/CASE-Lab-UMD/Pruning-on-Representations

replace-cross A Firefly Algorithm for Mixed-Variable Optimization Based on Hybrid Distance Modeling

Authors: Ousmane Tom Bechir, Ad\'an Jos\'e-Garc\'ia, Zaineb Chelly Garcia, Vincent Sobanski, Clarisse Dhaenens

Abstract: Several real-world optimization problems involve mixed-variable search spaces, where continuous, ordinal, and categorical decision variables coexist. However, most population-based metaheuristic algorithms are designed for either continuous or discrete optimization problems and do not naturally handle heterogeneous variable types. In this paper, we propose an adaptation of the Firefly Algorithm for mixed-variable optimization problems (FAmv). The proposed method relies on a modified distance-based attractiveness mechanism that integrates continuous and discrete components within a unified formulation. This mixed-distance approach enables a more appropriate modeling of heterogeneous search spaces while maintaining a balance between exploration and exploitation. The proposed method is evaluated on the CEC2013 mixed-variable benchmark, which includes unimodal, multimodal, and composition functions. The results show that FAmv achieves competitive, and often superior, performance compared with state-of-the-art mixed-variable optimization algorithms. In addition, experiments on engineering design problems further highlight the robustness and practical applicability of the proposed approach. These results indicate that incorporating appropriate distance formulations into the Firefly Algorithm provides an effective strategy for solving complex mixed-variable optimization problems.

replace-cross Weakly Convex Ridge Regularization for 3D Non-Cartesian MRI Reconstruction

Authors: German Sh\^ama Wache, Chaithya G R, Asma Tanabene, Sebastian Neumayer

Abstract: While highly accelerated non-Cartesian acquisition protocols significantly reduce scan time, they often entail long reconstruction delays. Deep learning based reconstruction methods can alleviate this, but often lack stability and robustness to distribution shifts. As an alternative, we train a rotation invariant weakly convex ridge regularizer (WCRR). The resulting variational reconstruction approach is benchmarked against state of the art methods on retrospectively simulated data and (out of distribution) on prospective GoLF SPARKLING and CAIPIRINHA acquisitions. Our approach consistently outperforms widely used baselines and achieves performance comparable to Plug and Play reconstruction with a state of the art 3D DRUNet denoiser, while offering substantially improved computational efficiency and robustness to acquisition changes. In summary, WCRR unifies the strengths of principled variational methods and modern deep learning based approaches.

replace-cross When GPUs Fail Quietly: Observability-Aware Early Warning Beyond Numeric Telemetry

Authors: Michael Bidollahkhani, Freja Nordsiek, Julian M. Kunkel

Abstract: GPU nodes are central to modern HPC and AI workloads, yet many failures do not manifest as immediate hard faults. While some instabilities emerge gradually as weak thermal or efficiency drift, a significant class occurs abruptly with little or no numeric precursor. In these detachment-class failures, GPUs become unavailable at the driver or interconnect level and the dominant observable signal is structural, including disappearance of device metrics and degradation of monitoring payload integrity. This paper proposes an observability-aware early-warning framework that jointly models (i) utilization-aware thermal drift signatures in GPU telemetry and (ii) monitoring-pipeline degradation indicators such as scrape latency increase, sample loss, time-series gaps, and device-metric disappearance. The framework is evaluated on production telemetry from GPU nodes at GWDG, where GPU, node, monitoring, and scheduler signals can be correlated. Results show that detachment failures exhibit minimal numeric precursor and are primarily observable through structural telemetry collapse, while joint modeling increases early-warning lead time compared to GPU-only detection. The dataset used in this study is publicly available at https://doi.org/10.5281/zenodo.19052367.

URLs: https://doi.org/10.5281/zenodo.19052367.

replace-cross OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training

Authors: Haiyue Song, Masao Utiyama

Abstract: Continual pre-training is widely used to adapt LLMs to target languages and domains, yet the mixture ratio of training data remains a sensitive hyperparameter that is expensive to tune: they must be fixed before training begins, and a suboptimal choice can waste weeks of compute. In this work, we propose OptiMer, which decouples ratio selection from training: we train one CPT model per dataset, extract each model's distribution vector, which represents the parameter shift induced by that dataset, and search for optimal composition weights post-hoc via Bayesian optimization. Experiments on Gemma 3 27B across languages (Japanese, Chinese) and domains (Math, Code) show that OptiMer consistently outperforms data mixture and model averaging baselines with 15-35 times lower search cost. Key findings reveal that 1) the optimized weights can be interpreted as data mixture ratios, and retraining with these ratios improves data mixture CPT, and 2) the same vector pool can be re-optimized for a given objective without any retraining, producing target-tailored models on demand. Our work establishes that data mixture ratio selection, traditionally a pre-training decision, can be reformulated as a post-hoc optimization over distribution vectors, offering a more flexible paradigm for continual pre-training.

replace-cross Segmentation of Gray Matters and White Matters from Brain MRI data

Authors: Chang Sun, Rui Shi, Tsukasa Koike, Tetsuro Sekine, Akio Morita, Tetsuya Sakai

Abstract: Accurate segmentation of brain tissues such as gray matter and white matter from magnetic resonance imaging is essential for studying brain anatomy, diagnosing neurological disorders, and monitoring disease progression. Traditional methods, such as FSL FAST, produce tissue probability maps but often require task-specific adjustments and face challenges with diverse imaging conditions. Recent foundation models, such as MedSAM, offer a prompt-based approach that leverages large-scale pretraining. In this paper, we propose a modified MedSAM model designed for multi-class brain tissue segmentation. Our preprocessing pipeline includes skull stripping with FSL BET, tissue probability mapping with FSL FAST, and converting these into 2D axial, sagittal, coronal slices with multi-class labels (background, gray matter, and white matter). We extend MedSAM's mask decoder to three classes, freezing the pre-trained image encoder and fine-tuning the prompt encoder and decoder. Experiments on the IXI dataset achieve Dice scores up to 0.8751. This work demonstrates that foundation models like MedSAM can be adapted for multi-class medical image segmentation with minimal architectural modifications. Our findings suggest that such models can be extended to more diverse medical imaging scenarios in future work.

replace-cross The Thiomi Dataset: A Large-Scale Multimodal Corpus for Low-Resource African Languages

Authors: Hillary Mutisya, John Mugane, Gavin Nyamboga, Brian Chege, Maryruth Gathoni

Abstract: We present the Thiomi Dataset, a large-scale multimodal corpus spanning ten African languages across four language families: Swahili, Kikuyu, Kamba, Kimeru, Luo, Maasai, Kipsigis, Somali (East Africa); Wolof (West Africa); and Fulani (West/Central Africa). The dataset contains over 601,000 approved sentence-level text annotations and over 385,000 audio recordings, collected through a dedicated community data collection platform involving over 100 contributors. To validate the dataset's utility, we train and evaluate ASR, MT, and TTS models, establishing baselines across all languages. Our best ASR system achieves 3.24% WER on Swahili (Common Voice), reducing prior academic SOTA from 8.3% to 3.24% (5.1 percentage point absolute, 61% relative reduction), and 4.3% WER on Somali. The dataset will be published on HuggingFace. We describe the collection platform, quality assurance workflows, and baseline experiments, and discuss implications for African language technology infrastructure.

replace-cross Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning

Authors: Kavindu Herath, Joshua Zhao, Saurabh Bagchi

Abstract: Backdoor attacks on federated learning (FL) are most often evaluated with synthetic corner patches or out-of-distribution (OOD) patterns that are unlikely to arise in practice. In this paper, we revisit the backdoor threat to standard FL (a single global model) under a more realistic setting where triggers must be semantically meaningful, in-distribution, and visually plausible. We propose SABLE, a Semantics-Aware Backdoor for LEarning in federated settings, which constructs natural, content-consistent triggers (e.g., semantic attribute changes such as sunglasses) and optimizes an aggregation-aware malicious objective with feature separation and parameter regularization to keep attacker updates close to benign ones. We instantiate SABLE on CelebA hair-color classification and the German Traffic Sign Recognition Benchmark (GTSRB), poisoning only a small, interpretable subset of each malicious client's local data while otherwise following the standard FL protocol. Across heterogeneous client partitions and multiple aggregation rules (FedAvg, Trimmed Mean, MultiKrum, and FLAME), our semantics-driven triggers achieve high targeted attack success rates while preserving benign test accuracy. These results show that semantics-aligned backdoors remain a potent and practical threat in federated learning, and that robustness claims based solely on synthetic patch triggers can be overly optimistic.

replace-cross Metriplector: From Field Theory to Neural Architecture

Authors: Dan Oprisa, Peter Toth

Abstract: We present Metriplector, a neural architecture primitive in which the input configures an abstract physical system -- fields, sources, and operators -- and the dynamics of that system is the computation. Multiple fields evolve via coupled metriplectic dynamics, and the stress-energy tensor T^{\mu\nu}, derived from Noether's theorem, provides the readout. The metriplectic formulation admits a natural spectrum of instantiations: the dissipative branch alone yields a screened Poisson equation solved exactly via conjugate gradient; activating the full structure -- including the antisymmetric Poisson bracket -- gives field dynamics for image recognition, language modeling, and robotic control. We evaluate Metriplector across five domains, each using a task-specific architecture built from this shared primitive with progressively richer physics: 81.03% on CIFAR-100 with 2.26M parameters; 88% CEM success on Reacher robotic control with under 1M parameters; 97.2% exact Sudoku solve rate with zero structural injection; 1.182 bits/byte on language modeling with 3.6x fewer training tokens than a GPT baseline; and F1=1.0 on maze pathfinding, generalizing from 15x15 training grids to unseen 39x39 grids.

replace-cross S0 Tuning: Zero-Overhead Adaptation of Hybrid Recurrent-Attention Models

Authors: Jack Young

Abstract: Using roughly 48 execution-verified HumanEval training solutions, tuning a single initial state matrix per recurrent layer, with zero inference overhead, outperforms LoRA by +10.8 pp (p < 0.001) on HumanEval. The method, which we call S0 tuning, optimizes one state matrix per recurrent layer while freezing all model weights. On Qwen3.5-4B (GatedDeltaNet hybrid), S0 tuning improves greedy pass@1 by +23.6 +/- 1.7 pp (10 seeds). On FalconH1-7B (Mamba-2 hybrid), S0 reaches 71.8% +/- 1.3 and LoRA reaches 71.4% +/- 2.4 (3 seeds), statistically indistinguishable at this sample size while requiring no weight merging. Cross-domain transfer is significant on MATH-500 (+4.8 pp, p = 0.00002, 8 seeds) and GSM8K (+2.8 pp, p = 0.0003, 10 seeds); a text-to-SQL benchmark (Spider) shows no transfer, consistent with the trajectory-steering mechanism. A prefix-tuning control on a pure Transformer (Qwen2.5-3B) degrades performance by -13.9 pp under all nine configurations tested. On Qwen3.5, a per-step state-offset variant reaches +27.1 pp, above both S0 and LoRA but with per-step inference cost. Taken together, the results show that recurrent state initialization is a strong zero-inference-overhead PEFT surface for hybrid language models when verified supervision is scarce. The tuned state is a ~48 MB file; task switching requires no weight merging or model reload. Code and library: https://github.com/jackyoung27/s0-tuning.

URLs: https://github.com/jackyoung27/s0-tuning.

replace-cross CASHG: Context-Aware Stylized Online Handwriting Generation

Authors: Jinsu Shin, Sungeun Hong, JinYeong Bak

Abstract: Online handwriting represents strokes as time-ordered trajectories, which makes handwritten content easier to transform and reuse in a wide range of applications. However, generating natural sentence-level online handwriting that faithfully reflects a writer's style remains challenging, since sentence synthesis demands context-dependent characters with stroke continuity and spacing. Prior methods treat these boundary properties as implicit outcomes of sequence modeling, which becomes unreliable at the sentence scale and under limited compositional diversity. We propose CASHG, a context-aware stylized online handwriting generator that explicitly models inter-character connectivity for style-consistent sentence-level trajectory synthesis. CASHG uses a Character Context Encoder to obtain character identity and sentence-dependent context memory and fuses them in a bigram-aware sliding-window Transformer decoder that emphasizes local predecessor--current transitions, complemented by gated context fusion for sentence-level context.Training proceeds through a three-stage curriculum from isolated glyphs to full sentences, improving robustness under sparse transition coverage. We further introduce Connectivity and Spacing Metrics (CSM), a boundary-aware evaluation suite that quantifies cursive connectivity and spacing similarity. Under benchmark-matched evaluation protocols, CASHG consistently improves CSM over comparison methods while remaining competitive in DTW-based trajectory similarity, with gains corroborated by a human evaluation.

replace-cross SkVM: Compiling Skills for Efficient Execution Everywhere

Authors: Le Chen, Erhu Feng, Yubin Xia, Haibo Chen

Abstract: LLM agents increasingly adopt skills as a reusable unit of composition. While skills are shared across diverse agent platforms, current systems treat them as raw context, causing the same skill to behave inconsistently for different agents. This fragility undermines skill portability and execution efficiency. To address this challenge, we analyze 118,000 skills and draw inspiration from traditional compiler design. We treat skills as code and LLMs as heterogeneous processors. To make portability actionable, we decompose a skill's requirements into a set of primitive capabilities, and measure how well each model-harness pair supports them. Based on these capability profiles, we propose SkVM, a compilation and runtime system designed for portable and efficient skill execution. At compile time, SkVM performs capability-based compilation, environment binding, and concurrency extraction. At runtime, SkVM applies JIT code solidification and adaptive recompilation for performance optimization. We evaluate SkVM across eight LLMs of varying scales and three agent harnesses, covering SkillsBench and representative skill tasks. Results demonstrate that SkVM significantly improves task completion rates across different models and environments while reducing token consumption by up to 40%. In terms of performance, SkVM achieves up to 3.2x speedup with enhanced parallelism, and 19-50x latency reduction through code solidification.