new Translational Gaps in Graph Transformers for Longitudinal EHR Prediction: A Critical Appraisal of GT-BEHRT

Authors: Krish Tadigotla

Abstract: Transformer-based models have improved predictive modeling on longitudinal electronic health records through large-scale self-supervised pretraining. However, most EHR transformer architectures treat each clinical encounter as an unordered collection of codes, which limits their ability to capture meaningful relationships within a visit. Graph-transformer approaches aim to address this limitation by modeling visit-level structure while retaining the ability to learn long-term temporal patterns. This paper provides a critical review of GT-BEHRT, a graph-transformer architecture evaluated on MIMIC-IV intensive care outcomes and heart failure prediction in the All of Us Research Program. We examine whether the reported performance gains reflect genuine architectural benefits and whether the evaluation methodology supports claims of robustness and clinical relevance. We analyze GT-BEHRT across seven dimensions relevant to modern machine learning systems, including representation design, pretraining strategy, cohort construction transparency, evaluation beyond discrimination, fairness assessment, reproducibility, and deployment feasibility. GT-BEHRT reports strong discrimination for heart failure prediction within 365 days, with AUROC 94.37 +/- 0.20, AUPRC 73.96 +/- 0.83, and F1 64.70 +/- 0.85. Despite these results, we identify several important gaps, including the lack of calibration analysis, incomplete fairness evaluation, sensitivity to cohort selection, limited analysis across phenotypes and prediction horizons, and limited discussion of practical deployment considerations. Overall, GT-BEHRT represents a meaningful architectural advance in EHR representation learning, but more rigorous evaluation focused on calibration, fairness, and deployment is needed before such models can reliably support clinical decision-making.

new RFX-Fuse: Breiman and Cutler's Unified ML Engine + Native Explainable Similarity

Authors: Chris Kuchar

Abstract: Breiman and Cutler's original Random Forest was designed as a unified ML engine -- not merely an ensemble predictor. Their implementation included classification, regression, unsupervised learning, proximity-based similarity, outlier detection, missing value imputation, and visualization -- capabilities that modern libraries like scikit-learn never implemented. RFX-Fuse (Random Forests X [X=compression] -- Forest Unified Learning and Similarity Engine) delivers Breiman and Cutler's complete vision with native GPU/CPU support. Modern ML pipelines require 5+ separate tools -- XGBoost for prediction, FAISS for similarity, SHAP for explanations, Isolation Forest for outliers, custom code for importance. RFX-Fuse provides a 1 to 2 model object alternative -- a single set of trees grown once. Novel Contributions: (1) Proximity Importance -- native explainable similarity: proximity measures that samples are similar; proximity importance explains why. (2) Dataset-specific imputation validation for general tabular data -- ranking imputation methods by how real the imputed data looks, without ground truth labels.

new Continual Fine-Tuning with Provably Accurate and Parameter-Free Task Retrieval

Authors: Hang Thi-Thuy Le, Long Minh Bui, Minh Hoang, Trong Nghia Hoang

Abstract: Continual fine-tuning aims to adapt a pre-trained backbone to new tasks sequentially while preserving performance on earlier tasks whose data are no longer available. Existing approaches fall into two categories which include input- and parameter-adaptation. Input-adaptation methods rely on retrieving the most relevant prompts at test time, but require continuously learning a retrieval function that is prone to forgetting. Parameter-adaptation methods instead use a fixed input embedding function to enable retrieval-free prediction and avoid forgetting, but sacrifice representation adaptability. To combine their best strengths, we propose a new parameter-adaptation method that enables adaptive use of input embeddings during test time with parameter-free retrieval. We derive task-retrieval error bounds for a clustering-based, parameter-free paradigm, providing theoretical guarantees that link low retrieval error to structural properties of task-specific representation clusters, revealing a fresh insight into how well-organized clustering structure will enable reliable retrieval. Motivated by this insight, our method is designed with two key components: (i) an adaptive module composition strategy that learns informative task-specific updates to preserve and complement prior knowledge, and (ii) a clustering-based retrieval mechanism that captures distinct representation signatures for each task, enabling adaptive representation use at test time. Extensive experiments show that these components work synergistically to improve retrieval and predictive performance under large shifts in task semantics.

new Introducing Feature-Based Trajectory Clustering, a clustering algorithm for longitudinal data

Authors: Marie-Pierre Sylvestre, Laurence Boulanger

Abstract: We present a new algorithm for clustering longitudinal data. Data of this type can be conceptualized as consisting of individuals and, for each such individual, observations of a time-dependent variable made at various times. Generically, the specific way in which this variable evolves with time is different from one individual to the next. However, there may also be commonalities; specific characteristic features of the time evolution shared by many individuals. The purpose of the method we put forward is to find clusters of individual whose underlying time-dependent variables share such characteristic features. This is done in two steps. The first step identifies each individual to a point in Euclidean space whose coordinates are determined by specific mathematical formulae meant to capture a variety of characteristic features. The second step finds the clusters by applying the Spectral Clustering algorithm to the resulting point cloud.

new Your Code Agent Can Grow Alongside You with Structured Memory

Authors: Yi-Xuan Deng, Xiaoqin Liu, Yi Zhang, Guo-Wei Yang, Shuojin Yang

Abstract: While "Intent-oriented programming" (or "Vibe Coding") redefines software engineering, existing code agents remain tethered to static code snapshots. Consequently, they struggle to model the critical information embedded in the temporal evolution of projects, failing to leverage the "reasoning trajectories" implicit in past successful practices. This limitation results in rigid behavioral logic and a lack of autonomous adaptability, ultimately hindering their ability to tackle complex, repository-level problems. To bridge this static-dynamic mismatch, we propose MemCoder, a framework designed to enable continual human-AI co-evolution. MemCoder first structures historical human experience to distill latent intent-to-code mappings from past commits. It then employs a self-refinement mechanism driven by verification feedback to correct agent behavior in real-time. Crucially, an experience self-internalization mechanism is introduced to crystallize human-validated solutions into long-term knowledge, thereby supporting sustained evolution. Experimental results on SWE-bench Verified demonstrate that MemCoder not only achieves State-of-the-Art (SOTA) performance but also delivers a 9.4% improvement in resolved rate over the general foundation model DeepSeek-V3.2. These findings indicate that equipping agents with the capability to co-evolve with humans via project history and real-time feedback effectively unlocks the potential of general models in complex software engineering tasks.

new Beyond Attention: True Adaptive World Models via Spherical Kernel Operator

Authors: Vladimer Khasia

Abstract: The pursuit of world model based artificial intelligence has predominantly relied on projecting high-dimensional observations into parameterized latent spaces, wherein transition dynamics are subsequently learned. However, this conventional paradigm is mathematically flawed: it merely displaces the manifold learning problem into the latent space. When the underlying data distribution shifts, the latent manifold shifts accordingly, forcing the predictive operator to implicitly relearn the new topological structure. Furthermore, by classical approximation theory, positive operators like dot product attention inevitably suffer from the saturation phenomenon, permanently bottlenecking their predictive capacity and leaving them vulnerable to the curse of dimensionality. In this paper, we formulate a mathematically rigorous paradigm for world model construction by redefining the core predictive mechanism. Inspired by Ryan O'Dowd's foundational work we introduce Spherical Kernel Operator (SKO), a framework that replaces standard attention. By projecting the unknown data manifold onto a unified ambient hypersphere and utilizing a localized sequence of ultraspherical (Gegenbauer) polynomials, SKO performs direct integral reconstruction of the target function. Because this localized spherical polynomial kernel is not strictly positive, it bypasses the saturation phenomenon, yielding approximation error bounds that depend strictly on the intrinsic manifold dimension q, rather than the ambient dimension. Furthermore, by formalizing its unnormalized output as an authentic measure support estimator, SKO mathematically decouples the true environmental transition dynamics from the biased observation frequency of the agent. Empirical evaluations confirm that SKO significantly accelerates convergence and outperforms standard attention baselines in autoregressive language modeling.

new Federated Personal Knowledge Graph Completion with Lightweight Large Language Models for Personalized Recommendations

Authors: Fernando Spadea, Oshani Seneviratne

Abstract: Personalized recommendation increasingly relies on private user data, motivating approaches that can adapt to individuals without centralizing their information. We present Federated Targeted Recommendations with Evolving Knowledge graphs and Language Models (FedTREK-LM), a framework that unifies lightweight large language models (LLMs), evolving personal knowledge graphs (PKGs), federated learning (FL), and Kahneman-Tversky Optimization to enable scalable, decentralized personalization. By prompting LLMs with structured PKGs, FedTREK-LM performs context-aware reasoning for personalized recommendation tasks such as movie and recipe suggestions. Across three lightweight Qwen3 models (0.6B, 1.7B, 4B), FedTREK-LM consistently and substantially outperforms state-of-the-art KG completion and federated recommendation baselines (HAKE, KBGAT, and FedKGRec), achieving more than a 4x improvement in F1-score on the movie and food benchmarks. Our results further show that real user data is critical for effective personalization, as synthetic data degrades performance by up to 46%. Overall, FedTREK-LM offers a practical paradigm for adaptive, LLM-powered personalization that generalizes across decentralized, evolving user PKGs.

new Knowledge, Rules and Their Embeddings: Two Paths towards Neuro-Symbolic JEPA

Authors: Yongchao Huang, Hassan Raza

Abstract: Modern self-supervised predictive architectures excel at capturing complex statistical correlations from high-dimensional data but lack mechanisms to internalize verifiable human logic, leaving them susceptible to spurious correlations and shortcut learning. Conversely, traditional rule-based inference systems offer rigorous, interpretable logic but suffer from discrete boundaries and NP-hard combinatorial explosion. To bridge this divide, we propose a bidirectional neuro-symbolic framework centered around Rule-informed Joint-Embedding Predictive Architectures (RiJEPA). In the first direction, we inject structured inductive biases into JEPA training via Energy-Based Constraints (EBC) and a multi-modal dual-encoder architecture. This fundamentally reshapes the representation manifold, replacing arbitrary statistical correlations with geometrically sound logical basins. In the second direction, we demonstrate that by relaxing rigid, discrete symbolic rules into a continuous, differentiable logic, we can bypass traditional combinatorial search for new rule generation. By leveraging gradient-guided Langevin diffusion within the rule energy landscape, we introduce novel paradigms for continuous rule discovery, which enable unconditional joint generation, conditional forward and abductive inference, and marginal predictive translation. Empirical evaluations on both synthetic topological simulations and a high-stakes clinical use case confirm the efficacy of our approach. Ultimately, this framework establishes a powerful foundation for robust, generative, and interpretable neuro-symbolic representation learning.

new CAMEL-CLIP: Channel-aware Multimodal Electroencephalography-text Alignment for Generalizable Brain Foundation Models

Authors: Hanseul Choi, Jinyeong Park, Seongwon Jin, Sungho Park, Jibum Kim

Abstract: Electroencephalography (EEG) foundation models have shown promise for learning generalizable representations, yet they remain sensitive to channel heterogeneity, such as changes in channel composition or ordering. We propose channel-aware multimodal EEG-text alignment contrastive language-image pretraining (CAMEL-CLIP), a contrastive EEG-text multimodal foundation model designed to be robust to heterogeneous channel configurations and widely applicable to diverse downstream tasks. CAMEL-CLIP introduces three key components: (1) channel attribute-based positional encoding, which identifies channels through semantic information; (2) dynamic channel projection, which generates variable-length embeddings by independently projecting each channel without feature compression; and (3) dual-level contrastive learning, which jointly performs channel-level and sample-level contrastive learning to capture both channel-specific and global signal characteristics. Experimental results demonstrate that CAMEL-CLIP achieves state-of-the-art performance under linear-probing and outperforms existing foundation models that rely on full-finetuning.

new Spatially Aware Deep Learning for Microclimate Prediction from High-Resolution Geospatial Imagery

Authors: Idan Sulami, Alon Itzkovitch, Michael R. Kearney, Moni Shahar, Ofir Levy

Abstract: Microclimate models are essential for linking climate to ecological processes, yet most physically based frameworks estimate temperature independently for each spatial unit and rely on simplified representations of lateral heat exchange. As a result, the spatial scales over which surrounding environmental conditions influence local microclimates remain poorly quantified. Here, we show how remote sensing can help quantify the contribution of spatial context to microclimate temperature predictions. Building on convolutional neural network principles, we designed a task-specific deep neural network and trained a series of models in which the spatial extent of input data was systematically varied. Drone-derived spatial layers and meteorological data were used to predict ground temperature at a focal location, allowing direct assessment of how prediction accuracy changes with increasing spatial context. Our results show that incorporating spatially adjacent information substantially improves prediction accuracy, with diminishing returns beyond spatial extents of approximately 5-7 m. This characteristic scale indicates that ground temperatures are influenced not only by local surface properties, but also by horizontal heat transfer and radiative interactions operating across neighboring microhabitats. The magnitude of spatial effects varied systematically with time of day, microhabitat type, and local environmental characteristics, highlighting context-dependent spatial coupling in microclimate formation. By treating deep learning as a diagnostic tool rather than solely a predictive one, our approach provides a general and transferable method for quantifying spatial dependencies in microclimate models and informing the development of hybrid mechanistic-data-driven approaches that explicitly account for spatial interactions while retaining physical interpretability.

new Learning from Partial Chain-of-Thought via Truncated-Reasoning Self-Distillation

Authors: Gianluigi Silvestri, Edoardo Cetin

Abstract: Reasoning-oriented language models achieve strong performance by generating long chain-of-thought traces at inference time. However, this capability comes with substantial and often excessive computational cost, which can materialize in redundant or inefficient reasoning. We study this setting and introduce Truncated-Reasoning Self-Distillation (TRSD), a lightweight post-training procedure that encourages models to produce correct predictions from partial reasoning traces. In TRSD, a frozen teacher model first generates a full reasoning trace and evaluates the corresponding answer distribution conditioned on the prompt and the complete reasoning to construct a synthetic training target. A student model with the same architecture is then trained to match the teacher's answer distribution while being conditioned only on a truncated prefix of its reasoning trace. Across multiple reasoning benchmarks and token budgets, we demonstrate that TRSD improves robustness to truncated inference, with far reduced accuracy tradeoffs when applied to a diverse set of reasoning models. Moreover, although never explicitly regularized for shorter generation during training, we also find that TRSD-trained models inherently output shorter reasoning traces without truncation, significantly reducing inference-time costs even without artificial interventions.

new PREBA: Surgical Duration Prediction via PCA-Weighted Retrieval-Augmented LLMs and Bayesian Averaging Aggregation

Authors: Wanyin Wu, Kanxue Li, Baosheng Yu, Haoyun Zhao, Yibing Zhan, Dapeng Tao, Hua Jin

Abstract: Accurate prediction of surgical duration is pivotal for hospital resource management. Although recent supervised learning approaches-from machine learning (ML) to fine-tuned large language models (LLMs)-have shown strong performance, they remain constrained by the need for high-quality labeled data and computationally intensive training. In contrast, zero-shot LLM inference offers a promising training-free alternative but it lacks grounding in institution-specific clinical context (e.g., local demographics and case-mix distributions), making its predictions clinically misaligned and prone to instability. To address these limitations, we present PREBA, a retrieval-augmented framework that integrates PCA-weighted retrieval and Bayesian averaging aggregation to ground LLM predictions in institution-specific clinical evidence and statistical priors. The core of PREBA is to construct an evidence-based prompt for the LLM, comprising (1) the most clinically similar historical surgical cases and (2) clinical statistical priors. To achieve this, PREBA first encodes heterogeneous clinical features into a unified representation space enabling systematic retrieval. It then performs PCA-weighted retrieval to identify clinically relevant historical cases, which form the evidence context supplied to the LLM. Finally, PREBA applies Bayesian averaging to fuse multi-round LLM predictions with population-level statistical priors, yielding calibrated and clinically plausible duration estimates. We evaluate PREBA on two real-world clinical datasets using three state-of-the-art LLMs, including Qwen3, DeepSeek-R1, and HuatuoGPT-o1. PREBA significantly improves performance-for instance, reducing MAE by up to 40% and raising R^2 from -0.13 to 0.62 over zero-shot inference-and it achieves accuracy competitive with supervised ML methods, demonstrating strong effectiveness and generalization.

new FastODT: A tree-based framework for efficient continual learning

Authors: Daniel Bretsko, Piotr Walas, Devashish Khulbe, Sebastian Stros, Stanislav Sobolevsky, Tomas Satura

Abstract: Machine learning models deployed in real-world settings must operate under evolving data distributions and constrained computational resources. This challenge is particularly acute in non-stationary domains such as energy time series, weather monitoring, and environmental sensing. To remain effective, models must support adaptability, continuous learning, and long-term knowledge retention. This paper introduces a oblivious tree-based model with Hoeffding bound controlling its growth. It seamlessly integrates rapid learning and inference with efficient memory management and robust knowledge preservation, thus allowing for online learning. Extensive experiments across energy and environmental sensing time-series benchmarks demonstrate that the proposed framework achieves performance competitive with, and in several cases surpassing, existing online and batch learning methods, while maintaining superior computational efficiency. Collectively, these results demonstrate that the proposed approach fulfills the core objectives of adaptability, continual updating, and efficient retraining without full model retraining. The framework provides a scalable and resource-aware foundation for deployment in real-world non-stationary environments where resources are constrained and sustained adaptation is essential.

new Learning Retrieval Models with Sparse Autoencoders

Authors: Thibault Formal, Maxime Louis, Herv\'e Dejean, St\'ephane Clinchant

Abstract: Sparse autoencoders (SAEs) provide a powerful mechanism for decomposing the dense representations produced by Large Language Models (LLMs) into interpretable latent features. We posit that SAEs constitute a natural foundation for Learned Sparse Retrieval (LSR), whose objective is to encode queries and documents into high-dimensional sparse representations optimized for efficient retrieval. In contrast to existing LSR approaches that project input sequences into the vocabulary space, SAE-based representations offer the potential to produce more semantically structured, expressive, and language-agnostic features. Building on this insight, we introduce SPLARE, a method to train SAE-based LSR models. Our experiments, relying on recently released open-source SAEs, demonstrate that this technique consistently outperforms vocabulary-based LSR in multilingual and out-of-domain settings. SPLARE-7B, a multilingual retrieval model capable of producing generalizable sparse latent embeddings for a wide range of languages and domains, achieves top results on MMTEB's multilingual and English retrieval tasks. We also developed a 2B-parameter variant with a significantly lighter footprint.

new Demand Acceptance using Reinforcement Learning for Dynamic Vehicle Routing Problem with Emission Quota

Authors: Farid Najar, Dominique Barth, Yann Strozecki

Abstract: This paper introduces and formalizes the Dynamic and Stochastic Vehicle Routing Problem with Emission Quota (DS-QVRP-RR), a novel routing problems that integrates dynamic demand acceptance and routing with a global emission constraint. A key contribution is a two-layer optimization framework designed to facilitate anticipatory rejections of demands and generation of new routes. To solve this, we develop hybrid algorithms that combine reinforcement learning with combinatorial optimization techniques. We present a comprehensive computational study that compares our approach against traditional methods. Our findings demonstrate the relevance of our approach for different types of inputs, even when the horizon of the problem is uncertain.

new A Stability-Aware Frozen Euler Autoencoder for Physics-Informed Tracking in Continuum Mechanics (SAFE-PIT-CM)

Authors: Emil Hovad

Abstract: We introduce a Stability-Aware Frozen Euler autoencoder for Physics-Informed Tracking in Continuum Mechanics (SAFE-PIT-CM) that recovers material parameters and temporal field evolution from videos of physical processes. The architecture is an autoencoder whose latent-space transition is governed by a frozen PDE operator: a convolutional encoder maps each frame to a latent field; the SAFE operator propagates it forward via sub-stepped finite differences; and a decoder reconstructs the video. Because the physics is embedded as a frozen, differentiable layer, backpropagation yields gradients that directly supervise an attention-based estimator for the transport coefficient alpha, requiring no ground-truth labels. The SAFE operator is the central contribution. Temporal snapshots are saved at intervals far larger than the simulation time step; a forward Euler step at the frame interval violates the von Neumann stability condition, causing alpha to collapse to an unphysical value. The SAFE operator resolves this by sub-stepping the frozen finite-difference stencil to match the original temporal resolution, restoring stability and enabling accurate parameter recovery. We demonstrate SAFE-PIT-CM on the heat equation (diffusion, alpha < 0) and the reverse heat equation (mobility, alpha > 0). SAFE-PIT-CM also supports zero-shot inference: learning alpha from a single simulation with no training data, using only the SAFE loss as supervision. The zero-shot mode achieves accuracy comparable to a pre-trained model. The architecture generalises to any PDE admitting a convolutional finite-difference discretisation. Because latent dynamics are governed by a known PDE, SAFE-PIT-CM is inherently explainable: every prediction is traceable to a physical transport coefficient and step-by-step PDE propagation.

new ICaRus: Identical Cache Reuse for Efficient Multi Model Inference

Authors: Sunghyeon Woo, Jaeeun Kil, Hoseung Kim, Minsub Kim, Joonghoon Kim, Ahreum Seo, Sungjae Lee, Minjung Jo, Jiwon Ryu, Baeseong Park, Se Jung Kwon, Dongsoo Lee

Abstract: Multi model inference has recently emerged as a prominent paradigm, particularly in the development of agentic AI systems. However, in such scenarios, each model must maintain its own Key-Value (KV) cache for the identical prompt, leading to substantial memory consumption. This explosive growth of KV caches forces LLM serving systems to evict previously stored caches, which in turn introduces significant recomputation overhead whenever the evicted caches are required again. Moreover, prefix caching is inherently infeasible across different models, forcing each model to recompute KV cache for the identical prompt, which leads to significant overhead. To alleviate these issues, we propose Identical Cache Reuse (ICaRus), a novel architecture that allows multiple models to share identical KV caches across all layers. ICaRus is based on the key observation that a decoder-only Transformer can be conceptually decomposed into a logical encoder, which generates KV caches, and a logical decoder, which predicts output tokens from the KV caches. ICaRus fine-tunes only the logical decoder while freezing the logical encoder, enabling multiple models to share an identical KV cache. This eliminates cache memory explosion and unexpected evictions while also allowing cross-model reuse of KV caches for new input tokens, thereby removing redundant recomputation in multi model inference achieving both efficiency and scalability. Moreover, by incorporating lightweight adapters such as LoRA, ICaRus parallelizes KV cache generation and next-token prediction during decoding. ICaRus achieves comparable accuracy to task-specific fine-tuned model across a diverse set of tasks, while allowing multiple specialized models to fully share KV caches. ICaRus achieves up to 11.1x lower P95 latency and 3.8x higher throughput in multi agent workflow with 8 different models, compared to conventional multi model system.

new FedTreeLoRA: Reconciling Statistical and Functional Heterogeneity in Federated LoRA Fine-Tuning

Authors: Jieming Bian, Lei Wang, Letian Zhang, Jie Xu

Abstract: Federated Learning (FL) with Low-Rank Adaptation (LoRA) has become a standard for privacy-preserving LLM fine-tuning. However, existing personalized methods predominantly operated under a restrictive Flat-Model Assumption: they addressed client-side \textit{statistical heterogeneity} but treated the model as a monolithic block, ignoring the \textit{functional heterogeneity} across LLM layers. We argue that these two statistical (horizontal) and functional (vertical) dimensions, are \textit{orthogonal in source yet coupled in interaction}, implying that the optimal depth of parameter sharing is functionally dependent on client similarity. To address this, we propose \textbf{FedTreeLoRA}, a framework employing tree-structured aggregation for fine-grained, layer-wise alignment. By dynamically constructing an aggregation hierarchy, FedTreeLoRA allows clients to share broad consensus on shallow `trunks' while progressively specializing on deep `branches'. Experiments on NLU and NLG benchmarks demonstrate that FedTreeLoRA significantly outperforms state-of-the-art methods by effectively reconciling generalization and personalization.

new Do Diffusion Models Dream of Electric Planes? Discrete and Continuous Simulation-Based Inference for Aircraft Design

Authors: Aurelien Ghiglino, Daniel Elenius, Anirban Roy, Ramneet Kaur, Manoj Acharya, Colin Samplawski, Brian Matejek, Susmit Jha, Juan Alonso, Adam Cobb

Abstract: In this paper, we generate conceptual engineering designs of electric vertical take-off and landing (eVTOL) aircraft. We follow the paradigm of simulation-based inference (SBI), whereby we look to learn a posterior distribution over the full eVTOL design space. To learn this distribution, we sample over discrete aircraft configurations (topologies) and their corresponding set of continuous parameters. Therefore, we introduce a hierarchical probabilistic model consisting of two diffusion models. The first model leverages recent work on Riemannian Diffusion Language Modeling (RDLM) and Unified World Models (UWMs) to enable us to sample topologies from a discrete and continuous space. For the second model we introduce a masked diffusion approach to sample the corresponding parameters conditioned on the topology. Our approach rediscovers known trends and governing physical laws in aircraft design, while significantly accelerating design generation.

new Brittlebench: Quantifying LLM robustness via prompt sensitivity

Authors: Angelika Romanou, Mark Ibrahim, Candace Ross, Chantal Shaib, Kerem Okta, Sam Bell, Elia Ovalle, Jesse Dodge, Antoine Bosselut, Koustuv Sinha, Adina Williams

Abstract: Existing evaluation methods largely rely on clean, static benchmarks, which can overestimate true model performance by failing to capture the noise and variability inherent in real-world user inputs. This is especially true for language models, which can face human-generated text queries containing mistakes, typos, or alternative ways of phrasing the same question. In this work, we introduce a theoretical framework for quantifying model sensitivity to prompt variants, or brittleness, that can enable us to disentangle data-induced difficulty from prompt-related variability. Using this framework, we design a novel evaluation pipeline, Brittlebench, to holistically evaluate the sensitivity of frontier models. We apply semantics-preserving perturbations to a suite of popular benchmarks, and observe model performance to degrade as much as 12%. However, these perturbations do not affect all models equally: even a single perturbation alters the relative ranking of models in 63% of cases, impacting conclusions about comparative model performance. Decomposing the total variance of both state-of-the-art open-weight and commercial models, we find that semantics-preserving input perturbations can account for up to half of the performance variance for a given model. Brittlebench highlights the need for more robust evaluations and models, and allows us to systematically understand model brittleness.

new From Stochastic Answers to Verifiable Reasoning: Interpretable Decision-Making with LLM-Generated Code

Authors: Anirudh Jaidev Mahesh, Ben Griffin, Fuat Alican, Joseph Ternasky, Zakari Salifu, Kelvin Amoaba, Yagiz Ihlamur, Aaron Ontoyin Yin, Aikins Laryea, Afriyie Samuel, Yigit Ihlamur

Abstract: Large language models (LLMs) are increasingly used for high-stakes decision-making, yet existing approaches struggle to reconcile scalability, interpretability, and reproducibility. Black-box models obscure their reasoning, while recent LLM-based rule systems rely on per-sample evaluation, causing costs to scale with dataset size and introducing stochastic, hallucination-prone outputs. We propose reframing LLMs as code generators rather than per-instance evaluators. A single LLM call generates executable, human-readable decision logic that runs deterministically over structured data, eliminating per-sample LLM queries while enabling reproducible and auditable predictions. We combine code generation with automated statistical validation using precision lift, binomial significance testing, and coverage filtering, and apply cluster-based gap analysis to iteratively refine decision logic without human annotation. We instantiate this framework in venture capital founder screening, a rare-event prediction task with strong interpretability requirements. On VCBench, a benchmark of 4,500 founders with a 9% base success rate, our approach achieves 37.5% precision and an F0.5 score of 25.0%, outperforming GPT-4o (at 30.0% precision and an F0.5 score of 25.7%) while maintaining full interpretability. Each prediction traces to executable rules over human-readable attributes, demonstrating verifiable and interpretable LLM-based decision-making in practice.

new RelayCaching: Accelerating LLM Collaboration via Decoding KV Cache Reuse

Authors: Yingsheng Geng, Yuchong Gao, Weihong Wu, Guyue Liu, Jiang Liu

Abstract: The increasing complexity of AI tasks has shifted the paradigm from monolithic models toward multi-agent large language model (LLM) systems. However, these collaborative architectures introduce a critical bottleneck: redundant prefill computation for shared content generated by previous agents, which significantly increases KV cache memory usage and time-to-first-token (TTFT). While various KV cache methods have been proposed to mitigate prefill redundancy, they either fail to maintain accuracy on agent-generated outputs or exhibit low reuse rates due to rigid constraints. We present RelayCaching, a training-free inference method that directly reuses decoding phase KV caches from previous agents in subsequent prefill phases. Our key insight is that KV caches for identical content are highly consistent across phases, while prefix-induced deviations are sparse and localized within a limited range of layers and token positions. By selectively recomputing KV caches at these positions, RelayCaching preserves model accuracy with minimal overhead, yielding a superior accuracy-efficiency trade-off over existing methods. Experiments on diverse collaborative LLM tasks spanning mathematical reasoning, general knowledge, and code generation demonstrate that RelayCaching achieves over 80% KV cache reuse, reduces TTFT by up to $4.7\times$ compared to the standard pipeline, all with negligible accuracy degradation.

new FedUAF: Uncertainty-Aware Fusion with Reliability-Guided Aggregation for Multimodal Federated Sentiment Analysis

Authors: Xianxun Zhu, Zezhong Sun, Imad Rida, Erik Cambria, Junqi Su, Rui Wang, Hui Chen

Abstract: Multimodal sentiment analysis in federated learning environments faces significant challenges due to missing modalities, heterogeneous data distributions, and unreliable client updates. Existing federated approaches often struggle to maintain robust performance under these practical conditions. In this paper, we propose FedUAF, a unified multimodal federated learning framework that addresses these challenges through uncertainty-aware fusion and reliability-guided aggregation. FedUAF explicitly models modality-level uncertainty during local training and leverages client reliability to guide global aggregation, enabling effective learning under incomplete and noisy multimodal data. Extensive experiments on CMU-MOSI and CMU-MOSEI demonstrate that FedUAF consistently outperforms state-of-the-art federated baselines across various missing-modality patterns and Non-IID settings. Moreover, FedUAF exhibits superior robustness against noisy clients, highlighting its potential for real-world multimodal federated applications.

new Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs

Authors: Ming Wen, Kun Yang, Xin Chen, Jingyu Zhang, Dingding Han, Shiwen Cui, Yuedong Xu

Abstract: Multimodal Large Language Models (MLLMs) pose critical safety challenges, as they are susceptible not only to adversarial attacks such as jailbreaking but also to inadvertently generating harmful content for benign users. While internal safety alignment via Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) is a primary mitigation strategy, current methods often face a safety-utility trade-off: they either refuse benign queries out of excessive caution or overlook latent risks in cross-modal interactions. To resolve this, we introduce Pragma-VL, an end-to-end alignment algorithm that enables MLLMs to pragmatically arbitrate between safety and helpfulness. First, we enhance visual risk perception with a novel cold-start SFT stage. This is achieved by applying risk-aware clustering to the visual encoder and using an interleaved dataset of risk descriptions and high-quality data. Second, we introduce a theoretically-guaranteed reward model that leverages synergistic learning. We train it with a novel data augmentation method that assigns dynamic weights based on the queries, enabling contextual arbitration between safety and helpfulness. Extensive experiments show that Pragma-VL effectively balances safety and helpfulness, outperforming baselines by 5% to 20% on most multimodal safety benchmarks while preserving its general capabilities in areas such as mathematics and knowledge reasoning.

new A Robust Framework for Secure Cardiovascular Risk Prediction: An Architectural Case Study of Differentially Private Federated Learning

Authors: Rodrigo Tertulino, La\'ercio Alencar

Abstract: Accurate cardiovascular risk prediction is crucial for preventive healthcare; however, the development of robust Artificial Intelligence (AI) models is hindered by the fragmentation of clinical data across institutions due to stringent privacy regulations. This paper presents a comprehensive architectural case study validating the engineering robustness of FedCVR, a privacy-preserving Federated Learning framework applied to heterogeneous clinical networks. Rather than proposing a new theoretical optimizer, this work focuses on a systems engineering analysis to quantify the operational trade-offs of server-side adaptive optimization under utility-prioritized Differential Privacy (DP). By conducting a rigorous stress test in a high-fidelity synthetic environment calibrated against real-world datasets (Framingham, Cleveland), we systematically evaluate the system's resilience to statistical noise. The validation results demonstrate that integrating server-side momentum as a temporal denoiser allows the architecture to achieve a stable F1-score of 0.84 and an Area Under the Curve (AUC) of 0.96, statistically outperforming standard stateless baselines. Our findings confirm that server-side adaptivity is a structural prerequisite for recovering clinical utility under realistic privacy budgets, providing a validated engineering blueprint for secure multi-institutional collaboration.

new ICPRL: Acquiring Physical Intuition from Interactive Control

Authors: Xinrun Xu, Pi Bu, Ye Wang, B\"orje F. Karlsson, Ziming Wang, Tengtao Song, Qi Zhu, Jun Song, Shuo Zhang, Zhiming Ding, Bo Zheng

Abstract: VLMs excel at static perception but falter in interactive reasoning in dynamic physical environments, which demands planning and adaptation to dynamic outcomes. Existing physical reasoning methods often depend on abstract symbolic inputs or lack the ability to learn and adapt from direct, pixel-based visual interaction in novel scenarios. We introduce ICPRL (In-Context Physical Reinforcement Learning), a framework inspired by In-Context Reinforcement Learning (ICRL) that empowers VLMs to acquire physical intuition and adapt their policies in-context. Our approach trains a vision-grounded policy model via multi-turn Group Relative Policy Optimization (GRPO) over diverse multi-episode interaction histories. This enables the agent to adapt strategies by conditioning on past trial-and-error sequences, without requiring any weight updates. This adaptive policy works in concert with a separately trained world model that provides explicit physical reasoning by predicting the results of potential actions. At inference, the policy proposes candidate actions, while the world model predicts outcomes to guide a root-node PUCT search to select the most promising action. Evaluated on the diverse physics-based puzzle-solving tasks in the DeepPHY benchmark, ICPRL demonstrates significant improvements across both its I. policy-only, and II. world-model-augmented stages. Notably, these gains are retained in unseen physical environments, demonstrating that our framework facilitates genuine in-context acquisition of the environment's physical dynamics from interactive experience.

new Enhanced Atrial Fibrillation Prediction in ESUS Patients with Hypergraph-based Pre-training

Authors: Yuzhang Xie, Yuhua Wu, Ruiyu Wang, Fadi Nahab, Xiao Hu, Carl Yang

Abstract: Atrial fibrillation (AF) is a major complication following embolic stroke of undetermined source (ESUS), elevating the risk of recurrent stroke and mortality. Early identification is clinically important, yet existing tools face limitations in accuracy, scalability, and cost. Machine learning (ML) offers promise but is hindered by small ESUS cohorts and high-dimensional medical features. To address these challenges, we introduce supervised and unsupervised hypergraph-based pre-training strategies to improve AF prediction in ESUS patients. We first pre-train hypergraph-based patient embedding models on a large stroke cohort (7,780 patients) to capture salient features and higher-order interactions. The resulting embeddings are transferred to a smaller ESUS cohort (510 patients), reducing feature dimensionality while preserving clinically meaningful information, enabling effective prediction with lightweight models. Experiments show that both pre-training approaches outperform traditional models trained on raw data, improving accuracy and robustness. This framework offers a scalable and efficient solution for AF risk prediction after stroke.

new FusionCast: Enhancing Precipitation Nowcasting with Asymmetric Cross-Modal Fusion and Future Radar Priors

Authors: Henan Wang, Shengwu Xiong, Yifang Zhang, Wenjie Yin, Chen Zhou, Yuqiang Zhang, Pengfei Duan

Abstract: Deep learning has significantly improved the accuracy of precipitation nowcasting. However, most existing multimodal models typically use simple channel concatenation or interpolation methods for data fusion, which often overlook the feature differences between different modalities. This paper therefore proposes a novel precipitation nowcasting optimisation framework called FusionCast. This framework incorporates three types of data: historical precipitable water vapour (PWV) data derived from global navigation satellite system (GNSS) inversions, historical radar based quantitative precipitation estimation (QPE), and forecasted radar QPE serving as a future prior. The FusionCast model comprises two core modules: the future prior radar QPE processing Module, which forecasts future radar data; and the Radar PWV Fusion (RPF) module, which uses a gate mechanism to efficiently combine features from various sources. Experimental results show that FusionCast significantly improves nowcasting performance.

new DreamReader: An Interpretability Toolkit for Text-to-Image Models

Authors: Nirmalendu Prakash, Narmeen Oozeer, Michael Lan, Luka Samkharadze, Phillip Howard, Roy Ka-Wei Lee, Dhruv Nathawani, Shivam Raval, Amirali Abdullah

Abstract: Despite the rapid adoption of text-to-image (T2I) diffusion models, causal and representation-level analysis remains fragmented and largely limited to isolated probing techniques. To address this gap, we introduce DreamReader: a unified framework that formalizes diffusion interpretability as composable representation operators spanning activation extraction, causal patching, structured ablations, and activation steering across modules and timesteps. DreamReader provides a model-agnostic abstraction layer enabling systematic analysis and intervention across diffusion architectures. Beyond consolidating existing methods, DreamReader introduces three novel intervention primitives for diffusion models: (1) representation fine-tuning (LoReFT) for subspace-constrained internal adaptation; (2) classifier-guided gradient steering using MLP probes trained on activations; and (3) component-level cross-model mapping for systematic study of transferability of representations across modalities. These mechanisms allows us to do lightweight white-box interventions on T2I models by drawing inspiration from interpretability techniques on LLMs. We demonstrate DreamReader through controlled experiments that (i) perform activation stitching between two models, and (ii) apply LoReFT to steer multiple activation units, reliably injecting a target concept into the generated images. Experiments are specified declaratively and executed in controlled batched pipelines to enable reproducible large-scale analysis. Across multiple case studies, we show that techniques adapted from language model interpretability yield promising and controllable interventions in diffusion models. DreamReader is released as an open source toolkit for advancing research on T2I interpretability.

new Machine Learning Models to Identify Promising Nested Antiresonance Nodeless Fiber Designs

Authors: Rania A. Eltaieb, Sophie LaRochelle, Leslie A. Rusch

Abstract: Hollow-core fibers offer superior loss and latency characteristics compared to solid-core alternatives, yet the geometric complexity of nested antiresonance nodeless fibers (NANFs) makes traditional optimization computationally prohibitive. We propose a high-efficiency, two-stage machine learning framework designed to identify high-performance NANF designs using minimal training data. The model employs a neural network (NN) classifier to filter for single-mode designs (suppression ratio $\ge$ 50 dB), followed by a regressor that predicts confinement loss (CL). By training on the common logarithm of the loss, the regressor overcomes the challenges of high dynamic range. Using a sparse data set of only 1,819 designs, all with CL greater or equal to 1 dB/km, the model successfully identified optimized designs with a confirmed CL of 0.25 dB/km. {This demonstrates the NN has captured underlying physical behavior and is able to extrapolate to regions of lower CL. We show that small data sets are sufficient for stable, high-accuracy performance prediction, enabling the exploration of design spaces as large as $14e6$ cases at a negligible computational cost compared to finite element methods.

new Evidence-based Distributional Alignment for Large Language Models

Authors: Viet-Thanh Pham, Lizhen Qu, Zhuang Li, Gholamreza Haffari

Abstract: Distributional alignment enables large language models (LLMs) to predict how a target population distributes its responses across answer options, rather than collapsing disagreement into a single consensus answer. However, existing LLM-based distribution prediction is often unstable and degrades under cultural and domain shift. Token score-based estimates can change with minor option wording or formatting, response sampling-based estimates are expensive and sensitive to prompts and decoding settings, and directly generated distributions are frequently miscalibrated. We propose Evi-DA, an evidence-based alignment technique that improves the fidelity and robustness of LLM-based distribution estimation under domain and cultural shift. Given a target country and a multiple-choice question, Evi-DA retrieves related World Values Survey items and their answer distributions, predicts a coarse Welzel value signature for each option, and infers the country-conditioned answer distribution in a structured format. We train the LLMs using a two-stage pipeline, where reinforcement learning optimizes survey-derived rewards that encourage accurate intermediate value predictions, faithful final distributions, well-formed structured outputs, and reduced cultural bias. Across in-domain and out-of-domain benchmarks and multiple open-source backbones, Evi-DA reduces Jensen-Shannon divergence between predicted and gold distributions relative to strong baselines, with average relative improvements of up to 44%.

new Task Expansion and Cross Refinement for Open-World Conditional Modeling

Authors: Shreyas Bhat Brahmavar, Qiyang Liu, Yang Li, Junier Oliva

Abstract: Open-world conditional modeling (OCM), requires a single model to answer arbitrary conditional queries across heterogeneous datasets, where observed variables and targets vary and arise from a vast open-ended task universe. Because any finite collection of real-world datasets covers only a small fraction of this space, we propose Task Expansion and Cross Refinement (TEXR), a semi-supervised framework that enlarges effective task coverage through structured synthesis and refinement of semantic data contexts. TEXR first generates diverse uninstantiated dataset schemas and weakly instantiates them via structured probabilistic generators guided by large language models. It then performs cross-model refinement by training on disjoint data partitions and revising synthetic values across splits to reduce confirmation bias and improve pseudo-value quality. The refined synthetic datasets are aggregated with real data to train a unified conditional model. Across heterogeneous tabular benchmarks, TEXR consistently improves zero-, few-, and many-shot performance for multiple OCM backbones, demonstrating that structured task expansion and cross refinement enhance open-world conditional modeling.

new Preventing Curriculum Collapse in Self-Evolving Reasoning Systems

Authors: Vaibhav Mishra

Abstract: Self-evolving reasoning frameworks let LLMs improve their reasoning capabilities by iteratively generating and solving problems without external supervision, using verifiable rewards. Ideally, such systems are expected to explore a diverse problem space and propose new challenges of high learning value. While prior work has largely focused on solver-side optimisation and verification, recent evidence suggests that self-evolving systems can exhibit diversity collapse in posing new problems after just a few iterations, even when surface-level variation is preserved. We introduce Prism, a question-centric self-evolution method that directly tackles this collapse. Prism defines a persistent diversity signal over an embedding-induced semantic partition of mathematical problems and uses it to encourage balanced exploration of underrepresented regions across iterations. This coverage signal is combined with a Zone-of-Proximal-Development (ZPD) gate to preserve edge-of-solvability difficulty. Evaluated on seven widely used mathematical reasoning benchmarks against five self-evolving baselines, Prism achieves the highest accuracy on six out of seven tasks, achieving gains of +3.98 absolute points over R-Zero on AMC and +3.68 on Minerva Math. Prism also generates semantically diverse and challenging questions across iterations, resulting in the construction of the Prism-Math dataset comprising 100k mathematical questions. These results demonstrate that cross-iteration semantic coverage is a high-leverage and under-explored axis for building more capable self-evolving reasoners. We release the code, dataset, and models to facilitate further research.

new Neural Approximation and Its Applications

Authors: Wei-Hao Wu, Ting-Zhu Huang, Xi-Le Zhao, Yisi Luo, Deyu Meng

Abstract: Multivariate function approximation is a fundamental problem in machine learning. Classic multivariate function approximations rely on hand-crafted basis functions (e.g., polynomial basis and Fourier basis), which limits their approximation ability and data adaptation ability, resulting in unsatisfactory performance. To address these challenges, we introduce the neural basis function by leveraging an untrained neural network as the basis function. Equipped with the proposed neural basis function, we suggest the neural approximation (NeuApprox) paradigm for multivariate function approximation. Specifically, the underlying multivariate function behind the multi-dimensional data is decomposed into a sum of block terms. The clear physically-interpreted block term is the product of expressive neural basis functions and their corresponding learnable coefficients, which allows us to faithfully capture distinct components of the underlying data and also flexibly adapt to new data by readily fine-tuning the neural basis functions. Attributed to the elaborately designed block terms, the suggested NeuApprox enjoys strong approximation ability and flexible data adaptation ability over the hand-crafted basis function-based methods. We also theoretically prove that NeuApprox can approximate any multivariate continuous function to arbitrary accuracy. Extensive experiments on diverse multi-dimensional datasets (including multispectral images, light field data, videos, traffic data, and point cloud data) demonstrate the promising performance of NeuApprox in terms of both approximation capability and adaptability.

new Linear Predictability of Attention Heads in Large Language Models

Authors: Khalid Shaikh, Asmit Kumar Singh, Rebecca Christopher Dsouza, Shikhar Shiromani

Abstract: Large language model (LLM) inference is increasingly bottlenecked by the Key-Value (KV) cache, yet the fine-grained structure of attention-head activations remains poorly understood. We show that pretrained Transformers exhibit a pervasive inter-head linear structure: for a given token, the Query, Key, and Value (QKV) vectors of an attention head can often be reconstructed as a linear combination of a small number of peer heads, typically within the same layer. Across Llama-3.1-8B, Falcon3-10B, OLMo-2-7B, and Qwen3-32B, just 2-5 reference heads recover many target heads with high fidelity (e.g., mean R^2 approx 0.76 for Keys on C4 with five references, and frequently R^2 > 0.85 on GSM8K). This predictability is learned rather than architectural: it is largely absent at random initialization, rises rapidly during pretraining as we track through OLMo-2 checkpoints, and is supported by a theoretical lower bound showing high mean-squared error for linear prediction at initialization. We further connect this emergence to increasing intra-layer alignment of Key projection subspaces. Finally, we exploit this redundancy for efficiency by caching only reference-head KV states and reconstructing the remaining heads on the fly via lightweight linear maps, achieving 2x KV-cache reduction with model-dependent accuracy trade-offs (4.5-5.5 percentage point average drop on Falcon3-10B and Qwen3-32B across five benchmarks, and larger drops on Llama-3.1-8B), and we find that reconstructing Keys is substantially less harmful than reconstructing Values.

new Evaluating Large Language Models for Gait Classification Using Text-Encoded Kinematic Waveforms

Authors: Carlo Dindorf, Jonas Dully, Rebecca Keilhauer, Michael Lorenz, Michael Fr\"ohlich

Abstract: Background: Machine learning (ML) enhances gait analysis but often lacks the level of interpretability desired for clinical adoption. Large Language Models (LLMs) may offer explanatory capabilities and confidence-aware outputs when applied to structured kinematic data. This study therefore evaluated whether general-purpose LLMs can classify continuous gait kinematics when represented as textual numeric sequences and how their performance compares to conventional ML approaches. Methods: Lower-body kinematics were recorded from 20 participants performing seven gait patterns. A supervised KNN classifier and a class-independent One-Class SVM (OCSVM) were compared against zero-shot LLMs (GPT-5, GPT-5-mini, GPT-4.1, and o4-mini). Models were evaluated using Leave-One-Subject-Out (LOSO) cross-validation. LLMs were tested both with and without explicit reference gait statistics. Results: The supervised KNN achieved the highest performance (multiclass Matthews Correlation Coefficient, MCC = 0.88). The best-performing LLM (GPT-5) with reference grounding achieved a multiclass MCC of 0.70 and a binary MCC of 0.68, outperforming the class-independent OCSVM (binary MCC = 0.60). Performance of the LLM was highly dependent on explicit reference information and self-rated confidence; when restricted to high-confidence predictions, multiclass MCC increased to 0.83 on the filtered subset. Notably, the computationally efficient o4-mini model performed comparably to larger models. Conclusion: When continuous kinematic waveforms were encoded as textual numeric tokens, general-purpose LLMs, even with reference grounding, did not match supervised multiclass classifiers for precise gait classification and are better regarded as exploratory systems requiring cautious, human-guided interpretation rather than diagnostic use.

new Residual Stream Analysis of Overfitting And Structural Disruptions

Authors: Quan Liu, Han Zhou, Wenquan Wu, Hua Wu, Sen Su

Abstract: Ensuring that large language models (LLMs) remain both helpful and harmless poses a significant challenge: fine-tuning on repetitive safety datasets, where unsafe prompts are paired with standard refusal templates, often leads to false refusals, in which benign queries are declined. We first quantify this effect, showing that safety data exhibits substantially lower token entropy and 2-gram diversity (0.048) compared to general instruction data. To uncover the root cause, we introduce FlowLens, a stable PCA-based tool for residual-stream geometry analysis, and reveal that higher proportions of safety examples concentrate variance along a few components, reducing representational smoothness and driving false refusals (false refusal rate rises from 63 percent to 84 percent as safety data increases from 0 percent to 40 percent). Guided by these insights, we propose Variance Concentration Loss (VCL), an auxiliary regularizer that penalizes excessive variance concentration in mid-layer residuals. Empirical results demonstrate that VCL reduces false refusals by over 35 percentage points while maintaining or improving performance on general benchmarks such as MMLU and GSM8K.

new LightningRL: Breaking the Accuracy-Parallelism Trade-off of Block-wise dLLMs via Reinforcement Learning

Authors: Yanzhe Hu, Yijie Jin, Pengfei Liu, Kai Yu, Zhijie Deng

Abstract: Diffusion Large Language Models (dLLMs) have emerged as a promising paradigm for parallel token generation, with block-wise variants garnering significant research interest. Despite their potential, existing dLLMs typically suffer from a rigid accuracy-parallelism trade-off: increasing the number of tokens per forward (TPF) via aggressive parallel decoding often leads to performance degradation and increased generation instability. We identify that this limitation stems from the model's inability to navigate high-parallelism regimes where approximation errors and local corruptions accumulate, ultimately undermining the reliability of parallel generation. To address this, we propose LightningRL, a post-training framework designed to directly optimize the speed-quality Pareto frontier of pre-trained dLLMs. Instead of forcing uniform parallelization, our approach leverages reinforcement learning to identify and reinforce high-parallelism trajectories that maintain generation accuracy. Built upon the Group Relative Policy Optimization (GRPO) framework, LightningRL introduces several enhancements tailored for dLLMs: (1) stabilized training via per-reward decoupled normalization; (2) token-level negative log-likelihood (NLL) regularization on correct trajectories to anchor model performance; and (3) a dynamic sampling strategy with TPF-aware filtering to enhance training efficiency. Experimental results across mathematical and coding benchmarks demonstrate that LightningRL consistently advances the Pareto frontier, achieving competitive task accuracy while significantly increasing parallelism, reaching an average TPF of 7.32 (with a peak of 11.10 on the MBPP dataset). Our code is available at https://github.com/SJTU-DENG-Lab/LightningRL.

URLs: https://github.com/SJTU-DENG-Lab/LightningRL.

new Modular Neural Computer

Authors: Florin Leon

Abstract: This paper introduces the Modular Neural Computer (MNC), a memory-augmented neural architecture for exact algorithmic computation on variable-length inputs. The model combines an external associative memory of scalar cells, explicit read and write heads, a controller multi-layer perceptron (MLP), and a homogeneous set of functional MLP modules. Rather than learning an algorithm end to end from data, it realizes a given algorithm through analytically specified neural components with fixed interfaces and exact behavior. The control flow is represented inside the neural computation through one-hot module gates, where inactive modules are inhibited. Computation unfolds as a sequence of memory transformations generated by a fixed graph. The architecture is illustrated through three case studies: computing the minimum of an array, sorting an array in place, and executing A* search on a fixed problem instance. These examples show that algorithmic procedures can be compiled into modular neural components with external memory while preserving deterministic behavior and explicit intermediate state.

new The Challenge of Out-Of-Distribution Detection in Motor Imagery BCIs

Authors: Merlijn Quincent Mulder, Matias Valdenegro-Toro, Andreea Ioana Sburlea, Ivo Pascal de Jong

Abstract: Machine Learning classifiers used in Brain-Computer Interfaces make classifications based on the distribution of data they were trained on. When they need to make inferences on samples that fall outside of this distribution, they can only make blind guesses. Instead of allowing random guesses, these Out-of-Distribution (OOD) samples should be detected and rejected. We study OOD detection in Motor Imagery BCIs by training a model on some classes and observing whether unfamiliar classes can be detected based on increased uncertainty. We test seven different OOD detection techniques and one more method that has been claimed to boost the quality of OOD detection. Our findings show that OOD detection for Brain-Computer Interfaces is more challenging than in other machine learning domains due to the high uncertainty inherent in classifying EEG signals. For many subjects, uncertainty for in-distribution classes can still be higher than for out-of-distribution classes. As a result, many OOD detection methods prove to be ineffective, though MC Dropout performed best. Additionally, we show that high in-distribution classification performance predicts high OOD detection performance, suggesting that improved accuracy can also lead to improved robustness. Our research demonstrates a setup for studying how models deal with unfamiliar EEG data and evaluates methods that are robust to these unfamiliar inputs. OOD detection can improve the overall safety and reliability of BCIs.

new Feature-level Interaction Explanations in Multimodal Transformers

Authors: Yeji Kim, Housam Khalifa Bashier Babiker, Mi-Young Kim, Randy Goebel

Abstract: Multimodal Transformers often produce predictions without clarifying how different modalities jointly support a decision. Most existing multimodal explainable AI (MXAI) methods extend unimodal saliency to multimodal backbones, highlighting important tokens or patches within each modality, but they rarely pinpoint which cross-modal feature pairs provide complementary evidence (synergy) or serve as reliable backups (redundancy). We present Feature-level I2MoE (FL-I2MoE), a structured Mixture-of-Experts layer that operates directly on token/patch sequences from frozen pretrained encoders and explicitly separates unique, synergistic, and redundant evidence at the feature level. We further develop an expert-wise explanation pipeline that combines attribution with top-K% masking to assess faithfulness, and we introduce Monte Carlo interaction probes to quantify pairwise behavior: the Shapley Interaction Index (SII) to score synergistic pairs and a redundancy-gap score to capture substitutable (redundant) pairs. Across three benchmarks (MMIMDb, ENRICO, and MMHS150K), FL-I2MoE yields more interactionspecific and concentrated importance patterns than a dense Transformer with the same encoders. Finally, pair-level masking shows that removing pairs ranked by SII or redundancy-gap degrades performance more than masking randomly chosen pairs under the same budget, supporting that the identified interactions are causally relevant.

new LUMINA: Laplacian-Unifying Mechanism for Interpretable Neurodevelopmental Analysis via Quad-Stream GCN

Authors: Minkyung Cha, Jooyoung Bae, Jaewon Jung, Ping Shu Ho, Ka Chun Cheung, Namjoon Kim

Abstract: Functional Magnetic Resonance Imaging(fMRI) has now become a classic way for measuring brain activity, and recent trend is shifting toward utilizing fMRI brain data for AI-driven diagnosis. Given that the brain functions as not a discrete but interconnected whole, Graph-based architectures represented by Graph Convolutional Network(GCN) has emerged as a dominant framework for such task, since they are capable of treating ROIs as dynamically interconnected nodes and extracting relational architecture between them. Ironically, however, it is the very nature of GCN's architecture that acts as an obstacle to its performance. The mathematical foundation of GCN, effective for capturing global regularities, acts as a tradeoff; by smoothing features across the connected nodes repeatedly, traditional GCN tend to blur out the contrastive dynamics that might be crucial in identifying certain neurological disorders. In order to break through this structural bottleneck, we propose LUMINA, a Laplacian-Unifying Mechanism for Interpretable Neurodevelopmental Analysis. Our model is a Quad-Stream GCN that employs a bipolar RELU activation and a dual-spectrum graph Laplacian filtering mechanism, thereby capturing heterogeneous dynamics that were often blurred out in conventional GCN. By doing so, we can preserve the diverse range and characteristics of neural connections in each fMRI data. Through 5-fold cross validation on the ADHD200(N=144) and ABIDE(N=579) dataset, LUMINA demonstrates stable diagnostic performance in two of the most critical neurodevelopmental disorder in childhood, ADHD and ASD, outperforming existing models with an accuracy of 84.66% and 88.41% each.

new RBF-Solver: A Multistep Sampler for Diffusion Probabilistic Models via Radial Basis Functions

Authors: Soochul Park, Yeon Ju Lee, SeongJin Yoon, Jiyub Shin, Juhee Lee, Seongwoon Jo

Abstract: Diffusion probabilistic models (DPMs) are widely adopted for their outstanding generative fidelity, yet their sampling is computationally demanding. Polynomial-based multistep samplers mitigate this cost by accelerating inference; however, despite their theoretical accuracy guarantees, they generate the sampling trajectory according to a predefined scheme, providing no flexibility for further optimization. To address this limitation, we propose RBF-Solver, a multistep diffusion sampler that interpolates model evaluations with Gaussian radial basis functions (RBFs). By leveraging learnable shape parameters in Gaussian RBFs, RBF-Solver explicitly follows optimal sampling trajectories. At first order, it reduces to the Euler method (DDIM). At second order or higher, as the shape parameters approach infinity, RBF-Solver converges to the Adams method, ensuring its compatibility with existing samplers. Owing to the locality of Gaussian RBFs, RBF-Solver maintains high image fidelity even at fourth order or higher, where previous samplers deteriorate. For unconditional generation, RBF-Solver consistently outperforms polynomial-based samplers in the high-NFE regime (NFE >= 15). On CIFAR-10 with the Score-SDE model, it achieves an FID of 2.87 with 15 function evaluations and further improves to 2.48 with 40 function evaluations. For conditional ImageNet 256 x 256 generation with the Guided Diffusion model at a guidance scale 8.0, substantial gains are achieved in the low-NFE range (5-10), yielding a 16.12-33.73% reduction in FID relative to polynomial-based samplers.

new Lipschitz-Based Robustness Certification Under Floating-Point Execution

Authors: Toby Murray

Abstract: Sensitivity-based robustness certification has emerged as a practical approach for certifying neural network robustness, including in settings that require verifiable guarantees. A key advantage of these methods is that certification is performed by concrete numerical computation (rather than symbolic reasoning) and scales efficiently with network size. However, as with the vast majority of prior work on robustness certification and verification, the soundness of these methods is typically proved with respect to a semantic model that assumes exact real arithmetic. In reality deployed neural network implementations execute using floating-point arithmetic. This mismatch creates a semantic gap between certified robustness properties and the behaviour of the executed system. As motivating evidence, we exhibit concrete counterexamples showing that real arithmetic robustness guarantees can fail under floating-point execution, even for previously verified certifiers, with discrepancies becoming pronounced at lower-precision formats such as float16. We then develop a formal, compositional theory relating real arithmetic Lipschitz-based sensitivity bounds to the sensitivity of floating-point execution under standard rounding-error models, specialised to feed-forward neural networks with ReLU activations. We derive sound conditions for robustness under floating-point execution, including bounds on certificate degradation and sufficient conditions for the absence of overflow. We formalize the theory and its main soundness results, and implement an executable certifier based on these principles, which we empirically evaluate to demonstrate its practicality.

new AdaBox: Adaptive Density-Based Box Clustering with Parameter Generalization

Authors: Ahmed Elmahdi

Abstract: Density-based clustering algorithms like DBSCAN and HDBSCAN are foundational tools for discovering arbitrarily shaped clusters, yet their practical utility is undermined by acute hyperparameter sensitivity -- parameters tuned on one dataset frequently fail to transfer to others, requiring expensive re-optimization for each deployment. We introduce AdaBox (Adaptive Density-Based Box Clustering), a grid-based density clustering algorithm designed for robustness across diverse data geometries. AdaBox features a six-parameter design where parameters capture cluster structure rather than pairwise point relationships. Four parameters are inherently scale-invariant, one self-corrects for sampling bias, and one is adjusted via a density scaling stage, enabling reliable parameter transfer across 30-200x scale factors. AdaBox processes data through five stages: adaptive grid construction, liberal seed initialization, iterative growth with graduation, statistical cluster merging, and Gaussian boundary refinement. Comprehensive evaluation across 111 datasets demonstrates three key findings: (1) AdaBox significantly outperforms DBSCAN and HDBSCAN across five evaluation metrics, achieving the best score on 78\% of datasets with p < 0.05; (2) AdaBox uniquely exhibits parameter generalization. Protocol A (direct transfer to 30-100x larger datasets) shows AdaBox maintains performance while baselines collapse. (3) Ablation studies confirm the necessity of all five architectural stages for maintaining robustness.

new MS2MetGAN: Latent-space adversarial training for metabolite-spectrum matching in MS/MS database search

Authors: Meng Tsai, Alexzander Dwyer, Estelle Nuckels, Yingfeng Wang

Abstract: Database search is a widely used approach for identifying metabolites from tandem mass spectra (MS/MS). In this strategy, an experimental spectrum is matched against a user-specified database of candidate metabolites, and candidates are ranked such that true metabolite-spectrum matches receive the highest scores. Machine-learning methods have been widely incorporated into database-search-based identification tools and have substantially improved performance. To further improve identification accuracy, we propose a new framework for generating negative training samples. The framework first uses autoencoders to learn latent representations of metabolite structures and MS/MS spectra, thereby recasting metabolite-spectrum matching as matching between latent vectors. It then uses a GAN to generate latent vectors of decoy metabolites and constructs decoy metabolite-spectrum matches as negative samples for training. Experimental results show that our tool, MS2MetGAN, achieves better overall performance than existing metabolite identification methods.

new AI-Driven Predictive Maintenance with Real-Time Contextual Data Fusion for Connected Vehicles: A Multi-Dataset Evaluation

Authors: Kushal Khemani (Independent Researcher, India), Anjum Nazir Qureshi (Rajiv Gandhi College of Engineering Research,Technology)

Abstract: Most vehicle predictive maintenance systems rely exclusively on internal diagnostic signals and are validated on deterministic synthetic data, limiting the credibility of reported metrics. This paper presents a simulation-validated proof-of-concept framework for V2X-augmented predictive maintenance, integrating on-board sensor streams with external contextual signals -- road quality, weather, traffic density, and driver behaviour -- acquired via V2X communication and third-party APIs, with inference at the vehicle edge. Field validation on instrumented vehicles is identified as the required next step. Three experiments address common shortcomings of prior work. A feature group ablation study shows that V2X contextual features contribute a 2.6-point F1 gain, with full context removal reducing macro F1 from 0.855 to 0.807. On the AI4I 2020 real-world industrial failure dataset (10,000 samples, five failure modes), LightGBM achieves AUC-ROC of 0.973 under 5-fold stratified CV with SMOTE confined to training folds. A noise sensitivity analysis shows macro F1 remains above 0.88 under low noise and degrades to 0.74 under very high noise. SHAP analysis confirms that V2X and engineered interaction features rank among the top 15 predictors. Edge inference is estimated to reduce latency from 3.5s to under 1.0s versus cloud-only processing.

new PolyGLU: State-Conditional Activation Routing in Transformer Feed-Forward Networks

Authors: Daniel Nobrega Medeiros

Abstract: Biological neural systems employ diverse neurotransmitters -- glutamate, GABA, dopamine, acetylcholine -- to implement distinct signal-processing modalities within shared neural circuits. In contrast, modern transformers apply a single fixed activation function across all feed-forward neurons. We introduce PolyGLU (Polychromatic Gated Linear Unit), a drop-in replacement for SwiGLU that enables each FFN neuron to dynamically route among K=4 activation functions via a differentiable mechanism combining learned static preferences with input-conditioned gating, trained end-to-end with Gumbel-Softmax. We train PolychromaticLM, a 597M-parameter transformer, on ~10B tokens using a single NVIDIA A100 GPU. Our key finding is emergent routing behavior: without any explicit sparsity loss or entropy regularization, the routing mechanism converges to near-deterministic activation selections (mean dynamic entropy = 0.030% of maximum), with a striking depth-dependent specialization pattern -- early layers prefer GELU while deep layers strongly favor Tanh. Three layers maintain elevated routing entropy, suggesting computational flexibility points. The routing architecture adds only 0.23% parameter overhead (~1.4M parameters) and proves fully robust to supervised fine-tuning: routing entropy remains constant at ln(4) throughout 13,067 SFT steps. On standard benchmarks, PolychromaticLM achieves 62-89% of Qwen3-0.6B-Base performance despite training on 3,600x fewer tokens. All code, weights, and training infrastructure are released under Apache 2.0.

new Thermal Robustness of Retrieval in Dense Associative Memories: LSE vs LSR Kernels

Authors: Tatiana Petrova (Interdisciplinary Centre for Security, Reliability,Trust)

Abstract: Understanding whether retrieval in dense associative memories survives thermal noise is essential for bridging zero-temperature capacity proofs with the finite-temperature conditions of practical inference and biological computation. We use Monte Carlo simulations to map the retrieval phase boundary of two continuous dense associative memories (DAMs) on the $N$-sphere with an exponential number of stored patterns $M = e^{\alpha N}$: a log-sum-exp (LSE) kernel and a log-sum-ReLU (LSR) kernel. Both kernels share the zero-temperature critical load $\alpha_c(0)=0.5$, but their finite-temperature behavior differs markedly. The LSE kernel sustains retrieval at arbitrarily high temperatures for sufficiently low load, whereas the LSR kernel exhibits a finite support threshold below which retrieval is perfect at any temperature; for typical sharpness values this threshold approaches $\alpha_c$, making retrieval nearly perfect across the entire load range. We also compare the measured equilibrium alignment with analytical Boltzmann predictions within the retrieval basin.

new A Hierarchical End-of-Turn Model with Primary Speaker Segmentation for Real-Time Conversational AI

Authors: Karim Helwani, Hoang Do, James Luan, Sriram Srinivasan

Abstract: We present a real-time front-end for voice-based conversational AI to enable natural turn-taking in two-speaker scenarios by combining primary speaker segmentation with hierarchical End-of-Turn (EOT) detection. To operate robustly in multi-speaker environments, the system continuously identifies and tracks the primary user, ensuring that downstream EOT decisions are not confounded by background conversations. The tracked activity segments are fed to a hierarchical, causal EOT model that predicts the immediate conversational state by independently analyzing per-speaker speech features from both the primary speaker and the bot. Simultaneously, the model anticipates near-future states ($t{+}10/20/30$\,ms) through probabilistic predictions that are aware of the conversation partner's speech. Task-specific knowledge distillation compresses wav2vec~2.0 representations (768\,D) into a compact MFCC-based student (32\,D) for efficient deployment. The system achieves 82\% multi-class frame-level F1 and 70.6\% F1 on Backchannel detection, with 69.3\% F1 on a binary Final vs.\ Others task. On an end-to-end turn-detection benchmark, our model reaches 87.7\% recall vs.\ 58.9\% for Smart Turn~v3 while keeping a median detection latency of 36\,ms versus 800--1300\,ms. Despite using only 1.14\,M parameters, the proposed model matches or exceeds transformer-based baselines while substantially reducing latency and memory footprint, making it suitable for edge deployment.

new Beyond Linearity in Attention Projections: The Case for Nonlinear Queries

Authors: Marko Karbevski

Abstract: Recent algebraic analysis shows that in decoder-only and encoder-only transformers, the Query projection $W_Q$ may be set to identity without noticeable performance deterioration. This is possible because attention depends on $X$ only through the products $XW_Q, XW_K, XW_V$, allowing basis transformations to be absorbed by adjacent layers and propagated through the network. We replace $W_Q \in \mathbb{R}^{d \times d}$ with a nonlinear residual of the form $Q(X) = X + f_\theta(X)$, where $f_\theta$ is a bottleneck MLP with $d^2 + O(d)$ parameters. The identity term anchors the nonlinearity to a known-good prior. Experiments on GPT-3 small style models show consistent improvement over the baseline, comfortably outperforming a model with 12.5% more non-embedding parameters. These results motivate investigation at larger scales and across modalities.

new GPrune-LLM: Generalization-Aware Structured Pruning for Large Language Models

Authors: Xiaoyun Liu, Divya Saxena, Jiannong Cao, Yuqing Zhao, Yiying Dong, Penghui Ruan

Abstract: Structured pruning is widely used to compress large language models (LLMs), yet its effectiveness depends heavily on neuron importance estimation. Most existing methods estimate neuron importance from activation statistics on a single calibration dataset, which introduces calibration bias and degrades downstream cross-task generalization. We observe that neurons exhibit heterogeneous distribution sensitivity, with distribution-robust neurons maintaining consistent rankings across datasets and distribution-sensitive neurons showing high cross-dataset ranking variance. Based on this, we identify two structural limitations in existing methods. First, ranking all neurons within a shared space causes distribution-sensitive neurons that strongly activate on calibration inputs to dominate, crowding out distribution-robust neurons critical for out-of-distribution tasks. Second, applying activation-based importance metrics uniformly can be unreliable. Distribution-sensitive neurons that infrequently activate on calibration data receive insufficient activation signal for accurate local ranking. To address these limitations, we propose GPrune-LLM, a generalization-aware structured pruning framework that explicitly accounts for neuron differences in cross-distribution behavior. We first partition neurons into behavior-consistent modules to localize ranking competition, then evaluate activation-based metric reliability per module according to distribution sensitivity and score magnitude. For modules where activation-based scoring is unreliable, we switch to an activation-independent metric. Finally, we adaptively learn module-wise sparsity. Extensive experiments across multiple downstream tasks demonstrate GPrune-LLM's consistent improvements in post-compression generalization, particularly at high sparsity, and reduced dependence on importance metric choice.

new Diffusion Models Generalize but Not in the Way You Might Think

Authors: Tim Kaiser, Markus Kollmann

Abstract: Standard evaluation metrics suggest that Denoising Diffusion Models based on U-Net or Transformer architectures generalize well in practice. However, as it can be shown that an optimal Diffusion Model fully memorizes the training data, the model error determines generalization. Here, we show that although sufficiently large denoiser models show increasing memorization of the training set with increasing training time, the resulting denoising trajectories do not follow this trend. Our experiments indicate that the reason for this observation is rooted in the fact that overfitting occurs at intermediate noise levels, but the distribution of noisy training data at these noise levels has little overlap with denoising trajectories during inference. To gain more insight, we make use of a 2D toy diffusion model to show that overfitting at intermediate noise levels is largely determined by model error and the density of the data support. While the optimal denoising flow field localizes sharply around training samples, sufficient model error or dense support on the data manifold suppresses exact recall, yielding a smooth, generalizing flow field. To further support our results, we investigate how several factors, such as training time, model size, dataset size, condition granularity, and diffusion guidance, influence generalization behavior.

new Generalization and Memorization in Rectified Flow

Authors: Mingxing Rao, Daniel Moyer

Abstract: Generative models based on the Flow Matching objective, particularly Rectified Flow, have emerged as a dominant paradigm for efficient, high-fidelity image synthesis. However, while existing research heavily prioritizes generation quality and architectural scaling, the underlying dynamics of how RF models memorize training data remain largely underexplored. In this paper, we systematically investigate the memorization behaviors of RF through the test statistics of Membership Inference Attacks (MIA). We progressively formulate three test statistics, culminating in a complexity-calibrated metric ($T_\text{mc\_cal}$) that successfully decouples intrinsic image spatial complexity from genuine memorization signals. This calibration yields a significant performance surge -- boosting attack AUC by up to 15\% and the privacy-critical TPR@1\%FPR metric by up to 45\% -- establishing the first non-trivial MIA specifically tailored for RF. Leveraging these refined metrics, we uncover a distinct temporal pattern: under standard uniform temporal training, a model's susceptibility to MIA strictly peaks at the integration midpoint, a phenomenon we justify via the network's forced deviation from linear approximations. Finally, we demonstrate that substituting uniform timestep sampling with a Symmetric Exponential (U-shaped) distribution effectively minimizes exposure to vulnerable intermediate timesteps. Extensive evaluations across three datasets confirm that this temporal regularization suppresses memorization while preserving generative fidelity.

new From Gradients to Riccati Geometry: Kalman World Models for Single-Pass Learning

Authors: Andrew Kiruluta

Abstract: Backpropagation dominates modern machine learning, yet it is not the only principled method for optimizing dynamical systems. We propose Kalman World Models (KWM), a class of learned state-space models trained via recursive Bayesian filtering rather than reverse-mode automatic differentiation. Instead of gradient descent updates, we replace parameter learning with Kalman-style gain adaptation. Training becomes online filtering; error signals become innovations. We further extend this framework to transformer-based large language models (LLMs), where internal activations are treated as latent dynamical states corrected via innovation terms. This yields a gradient-free training and adaptation paradigm grounded in control theory. We derive stability conditions, analyze computational complexity, and provide empirical results on sequence modeling tasks demonstrating competitive performance with improved robustness and continual adaptation properties.

new Self-Flow-Matching assisted Full Waveform Inversion

Authors: Xinquan Huang, Paris Perdikaris

Abstract: Full-waveform inversion (FWI) is a high-resolution seismic imaging method that estimates subsurface velocity by matching simulated and recorded waveforms. However, FWI is highly nonlinear, prone to cycle skipping, and sensitive to noise, particularly when low frequencies are missing or the initial model is poor, leading to failures under imperfect acquisition. Diffusion-regularized FWI introduces generative priors to encourage geologically realistic models, but these priors typically require costly offline pretraining and can deteriorate under distribution shift. Moreover, they assume Gaussian initialization and a fixed noise schedule, in which it is unclear how to map a deterministic FWI iterate and its starting model to a well-defined diffusion time or noise level. To address these limitations, we introduce Self-Flow-Matching assisted Full-Waveform Inversion (SFM-FWI), a physics-driven framework that eliminates the need for large-scale offline pretraining while avoiding the noise-level alignment ambiguity. SFM-FWI leverages flow matching to learn a transport field without assuming Gaussian initialization or a predefined noise schedule, so the initial model can be used directly as the starting point of the dynamics. Our approach trains a single flow network online using the governing physics and observed data. At each outer iteration, we build an interpolated model and update the flow by backpropagating the FWI data misfit, providing self-supervision without external training pairs. Experiments on challenging synthetic benchmarks show that SFM-FWI delivers more accurate reconstructions, greater noise robustness, and more stable convergence than standard FWI and pretraining-free regularization methods.

new Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference

Authors: Huamin Chen, Xunzhuo Liu, Junchen Jiang, Bowei He, Xue Liu

Abstract: Semantic routers in LLM inference gateways select tools in the critical request path, where every millisecond of added latency compounds across millions of requests. We propose Outcome-Aware Tool Selection (OATS), which interpolates tool embeddings toward the centroid of queries where they historically succeed -- an offline process that adds no parameters, latency, or GPU cost at serving time. On MetaTool (199~tools, 4,287~queries), this improves NDCG@5 from 0.869 to 0.940; on ToolBench (2,413~APIs), from 0.834 to 0.848. We also evaluate two learned extensions: a 2,625-parameter MLP re-ranker and a 197K-parameter contrastive adapter. The MLP re-ranker hurts or matches baseline when outcome data is sparse relative to the tool set; the contrastive adapter provides comparable gains on MetaTool (NDCG@5: 0.931). All methods are evaluated on the same held-out 30\% test split. The practical takeaway is to start with the zero-cost refinement and add learned components only when data density warrants it. All mechanisms run within single-digit millisecond CPU budgets.

new CHIMERA-Bench: A Benchmark Dataset for Epitope-Specific Antibody Design

Authors: Mansoor Ahmed, Nadeem Taj, Imdad Ullah Khan, Hemanth Venkateswara, Murray Patterson

Abstract: Computational antibody design has seen rapid methodological progress, with dozens of deep generative methods proposed in the past three years, yet the field lacks a standardized benchmark for fair comparison and model development. These methods are evaluated on different SAbDab snapshots, non-overlapping test sets, and incompatible metrics, and the literature fragments the design problem into numerous sub-tasks with no common definition. We introduce \textsc{Chimera-Bench} (\textbf{C}DR \textbf{M}odeling with \textbf{E}pitope-guided \textbf{R}edesign), a unified benchmark built around a single canonical task: \emph{epitope-conditioned CDR sequence-structure co-design}. \textsc{Chimera-Bench} provides (1) a curated, deduplicated dataset of \textbf{2,922} antibody-antigen complexes with epitope and paratope annotations; (2) three biologically motivated splits testing generalization to unseen epitopes, unseen antigen folds, and prospective temporal targets; and (3) a comprehensive evaluation protocol with five metric groups including novel epitope-specificity measures. We benchmark representative methods spanning different generative paradigms and report results across all splits. \textsc{Chimera-Bench} is the largest dataset of its kind for the antibody design problem, allowing the community to develop and test novel methods and evaluate their generalizability. The source code and data are available at: https://github.com/mansoor181/chimera-bench.git

URLs: https://github.com/mansoor181/chimera-bench.git

new Modality-free Graph In-context Alignment

Authors: Wei Zhuo, Siqiang Luo

Abstract: In-context learning (ICL) converts static encoders into task-conditioned reasoners, enabling adaptation to new data from just a few examples without updating pretrained parameters. This capability is essential for graph foundation models (GFMs) to approach LLM-level generality. Yet current GFMs struggle with cross-domain alignment, typically relying on modality-specific encoders that fail when graphs are pre-vectorized or raw data is inaccessible. In this paper, we introduce Modality-Free Graph In-context Alignment (MF-GIA), a framework that makes a pretrained graph encoder promptable for few-shot prediction across heterogeneous domains without modality assumptions. MF-GIA captures domain characteristics through gradient fingerprints, which parameterize lightweight transformations that align pre-encoded features and indexed labels into unified semantic spaces. During pretraining, a dual prompt-aware attention mechanism with episodic objective learns to match queries against aligned support examples to establish prompt-based reasoning capabilities. At inference, MF-GIA performs parameter-update-free adaptation using only a few-shot support set to trigger cross-domain alignment and enable immediate prediction on unseen domains. Experiments demonstrate that MF-GIA achieves superior few-shot performance across diverse graph domains and strong generalization to unseen domains.

new Improving Channel Estimation via Multimodal Diffusion Models with Flow Matching

Authors: Xiaotian Fan, Xingyu Zhou, Le Liang, Xiao Li, Shi Jin

Abstract: Deep generative models offer a powerful alternative to conventional channel estimation by learning complex channel distributions. By integrating the rich environmental information available in modern sensing-aided networks, this paper proposes MultiCE-Flow, a multimodal channel estimation framework based on flow matching and diffusion transformer (DiT). We design a specialized multimodal perception module that fuses LiDAR, camera, and location data into a semantic condition, while treating sparse pilots as a structural condition. These conditions guide a DiT backbone to reconstruct high-fidelity channels. Unlike standard diffusion models, we employ flow matching to learn a linear trajectory from noise to data, enabling efficient one-step sampling. By leveraging environmental semantics, our method mitigates the ill-posed nature of estimation with sparse pilots. Extensive experiments demonstrate that MultiCE-Flow consistently outperforms traditional baselines and existing generative models. Notably, it exhibits superior robustness to out-of-distribution scenarios and varying pilot densities, making it suitable for environment-aware communication systems.

new Scalable Machines with Intrinsic Higher Mental-State Dynamics

Authors: Ahsan Adeel, M. Bilal

Abstract: Drawing on recent breakthroughs in cellular neurobiology and detailed biophysical modeling linking neocortical pyramidal neurons to distinct mental-state regimes, this work introduces a mathematically grounded formulation showing how models (e.g., Transformers) can implement computational principles underlying awake imaginative thought to pre-select relevant information before attention is applied via triadic modulation loops among queries ($Q$), keys ($K$), and values ($V$).~Scalability experiments on ImageNet-1K, benchmarked against a standard Vision Transformer (ViT), demonstrate significantly faster learning with reduced computational demand (fewer heads, layers, and tokens), consistent with our prior findings in reinforcement learning and language modeling. The approach operates at approximately $\mathcal{O}(N)$ complexity with respect to the number of input tokens $N$.

new Reconciling In-Context and In-Weight Learning via Dual Representation Space Encoding

Authors: Guanyu Chen, Ruichen Wang, Tianren Zhang, Feng Chen

Abstract: In-context learning (ICL) is a valuable capability exhibited by Transformers pretrained on diverse sequence tasks. However, previous studies have observed that ICL often conflicts with the model's inherent in-weight learning (IWL) ability. By examining the representation space learned by a toy model in synthetic experiments, we identify the shared encoding space for context and samples in Transformers as a potential source of this conflict. To address this, we modify the model architecture to separately encode the context and samples into two distinct spaces: a task representation space and a sample representation space. We model these two spaces under a simple yet principled framework, assuming a linear representational structure and treating them as a pair of dual spaces. Both theoretical analysis and empirical results demonstrate the effectiveness of our proposed architecture, CoQE, in the single-value answer setting. It not only enhances ICL performance through improved representation learning, but also successfully reconciles ICL and IWL capabilities across synthetic few-shot classification and a newly designed pseudo-arithmetic task. Code: https://github.com/McGuinnessChen/dual-representation-space-encoding

URLs: https://github.com/McGuinnessChen/dual-representation-space-encoding

new Resolving Interference (RI): Disentangling Models for Improved Model Merging

Authors: Pratik Ramesh, George Stoica, Arun Iyer, Leshem Choshen, Judy Hoffman

Abstract: Model merging has shown that multitask models can be created by directly combining the parameters of different models that are each specialized on tasks of interest. However, models trained independently on distinct tasks often exhibit interference that degrades the merged model's performance. To solve this problem, we formally define the notion of Cross-Task Interference as the drift in the representation of the merged model relative to its constituent models. Reducing cross-task interference is key to improving merging performance. To address this issue, we propose our method, Resolving Interference (RI), a light-weight adaptation framework which disentangles expert models to be functionally orthogonal to the space of other tasks, thereby reducing cross-task interference. RI does this whilst using only unlabeled auxiliary data as input (i.e., no task-data is needed), allowing it to be applied in data-scarce scenarios. RI consistently improves the performance of state-of-the-art merging methods by up to 3.8% and generalization to unseen domains by up to 2.3%. We also find RI to be robust to the source of auxiliary input while being significantly less sensitive to tuning of merging hyperparameters. Our codebase is available at: https://github.com/pramesh39/resolving_interference

URLs: https://github.com/pramesh39/resolving_interference

new Deep Invertible Autoencoders for Dimensionality Reduction of Dynamical Systems

Authors: Nicol\`o Botteghi, Silke Glas, Christoph Brune

Abstract: Constructing reduced-order models (ROMs) capable of efficiently predicting the evolution of high-dimensional, parametric systems is crucial in many applications in engineering and applied sciences. A popular class of projection-based ROMs projects the high-dimensional full-order model (FOM) dynamics onto a low-dimensional manifold. These projection-based ROMs approaches often rely on classical model reduction techniques such as proper orthogonal decomposition (POD) or, more recently, on neural network architectures such as autoencoders (AEs). In the case that the ROM is constructed by the POD, one has approximation guaranteed based based on the singular values of the problem at hand. However, POD-based techniques can suffer from slow decay of the singular values in transport- and advection-dominated problems. In contrast to that, AEs allow for better reduction capabilities than the POD, often with the first few modes, but at the price of theoretical considerations. In addition, it is often observed, that AEs exhibits a plateau of the projection error with the increment of the dimension of the trial manifold. In this work, we propose a deep invertible AE architecture, named inv-AE, that improves upon the stagnation of the projection error typical of traditional AE architectures, e.g., convolutional, and the reconstructions quality. Inv-AE is composed of several invertible neural network layers that allows for gradually recovering more information about the FOM solutions the more we increase the dimension of the reduced manifold. Through the application of inv-AE to a parametric 1D Burgers' equation and a parametric 2D fluid flow around an obstacle with variable geometry, we show that (i) inv-AE mitigates the issue of the characteristic plateau of (convolutional) AEs and (ii) inv-AE can be combined with popular projection-based ROM approaches to improve their accuracy.

new Exploring label correlations using decision templates for ensemble of classifier chains

Authors: Victor F. Rocha, Alexandre L. Rodrigues, Thiago Oliveira-Santos, Fl\'avio M. Varej\~ao

Abstract: The use of ensemble-based multi-label methods has been shown to be effective in improving multi-label classification results. One of the most widely used ensemble-based multi-label classifiers is Ensemble of Classifier Chains. Decision templates for Ensemble of Classifier Chains (DTECC) is a fusion scheme based on Decision Templates that combines the predictions of Ensemble of Classifier Chains using information from the decision profile for each label, without considering information about other labels that might contribute to the classified result. Based on DTECC, this work proposes the Unconditionally Dependent Decision Templates for Ensemble of Classifier Chains (UDDTECC) method, a classifier fusion method that seeks to exploit correlations between labels in the fusion process. In this way, the classification of each label in the problem takes into account the label values that are considered conditionally dependent and that can lead to an improvement in the classification performance. The proposed method is experimentally compared with two traditional classifier fusion strategies and with a stacking-based strategy. Empirical evidence shows that using the proposed Decision Templates adaptation can improve the performance compared to the traditionally used fusion schemes on most of the evaluated metrics.

new Probabilistic Gaussian Homotopy: A Probability-Space Continuation Framework for Nonconvex Optimization

Authors: Eshed Gal, Samy Wu Fung, Eldad Haber

Abstract: We introduce Probabilistic Gaussian Homotopy (PGH), a probability-space continuation framework for nonconvex optimization. Unlike classical Gaussian homotopy, which smooths the objective and uniformly averages gradients, PGH deforms the associated Boltzmann distribution and induces Boltzmann-weighted aggregation of perturbed gradients, which exponentially biases descent directions toward low-energy regions. We show that PGH corresponds to a log-sum-exp (soft-min) homotopy that smooths a nonconvex objective at scale $\lambda>0$ and recovers the original objective as $\lambda\to 0$, yielding a posterior-mean generalization of the Moreau envelope, and we derive a dynamical system governing minimizer evolution along an annealed homotopy path. This establishes a principled connection between Gaussian continuation, Bayesian denoising, and diffusion-style smoothing. We further propose Probabilistic Gaussian Homotopy Optimization (PGHO), a practical stochastic algorithm based on Monte Carlo gradient estimation, and demonstrate strong performance on high-dimensional nonconvex benchmarks and sparse recovery problems where classical gradient methods and objective-space smoothing frequently fail.

new Ghosts of Softmax: Complex Singularities That Limit Safe Step Sizes in Cross-Entropy

Authors: Piyush Sao

Abstract: Optimization analyses for cross-entropy training rely on local Taylor models of the loss to predict whether a proposed step will decrease the objective. These surrogates are reliable only inside the Taylor convergence radius of the true loss along the update direction. That radius is set not by real-line curvature alone but by the nearest complex singularity. For cross-entropy, the softmax partition function $F=\sum_j \exp(z_j)$ has complex zeros -- ``ghosts of softmax'' -- that induce logarithmic singularities in the loss and cap this radius. To make this geometry usable, we derive closed-form expressions under logit linearization along the proposed update direction. In the binary case, the exact radius is $\rho^*=\sqrt{\delta^2+ \pi^2}/\Delta_a$. In the multiclass case, we obtain the lower bound $\rho_a=\pi/\Delta_a$, where $\Delta_a=\max_k a_k-\min_k a_k$ is the spread of directional logit derivatives $a_k=\nabla z_k\cdot v$. This bound costs one Jacobian-vector product and reveals what makes a step fragile: samples that are both near a decision flip and highly sensitive to the proposed direction tighten the radius. The normalized step size $r=\tau/\rho_a$ separates safe from dangerous updates. Across six tested architectures and multiple step directions, no model fails for $r<1$, yet collapse appears once $r\ge 1$. Temperature scaling confirms the mechanism: normalizing by $\rho_a$ shrinks the onset-threshold spread from standard deviation $0.992$ to $0.164$. A controller that enforces $\tau\le\rho_a$ survives learning-rate spikes up to $10{,} 000\times$ in our tests, where gradient clipping still collapses. Together, these results identify a geometric constraint on cross-entropy optimization that operates through Taylor convergence rather than Hessian curvature.

new Scalable Classification of Course Information Sheets Using Large Language Models: A Reusable Institutional Method for Academic Quality Assurance

Authors: Brecht Verbeken, Joke Van den Broeck, Inge De Cleyn, Steven Van Luchene, Nadine Engels, Andres Algaba, Vincent Ginis

Abstract: Purpose: Higher education institutions face increasing pressure to audit course designs for generative AI (GenAI) integration. This paper presents an end-to-end method for using large language models (LLMs) to scan course information sheets at scale, identify where assessments may be vulnerable to student use of GenAI tools, validate system performance through iterative refinement, and operationalise results through direct stakeholder communication and effort. Method: We developed a four-phase pipeline: (0) manual pilot sampling, (1) iterative prompt engineering with multi-model comparison, (2) full production scan of 4,684 Bachelor and Master course information sheets (Academic Year 2024-2025) from the Vrije Universiteit Brussel (VUB) with automated report generation and email distribution to teaching teams (91.4% address-matched) using a three-tier risk taxonomy (Clear risk, Potential risk, Low risk), and (3) longitudinal re-scan of 4,675 sheets after the next catalogue release. Results: Five iterations of prompt refinement achieved 87% agreement with expert labels. GPT-4o was selected for production based on superior handling of ambiguous cases involving internships and practical components. The Year 1 scan classified 60.3% of courses as Clear risk, 15.2% as Potential risk, and 24.5% as Low risk. Year 2 comparison revealed substantial shifts in risk distributions, with improvements most pronounced in practice-oriented programmes. Implications: The method enables institutions to rapidly transform heterogeneous catalogue data into structured and actionable intelligence. The approach is transferable to other audit domains (sustainability, accessibility, pedagogical alignment) and provides a template for responsible LLM deployment in higher education governance.

new MR-GNF: Multi-Resolution Graph Neural Forecasting on Ellipsoidal Meshes for Efficient Regional Weather Prediction

Authors: Andrii Shchur, Inna Skarga-Bandurova

Abstract: Weather forecasting offers an ideal testbed for artificial intelligence (AI) to learn complex, multi-scale physical systems. Traditional numerical weather prediction remains computationally costly for frequent regional updates, as high-resolution nests require intensive boundary coupling. We introduce Multi-Resolution Graph Neural Forecasting (MR-GNF), a lightweight, physics-aware model that performs short-term regional forecasts directly on an ellipsoidal, multi-scale graph of the Earth. The framework couples a 0.25{\deg} region of interest with a 0.5{\deg} context belt and 1.0{\deg} outer domain, enabling continuous cross-scale message passing without explicit nested boundaries. Its axial graph-attention network alternates vertical self-attention across pressure levels with horizontal graph attention across surface nodes, capturing implicit 3-D structure in just 1.6 M parameters. Trained on 40 years of ERA5 reanalysis (1980-2024), MR-GNF delivers stable +6 h to +24 h forecasts for near-surface temperature, wind, and precipitation over the UK-Ireland sector. Despite a total compute cost below 80 GPU-hours on a single RTX 6000 Ada, the model matches or exceeds heavier regional AI systems while preserving physical consistency across scales. These results demonstrate that graph-based neural operators can achieve trustworthy, high-resolution weather prediction at a fraction of NWP cost, opening a practical path toward AI-driven early-warning and renewable-energy forecasting systems. Project page and code: https://github.com/AndriiShchur/MR-GNF

URLs: https://github.com/AndriiShchur/MR-GNF

new Privacy-Preserving Machine Learning for IoT: A Cross-Paradigm Survey and Future Roadmap

Authors: Zakia Zaman, Praveen Gauravaram, Mahbub Hassan, Sanjay Jha, Wen Hu

Abstract: The rapid proliferation of the Internet of Things has intensified demand for robust privacy-preserving machine learning mechanisms to safeguard sensitive data generated by large-scale, heterogeneous, and resource-constrained devices. Unlike centralized environments, IoT ecosystems are inherently decentralized, bandwidth-limited, and latency-sensitive, exposing privacy risks across sensing, communication, and distributed training pipelines. These characteristics render conventional anonymization and centralized protection strategies insufficient for practical deployments. This survey presents a comprehensive IoT-centric, cross-paradigm analysis of privacy-preserving machine learning. We introduce a structured taxonomy spanning perturbation-based mechanisms such as differential privacy, distributed paradigms such as federated learning, cryptographic approaches including homomorphic encryption and secure multiparty computation, and generative synthesis techniques based on generative adversarial networks. For each paradigm, we examine formal privacy guarantees, computational and communication complexity, scalability under heterogeneous device participation, and resilience against threats including membership inference, model inversion, gradient leakage, and adversarial manipulation. We further analyze deployment constraints in wireless IoT environments, highlighting trade-offs between privacy, communication overhead, model convergence, and system efficiency within next-generation mobile architectures. We also consolidate evaluation methodologies, summarize representative datasets and open-source frameworks, and identify open challenges including hybrid privacy integration, energy-aware learning, privacy-preserving large language models, and quantum-resilient machine learning.

new Volumetric Radar Echo Motion Estimation Using Physics-Informed Deep Learning: A Case Study Over Slovakia

Authors: Peter Pavl\'ik, Anna Bou Ezzeddine, Viera Rozinajov\'a

Abstract: In precipitation nowcasting, most extrapolation-based methods rely on two-dimensional radar composites to estimate the horizontal motion of precipitation systems. However, in some cases, precipitation systems can exhibit varying motion at different heights. We propose a physics-informed convolutional neural network that estimates independent horizontal motion fields for multiple altitude layers directly from volumetric radar reflectivity data and investigate the practical benefits of altitude-wise motion field estimation for precipitation nowcasting. The model is trained end-to-end on volumetric observations from the Slovak radar network and its extrapolation nowcasting performance is evaluated. We compare the proposed model against an architecturally identical baseline operating on vertically pooled two-dimensional radar composites. Our results show that, although the model successfully learns altitude-wise motion fields, the estimated displacement is highly correlated across vertical levels for the vast majority of precipitation events. Consequently, the volumetric approach does not yield systematic improvements in nowcasting accuracy. While categorical metrics indicate increased precipitation detection at longer lead times, this gain is largely attributable to non-physical artifacts and is accompanied by a growing positive bias. A comprehensive inter-altitude motion field correlation analysis further confirms that events exhibiting meaningful vertical variability in horizontal motion are rare in the studied region. We conclude that, for the Slovak radar dataset, the additional complexity of three-dimensional motion field estimation is not justified by questionable gains in predictive skill. Nonetheless, the proposed framework remains applicable in climates where precipitation systems exhibit stronger vertical variability in horizontal motion.

new A Causal Framework for Mitigating Data Shifts in Healthcare

Authors: Kurt Butler, Stephanie Riley, Damian Machlanski, Edward Moroshko, Panagiotis Dimitrakopoulos, Thomas Melistas, Akchunya Chanchal, Konstantinos Vilouras, Zhihua Liu, Steven McDonagh, Hana Chockler, Ben Glocker, Niccolo Tempini, Matthew Sperrin, Sotirios A Tsaftaris, Ricardo Silva

Abstract: Developing predictive models that perform reliably across diverse patient populations and heterogeneous environments is a core aim of medical research. However, generalization is only possible if the learned model is robust to statistical differences between data used for training and data seen at the time and place of deployment. Domain generalization methods provide strategies to address data shifts, but each method comes with its own set of assumptions and trade-offs. To apply these methods in healthcare, we must understand how domain shifts arise, what assumptions we prefer to make, and what our design constraints are. This article proposes a causal framework for the design of predictive models to improve generalization. Causality provides a powerful language to characterize and understand diverse domain shifts, regardless of data modality. This allows us to pinpoint why models fail to generalize, leading to more principled strategies to prepare for and adapt to shifts. We recommend general mitigation strategies, discussing trade-offs and highlighting existing work. Our causality-based perspective offers a critical foundation for developing robust, interpretable, and clinically relevant AI solutions in healthcare, paving the way for reliable real-world deployment.

new Privacy-Preserving Federated Fraud Detection in Payment Transactions with NVIDIA FLARE

Authors: Holger R. Roth, Sarthak Tickoo, Mayank Kumar, Isaac Yang, Andrew Liu, Amit Varshney, Sayani Kundu, Iustina Vintila, Peter Madsgaard, Juraj Milcak, Chester Chen, Yan Cheng, Andrew Feng, Jeff Savio, Vikram Singh, Craig Stancill, Gloria Wan, Evan Powell, Anwar Ul Haq, Sudhir Upadhyay, Jisoo Lee

Abstract: Fraud-related financial losses continue to rise, while regulatory, privacy, and data-sovereignty constraints increasingly limit the feasibility of centralized fraud detection systems. Federated Learning (FL) has emerged as a promising paradigm for enabling collaborative model training across institutions without sharing raw transaction data. Yet, its practical effectiveness under realistic, non-IID financial data distributions remains insufficiently validated. In this work, we present a multi-institution, industry-oriented proof-of-concept study evaluating federated anomaly detection for payment transactions using the NVIDIA FLARE framework. We simulate a realistic federation of heterogeneous financial institutions, each observing distinct fraud typologies and operating under strict data isolation. Using a deep neural network trained via federated averaging (FedAvg), we demonstrate that federated models achieve a mean F1-score of 0.903 - substantially outperforming locally trained models (0.643) and closely approaching centralized training performance (0.925), while preserving full data sovereignty. We further analyze convergence behavior, showing that strong performance is achieved within 10 federated communication rounds, highlighting the operational viability of FL in latency- and cost-sensitive financial environments. To support deployment in regulated settings, we evaluate model interpretability using Shapley-based feature attribution and confirm that federated models rely on semantically coherent, domain-relevant decision signals. Finally, we incorporate sample-level differential privacy via DP-SGD and demonstrate favorable privacy-utility trade-offs...

new BERTology of Molecular Property Prediction

Authors: Mohammad Mostafanejad, Paul Saxe, T. Daniel Crawford

Abstract: Chemical language models (CLMs) have emerged as promising competitors to popular classical machine learning models for molecular property prediction (MPP) tasks. However, an increasing number of studies have reported inconsistent and contradictory results for the performance of CLMs across various MPP benchmark tasks. In this study, we conduct and analyze hundreds of meticulously controlled experiments to systematically investigate the effects of various factors, such as dataset size, model size, and standardization, on the pre-training and fine-tuning performance of CLMs for MPP. In the absence of well-established scaling laws for encoder-only masked language models, our aim is to provide comprehensive numerical evidence and a deeper understanding of the underlying mechanisms affecting the performance of CLMs for MPP tasks, some of which appear to be entirely overlooked in the literature.

new SemRep: Generative Code Representation Learning with Code Transformations

Authors: Weichen Li, Jiamin Song, Bogdan Alexandru Stoica, Arav Dhoot, Gabriel Ryan, Shengyu Fu, Kexin Pei

Abstract: Code transformation is a foundational capability in the software development process, where its effectiveness relies on constructing a high-quality code representation to characterize the input code semantics and guide the transformation. Existing approaches treat code transformation as an end-to-end learning task, leaving the construction of the representation needed for semantic reasoning implicit in model weights or relying on rigid compiler-level abstractions. We present SemRep, a framework that improves code transformation through generative code representation learning. Our key insight is to employ the semantics-preserving transformations as the intermediate representation, which serves as both a generative mid-training task and the guidance for subsequent instruction-specific code transformations. Across general code editing and optimization tasks (e.g., GPU kernel optimization), SemRep outperforms the extensively finetuned baselines with strictly the same training budget by 6.9% in correctness, 1.1x in performance, 13.9% in generalization, and 6.7% in robustness. With the improved exploration of diverse code transformations, SemRep is particularly amenable to evolutionary search. Combined with an evolutionary coding agent, SemRep finds optimizations that 685B larger-weight baselines fail to discover while achieving the same performance with 25% less inference compute.

new PLUME: Building a Network-Native Foundation Model for Wireless Traces via Protocol-Aware Tokenization

Authors: Swadhin Pradhan, Shazal Irshad, Jerome Henry

Abstract: Foundation models succeed when they learn in the native structure of a modality, whether morphology-respecting tokens in language or pixels in vision. Wireless packet traces deserve the same treatment: meaning emerges from layered headers, typed fields, timing gaps, and cross-packet state machines, not flat strings. We present Plume (Protocol Language Understanding Model for Exchanges), a compact 140M-parameter foundation model for 802.11 traces that learns from structured PDML dissections. A protocol-aware tokenizer splits along the dissector field tree, emits gap tokens for timing, and normalizes identifiers, yielding 6.2x shorter sequences than BPE with higher per token information density. Trained on a curated corpus, Plume achieves 74-97% next-packet token accuracy across five real-world failure categories and AUROC >= 0.99 for zero-shot anomaly detection. On the same prediction task, frontier LLMs (Claude Opus 4.6, GPT-5.4) score comparably despite receiving identical protocol context, yet Plume does so with > 600x fewer parameters, fitting on a single GPU at effectively zero marginal cost vs. cloud API pricing, enabling on-prem, privacy-preserving root cause analysis.

new PDE-SSM: A Spectral State Space Approach to Spatial Mixing in Diffusion Transformers

Authors: Eshed Gal, Moshe Eliasof, Siddharth Rout, Eldad Haber

Abstract: The success of vision transformers-especially for generative modeling-is limited by the quadratic cost and weak spatial inductive bias of self-attention. We propose PDE-SSM, a spatial state-space block that replaces attention with a learnable convection-diffusion-reaction partial differential equation. This operator encodes a strong spatial prior by modeling information flow via physically grounded dynamics rather than all-to-all token interactions. Solving the PDE in the Fourier domain yields global coupling with near-linear complexity of $O(N \log N)$, delivering a principled and scalable alternative to attention. We integrate PDE-SSM into a flow-matching generative model to obtain the PDE-based Diffusion Transformer PDE-SSM-DiT. Empirically, PDE-SSM-DiT matches or exceeds the performance of state-of-the-art Diffusion Transformers while substantially reducing compute. Our results show that, analogous to 1D settings where SSMs supplant attention, multi-dimensional PDE operators provide an efficient, inductive-bias-rich foundation for next-generation vision models.

new Locally Linear Continual Learning for Time Series based on VC-Theoretical Generalization Bounds

Authors: Yan V. G. Ferreira, Igor B. Lima, Pedro H. G. Mapa S., Felipe V. Campos, Antonio P. Braga

Abstract: Most machine learning methods assume fixed probability distributions, limiting their applicability in nonstationary real-world scenarios. While continual learning methods address this issue, current approaches often rely on black-box models or require extensive user intervention for interpretability. We propose SyMPLER (Systems Modeling through Piecewise Linear Evolving Regression), an explainable model for time series forecasting in nonstationary environments based on dynamic piecewise-linear approximations. Unlike other locally linear models, SyMPLER uses generalization bounds from Statistical Learning Theory to automatically determine when to add new local models based on prediction errors, eliminating the need for explicit clustering of the data. Experiments show that SyMPLER can achieve comparable performance to both black-box and existing explainable models while maintaining a human-interpretable structure that reveals insights about the system's behavior. In this sense, our approach conciliates accuracy and interpretability, offering a transparent and adaptive solution for forecasting nonstationary time series.

new Quantum-Enhanced Vision Transformer for Flood Detection using Remote Sensing Imagery

Authors: Soumyajit Maity, Behzad Ghanbarian

Abstract: Reliable flood detection is critical for disaster management, yet classical deep learning models often struggle with the high-dimensional, nonlinear complexities inherent in remote sensing data. To mitigate these limitations, we introduced a novel Quantum-Enhanced Vision Transformer (ViT) that synergizes the global context-awareness of transformers with the expressive feature extraction capabilities of quantum computing. Using remote sensing imagery, we developed a hybrid architecture that processes inputs through parallel pathways, a ViT backbone and a quantum branch utilizing a 4-qubit parameterized quantum circuit for localized feature mapping. These distinct representations were fused to optimize binary classification. Results showed that the proposed hybrid model significantly outperformed a classical ViT baseline, increased overall accuracy from 84.48% to 94.47% and the F1-score from 0.841 to 0.944. Notably, the quantum integration substantially improved discriminative power in complex terrains for both class. These findings validate the potential of quantum-classical hybrid models to enhance precision in hydrological monitoring and earth observation applications.

new Routing Channel-Patch Dependencies in Time Series Forecasting with Graph Spectral Decomposition

Authors: Dongyuan Li, Shun Zheng, Chang Xu, Jiang Bian, Renhe Jiang

Abstract: Time series forecasting has attracted significant attention in the field of AI. Previous works have revealed that the Channel-Independent (CI) strategy improves forecasting performance by modeling each channel individually, but it often suffers from poor generalization and overlooks meaningful inter-channel interactions. Conversely, Channel-Dependent (CD) strategies aggregate all channels, which may introduce irrelevant information and lead to oversmoothing. Despite recent progress, few existing methods offer the flexibility to adaptively balance CI and CD strategies in response to varying channel dependencies. To address this, we propose a generic plugin xCPD, that can adaptively model the channel-patch dependencies from the perspective of graph spectral decomposition. Specifically, xCPD first projects multivariate signals into the frequency domain using a shared graph Fourier basis, and groups patches into low-, mid-, and high-frequency bands based on their spectral energy responses. xCPD then applies a channel-adaptive routing mechanism that dynamically adjusts the degree of inter-channel interaction for each patch, enabling selective activation of frequency-specific experts. This facilitates fine-grained input-aware modeling of smooth trends, local fluctuations, and abrupt transitions. xCPD can be seamlessly integrated on top of existing CI and CD forecasting models, consistently enhancing both accuracy and generalization across benchmarks. The code is available https://github.com/Clearloveyuan/xCPD.

URLs: https://github.com/Clearloveyuan/xCPD.

new Data-driven Progressive Discovery of Physical Laws

Authors: Mingkun Xia, Weiwei Zhang

Abstract: Symbolic regression is a powerful tool for knowledge discovery, enabling the extraction of interpretable mathematical expressions directly from data. However, conventional symbolic discovery typically follows an end-to-end, "one-step" process, which often generates lengthy and physically meaningless expressions when dealing with real physical systems, leading to poor model generalization. This limitation fundamentally stems from its deviation from the basic path of scientific discovery: physical laws do not exist in a single form but follow a hierarchical and progressive pattern from simplicity to complexity. Motivated by this principle, we propose Chain of Symbolic Regression (CoSR), a novel framework that models the discovery of physical laws as a chain of symbolic knowledge. This knowledge chain is formed by progressively combining multiple knowledge units with clear physical meanings along a specific logic, ultimately enabling the precise discovery of the underlying physical laws from data. CoSR fully recapitulates the progressive discovery path from Kepler's third law to the law of universal gravitation in classical mechanics, and is applied to three types of problems: turbulent Rayleigh-Benard convection, viscous flows in a circular pipe, and laser-metal interaction, demonstrating its ability to improve classical scaling theories. Finally, CoSR showcases its capability to discover new knowledge in the complex engineering problem of aerodynamic coefficients scaling for different aircraft.

new Few Batches or Little Memory, But Not Both: Simultaneous Space and Adaptivity Constraints in Stochastic Bandits

Authors: Ruiyuan Huang, Zicheng Lyu, Xiaoyi Zhu, Zengfeng Huang

Abstract: We study stochastic multi-armed bandits under simultaneous constraints on space and adaptivity: the learner interacts with the environment in $B$ batches and has only $W$ bits of persistent memory. Prior work shows that each constraint alone is surprisingly mild: near-minimax regret $\widetilde{O}(\sqrt{KT})$ is achievable with $O(\log T)$ bits of memory under fully adaptive interaction, and with a $K$-independent $O(\log\log T)$-type number of batches when memory is unrestricted. We show that this picture breaks down in the simultaneously constrained regime. We prove that any algorithm with a $W$-bit memory constraint must use at least $\Omega(K/W)$ batches to achieve near-minimax regret $\widetilde{O}(\sqrt{KT})$ , even under adaptive grids. In particular, logarithmic memory rules out $K$-independent batch complexity. Our proof is based on an information bottleneck. We show that near-minimax regret forces the learner to acquire $\Omega(K)$ bits of information about the hidden set of good arms under a suitable hard prior, whereas an algorithm with $B$ batches and $W$ bits of memory allows only $O(BW)$ bits of information. A key ingredient is a localized change-of-measure lemma that yields probability-level arm exploration guarantees, which is of independent interest. We also give an algorithm using $O(\log T)$ bits of memory and $\widetilde{O}(K)$ batches that achieves regret $\widetilde{O}(\sqrt{KT})$, which nearly matches our lower bound.

new Manifold-Orthogonal Dual-spectrum Extrapolation for Parameterized Physics-Informed Neural Networks

Authors: Zhangyong Liang, Ji Zhang

Abstract: Physics-informed neural networks (PINNs) have achieved notable success in modeling dynamical systems governed by partial differential equations (PDEs). To avoid computationally expensive retraining under new physical conditions, parameterized PINNs (P$^2$INNs) commonly adapt pre-trained operators using singular value decomposition (SVD) for out-of-distribution (OOD) regimes. However, SVD-based fine-tuning often suffers from rigid subspace locking and truncation of important high-frequency spectral modes, limiting its ability to capture complex physical transitions. While parameter-efficient fine-tuning (PEFT) methods appear to be promising alternatives, applying conventional adapters such as LoRA to P$^2$INNs introduces a severe Pareto trade-off, as additive updates increase parameter overhead and disrupt the structured physical manifolds inherent in operator representations. To address these limitations, we propose Manifold-Orthogonal Dual-spectrum Extrapolation (MODE), a lightweight micro-architecture designed for physics operator adaptation. MODE decomposes physical evolution into complementary mechanisms including principal-spectrum dense mixing that enables cross-modal energy transfer within frozen orthogonal bases, residual-spectrum awakening that activates high-frequency spectral components through a single trainable scalar, and affine Galilean unlocking that explicitly isolates spatial translation dynamics. Experiments on challenging PDE benchmarks including the 1D Convection--Diffusion--Reaction equation and the 2D Helmholtz equation demonstrate that MODE achieves strong out-of-distribution generalization while preserving the minimal parameter complexity of native SVD and outperforming existing PEFT-based baselines.

new Level Up: Defining and Exploiting Transitional Problems for Curriculum Learning

Authors: Zhenwei Tang, Amogh Inamdar, Ashton Anderson, Richard Zemel

Abstract: Curriculum learning--ordering training examples in a sequence to aid machine learning--takes inspiration from human learning, but has not gained widespread acceptance. Static strategies for scoring item difficulty rely on indirect proxy scores of varying quality and produce curricula that are not specific to the learner at hand. Dynamic approaches base difficulty estimates on gradient information, requiring considerable extra computation during training. We introduce a novel method for measuring the difficulty of individual problem instances directly relative to the ability of a given model, and identify transitional problems that are consistently easier as model ability increases. Applying this method to chess and mathematics, we find that training on a curriculum that "levels up" from easier to harder transitional problems most efficiently improves a model to the next tier of competence. These problems induce a natural progression from easier to harder items, which outperforms other training strategies. By measuring difficulty directly relative to model competence, our method yields interpretable problems, learner-specific curricula, and a principled basis for step-by-step improvement.

new Greedy Information Projection for LLM Data Selection

Authors: Victor Ye Dong, Kuan-Yun Lee, Jiamei Shuai, Shengfei Liu, Yi Liu, Jian Jiao

Abstract: We present \emph{Greedy Information Projection} (\textsc{GIP}), a principled framework for choosing training examples for large language model fine-tuning. \textsc{GIP} casts selection as maximizing mutual information between a subset of examples and task-specific query signals, which may originate from LLM quality judgments, metadata, or other sources. The framework involves optimizing a closed-form mutual information objective defined using both data and query embeddings, naturally balancing {\it quality} and {\it diversity}. Optimizing this score is equivalent to maximizing the projection of the query embedding matrix onto the span of the selected data, which provides a geometric explanation for the co-emergence of quality and diversity. Building on this view, we employ a fast greedy matching-pursuit procedure with efficient projection-based updates. On instruction-following and mathematical reasoning datasets, \textsc{GIP} selects small subsets that match full-data fine-tuning while using only a fraction of examples and compute, unifying quality-aware and diversity-aware selection for efficient fine-tuning.

new IGU-LoRA: Adaptive Rank Allocation via Integrated Gradients and Uncertainty-Aware Scoring

Authors: Xuan Cui, Huiyue Li, Run Zeng, Yunfei Zhao, Jinrui Qian, Wei Duan, Bo Liu, Zhanpeng Zhou

Abstract: As large language models (LLMs) scale to billions of parameters, full-parameter fine-tuning becomes compute- and memory-prohibitive. Parameter-efficient fine-tuning (PEFT) mitigates this issue by updating only a small set of task-specific parameters while keeping the base model frozen. Among PEFT approaches, low-rank adaptation (LoRA) is widely adopted; however, it enforces a uniform rank across layers despite substantial variation in layer importance, motivating {layerwise} rank allocation. Recent adaptive-rank variants (e.g., AdaLoRA) allocate ranks based on importance scores, yet typically rely on instantaneous gradients that capture only local sensitivity, overlooking non-local, pathwise effects within the same layer, which yields unstable and biased scores. To address this limitation, we introduce IGU-LoRA, an adaptive-rank LoRA that (i) computes within-layer Integrated Gradients (IG) sensitivities and aggregates them into a layer-level score for rank allocation, and (ii) applies an uncertainty-aware scheme using exponential moving averages with deviation tracking to suppress noisy updates and calibrate rank selection. Theoretically, we prove an upper bound on the composite trapezoidal rule approximation error for parameter-space IG under a pathwise Hessian-Lipschitz condition, which informs the quadrature budget. Across diverse tasks and architectures, IGU-LoRA consistently outperforms strong PEFT baselines at matched parameter budgets, improving downstream accuracy and robustness. Ablations confirm the contributions of pathwise within-layer sensitivity estimates and uncertainty-aware selection to effective rank allocation. Our code is publicly available at https://github.com/withyou12/igulora.git

URLs: https://github.com/withyou12/igulora.git

new Computation and Communication Efficient Federated Unlearning via On-server Gradient Conflict Mitigation and Expression

Authors: Minh-Duong Nguyen, Senura Hansaja, Le-Tuan Nguyen, Quoc-Viet Pham, Ken-Tye Yong, Nguyen H. Tran, Dung D. Le

Abstract: Federated Unlearning (FUL) aims to remove specific participants' data contributions from a trained Federated Learning model, thereby ensuring data privacy and compliance with regulatory requirements. Despite its potential, progress in FUL has been limited due to several challenges, including the cross-client knowledge inaccessibility and high computational and communication costs. To overcome these challenges, we propose Federated On-server Unlearning (FOUL), a novel framework that comprises two key stages. The learning-to-unlearn stage serves as a preparatory learning phase, during which the model identifies and encodes the key features associated with the forget clients. This stage is communication-efficient and establishes the basis for the subsequent unlearning process. Subsequently, on-server knowledge aggregation phase aims to perform the unlearning process at the server without requiring access to client data, thereby preserving both efficiency and privacy. We introduce a new data setting for FUL, which enables a more transparent and rigorous evaluation of unlearning. To highlight the effectiveness of our approach, we propose a novel evaluation metric termed time-to-forget, which measures how quickly the model achieves optimal unlearning performance. Extensive experiments conducted on three datasets under various unlearning scenarios demonstrate that FOUL outperforms the Retraining in FUL. Moreover, FOUL achieves competitive or superior results with significantly reduced time-to-forget, while maintaining low communication and computation costs.

new Node Role-Guided LLMs for Dynamic Graph Clustering

Authors: Dongyuan Li, Ying Zhang, Yaozu Wu, Renhe Jiang

Abstract: Dynamic graph clustering aims to detect and track time-varying clusters in dynamic graphs, revealing how complex real-world systems evolve over time. However, existing methods are predominantly black-box models. They lack interpretability in their clustering decisions and fail to provide semantic explanations of why clusters form or how they evolve, severely limiting their use in safety-critical domains such as healthcare or transportation. To address these limitations, we propose an end-to-end interpretable framework that maps continuous graph embeddings into discrete semantic concepts through learnable prototypes. Specifically, we first decompose node representations into orthogonal role and clustering subspaces, so that nodes with similar roles (e.g., hubs, bridges) but different cluster affiliations can be properly distinguished. We then introduce five node role prototypes (Leader, Contributor, Wanderer, Connector, Newcomer) in the role subspace as semantic anchors, transforming continuous embeddings into discrete concepts to facilitate LLM understanding of node roles within communities. Finally, we design a hierarchical LLM reasoning mechanism to generate both clustering results and natural language explanations, while providing consistency feedback as weak supervision to refine node representations. Experimental results on four synthetic and six real-world benchmarks demonstrate the effectiveness, interpretability, and robustness of DyG-RoLLM. Code is available at https://github.com/Clearloveyuan/DyG-RoLLM.

URLs: https://github.com/Clearloveyuan/DyG-RoLLM.

new Prototypical Exemplar Condensation for Memory-efficient Online Continual Learning

Authors: Minh-Duong Nguyen, Thien-Thanh Dao, Le-Tuan Nguyen, Dung D. Le, Kok-Seng Wong

Abstract: Rehearsal-based continual learning (CL) mitigates catastrophic forgetting by maintaining a subset of samples from previous tasks for replay. Existing studies primarily focus on optimizing memory storage through coreset selection strategies. While these methods are effective, they typically require storing a substantial number of samples per class (SPC), often exceeding 20, to maintain satisfactory performance. In this work, we propose to further compress the memory footprint by synthesizing and storing prototypical exemplars, which can form representative prototypes when passed through the feature extractor. Owing to their representative nature, these exemplars enable the model to retain previous knowledge using only a small number of samples while preserving privacy. Moreover, we introduce a perturbation-based augmentation mechanism that generates synthetic variants of previous data during training, thereby enhancing CL performance. Extensive evaluations on widely used benchmark datasets and settings demonstrate that the proposed algorithm achieves superior performance compared to existing baselines, particularly in scenarios involving large-scale datasets and a high number of tasks.

new Collapse or Preserve: Data-Dependent Temporal Aggregation for Spiking Neural Network Acceleration

Authors: Jiahao Qin

Abstract: Spike sparsity is widely believed to enable efficient spiking neural network (SNN) inference on GPU hardware. We demonstrate this is an illusion: five distinct sparse computation strategies on Apple M3 Max all fail to outperform dense convolution, because SIMD architectures cannot exploit the fine-grained, unstructured sparsity of i.i.d. binary spikes. Instead, we propose Temporal Aggregated Convolution (TAC), which exploits convolution linearity to pre-aggregate $K$ spike frames before a single convolution call, reducing $T$ calls to $T/K$. On rate-coded data, TAC achieves 13.8times speedup with +1.6% accuracy on MNIST and +5.4% on Fashion-MNIST -- a simultaneous improvement in both speed and accuracy. However, on event-based data where the temporal dimension carries genuine motion information, TAC's temporal collapse is harmful. We therefore introduce TAC-TP (Temporal Preservation), which shares each group's convolution output across K independent LIF steps, preserving full temporal resolution for downstream layers. On DVS128-Gesture, TAC-TP achieves 95.1% accuracy (vs. 96.3% baseline) with 50% fewer convolution calls, while standard TAC drops to 91.3%. Our key finding is that the optimal temporal aggregation strategy is data-dependent: collapse the temporal dimension for rate-coded data (noise reduction) but preserve it for event data (information retention). Speedup is hardware-agnostic: TAC achieves 11.0times on NVIDIA V100, confirming the mechanism transfers across GPU architectures. All operators in the mlx-snn library are open source.

new Effective Sparsity: A Unified Framework via Normalized Entropy and the Effective Number of Nonzeros

Authors: Haoyu He, Hao Wang, Jiashan Wang, Hao Zeng

Abstract: Classical sparsity promoting methods rely on the l0 norm, which treats all nonzero components as equally significant. In practical inverse problems, however, solutions often exhibit many small amplitude components that have little effect on reconstruction but lead to an overestimation of signal complexity. We address this limitation by shifting the paradigm from discrete cardinality to effective sparsity. Our approach introduces the effective number of nonzeros (ENZ), a unified class of normalized entropy-based regularizers, including Shannon and Renyi forms, that quantifies the concentration of significant coefficients. We show that, unlike the classical l0 norm, the ENZ provides a stable and continuous measure of effective sparsity that is insensitive to negligible perturbations. For noisy linear inverse problems, we establish theoretical guarantees under the Restricted Isometry Property (RIP), proving that ENZ based recovery is unique and stable. We also derive a decomposition showing that the ENZ equals the support cardinality times a distributional efficiency term, thereby linking entropy with l0 regularization. Numerical experiments show that this effective sparsity framework outperforms traditional cardinality based methods in robustness and accuracy.

new Exploring the Dimensions of a Variational Neuron

Authors: Yves Ruffenach

Abstract: We introduce EVE (Elemental Variational Expanse), a variational distributional neuron formulated as a local probabilistic computational unit with an explicit prior, an amortized posterior, and unit-level variational regularization. In most modern architectures, uncertainty is modeled through global latent variables or parameter uncertainty, while the computational unit itself remains scalar. EVE instead relocates probabilistic structure to the neuron level, making it locally observable and controllable. In this paper, the term dimensions refers primarily to the neuron's internal latent dimensionality, denoted by k. We study how varying k, from the atomic case k = 1 to higher-dimensional latent spaces, changes the neuron's learned operating regime. We then examine how this main axis interacts with two additional structural properties: local capacity control and temporal persistence through a neuron-level autoregressive extension. To support this study, EVE is instrumented with internal diagnostics and constraints, including effective KL, a target band on mu^2, out-of-band fractions, and indicators of drift and collapse. Across selected forecasting and tabular settings, we show that latent dimensionality, control, and temporal extension shape the neuron's internal regime, and that some neuron-level variables are measurable, informative, and related to downstream behavior. Overall, the paper provides an experimentally grounded first map of the design space opened by a variational neuron.

new Fronto-parietal and fronto-temporal EEG coherence as predictive neuromarkers of transcutaneous auricular vagus nerve stimulation response in treatment-resistant schizophrenia: A machine learning study

Authors: Yapeng Cui, Ruoxi Yun, Shumin Zhang, Yi Gong, Zhiqin Li, Ying Chen, Mingbing Su, Dongniya Wu, Jingxia Wu, Qian Wang, Jianan Wang, Qianqian Tian, Yangyang Yuan, Shuhao Mei, Lei Wu, Xinghua Li, Bingkui Zhang, Taipin Guo, Jinbo Sun

Abstract: Response variability limits the clinical utility of transcutaneous auricular vagus nerve stimulation (taVNS) for negative symptoms in treatment-resistant schizophrenia (TRS). This study aimed to develop an electroencephalography (EEG)-based machine learning (ML) model to predict individual response and explore associated neurophysiological mechanisms. We used ML to develop and validate predictive models based on pre-treatment EEG data features (power, coherence, and dynamic functional connectivity) from 50 TRS patients enrolled in the taVNS trial, within a nested cross-validation framework. Participants received 20 sessions of active or sham taVNS (n = 25 each) over two weeks, followed by a two-week follow-up. The prediction target was the percentage change in the positive and negative syndrome scale-factor score for negative symptoms (PANSS-FSNS) from baseline to post-treatment, with further evaluation of model specificity and neurophysiological relevance.The optimal model accurately predicted taVNS response in the active group, with predicted PANSS-FSNS changes strongly correlated with observed changes (r = 0.87, p < .001); permutation testing confirmed performance above chance (p < .001). Nine consistently retained features were identified, predominantly fronto-parietal and fronto-temporal coherence features. Negligible predictive performance in the sham group and failure to predict positive symptom change support the predictive specificity of this oscillatory signature for taVNS-related negative symptom improvement. Two coherence features within fronto-parietal-temporal networks showed post-taVNS changes significantly associated with symptom improvement, suggesting dual roles as predictors and potential therapeutic targets. EEG oscillatory neuromarkers enable accurate prediction of individual taVNS response in TRS, supporting mechanism-informed precision neuromodulation strategies.

new OrigamiBench: An Interactive Environment to Synthesize Flat-Foldable Origamis

Authors: Naaisha Agarwal, Yihan Wu, Yichang Jian, Yikuan Hu, Nishad Mansoor, Mohan Li, Yifei Peng, Wang-Zhou Dai, Yao-Xiang Ding, Emanuele Sansone

Abstract: Building AI systems that can plan, act, and create in the physical world requires more than pattern recognition. Such systems must understand the causal mechanisms and constraints governing physical processes in order to guide sequential decisions. This capability relies on internal representations, analogous to an internal language model, that relate observations, actions, and resulting environmental changes. However, many existing benchmarks treat visual perception and programmatic reasoning as separate problems, focusing either on visual recognition or on symbolic tasks. The domain of origami provides a natural testbed that integrates these modalities. Constructing shapes through folding operations requires visual perception, reasoning about geometric and physical constraints, and sequential planning, while remaining sufficiently structured for systematic evaluation. We introduce OrigamiBench, an interactive benchmark in which models iteratively propose folds and receive feedback on physical validity and similarity to a target configuration. Experiments with modern vision-language models show that scaling model size alone does not reliably produce causal reasoning about physical transformations. Models fail to generate coherent multi-step folding strategies, suggesting that visual and language representations remain weakly integrated.

new On Interpolation Formulas Describing Neural Network Generalization

Authors: Jin Guo, Roy Y. He, Jean-Michel Morel

Abstract: In 2020 Domingos introduced an interpolation formula valid for "every model trained by gradient descent". He concluded that such models behave approximately as kernel machines. In this work, we extend the Domingos formula to stochastic training. We introduce a stochastic gradient kernel that extends the deterministic version via a continuous-time diffusion approximation. We prove stochastic Domingos theorems and show that the expected network output admits a kernel-machine representation with optimizer-specific weighting. It reveals that training samples contribute through loss-dependent weights and gradient alignment along the training trajectory. We then link the generalization error to the null space of the integral operator induced by the stochastic gradient kernel. The same path-kernel viewpoint provides a unified interpretation of diffusion models and GANs: diffusion induces stage-wise, noise-localized corrections, whereas GANs induce distribution-guided corrections shaped by discriminator geometry. We visualize the evolution of implicit kernels during optimization and quantify out-of-distribution behaviors through a series of numerical experiments. Our results support a feature-space memory view of learning: training stores data-dependent information in an evolving tangent feature geometry, and predictions at test time arise from kernel-weighted retrieval and aggregation of these stored features, with generalization governed by alignment between test points and the learned feature memory.

new Scribe Verification in Chinese manuscripts using Siamese, Triplet, and Vision Transformer Neural Networks

Authors: Dimitrios-Chrysovalantis Liakopoulos, Yanbo Zhang, Chongsheng Zhang, Constantine Kotropoulos

Abstract: The paper examines deep learning models for scribe verification in Chinese manuscripts. That is, to automatically determine whether two manuscript fragments were written by the same scribe using deep metric learning methods. Two datasets were used: the Tsinghua Bamboo Slips Dataset and a selected subset of the Multi-Attribute Chinese Calligraphy Dataset, focusing on the calligraphers with a large number of samples. Siamese and Triplet neural network architectures are implemented, including convolutional and Transformer-based models. The experimental results show that the MobileNetV3+ Custom Siamese model trained with contrastive loss achieves either the best or the second-best overall accuracy and area under the Receiver Operating Characteristic Curve on both datasets.

new UVLM: A Universal Vision-Language Model Loader for Reproducible Multimodal Benchmarking

Authors: Joan Perez, Giovanni Fusco

Abstract: Vision-Language Models (VLMs) have emerged as powerful tools for image understanding tasks, yet their practical deployment remains hindered by significant architectural heterogeneity across model families. This paper introduces UVLM (Universal Vision-Language Model Loader), a Google Colab-based framework that provides a unified interface for loading, configuring, and benchmarking multiple VLM architectures on custom image analysis tasks. UVLM currently supports two major model families -- LLaVA-NeXT and Qwen2.5-VL -- which differ fundamentally in their vision encoding, tokenization, and decoding strategies. The framework abstracts these differences behind a single inference function, enabling researchers to compare models using identical prompts and evaluation protocols. Key features include a multi-task prompt builder with support for four response types (numeric, category, boolean, text), a consensus validation mechanism based on majority voting across repeated inferences, a flexible token budget (up to 1,500 tokens) enabling users to design custom reasoning strategies through prompt engineering, and a built-in chain-of-thought reference mode for benchmarking. UVLM is designed for reproducibility, accessibility, and extensibility and as such is freely deployable on Google Colab using consumer-grade GPU resources. The paper also presents the first benchmarking of different VLMs on tasks of increasing reasoning complexity using a corpus of 120 street-view images.

new Robust Self-Training with Closed-loop Label Correction for Learning from Noisy Labels

Authors: Zhanhui Lin, Yanlin Liu, Sanping Zhou

Abstract: Training deep neural networks with noisy labels remains a significant challenge, often leading to degraded performance. Existing methods for handling label noise typically rely on either transition matrix, noise detection, or meta-learning techniques, but they often exhibit low utilization efficiency of noisy samples and incur high computational costs. In this paper, we propose a self-training label correction framework using decoupled bilevel optimization, where a classifier and neural correction function co-evolve. Leveraging a small clean dataset, our method employs noisy posterior simulation and intermediate features to transfer ground-truth knowledge, forming a closed-loop feedback system that prevents error amplification. Theoretical guarantees underpin the stability of our approach, and extensive experiments on benchmark datasets like CIFAR and Clothing1M confirm state-of-the-art performance with reduced training time, highlighting its practical applicability for learning from noisy labels.

new Distributed Acoustic Sensing for Urban Traffic Monitoring: Spatio-Temporal Attention in Recurrent Neural Networks

Authors: Izhan Fakhruzi, Manuel Titos, Carmen Ben\'itez, Luz Garc\'ia

Abstract: Effective urban traffic monitoring is essential for improving mobility, enhancing safety, and supporting sustainable cities. Distributed Acoustic Sensing (DAS) enables large-scale traffic observation by transforming existing fiber-optic infrastructure into dense arrays of vibration sensors. However, modeling the high-resolution spatio-temporal structure of DAS data for reliable traffic event recognition remains challenging. This study presents a real-world DAS-based traffic monitoring experiment conducted in Granada, Spain, where vehicles cross a fiber deployed perpendicular to the roadway. Recurrent neural networks (RNNs) are employed to model intra- and inter-event temporal dependencies. Spatial and temporal attention mechanisms are systematically integrated within the RNN architecture to analyze their impact on recognition performance, parameter efficiency, and interpretability. Results show that an appropriate and complementary placement of attention modules improves the balance between accuracy and model complexity. Attention heatmaps provide physically meaningful interpretations of classification decisions by highlighting informative spatial locations and temporal segments. Furthermore, the proposed SA-bi-TA configuration demonstrates spatial transferability, successfully recognizing traffic events at sensing locations different from those used during training, with only moderate performance degradation. These findings support the development of scalable and interpretable DAS-based traffic monitoring systems capable of operating under heterogeneous urban sensing conditions.

new FedPBS: Proximal-Balanced Scaling Federated Learning Model for Robust Personalized Training for Non-IID Data

Authors: Eman M. AbouNassara, Amr Elshalla, Sameh Abdulah

Abstract: Federated learning (FL) enables a set of distributed clients to jointly train machine learning models while preserving their local data privacy, making it attractive for applications in healthcare, finance, mobility, and smart-city systems. However, FL faces several challenges, including statistical heterogeneity and uneven client participation, which can degrade convergence and model quality. In this work, we propose FedPBS, an FL algorithm that couples complementary ideas from FedBS and FedProx to address these challenges. FedPBS dynamically adapts batch sizes to client resources to support balanced and scalable participation, and selectively applies a proximal correction to small-batch clients to stabilize local updates and reduce divergence from the global model. Experiments on benchmarking datasets such as CIFAR-10 and UCI-HAR under highly non-IID settings demonstrate that FedPBS consistently outperforms state-of-the-art methods, including FedBS, FedGA, MOON, and FedProx. The results demonstrate robust performance gains under extreme data heterogeneity, with smooth loss curves indicating stable convergence across diverse federated environments. FedPBS consistently outperforms state-of-the-art federated learning baselines on UCI-HAR and CIFAR-10 under severe non-IID conditions while maintaining stable and reliable convergence.

new Close to Reality: Interpretable and Feasible Data Augmentation for Imbalanced Learning

Authors: Matheus Camilo da Silva, Gabriel Gustavo Costanzo, Andrea de Lorenzo, Sylvio Barbon Junior

Abstract: Many machine learning classification tasks involve imbalanced datasets, which are often subject to over-sampling techniques aimed at improving model performance. However, these techniques are prone to generating unrealistic or infeasible samples. Furthermore, they often function as black boxes, lacking interpretability in their procedures. This opacity makes it difficult to track their effectiveness and provide necessary adjustments, and they may ultimately fail to yield significant performance improvements. To bridge this gap, we introduce the Decision Predicate Graphs for Data Augmentation (DPG-da), a framework that extracts interpretable decision predicates from trained models to capture domain rules and enforce them during sample generation. This design ensures that over-sampled data remain diverse, constraint-satisfying, and interpretable. In experiments on synthetic and real-world benchmark datasets, DPG-da consistently improves classification performance over traditional over-sampling methods, while guaranteeing logical validity and offering clear, interpretable explanations of the over-sampled data.

new True 4-Bit Quantized Convolutional Neural Network Training on CPU: Achieving Full-Precision Parity

Authors: Shivnath Tathe

Abstract: Low-precision neural network training has emerged as a promising direction for reducing computational costs and democratizing access to deep learning research. However, existing 4-bit quantization methods either rely on expensive GPU infrastructure or suffer from significant accuracy degradation. In this work, we present a practical method for training convolutional neural networks at true 4-bit precision using standard PyTorch operations on commodity CPUs. We introduce a novel tanh-based soft weight clipping technique that, combined with symmetric quantization, dynamic per-layer scaling, and straight-through estimators, achieves stable convergence and competitive accuracy. Training a VGG-style architecture with 3.25 million parameters from scratch on CIFAR-10, our method achieves 92.34% test accuracy on Google Colab's free CPU tier -- matching full-precision baseline performance (92.5%) with only a 0.16% gap. We further validate on CIFAR-100, achieving 70.94% test accuracy across 100 classes with the same architecture and training procedure, demonstrating that 4-bit training from scratch generalizes to harder classification tasks. Both experiments achieve 8x memory compression over FP32 while maintaining exactly 15 unique weight values per layer throughout training. We additionally validate hardware independence by demonstrating rapid convergence on a consumer mobile device (OnePlus 9R), achieving 83.16% accuracy in only 6 epochs. To the best of our knowledge, no prior work has demonstrated 4-bit quantization-aware training achieving full-precision parity on standard CPU hardware without specialized kernels or post-training quantization.

new Shapes are not enough: CONSERVAttack and its use for finding vulnerabilities and uncertainties in machine learning applications

Authors: Philip Bechtle, Lucie Flek, Philipp Alexander Jung, Akbar Karimi, Timo Saala, Alexander Schmidt, Matthias Schott, Philipp Soldin, Christopher Wiebusch, Ulrich Willemsen

Abstract: In High Energy Physics, as in many other fields of science, the application of machine learning techniques has been crucial in advancing our understanding of fundamental phenomena. Increasingly, deep learning models are applied to analyze both simulated and experimental data. In most experiments, a rigorous regime of testing for physically motivated systematic uncertainties is in place. The numerical evaluation of these tests for differences between the data on the one side and simulations on the other side quantifies the effect of potential sources of mismodelling on the machine learning output. In addition, thorough comparisons of marginal distributions and (linear) feature correlations between data and simulation in "control regions" are applied. However, the guidance by physical motivation, and the need to constrain comparisons to specific regions, does not guarantee that all possible sources of deviations have been accounted for. We therefore propose a new adversarial attack - the CONSERVAttack - designed to exploit the remaining space of hypothetical deviations between simulation and data after the above mentioned tests. The resulting adversarial perturbations are consistent within the uncertainty bounds - evading standard validation checks - while successfully fooling the underlying model. We further propose strategies to mitigate such vulnerabilities and argue that robustness to adversarial effects must be considered when interpreting results from deep learning in particle physics.

new Chunk-Guided Q-Learning

Authors: Gwanwoo Song, Kwanyoung Park, Youngwoon Lee

Abstract: In offline reinforcement learning (RL), single-step temporal-difference (TD) learning can suffer from bootstrapping error accumulation over long horizons. Action-chunked TD methods mitigate this by backing up over multiple steps, but can introduce suboptimality by restricting the policy class to open-loop action sequences. To resolve this trade-off, we present Chunk-Guided Q-Learning (CGQ), a single-step TD algorithm that guides a fine-grained single-step critic by regularizing it toward a chunk-based critic trained using temporally extended backups. This reduces compounding error while preserving fine-grained value propagation. We theoretically show that CGQ attains tighter critic optimality bounds than either single-step or action-chunked TD learning alone. Empirically, CGQ achieves strong performance on challenging long-horizon OGBench tasks, often outperforming both single-step and action-chunked methods.

new Aumann-SHAP: The Geometry of Counterfactual Interaction Explanations in Machine Learning

Authors: Adam Belahcen, St\'ephane Mussard

Abstract: We introduce Aumann-SHAP, an interaction-aware framework that decomposes counterfactual transitions by restricting the model to a local hypercube connecting baseline and counterfactual features. Each hyper-cube is decomposed into a grid in order to construct an induced micro-player cooperative game in which elementary grid-step moves become players. Shapley and LES values on this TU-micro-game yield: (i) within-pot contribution of each feature to the interaction with other features (interaction explainability), and (ii) the contribution of each instance and each feature to the counterfactual analysis (individual and global explainability). In particular, Aumann-LES values produce individual and global explanations along the counterfactual transition. Shapley and LES values converge to the diagonal Aumann-Shapley (integrated-gradients) attribution method. Experiments on the German Credit dataset and MNIST data show that Aumann-LES produces robust results and better explanations than the standard Shapley value during the counterfactual transition.

new Benchmarking Open-Source PPG Foundation Models for Biological Age Prediction

Authors: N. Brag

Abstract: A task-specific model trained on 212,231 UK Biobank subjects to predict vascular age from PPG (AI-PPG Age) fails on a different clinical population: predictions collapse to a narrow 38-67 year range regardless of true age. Meanwhile, a general-purpose foundation model with no age-related training objective achieves lower error on the same data. We investigate why this happens and what it means for PPG-based biological age prediction. We evaluate three open-source PPG models (Pulse-PPG, PaPaGei-S, AI-PPG Age) on 906 surgical patients from PulseDB, using frozen embeddings with Ridge regression and 5-fold cross-validation. Pulse-PPG reaches MAE = 9.28 years, beating both AI-PPG Age in linear probe mode (9.72) and HR/HRV combined with demographics (9.59). Adding demographic features brings the best result down to MAE = 8.22 years (R2 = 0.517, r = 0.725). The predicted age gap correlates with diastolic blood pressure after adjusting for chronological age (r = -0.188, p = 1.2e-8), consistent with what Apple reported for their proprietary PpgAge model. The remaining gap with Apple (MAE 2.43) appears driven by dataset size (906 vs 213,593 subjects) and population differences rather than model architecture, as our learning curve shows no plateau. Code is publicly available.

new Gated Graph Attention Networks for Predicting Duration of Large Scale Power Outages Induced by Natural Disasters

Authors: Chenghao Duan, Chuanyi Ji, Anwar Walid, Scott Ganz

Abstract: The occurrence of large-scale power outages induced by natural disasters has been on the rise in a changing climate. Such power outages often last extended durations, causing substantial financial losses and socioeconomic impacts to customers. Accurate estimation of outage duration is thus critical for enhancing the resilience of energy infrastructure under severe weather. We formulate such a task as a machine learning (ML) problem with focus on unique real-world challenges: high-order spatial dependency in the data, a moderate number of large-scale outage events, heterogeneous types of such events, and different impacts in a region within each event. To address these challenges, we develop a Bimodal Gated Graph Attention Network (BiGGAT), a graph-based neural network model, that integrates a Graph Attention Network (GAT) with a Gated Recurrent Unit (GRU) to capture the complex spatial characteristics. We evaluate the approach in a setting of inductive learning, using large-scale power outage data from six major hurricanes in the Southeastern United States. Experimental results demonstrate that BiGGAT achieves a superior performance compared to benchmark models.

new Enhancing Mental Health Classification with Layer-Attentive Residuals and Contrastive Feature Learning

Authors: Menna Elgabry, Ali Hamdi, Khaled Shaban

Abstract: The classification of mental health is challenging for a variety of reasons. For one, there is overlap between the mental health issues. In addition, the signs of mental health issues depend on the context of the situation, making classification difficult. Although fine-tuning transformers has improved the performance for mental health classification, standard cross-entropy training tends to create entangled feature spaces and fails to utilize all the information the transformers contain. We present a new framework that focuses on representations to improve mental health classification. This is done using two methods. First, \textbf{layer-attentive residual aggregation} which works on residual connections to to weigh and fuse representations from all transformer layers while maintaining high-level semantics. Second, \textbf{supervised contrastive feature learning} uses temperature-scaled supervised contrastive learning with progressive weighting to increase the geometric margin between confusable mental health problems and decrease class overlap by restructuring the feature space. With a score of \textbf{74.36\%}, the proposed method is the best performing on the SWMH benchmark and outperforms models that are domain-specialized, such as \textit{MentalBERT} and \textit{MentalRoBERTa} by margins of (3.25\% - 2.2\%) and 2.41 recall points over the highest achieving model. These findings show that domain-adaptive pretraining for mental health text classification can be surpassed by carefully designed representation geometry and layer-aware residual integration, which also provide enhanced interpretability through learnt layer importance.

new Bootstrapped Physically-Primed Neural Networks for Robust T2 Distribution Estimation in Low-SNR Pancreatic MRI

Authors: Hadas Ben Atya, Nicole Abramenkov, Noa Mashiah, Luise Brock, Daphna Link Sourani, Ram Weiss, Moti Freiman

Abstract: Estimating multi-component T2 relaxation distributions from Multi-Echo Spin Echo (MESE) MRI is a severely ill-posed inverse problem, traditionally solved using regularized non-negative least squares (NNLS). In abdominal imaging, particularly the pancreas, low SNR and residual uncorrelated noise challenge classical solvers and deterministic deep learning models. We introduce a bootstrap-based inference framework for robust distributional T2 estimation that performs stochastic resampling of the echo train and aggregates predictions across multiple subsets. This treats the acquisition as a distribution rather than a fixed input, yielding variance-reduced, physically consistent estimates and converting deterministic relaxometry networks into probabilistic ensemble predictors. Applied to the P2T2 architecture, our method uses inference-time bootstrapping to smooth noise artifacts and enhance fidelity to the underlying relaxation distribution. Noninvasive pancreatic evaluation is limited by location and biopsy risks, highlighting the need for biomarkers capable of capturing early pathophysiological changes. In type 1 diabetes (T1DM), progressive beta-cell destruction begins years before overt hyperglycemia, yet current imaging cannot assess early islet decline. We evaluate clinical utility via a test-retest reproducibility study (N=7) and a T1DM versus healthy differentiation task (N=8). Our approach achieves the lowest Wasserstein distances across repeated scans and superior sensitivity to physiology-driven shifts in the relaxation-time distribution, outperforming NNLS and deterministic deep learning baselines. These results establish inference-time bootstrapping as an effective enhancement for quantitative T2 relaxometry in low-SNR abdominal imaging.

new Understanding the Emergence of Seemingly Useless Features in Next-Token Predictors

Authors: Mark Rofin, Jalal Naghiyev, Michael Hahn

Abstract: Trained Transformers have been shown to compute abstract features that appear redundant for predicting the immediate next token. We identify which components of the gradient signal from the next-token prediction objective give rise to this phenomenon, and we propose a method to estimate the influence of those components on the emergence of specific features. After validating our approach on toy tasks, we use it to interpret the origins of the world model in OthelloGPT and syntactic features in a small language model. Finally, we apply our framework to a pretrained LLM, showing that features with extremely high or low influence on future tokens tend to be related to formal reasoning domains such as code. Overall, our work takes a step toward understanding hidden features of Transformers through the lens of their development during training.

new Soft Mean Expected Calibration Error (SMECE): A Calibration Metric for Probabilistic Labels

Authors: Michael Leznik

Abstract: The Expected Calibration Error (ece), the dominant calibration metric in machine learning, compares predicted probabilities against empirical frequencies of binary outcomes. This is appropriate when labels are binary events. However, many modern settings produce labels that are themselves probabilities rather than binary outcomes: a radiologist's stated confidence, a teacher model's soft output in knowledge distillation, a class posterior derived from a generative model, or an annotator agreement fraction. In these settings, ece commits a category error - it discards the probabilistic information in the label by forcing it into a binary comparison. The result is not a noisy approximation that more data will correct. It is a structural misalignment that persists and converges to the wrong answer with increasing precision as sample size grows. We introduce the Soft Mean Expected Calibration Error (smece), a calibration metric for settings where labels are of probabilistic nature. The modification to the ece formula is one line: replace the empirical hard-label fraction in each prediction bin with the mean probability label of the samples in that bin. smece reduces exactly to ece when labels are binary, making it a strict generalisation.

new Not All Latent Spaces Are Flat: Hyperbolic Concept Control

Authors: Maria Rosaria Briglia, Simone Facchiano, Paolo Cursi, Alessio Sampieri, Emanuele Rodol\`a, Guido Maria D'Amely di Melendugno, Luca Franco, Fabio Galasso, Iacopo Masi

Abstract: As modern text-to-image (T2I) models draw closer to synthesizing highly realistic content, the threat of unsafe content generation grows, and it becomes paramount to exercise control. Existing approaches steer these models by applying Euclidean adjustments to text embeddings, redirecting the generation away from unsafe concepts. In this work, we introduce hyperbolic control (HyCon): a novel control mechanism based on parallel transport that leverages semantically aligned hyperbolic representation space to yield more expressive and stable manipulation of concepts. HyCon reuses off-the-shelf generative models and a state-of-the-art hyperbolic text encoder, linked via a lightweight adapter. HyCon achieves state-of-the-art results across four safety benchmarks and four T2I backbones, showing that hyperbolic steering is a practical and flexible approach for more reliable T2I generation.

new Concisely Explaining the Doubt: Minimum-Size Abductive Explanations for Linear Models with a Reject Option

Authors: Gleilson Pedro Fernandes, Thiago Alves Rocha

Abstract: Trustworthiness in artificial intelligence depends not only on what a model decides, but also on how it handles and explains cases in which a reliable decision cannot be made. In critical domains such as healthcare and finance, a reject option allows the model to abstain when evidence is insufficient, making it essential to explain why an instance is rejected in order to support informed human intervention. In these settings, explanations must not only be interpretable, but also faithful to the underlying model and computationally efficient enough to support real-time decision making. Abductive explanations guarantee fidelity, but their exact computation is known to be NP-hard for many classes of models, limiting their practical applicability. Computing \textbf{minimum-size} abductive explanations is an even more challenging problem, as it requires reasoning not only about fidelity but also about optimality. Prior work has addressed this challenge in restricted settings, including log-linear-time algorithms for computing minimum-size abductive explanations in linear models without rejection, as well as a polynomial-time method based on linear programming for computing abductive explanations, without guarantees of minimum size, for linear models with a reject option. In this work, we bridge these lines of research by computing minimum-size abductive explanations for linear models with a reject option. For accepted instances, we adapt the log-linear algorithm to efficiently compute optimal explanations. For rejected instances, we formulate a 0-1 integer linear programming problem that characterizes minimum-size abductive explanations of rejection. Although this formulation is NP-hard in theory, our experimental results show that it is consistently more efficient in practice than the linear-programming-based approach that does not guarantee minimum-size explanations.

new ST-ResGAT: Explainable Spatio-Temporal Graph Neural Network for Road Condition Prediction and Priority-Driven Maintenance

Authors: Mohsin Mahmud Topu, Azmine Toushik Wasi, Mahfuz Ahmed Anik, MD Manjurul Ahsan

Abstract: Climate-vulnerable road networks require a paradigm shift from reactive, fix-on-failure repairs to predictive, decision-ready maintenance. This paper introduces ST-ResGAT, a novel Spatio-Temporal Residual Graph Attention Network that fuses residual graph-attention encoding with GRU temporal aggregation to forecast pavement deterioration. Engineered for resource-constrained deployment, the framework translates continuous Pavement Condition Index (PCI) forecasts directly into the American Society for Testing and Materials (ASTM)-compliant maintenance priorities. Using a real-world inspection dataset of 750 segments in Sylhet, Bangladesh (2021-2024), ST-ResGAT significantly outperforms traditional non-spatial machine learning baselines, achieving exceptional predictive fidelity (R2 = 0.93, RMSE = 2.72). Crucially, ablation testing confirmed the mathematical necessity of modeling topological neighbor effects, proving that structural decay acts as a spatial contagion. Uniquely, we integrate GNNExplainer to unbox the model, demonstrating that its learned priorities align perfectly with established physical engineering theory. Furthermore, we quantify classification safety: achieving 85.5% exact ASTM class agreement and 100% adjacent-class containment, ensuring bounded, engineer-safe predictions. To connect model outputs to policy, we generate localized longitudinal maintenance profiles, perform climate stress-testing, and derive Pareto sustainability frontiers. ST-ResGAT therefore offers a practical, explainable, and sustainable blueprint for intelligent infrastructure management in high-risk, low-resource geological settings.

new SVD Contextual Sparsity Predictors for Fast LLM Inference

Authors: Georgii Serbin, Kirill Koshkin, Zhongao Sun, Anastasiya Bistrigova, C. C. Korikov

Abstract: Contextual sparsity is one of the approaches used to reduce computational complexity in the inference process of large language models (LLMs). Existing techniques for efficient LLM inference acceleration based on contextual sparsity with minimal accuracy degradation require training sparse pattern predictors. This paper presents a framework for accelerating inference of ReGLU-based feed-forward networks (FFNs) within LLMs. The proposed framework provides a fast, training-free method for building sparse pattern predictors using truncation-aware singular value decomposition (SVD) of the gate projection matrix, along with a threshold calibration algorithm, and inference executors supporting conditional computation on CUDA and CANN devices. Experiments on three sparse LLMs with an average activation sparsity level of 90% in the FFNs demonstrate up to a 1.8x reduction in end-to-end decoding time while maintaining less than 1% degradation in benchmark scores on tasks involving complex math and code generation. This work advances the deployment of LLMs on edge devices.

new Is the reconstruction loss culprit? An attempt to outperform JEPA

Authors: Alexey Potapov, Oleg Shcherbakov, Ivan Kravchenko

Abstract: We evaluate JEPA-style predictive representation learning versus reconstruction-based autoencoders on a controlled "TV-series" linear dynamical system with known latent state and a single noise parameter. While an initial comparison suggests JEPA is markedly more robust to noise, further diagnostics show that autoencoder failures are strongly influenced by asymmetries in objectives and by bottleneck/component-selection effects (confirmed by PCA baselines). Motivated by these findings, we introduce gated predictive autoencoders that learn to select predictable components, mimicking the beneficial feature-selection behavior observed in over-parameterized PCA. On this toy testbed, the proposed gated model is stable across noise levels and matches or outperforms JEPA.

new Multifidelity Surrogate Modeling of Depressurized Loss of Forced Cooling in High-temperature Gas Reactors

Authors: Meredith Eaheart, Majdi I. Radaideh

Abstract: High-fidelity computational fluid dynamics (CFD) simulations are widely used to analyze nuclear reactor transients, but are computationally expensive when exploring large parameter spaces. Multifidelity surrogate models offer an approach to reduce cost by combining information from simulations of varying resolution. In this work, several multifidelity machine learning methods were evaluated for predicting the time to onset of natural circulation (ONC) and the temperature after ONC for a high-temperature gas reactor (HTGR) depressurized loss of forced cooling transient. A CFD model was developed in Ansys Fluent to generate 1000 simulation samples at each fidelity level, with low and medium-fidelity datasets produced by systematically coarsening the high-fidelity mesh. Multiple surrogate approaches were investigated, including multifidelity Gaussian processes and several neural network architectures, and validated on analytical benchmark functions before application to the ONC dataset. The results show that performance depends strongly on the informativeness of the input variables and the relationship between fidelity levels. Models trained using dominant inputs identified through prior sensitivity analysis consistently outperformed models trained on the full input set. The low- and high-fidelity pairing produced stronger performance than configurations involving medium-fidelity data, and two-fidelity configurations generally matched or exceeded three-fidelity counterparts at equivalent computational cost. Among the methods evaluated, multifidelity GP provided the most robust performance across input configurations, achieving excellent metrics for both time to ONC and temperature after ONC, while neural network approaches achieved comparable accuracy with substantially lower training times.

new Align Forward, Adapt Backward: Closing the Discretization Gap in Logic Gate Networks

Authors: Youngsung Kim

Abstract: In neural network models, soft mixtures of fixed candidate components (e.g., logic gates and sub-networks) are often used during training for stable optimization, while hard selection is typically used at inference. This raises questions about training-inference mismatch. We analyze this gap by separating forward-pass computation (hard selection vs. soft mixture) from stochasticity (with vs. without Gumbel noise). Using logic gate networks as a testbed, we observe distinct behaviors across four methods: Hard-ST achieves zero selection gap by construction; Gumbel-ST achieves near-zero gap when training succeeds but suffers accuracy collapse at low temperatures; Soft-Mix achieves small gap only at low temperature via weight concentration; and Soft-Gumbel exhibits large gaps despite Gumbel noise, confirming that noise alone does not reduce the gap. We propose CAGE (Confidence-Adaptive Gradient Estimation) to maintain gradient flow while preserving forward alignment. On logic gate networks, Hard-ST with CAGE achieves over 98% accuracy on MNIST and over 58% on CIFAR-10, both with zero selection gap across all temperatures, while Gumbel-ST without CAGE suffers a 47-point accuracy collapse.

new Deep probabilistic model synthesis enables unified modeling of whole-brain neural activity across individual subjects

Authors: William E. Bishop, Luuk W. Hesselink, Bernhard Englitz, Misha B. Ahrens, James E. Fitzgerald

Abstract: Many disciplines need quantitative models that synthesize experimental data across multiple instances of the same general system. For example, neuroscientists must combine data from the brains of many individual animals to understand the species' brain in general. However, typical machine learning models treat one system instance at a time. Here we introduce a machine learning framework, deep probabilistic model synthesis (DPMS), that leverages system properties auxiliary to the model to combine data across system instances. DPMS specifically uses variational inference to learn a conditional prior distribution and instance-specific posterior distributions over model parameters that respectively tie together the system instances and capture their unique structure. DPMS can synthesize a wide variety of model classes, such as those for regression, classification, and dimensionality reduction, and we demonstrate its ability to improve upon single-instance models on synthetic data and whole-brain neural activity data from larval zebrafish.

new TACTIC for Navigating the Unknown: Tabular Anomaly deteCTion via In-Context inference

Authors: Patryk Marsza{\l}ek, Tomasz Ku\'smierczyk, Marek \'Smieja

Abstract: Anomaly detection for tabular data has been a long-standing unsupervised learning problem that remains a major challenge for current deep learning models. Recently, in-context learning has emerged as a new paradigm that has shifted efforts from task-specific optimization to large-scale pretraining aimed at creating foundation models that generalize across diverse datasets. Although in-context models, such as TabPFN, perform well in supervised problems, their learned classification-based priors may not readily extend to anomaly detection. In this paper, we study in-context models for anomaly detection and show that the unsupervised extensions to TabPFN exhibit unstable behavior, particularly in noisy or contaminated contexts, in addition to the high computational cost. We address these challenges and introduce TACTIC, an in-context anomaly detection approach based on pretraining with anomaly-centric synthetic priors, which provides fast and data-dependent reasoning about anomalies while avoiding dataset-specific tuning. In contrast to typical score-based approaches, which produce uncalibrated anomaly scores that require post-processing (e.g. threshold selection or ranking heuristics), the proposed model is trained as a discriminative predictor, enabling unambiguous anomaly decisions in a single forward pass. Through experiments on real-world datasets, we examine the performance of TACTIC in clean and noisy contexts with varying anomaly rates and different anomaly types, as well as the impact of prior choices on detection quality. Our experiments clearly show that specialized anomaly-centric in-context models such as TACTIC are highly competitive compared to other task-specific methods.

new Hybrid Intent-Aware Personalization with Machine Learning and RAG-Enabled Large Language Models for Financial Services Marketing

Authors: Akhil Chandra Shanivendra

Abstract: Personalized marketing in financial services requires models that can both predict customer behavior and generate compliant, context-appropriate content. This paper presents a hybrid architecture that integrates classical machine learning for segmentation, latent intent modeling, and personalization prediction with retrieval-augmented large language models for grounded content generation. A synthetic, reproducible dataset is constructed to reflect temporal customer behavior, product interactions, and marketing responses. The proposed framework incorporates temporal encoders, latent representations, and multi-task classification to estimate segment membership, customer intent, and product-channel recommendations. A retrieval-augmented generation layer then produces customer-facing messages constrained by retrieved domain documents. Experiments show that temporal modeling and intent features improve personalization accuracy, while citation-based retrieval reduces unsupported generation and supports auditability in regulated settings. The contribution is primarily architectural, demonstrating how predictive modeling and RAG-based generation can be combined into a transparent, explainable pipeline for financial services personalization.

new Balancing Multimodal Domain Generalization via Gradient Modulation and Projection

Authors: Hongzhao Li, Guohao Shen, Shupan Li, Mingliang Xu, Muhammad Haris Khan

Abstract: Multimodal Domain Generalization (MMDG) leverages the complementary strengths of multiple modalities to enhance model generalization on unseen domains. A central challenge in multimodal learning is optimization imbalance, where modalities converge at different speeds during training. This imbalance leads to unequal gradient contributions, allowing some modalities to dominate the learning process while others lag behind. Existing balancing strategies typically regulate each modality's gradient contribution based on its classification performance on the source domain to alleviate this issue. However, relying solely on source-domain accuracy neglects a key insight in MMDG: modalities that excel on the source domain may generalize poorly to unseen domains, limiting cross-domain gains. To overcome this limitation, we propose Gradient Modulation Projection (GMP), a unified strategy that promotes balanced optimization in MMDG. GMP first decouples gradients associated with classification and domain-invariance objectives. It then modulates each modality's gradient based on semantic and domain confidence. Moreover, GMP dynamically adjusts gradient projections by tracking the relative strength of each task, mitigating conflicts between classification and domain-invariant learning within modality-specific encoders. Extensive experiments demonstrate that GMP achieves state-of-the-art performance and integrates flexibly with diverse MMDG methods, significantly improving generalization across multiple benchmarks.

new Artificial intelligence-enabled single-lead ECG for non-invasive hyperkalemia detection: development, multicenter validation, and proof-of-concept deployment

Authors: Gongzheng Tang, Qinghao Zhao, Guangkun Nie, Yujie Xiao, Shijia Geng, Donglin Xie, Shun Huang, Deyun Zhang, Xingchen Yao, Jinwei Wang, Kangyin Chen, Luxia Zhang, Shenda Hong

Abstract: Hyperkalemia is a life-threatening electrolyte disorder that is common in patients with chronic kidney disease and heart failure, yet frequent monitoring remains difficult outside hospital settings. We developed and validated Pocket-K, a single-lead AI-ECG system initialized from the ECGFounder foundation model for non-invasive hyperkalemia screening and handheld deployment. In this multicentre observational study using routinely collected clinical ECG and laboratory data, 34,439 patients contributed 62,290 ECG--potassium pairs. Lead I data were used to fine-tune the model. Data from Peking University People's Hospital were divided into development and temporal validation sets, and data from The Second Hospital of Tianjin Medical University served as an independent external validation set. Hyperkalemia was defined as venous serum potassium > 5.5 mmol/L. Pocket-K achieved AUROCs of 0.936 in internal testing, 0.858 in temporal validation, and 0.808 in external validation. For KDIGO-defined moderate-to-severe hyperkalemia (serum potassium >= 6.0 mmol/L), AUROCs increased to 0.940 and 0.861 in the temporal and external sets, respectively. External negative predictive value exceeded 99.3%. Model-predicted high risk below the hyperkalemia threshold was more common in patients with chronic kidney disease and heart failure. A handheld prototype enabled near-real-time inference, supporting future prospective evaluation in native handheld and wearable settings.

new Efficient Federated Conformal Prediction with Group-Conditional Guarantees

Authors: Haifeng Wen, Osvaldo Simeone, Hong Xing

Abstract: Deploying trustworthy AI systems requires principled uncertainty quantification. Conformal prediction (CP) is a widely used framework for constructing prediction sets with distribution-free coverage guarantees. In many practical settings, including healthcare, finance, and mobile sensing, the calibration data required for CP are distributed across multiple clients, each with its own local data distribution. In this federated setting, data can often be partitioned into, potentially overlapping, groups, which may reflect client-specific strata or cross-cutting attributes such as demographic or semantic categories. We propose group-conditional federated conformal prediction (GC-FCP), a novel protocol that provides group-conditional coverage guarantees. GC-FCP constructs mergeable, group-stratified coresets from local calibration scores, enabling clients to communicate compact weighted summaries that support efficient aggregation and calibration at the server. Experiments on synthetic and real-world datasets validate the performance of GC-FCP compared to centralized calibration baselines.

new Interleaved Resampling and Refitting: Data and Compute-Efficient Evaluation of Black-Box Predictors

Authors: Haichen Hu, David Simchi-Levi

Abstract: We study the problem of evaluating the excess risk of large-scale empirical risk minimization under the square loss. Leveraging the idea of wild refitting and resampling, we assume only black-box access to the training algorithm and develop an efficient procedure for estimating the excess risk. Our evaluation algorithm is both computationally and data efficient. In particular, it requires access to only a single dataset and does not rely on any additional validation data. Computationally, it only requires refitting the model on several much smaller datasets obtained through sequential resampling, in contrast to previous wild refitting methods that require full-scale retraining and might therefore be unsuitable for large-scale trained predictors. Our algorithm has an interleaved sequential resampling-and-refitting structure. We first construct pseudo-responses through a randomized residual symmetrization procedure. At each round, we thus resample two sub-datasets from the resulting covariate pseudo-response pairs. Finally, we retrain the model separately on these two small artificial datasets. We establish high probability excess risk guarantees under both fixed design and random design settings, showing that with a suitably chosen noise scale, our interleaved resampling and refitting algorithm yields an upper bound on the prediction error. Our theoretical analysis draws on tools from empirical process theory, harmonic analysis, Toeplitz operator theory, and sharp tensor concentration inequalities.

new Self-Indexing KVCache: Predicting Sparse Attention from Compressed Keys

Authors: Xu Yang, Jiapeng Zhang, Dongyang Zhao, Guo Chen, Zhuo Tang

Abstract: The KV cache in self-attention has emerged as a major bottleneck in long-context and large-batch inference for LLMs. Existing approaches often treat sparsity prediction and compression as separate modules, relying on auxiliary index structures to select relevant tokens, and on complex quantization schemes to reduce memory usage. This fragmented design introduces redundant overhead and limits scalability. In this paper, we propose a novel paradigm: treating the compressed key representation not merely as storage, but as a self-indexing structure that directly enables efficient sparse attention. By designing a sign-based 1-bit vector quantization (VQ) scheme, our method unifies compression and retrieval in a single, hardware-friendly format. This approach eliminates the need for external indices or learning-based predictors, offering a lightweight yet robust solution for memory-constrained inference. All components are designed to be hardware-efficient and easy to implement. By implementing custom CUDA kernels, our method integrates seamlessly with FlashAttention, minimizing additional runtime and memory overhead. Experimental results demonstrate that our approach delivers both effectiveness and efficiency.

new Domain-Skewed Federated Learning with Feature Decoupling and Calibration

Authors: Huan Wang, Jun Shen, Jun Yan, Guansong Pang

Abstract: Federated learning (FL) allows distributed clients to collaboratively train a global model in a privacy-preserving manner. However, one major challenge is domain skew, where clients' data originating from diverse domains may hinder the aggregated global model from learning a consistent representation space, resulting in poor generalizable ability in multiple domains. In this paper, we argue that the domain skew is reflected in the domain-specific biased features of each client, causing the local model's representations to collapse into a narrow low-dimensional subspace. We then propose Federated Feature Decoupling and Calibration ($F^2$DC), which liberates valuable class-relevant information by calibrating the domain-specific biased features, enabling more consistent representations across domains. A novel component, Domain Feature Decoupler (DFD), is first introduced in $F^2$DC to determine the robustness of each feature unit, thereby separating the local features into domain-robust features and domain-related features. A Domain Feature Corrector (DFC) is further proposed to calibrate these domain-related features by explicitly linking discriminative signals, capturing additional class-relevant clues that complement the domain-robust features. Finally, a domain-aware aggregation of the local models is performed to promote consensus among clients. Empirical results on three popular multi-domain datasets demonstrate the effectiveness of the proposed $F^2$DC and the contributions of its two modules. Code is available at https://github.com/mala-lab/F2DC.

URLs: https://github.com/mala-lab/F2DC.

new GoldenStart: Q-Guided Priors and Entropy Control for Distilling Flow Policies

Authors: He Zhang, Ying Sun, Hui Xiong

Abstract: Flow-matching policies hold great promise for reinforcement learning (RL) by capturing complex, multi-modal action distributions. However, their practical application is often hindered by prohibitive inference latency and ineffective online exploration. Although recent works have employed one-step distillation for fast inference, the structure of the initial noise distribution remains an overlooked factor that presents significant untapped potential. This overlooked factor, along with the challenge of controlling policy stochasticity, constitutes two critical areas for advancing distilled flow-matching policies. To overcome these limitations, we propose GoldenStart (GSFlow), a policy distillation method with Q-guided priors and explicit entropy control. Instead of initializing generation from uninformed noise, we introduce a Q-guided prior modeled by a conditional VAE. This state-conditioned prior repositions the starting points of the one-step generation process into high-Q regions, effectively providing a "golden start" that shortcuts the policy to promising actions. Furthermore, for effective online exploration, we enable our distilled actor to output a stochastic distribution instead of a deterministic point. This is governed by entropy regularization, allowing the policy to shift from pure exploitation to principled exploration. Our integrated framework demonstrates that by designing the generative startpoint and explicitly controlling policy entropy, it is possible to achieve efficient and exploratory policies, bridging the generative models and the practical actor-critic methods. We conduct extensive experiments on offline and online continuous control benchmarks, where our method significantly outperforms prior state-of-the-art approaches. Code will be available at https://github.com/ZhHe11/GSFlow-RL.

URLs: https://github.com/ZhHe11/GSFlow-RL.

new Sampling Boltzmann distributions via normalizing flow approximation of transport maps

Authors: Zia Ur Rehman, Gero Friesecke

Abstract: In a celebrated paper \cite{noe2019boltzmann}, No\'e, Olsson, K\"ohler and Wu introduced an efficient method for sampling high-dimensional Boltzmann distributions arising in molecular dynamics via normalizing flow approximation of transport maps. Here, we place this approach on a firm mathematical foundation. We prove the existence of a normalizing flow between the reference measure and the true Boltzmann distribution up to an arbitrarily small error in the Wasserstein distance. This result covers general Boltzmann distributions from molecular dynamics, which have low regularity due to the presence of interatomic Coulomb and Lennard-Jones interactions. The proof is based on a rigorous construction of the Moser transport map for low-regularity endpoint densities and approximation theorems for neural networks in Sobolev spaces. Numerical simulations for a simple model system and for the alanine dipeptide molecule confirm that the true and generated distributions are close in the Wasserstein distance. Moreover we observe that the RealNVP architecture does not just successfully capture the equilibrium Boltzmann distribution but also the metastable dynamics.

new Learning in Function Spaces: An Unified Functional Analytic View of Supervised and Unsupervised Learning

Authors: K. Lakshmanan

Abstract: Many machine learning algorithms can be interpreted as procedures for estimating functions defined on the data distribution. In this paper we present a conceptual framework that formulates a wide range of learning problems as variational optimization over function spaces induced by the data distribution. Within this framework the data distribution defines operators that capture structural properties of the data, such as similarity relations or statistical dependencies. Learning algorithms can then be viewed as estimating functions expressed in bases determined by these operators. This perspective provides a unified way to interpret several learning paradigms. In supervised learning the objective functional is defined using labeled data and typically corresponds to minimizing prediction risk, whereas unsupervised learning relies on structural properties of the input distribution and leads to objectives based on similarity or smoothness constraints. From this viewpoint, the distinction between learning paradigms arises primarily from the choice of the functional being optimized rather than from the underlying function space. We illustrate this framework by discussing connections with kernel methods, spectral clustering, and manifold learning, highlighting how operators induced by data distributions naturally define function representations used by learning algorithms. The goal of this work is not to introduce a new algorithm but to provide a conceptual framework that clarifies the role of function spaces and operators in modern machine learning.

new High-Fidelity Compression of Seismic Velocity Models via SIREN Auto-Decoders

Authors: Caiyun Liu, Xiaoxue Luo, Jie Xiong

Abstract: Implicit Neural Representations (INRs) have emerged as a powerful paradigm for representing continuous signals independently of grid resolution. In this paper, we propose a high-fidelity neural compression framework based on a SIREN (Sinusoidal Representation Networks) auto-decoder to represent multi-structural seismic velocity models from the OpenFWI benchmark. Our method compresses each 70x70 velocity map (4,900 points) into a compact 256-dimensional latent vector, achieving a compression ratio of 19:1. We evaluate the framework on 1,000 samples across five diverse geological families: FlatVel, CurveVel, FlatFault, CurveFault, and Style. Experimental results demonstrate an average PSNR of 32.47 dB and SSIM of 0.956, indicating high-quality reconstruction. Furthermore, we showcase two key advantages of our implicit representation: (1) smooth latent space interpolation that generates plausible intermediate velocity structures, and (2) zero-shot super-resolution capability that reconstructs velocity fields at arbitrary resolutions up to 280x280 without additional training. The results highlight the potential of INR-based auto-decoders for efficient storage, multi-scale analysis, and downstream geophysical applications such as full waveform inversion.

new Windowed Fourier Propagator: A Frequency-Local Neural Operator for Wave Equations in Inhomogeneous Media

Authors: Yiyang Cai, Zixuan Qiu, Yunlu Shu, Jiamao Wu, Yingzhou Li, Tianyu Wang, Xi Chen

Abstract: Wave equations are fundamental to describing a vast array of physical phenomena, yet their simulation in inhomogeneous media poses a computational challenge due to the highly oscillatory nature of the solutions. To overcome the high costs of traditional solvers, we propose the Windowed Fourier Propagator (WFP), a novel neural operator that efficiently learns the solution operator. The WFP's design is rooted in the physical principle of frequency locality, where wave energy scatters primarily to adjacent frequencies. By learning a set of compact, localized propagators, each mapping an input frequency to a small window of outputs, our method avoids the complexity of dense interaction models and achieves computational efficiency. Another key feature is the explicit preservation of superposition, which enables remarkable generalization from simple training data (e.g., plane waves) to arbitrary, complex wave states. We demonstrate that the WFP provides an explainable, efficient and accurate framework for data-driven wave modeling in complex media.

new Enhancing LLM Training via Spectral Clipping

Authors: Xiaowen Jiang, Andrei Semenov, Sebastian U. Stich

Abstract: While spectral-based optimizers like Muon operate directly on the spectrum of updates, standard adaptive methods such as AdamW do not account for the global spectral structure of weights and gradients, leaving them vulnerable to two empirical issues in large language model (LLM) training: (i) the optimizer updates can have large spectral norms, potentially destabilizing training and degrading generalization; (ii) stochastic gradient noise can exhibit sparse spectral spikes, with a few dominant singular values much larger than the rest. We propose SPECTRA, a general framework addressing these by (i) post-spectral clipping of updates to enforce spectral-norm constraints; (ii) optional pre-spectral clipping of gradients to suppress spectral noise spikes. We prove that post-clipping constitutes a Composite Frank-Wolfe method with spectral-norm constraints and weight regularization, recovering Frobenius and $\ell_{\infty}$-norm regularization with SGD-based and sign-based methods. We further analyze how pre-clipping mitigates sparse spectral spikes. We propose efficient soft spectral clipping via Newton-Schulz iterations, avoiding expensive SVD. Experiments on LLM pretraining show SPECTRA uniformly improves validation loss for various optimizers, including AdamW, Signum, and AdEMAMix, with the best-performing variants achieving state-of-the-art results. Models trained with SPECTRA exhibit smaller weight norms, confirming the link between spectral clipping and regularization.

new Structure-Dependent Regret and Constraint Violation Bounds for Online Convex Optimization with Time-Varying Constraints

Authors: Xiufeng Liu, Qian Chen, Zhijin Wang, Ruyu Liu

Abstract: Online convex optimization (OCO) with time-varying constraints is a critical framework for sequential decision-making in dynamic networked systems, where learners must minimize cumulative loss while satisfying regions of feasibility that shift across rounds. Existing theoretical analyses typically treat constraint variation as a monolithic adversarial process, resulting in joint regret and violation bounds that are overly conservative for real-world network dynamics. In this paper, we introduce a structured characterization of constraint variation - smooth drift, periodic cycles, and sparse switching - mapping these classes to common network phenomena such as slow channel fading, diurnal traffic patterns, and discrete maintenance windows. We derive structure-dependent joint bounds that strictly improve upon adversarial rates when the constraint process exhibits regularity. To realize these gains, we propose the Structure-Adaptive Primal-Dual (SA-PD) algorithm, which utilizes observable constraint signals to detect environmental structure online and adapt dual update strategies accordingly. Extensive experiments on synthetic benchmarks and real-world datasets - including online electricity scheduling and transformer load management - demonstrate that SA-PD reduces cumulative constraint violation by up to 53% relative to structure-agnostic baselines while maintaining competitive utility. This work serves as a comprehensive guide for exploiting temporal regularity in constrained online learning for robust network engineering.

new ECG-Reasoning-Benchmark: A Benchmark for Evaluating Clinical Reasoning Capabilities in ECG Interpretation

Authors: Jungwoo Oh, Hyunseung Chung, Junhee Lee, Min-Gyu Kim, Hangyul Yoon, Ki Seong Lee, Youngchae Lee, Muhan Yeo, Edward Choi

Abstract: While Multimodal Large Language Models (MLLMs) show promising performance in automated electrocardiogram interpretation, it remains unclear whether they genuinely perform actual step-by-step reasoning or just rely on superficial visual cues. To investigate this, we introduce \textbf{ECG-Reasoning-Benchmark}, a novel multi-turn evaluation framework comprising over 6,400 samples to systematically assess step-by-step reasoning across 17 core ECG diagnoses. Our comprehensive evaluation of state-of-the-art models reveals a critical failure in executing multi-step logical deduction. Although models possess the medical knowledge to retrieve clinical criteria for a diagnosis, they exhibit near-zero success rates (6% Completion) in maintaining a complete reasoning chain, primarily failing to ground the corresponding ECG findings to the actual visual evidence in the ECG signal. These results demonstrate that current MLLMs bypass actual visual interpretation, exposing a critical flaw in existing training paradigms and underscoring the necessity for robust, reasoning-centric medical AI. The code and data are available at https://github.com/Jwoo5/ecg-reasoning-benchmark.

URLs: https://github.com/Jwoo5/ecg-reasoning-benchmark.

new Localizing and Editing Knowledge in Large Audio-Language Models

Authors: Sung Kyun Chung, Jiaheng Dong, Qiuchi Hu, Gongping Huang, Hong Jia, Ting Dang

Abstract: Large Audio-Language Models (LALMs) have shown strong performance in speech understanding, making speech a natural interface for accessing factual information. Yet they are trained on static corpora and may encode incorrect facts. Existing model editing methods localize and update facts in text-only LLMs, but do not account for continuous speech representations, or where knowledge is stored across acoustic or language modules, or their cross-modal module. We construct the first audio benchmark for knowledge localization and editing in LALMs and propose a speech-driven locate-then-edit framework. First, we use speech-aware causal tracing to localize layers and modules that support factual retrieval and then apply editing at identified sites. Experiments show that factual knowledge is jointly encoded in audio and text modules, and that audio editing yields more effective updates than text editing or fine-tuning, enabling fine-grained knowledge control in speech AI systems.

new Refold: Refining Protein Inverse Folding with Efficient Structural Matching and Fusion

Authors: Yiran Zhu, Changxi Chi, Hongxin Xiang, Wenjie Du, Xiaoqi Wang, Jun Xia

Abstract: Protein inverse folding aims to design an amino acid sequence that will fold into a given backbone structure, serving as a central task in protein design. Two main paradigms have been widely explored. Template-based methods exploit database-derived structural priors and can achieve high local precision when close structural neighbors are available, but their dependence on database coverage and match quality often degrades performance on out-of-distribution (OOD) targets. Deep learning approaches, in contrast, learn general structure-to-sequence regularities and usually generalize better to new backbones. However, they struggle to capture fine-grained local structure, which can cause uncertain residue predictions and missed local motifs in ambiguous regions. We introduce Refold, a novel framework that synergistically integrates the strengths of database-derived structural priors and deep learning prediction to enhance inverse folding. Refold obtains structural priors from matched neighbors and fuses them with model predictions to refine residue probabilities. In practice, low-quality neighbors can introduce noise, potentially degrading model performance. We address this issue with a Dynamic Utility Gate that controls prior injection and falls back to the base prediction when the priors are untrustworthy. Comprehensive evaluations on standard benchmarks demonstrate that Refold achieves state-of-the-art native sequence recovery of 0.63 on both CATH 4.2 and CATH 4.3. Also, analysis indicates that Refold delivers larger gains on high-uncertainty regions, reflecting the complementarity between structural priors and deep learning predictions.

new Deconfounded Lifelong Learning for Autonomous Driving via Dynamic Knowledge Spaces

Authors: Jiayuan Du, Yuebing Song, Yiming Zhao, Xianghui Pan, Jiawei Lian, Yuchu Lu, Liuyi Wang, Chengju Liu, Qijun Chen

Abstract: End-to-End autonomous driving (E2E-AD) systems face challenges in lifelong learning, including catastrophic forgetting, difficulty in knowledge transfer across diverse scenarios, and spurious correlations between unobservable confounders and true driving intents. To address these issues, we propose DeLL, a Deconfounded Lifelong Learning framework that integrates a Dirichlet process mixture model (DPMM) with the front-door adjustment mechanism from causal inference. The DPMM is employed to construct two dynamic knowledge spaces: a trajectory knowledge space for clustering explicit driving behaviors and an implicit feature knowledge space for discovering latent driving abilities. Leveraging the non-parametric Bayesian nature of DPMM, our framework enables adaptive expansion and incremental updating of knowledge without predefining the number of clusters, thereby mitigating catastrophic forgetting. Meanwhile, the front-door adjustment mechanism utilizes the DPMM-derived knowledge as valid mediators to deconfound spurious correlations, such as those induced by sensor noise or environmental changes, and enhances the causal expressiveness of the learned representations. Additionally, we introduce an evolutionary trajectory decoder that enables non-autoregressive planning. To evaluate the lifelong learning performance of E2E-AD, we propose new evaluation protocols and metrics based on Bench2Drive. Extensive evaluations in the closed-loop CARLA simulator demonstrate that our framework significantly improves adaptability to new driving scenarios and overall driving performance, while effectively retaining previous acquired knowledge.

new M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling

Authors: Mayank Mishra, Shawn Tan, Ion Stoica, Joseph Gonzalez, Tri Dao

Abstract: Transformers are highly parallel but are limited to computations in the TC$^0$ complexity class, excluding tasks such as entity tracking and code execution that provably require greater expressive power. Motivated by this limitation, we revisit non-linear Recurrent Neural Networks (RNNs) for language modeling and introduce Matrix-to-Matrix RNN (M$^2$RNN): an architecture with matrix-valued hidden states and expressive non-linear state transitions. We demonstrate that the language modeling performance of non-linear RNNs is limited by their state size. We also demonstrate how the state size expansion mechanism enables efficient use of tensor cores. Empirically, M$^2$RNN achieves perfect state tracking generalization at sequence lengths not seen during training. These benefits also translate to large-scale language modeling. In hybrid settings that interleave recurrent layers with attention, Hybrid M$^2$RNN outperforms equivalent Gated DeltaNet hybrids by $0.4$-$0.5$ perplexity points on a 7B MoE model, while using $3\times$ smaller state sizes for the recurrent layers. Notably, replacing even a single recurrent layer with M$^2$RNN in an existing hybrid architecture yields accuracy gains comparable to Hybrid M$^2$RNN with minimal impact on training throughput. Further, the Hybrid Gated DeltaNet models with a single M$^2$RNN layer also achieve superior long-context generalization, outperforming state-of-the-art hybrid linear attention architectures by up to $8$ points on LongBench. Together, these results establish non-linear RNN layers as a compelling building block for efficient and scalable language models.

new From Specification to Architecture: A Theory Compiler for Knowledge-Guided Machine Learning

Authors: Asela Hevapathige, Yu Xia, Sachith Seneviratne, Saman Halgamuge

Abstract: Theory-guided machine learning has demonstrated that including authentic domain knowledge directly into model design improves performance, sample efficiency and out-of-distribution generalisation. Yet the process by which a formal domain theory is translated into architectural constraints remains entirely manual, specific to each domain formalism, and devoid of any formal correctness guarantee. This translation is non-transferable between domains, not verified, and does not scale. We propose the Theory Compiler: a system that accepts a typed, machine-readable domain theory as input and automatically produces an architecture whose function space is provably constrained to be consistent with that theory by construction, not by regularisation. We identify three foundational open problems whose resolution defines our research agenda: (1) designing a universal theory formalisation language with decidable type-checking; (2) constructing a compositionally correct compilation algorithm from theory primitives to architectural modules; and (3) establishing soundness and completeness criteria for formal verification. We further conjecture that compiled architectures match or exceed manually-designed counterparts in generalisation performance while requiring substantially less training data, a claim we ground in classical statistical learning theory. We argue that recent advances in formal machine learning theory, large language models, and the growth of an interdisciplinary research community have made this paradigm achievable for the first time.

new SPARQ: Spiking Early-Exit Neural Networks for Energy-Efficient Edge AI

Authors: Parth Patne, Mahdi Taheri, Ali Mahani, Maksim Jenihhin, Reza Mahani, Christian Herglotz

Abstract: Spiking neural networks (SNNs) offer inherent energy efficiency due to their event-driven computation model, making them promising for edge AI deployment. However, their practical adoption is limited by the computational overhead of deep architectures and the absence of input-adaptive control. This work presents SPARQ, a unified framework that integrates spiking computation, quantization-aware training, and reinforcement learning-guided early exits for efficient and adaptive inference. Evaluations across MLP, LeNet, and AlexNet architectures demonstrated that the proposed Quantised Dynamic SNNs (QDSNN) consistently outperform conventional SNNs and QSNNs, achieving up to 5.15% higher accuracy over QSNNs, over 330 times lower system energy compared to baseline SNNs, and over 90 percent fewer synaptic operations across different datasets. These results validate SPARQ as a hardware-friendly, energy-efficient solution for real-time AI at the edge.

new From $\boldsymbol{\log\pi}$ to $\boldsymbol{\pi}$: Taming Divergence in Soft Clipping via Bilateral Decoupled Decay of Probability Gradient Weight

Authors: Xiaoliang Fu, Jiaye Lin, Yangyi Fang, Chaowen Hu, Cong Qin, Zekai Shao, Binbin Zheng, Lu Pan, Ke Zeng

Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has catalyzed a leap in Large Language Model (LLM) reasoning, yet its optimization dynamics remain fragile. Standard algorithms like GRPO enforce stability via ``hard clipping'', which inadvertently stifles exploration by discarding gradients of tokens outside the trust region. While recent ``soft clipping'' methods attempt to recover these gradients, they suffer from a critical challenge: relying on log-probability gradient ($\nabla_\theta\log \pi_\theta$) yields divergent weights as probabilities vanish, destabilizing LLM training. We rethink this convention by establishing probability gradient ($\nabla_\theta \pi_\theta$) as the superior optimization primitive. Accordingly, we propose Decoupled Gradient Policy Optimization (DGPO), which employs a decoupled decay mechanism based on importance sampling ratios. By applying asymmetric, continuous decay to boundary tokens, DGPO resolves the conflict between stability and sustained exploration. Extensive experiments across DeepSeek-R1-Distill-Qwen series models (1.5B/7B/14B) demonstrate that DGPO consistently outperforms strong baselines on various mathematical benchmarks, offering a robust and scalable solution for RLVR. Our code and implementation are available at: https://github.com/VenomRose-Juri/DGPO-RL.

URLs: https://github.com/VenomRose-Juri/DGPO-RL.

new WestWorld: A Knowledge-Encoded Scalable Trajectory World Model for Diverse Robotic Systems

Authors: Yuchen Wang, Jiangtao Kong, Sizhe Wei, Xiaochang Li, Haohong Lin, Hongjue Zhao, Tianyi Zhou, Lu Gan, Huajie Shao

Abstract: Trajectory world models play a crucial role in robotic dynamics learning, planning, and control. While recent works have explored trajectory world models for diverse robotic systems, they struggle to scale to a large number of distinct system dynamics and overlook domain knowledge of physical structures. To address these limitations, we introduce WestWorld, a knoWledge-Encoded Scalable Trajectory World model for diverse robotic systems. To tackle the scalability challenge, we propose a novel system-aware Mixture-of-Experts (Sys-MoE) that dynamically combines and routes specialized experts for different robotic systems via a learnable system embedding. To further enhance zero-shot generalization, we incorporate domain knowledge of robot physical structures by introducing a structural embedding that aligns trajectory representations with morphological information. After pretraining on 89 complex environments spanning diverse morphologies across both simulation and real-world settings, WestWorld achieves significant improvements over competitive baselines in zero- and few-shot trajectory prediction. Additionally, it shows strong scalability across a wide range of robotic environments and significantly improves performance on downstream model-based control for different robots. Finally, we deploy our model on a real-world Unitree Go1, where it demonstrates stable locomotion performance (see our demo on the website: https://westworldrobot.github.io/). The code will be available upon publication.

URLs: https://westworldrobot.github.io/).

new ES-Merging: Biological MLLM Merging via Embedding Space Signals

Authors: Wonbin Lee, Dongki Kim, Sung Ju Hwang

Abstract: Biological multimodal large language models (MLLMs) have emerged as powerful foundation models for scientific discovery. However, existing models are specialized to a single modality, limiting their ability to solve inherently cross-modal scientific problems. While model merging is an efficient method to combine the different modalities into a unified MLLM, existing methods rely on input-agnostic parameter space heuristics that fail to faithfully capture modality specialization. To overcome this limitation, we propose a representation-aware merging framework that estimates merging coefficients from embedding space signals. We first design a probe input that consists of different modality tokens and forward it through each specialized MLLM to obtain layer-wise embedding responses that reflect modality-specific representation changes. We then estimate complementary merging coefficients at two granularities from the embedding space: layer-wise coefficients from coarse-grained signals and element-wise coefficients from fine-grained signals, which are jointly combined for robust coefficient estimation. Experiments on interactive effect prediction benchmarks show that our method outperforms existing merging methods and even surpasses task-specific fine-tuned models, establishing that embedding space signals provide a principled and effective foundation for cross-modal MLLM merging.

new Graph-Based Deep Learning for Intelligent Detection of Energy Losses, Theft, and Operational Inefficiencies in Oil & Gas Production Networks

Authors: AbdulQoyum A. Olowookere, Adewale U. Oguntola, Ebenezer. Leke Odekanle

Abstract: Early detection of energy losses, theft, and operational inefficiencies remains a critical challenge in oil and gas production systems due to complex interdependencies among wells and facilities, evolving operating conditions, and limited labeled anomaly data. Traditional machine learning approaches often treat production units independently and struggle under temporal distribution shifts. This study proposes a spatiotemporal graph-based deep learning framework for anomaly detection in oil and gas production networks. The production system is modeled as a hierarchical graph of wells, facilities, and fields, with additional peer connections among wells sharing common infrastructure. Weakly supervised anomaly labels are derived from physically informed heuristics based on production, pressure, and flow behavior. Temporal dynamics are captured through sequence modeling, while relational dependencies are learned using a Temporal Graph Attention Network. Under time-based evaluation, the proposed model achieves an ROC-AUC of about 0.98 and anomaly recall above 0.93, demonstrating improved robustness and practical potential for proactive monitoring in real-world energy operations.

new Towards One-for-All Anomaly Detection for Tabular Data

Authors: Shiyuan Li, Yixin Liu, Yu Zheng, Xiaofeng Cao, Shirui Pan, Heng Tao Shen

Abstract: Tabular anomaly detection (TAD) aims to identify samples that deviate from the majority in tabular data and is critical in many real-world applications. However, existing methods follow a ``one model for one dataset (OFO)'' paradigm, which relies on dataset-specific training and thus incurs high computational cost and yields limited generalization to unseen domains. To address these limitations, we propose OFA-TAD, a generalist one-for-all (OFA) TAD framework that only requires one-time training on multiple source datasets and can generalize to unseen datasets from diverse domains on-the-fly. To realize one-for-all tabular anomaly detection, OFA-TAD extracts neighbor-distance patterns as transferable cues, and introduces multi-view neighbor-distance representations from multiple transformation-induced metric spaces to mitigate the transformation sensitivity of distance profiles. To adaptively combine multi-view distance evidence, a Mixture-of-Experts (MoE) scoring network is employed for view-specific anomaly scoring and entropy-regularized gated fusion, with a multi-strategy anomaly synthesis mechanism to support training under the one-class constraint. Extensive experiments on 34 datasets from 14 domains demonstrate that OFA-TAD achieves superior anomaly detection performance and strong cross-domain generalizability under the strict OFA setting.

new MBD: A Model-Based Debiasing Framework Across User, Content, and Model Dimensions

Authors: Yuantong Li, Lei Yuan, Zhihao Zheng, Weimiao Wu, Songbin Liu, Jeong Min Lee, Ali Selman Aydin, Shaofeng Deng, Junbo Chen, Xinyi Zhang, Hongjing Xia, Sam Fieldman, Matthew Kosko, Wei Fu, Du Zhang, Peiyu Yang, Albert Jin Chung, Xianlei Qiu, Miao Yu, Zhongwei Teng, Hao Chen, Sunny Baek, Hui Tang, Yang Lv, Renze Wang, Qifan Wang, Zhan Li, Tiantian Xu, Peng Wu, Ji Liu

Abstract: Modern recommendation systems rank candidates by aggregating multiple behavioral signals through a value model. However, many commonly used signals are inherently affected by heterogeneous biases. For example, watch time naturally favors long-form content, loop rate favors short - form content, and comment probability favors videos over images. Such biases introduce two critical issues: (1) value model scores may be systematically misaligned with users' relative preferences - for instance, a seemingly low absolute like probability may represent exceptionally strong interest for a user who rarely engages; and (2) changes in value modeling rules can trigger abrupt and undesirable ecosystem shifts. In this work, we ask a fundamental question: can biased behavioral signals be systematically transformed into unbiased signals, under a user - defined notion of ``unbiasedness'', that are both personalized and adaptive? We propose a general, model-based debiasing (MBD) framework that addresses this challenge by augmenting it with distributional modeling. By conditioning on a flexible subset of features (partial feature set), we explicitly estimate the contextual mean and variance of the engagement distribution for arbitrary cohorts (e.g., specific video lengths or user regions) directly alongside the main prediction. This integration allows the framework to convert biased raw signals into unbiased representations, enabling the construction of higher-level, calibrated signals (such as percentiles or z - scores) suitable for the value model. Importantly, the definition of unbiasedness is flexible and controllable, allowing the system to adapt to different personalization objectives and modeling preferences. Crucially, this is implemented as a lightweight, built-in branch of the existing MTML ranking model, requiring no separate serving infrastructure.

new Zoom to Essence: Trainless GUI Grounding by Inferring upon Interface Elements

Authors: Ziwei Liu, Tao Feng, Borui Kang, Yanbing Yang, Jun Luo

Abstract: Multimodal Large Language Model (MLLM)-based Graphical User Interface (GUI) agents develop rapidly, with visual grounding that maps natural language instructions to target UI elements serving as the core capability. Existing GUI agents typically fine-tune MLLM on massive datasets to handle challenges in understanding instructions and UI interfaces, which not only incurs high data annotation costs but also makes performance dependent on data quality and distribution. To avoid such cumbersome yet ineffective training, we notice that complex UI interfaces can be decomposed into basic visual elements directly understandable by common MLLMs. Consequently, we propose ZoomUI that leverages inference scaling to guide common MLLMs in progressively anchor instruction elements to increasingly detailed interface elements. Specifically, ZoomUI first optimizes the latent thinking to transform original instruction into element visual features description, and subsequently leverages internal attention to iteratively zoom in target element interface region. Evaluations on extensive benchmarks demonstrate that ZoomUI reaches or even surpasses SOTA baselines.

new STAG-CN: Spatio-Temporal Apiary Graph Convolutional Network for Disease Onset Prediction in Beehive Sensor Networks

Authors: Sungwoo Kang

Abstract: Honey bee colony losses threaten global pollination services, yet current monitoring systems treat each hive as an isolated unit, ignoring the spatial pathways through which diseases spread across apiaries. This paper introduces the Spatio-Temporal Apiary Graph Convolutional Network (STAG-CN), a graph neural network that models inter-hive relationships for disease onset prediction. STAG-CN operates on a dual adjacency graph combining physical co-location and climatic sensor correlation among hive sessions, and processes multivariate IoT sensor streams through a temporal--spatial--temporal sandwich architecture built on causal dilated convolutions and Chebyshev spectral graph convolutions. Evaluated on the Korean AI Hub apiculture dataset (dataset \#71488) with expanding-window temporal cross-validation, STAG-CN achieves an F1 score of 0.607 at a three-day forecast horizon. An ablation study reveals that the climatic adjacency matrix alone matches full-model performance (F1\,=\,0.607), while the physical adjacency alone yields F1\,=\,0.274, indicating that shared environmental response patterns carry stronger predictive signal than spatial proximity for disease onset. These results establish a proof-of-concept for graph-based biosecurity monitoring in precision apiculture, demonstrating that inter-hive sensor correlations encode disease-relevant information invisible to single-hive approaches.

new On the (Generative) Linear Sketching Problem

Authors: Xinyu Yuan, Yan Qiao, Zonghui Wang, Wenzhi Chen

Abstract: Sketch techniques have been extensively studied in recent years and are especially well-suited to data streaming scenarios, where the sketch summary is updated quickly and compactly. However, it is challenging to recover the current state from these summaries in a way that is accurate, fast, and real. In this paper, we seek a solution that reconciles this tension, aiming for near-perfect recovery with lightweight computational procedures. Focusing on linear sketching problems of the form $\boldsymbol{\Phi}f \rightarrow f$, our study proceeds in three stages. First, we dissect existing techniques and show the root cause of the sketching dilemma: an orthogonal information loss. Second, we examine how generative priors can be leveraged to bridge the information gap. Third, we propose FLORE, a novel generative sketching framework that embraces these analyses to achieve the best of all worlds. More importantly, FLORE can be trained without access to ground-truth data. Comprehensive evaluations demonstrate FLORE's ability to provide high-quality recovery, and support summary with low computing overhead, outperforming previous methods by up to 1000 times in error reduction and 100 times in processing speed compared to learning-based solutions.

new Geometric and Topological Deep Learning for Predicting Thermo-mechanical Performance in Cold Spray Deposition Process Modeling

Authors: Akshansh Mishra

Abstract: This study presents a geometric deep learning framework for predicting cold spray particle impact responses using finite element simulation data. A parametric dataset was generated through automated Abaqus simulations spanning a systematic range of particle velocity, particle temperature, and friction coefficient, yielding five output targets including maximum equivalent plastic strain, average contact plastic strain, maximum temperature, maximum von Mises stress, and deformation ratio. Four novel algorithms i.e. a GraphSAGE-style inductive graph neural network, a Chebyshev spectral graph convolution network, a topological data analysis augmented multilayer perceptron, and a geometric attention network were implemented and evaluated. Each input sample was treated as a node in a k-nearest-neighbour feature-space graph, enabling the models to exploit spatial similarity between process conditions during training. Three-dimensional feature space visualisations and two-dimensional contour projections confirmed the highly non-linear and velocity-dominated nature of the input-output relationships. Quantitative evaluation demonstrated that GraphSAGE and GAT consistently achieved R-square values exceeding 0.93 across most targets, with GAT attaining peak performance of R-square equal to 0.97 for maximum plastic strain. ChebSpectral and TDA-MLP performed considerably worse, yielding negative R-square values for several targets. These findings establish spatial graph-based neighbourhood aggregation as a robust and physically interpretable surrogate modelling strategy for cold spray process optimisation.

new Disentangling Dynamical Systems: Causal Representation Learning Meets Local Sparse Attention

Authors: Markus W. Baumgartner, Anson Lei, Joe Watson, Ingmar Posner

Abstract: Parametric system identification methods estimate the parameters of explicitly defined physical systems from data. Yet, they remain constrained by the need to provide an explicit function space, typically through a predefined library of candidate functions chosen via available domain knowledge. In contrast, deep learning can demonstrably model systems of broad complexity with high fidelity, but black-box function approximation typically fails to yield explicit descriptive or disentangled representations revealing the structure of a system. We develop a novel identifiability theorem, leveraging causal representation learning, to uncover disentangled representations of system parameters without structural assumptions. We derive a graphical criterion specifying when system parameters can be uniquely disentangled from raw trajectory data, up to permutation and diffeomorphism. Crucially, our analysis demonstrates that global causal structures provide a lower bound on the disentanglement guarantees achievable when considering local state-dependent causal structures. We instantiate system parameter identification as a variational inference problem, leveraging a sparsity-regularised transformer to uncover state-dependent causal structures. We empirically validate our approach across four synthetic domains, demonstrating its ability to recover highly disentangled representations that baselines fail to recover. Corroborating our theoretical analysis, our results confirm that enforcing local causal structure is often necessary for full identifiability.

new Unlearning-based sliding window for continual learning under concept drift

Authors: Michal Wozniak, Marek Klonowski, Maciej Maczynski, Bartosz Krawczyk

Abstract: Traditional machine learning assumes a stationary data distribution, yet many real-world applications operate on nonstationary streams in which the underlying concept evolves over time. This problem can also be viewed as task-free continual learning under concept drift, where a model must adapt sequentially without explicit task identities or task boundaries. In such settings, effective learning requires both rapid adaptation to new data and forgetting of outdated information. A common solution is based on a sliding window, but this approach is often computationally demanding because the model must be repeatedly retrained from scratch on the most recent data. We propose a different perspective based on machine unlearning. Instead of rebuilding the model each time the active window changes, we remove the influence of outdated samples using unlearning and then update the model with newly observed data. This enables efficient, targeted forgetting while preserving adaptation to evolving distributions. To the best of our knowledge, this is the first work to connect machine unlearning with concept drift mitigation for task-free continual learning. Empirical results on image stream classification across multiple drift scenarios demonstrate that the proposed approach offers a competitive and computationally efficient alternative to standard sliding-window retraining. Our implementation can be found at \hrehttps://anonymous.4open.science/r/MUNDataStream-60F3}{https://anonymous.4open.science/r/MUNDataStream-60F3}.

URLs: https://anonymous.4open.science/r/MUNDataStream-60F3, https://anonymous.4open.science/r/MUNDataStream-60F3

new Predicting Stress-strain Behaviors of Additively Manufactured Materials via Loss-based and Activation-based Physics-informed Machine Learning

Authors: Chenglong Duan, Dazhong Wu

Abstract: Predicting the stress-strain behaviors of additively manufactured materials is crucial for part qualification in additive manufacturing (AM). Conventional physics-based constitutive models often oversimplify material properties, while data-driven machine learning (ML) models often lack physical consistency and interpretability. To address these issues, we propose a physics-informed machine learning (PIML) framework to improve the predictive performance and physical consistency for predicting the stress-strain curves of additively manufactured polymers and metals. A polynomial regression model is used to predict the yield point from AM process parameters, then stress-strain curves are segmented into elastic and plastic regions. Two long short-term memory (LSTM) models are trained to predict two regions separately. For the elastic region, Hooke's law is embedded into the LSTM model for both polymer and metal. For the plastic region, Voce hardening law and Hollomon's law are embedded into the LSTM model for polymer and metal, respectively. The loss-based and activation-based PIML architectures are developed by embedding the physical laws into the loss and activation functions, respectively. The performance of the two PIML architectures are compared with two LSTM-based ML models, three additional ML models, and a physics-based constitutive model. These models are built on experimental data collected from two additively manufactured polymers (i.e., Nylon and carbon fiber-acrylonitrile butadiene styrene) and two additively manufactured metals (i.e., AlSi10Mg and Ti6Al4V). Experimental results demonstrate that two PIML architectures consistently outperform the other models. The segmental predictive model with activation-based PIML architecture achieves the lowest MAPE of 10.46+/-0.81% and the highest R^2 of 0.82+/-0.05 arocss four datasets.

new Trust-Region Noise Search for Black-Box Alignment of Diffusion and Flow Models

Authors: Niklas Schweiger, Daniel Cremers, Karnik Ram

Abstract: Optimizing the noise samples of diffusion and flow models is an increasingly popular approach to align these models to target rewards at inference time. However, we observe that these approaches are usually restricted to differentiable or cheap reward models, the formulation of the underlying pretrained generative model, or are memory/compute inefficient. We instead propose a simple trust-region based search algorithm (TRS) which treats the pre-trained generative and reward models as a black-box and only optimizes the source noise. Our approach achieves a good balance between global exploration and local exploitation, and is versatile and easily adaptable to various generative settings and reward models with minimal hyperparameter tuning. We evaluate TRS across text-to-image, molecule and protein design tasks, and obtain significantly improved output samples over the base generative models and other inference-time alignment approaches which optimize the source noise sample, or even the entire reverse-time sampling noise trajectories in the case of diffusion models. Our source code is publicly available.

new High-Probability Bounds for SGD under the Polyak-Lojasiewicz Condition with Markovian Noise

Authors: Avik Kar, Siddharth Chandak, Rahul Singh, Eric Moulines, Shalabh Bhatnagar, Nicholas Bambos

Abstract: We present the first uniform-in-time high-probability bound for SGD under the PL condition, where the gradient noise contains both Markovian and martingale difference components. This significantly broadens the scope of finite-time guarantees, as the PL condition arises in many machine learning and deep learning models while Markovian noise naturally arises in decentralized optimization and online system identification problems. We further allow the magnitude of noise to grow with the function value, enabling the analysis of many practical sampling strategies. In addition to the high-probability guarantee, we establish a matching $1/k$ decay rate for the expected suboptimality. Our proof technique relies on the Poisson equation to handle the Markovian noise and a probabilistic induction argument to address the lack of almost-sure bounds on the objective. Finally, we demonstrate the applicability of our framework by analyzing three practical optimization problems: token-based decentralized linear regression, supervised learning with subsampling for privacy amplification, and online system identification.

new Excited Pfaffians: Generalized Neural Wave Functions Across Structure and State

Authors: Nicholas Gao, Till Grutschus, Frank No\'e, Stephan G\"unnemann

Abstract: Neural-network wave functions in Variational Monte Carlo (VMC) have achieved great success in accurately representing both ground and excited states. However, achieving sufficient numerical accuracy in state overlaps requires increasing the number of Monte Carlo samples, and consequently the computational cost, with the number of states. We present a nearly constant sample-size approach, Multi-State Importance Sampling (MSIS), that leverages samples from all states to estimate pairwise overlap. To efficiently evaluate all states for all samples, we introduce Excited Pfaffians. Inspired by Hartree-Fock, this architecture represents many states within a single neural network. Excited Pfaffians also serve as generalized wave functions, allowing a single model to represent multi-state potential energy surfaces. On the carbon dimer, we match the $O(N_s^4)$-scaling natural excited states while training $>200\times$ faster and modeling 50\% more states. Our favorable scaling enables us to be the first to use neural networks to find all distinct energy levels of the beryllium atom. Finally, we demonstrate that a single wave function can represent excited states across various molecules.

new Visualizing Critic Match Loss Landscapes for Interpretation of Online Reinforcement Learning Control Algorithms

Authors: Jingyi Liu, Jian Guo, Eberhard Gill

Abstract: Reinforcement learning has proven its power on various occasions. However, its performance is not always guaranteed when system dynamics change. Instead, it largely relies on users' empirical experience. For reinforcement learning algorithms with an actor-critic structure, the critic neural network reflects the approximation and optimization process in the RL algorithm. Analyzing the performance of the critic neural network helps to understand the mechanism of the algorithm. To support systematic interpretation of such algorithms in dynamic control problems, this work proposes a critic match loss landscape visualization method for online reinforcement learning. The method constructs a loss landscape by projecting recorded critic parameter trajectories onto a low-dimensional linear subspace. The critic match loss is evaluated over the projected parameter grid using fixed reference state samples and temporal-difference targets. This yields a three-dimensional loss surface together with a two-dimensional optimization path that characterizes critic learning behavior. To extend analysis beyond visual inspection, quantitative landscape indices and a normalized system performance index are introduced, enabling structured comparison across different training outcomes. The approach is demonstrated using the Action-Dependent Heuristic Dynamic Programming algorithm on cart-pole and spacecraft attitude control tasks. Comparative analyses across projection methods and training stages reveal distinct landscape characteristics associated with stable convergence and unstable learning. The proposed framework enables both qualitative and quantitative interpretation of critic optimization behavior in online reinforcement learning.

new Learning to Order: Task Sequencing as In-Context Optimization

Authors: Jan Kobiolka, Christian Frey, Arlind Kadra, Gresa Shala, Josif Grabocka

Abstract: Task sequencing (TS) is one of the core open problems in Deep Learning, arising in a plethora of real-world domains, from robotic assembly lines to autonomous driving. Unfortunately, prior work has not convincingly demonstrated the generalization ability of meta-learned TS methods to solve new TS problems, given few initial demonstrations. In this paper, we demonstrate that deep neural networks can meta-learn over an infinite prior of synthetically generated TS problems and achieve a few-shot generalization. We meta-learn a transformer-based architecture over datasets of sequencing trajectories generated from a prior distribution that samples sequencing problems as paths in directed graphs. In a large-scale experiment, we provide ample empirical evidence that our meta-learned models discover optimal task sequences significantly quicker than non-meta-learned baselines.

new CausalEvolve: Towards Open-Ended Discovery with Causal Scratchpad

Authors: Yongqiang Chen, Chenxi Liu, Zhenhao Chen, Tongliang Liu, Bo Han, Kun Zhang

Abstract: Evolve-based agent such as AlphaEvolve is one of the notable successes in using Large Language Models (LLMs) to build AI Scientists. These agents tackle open-ended scientific problems by iteratively improving and evolving programs, leveraging the prior knowledge and reasoning capabilities of LLMs. Despite the success, existing evolve-based agents lack targeted guidance for evolution and effective mechanisms for organizing and utilizing knowledge acquired from past evolutionary experience. Consequently, they suffer from decreasing evolution efficiency and exhibit oscillatory behavior when approaching known performance boundaries. To mitigate the gap, we develop CausalEvolve, equipped with a causal scratchpad that leverages LLMs to identify and reason about guiding factors for evolution. At the beginning, CausalEvolve first identifies outcome-level factors that offer complementary inspirations in improving the target objective. During the evolution, CausalEvolve also inspects surprise patterns during the evolution and abductive reasoning to hypothesize new factors, which in turn offer novel directions. Through comprehensive experiments, we show that CausalEvolve effectively improves the evolutionary efficiency and discovers better solutions in 4 challenging open-ended scientific tasks.

new Adapting Critic Match Loss Landscape Visualization to Off-policy Reinforcement Learning

Authors: Jingyi Liu, Jian Guo, Eberhard Gill

Abstract: This work extends an established critic match loss landscape visualization method from online to off-policy reinforcement learning (RL), aiming to reveal the optimization geometry behind critic learning. Off-policy RL differs from stepwise online actor-critic learning in its replay-based data flow and target computation. Based on these two structural differences, the critic match loss landscape visualization method is adapted to the Soft Actor-Critic (SAC) algorithm by aligning the loss evaluation with its batch-based data flow and target computation, using a fixed replay batch and precomputed critic targets from the selected policy. Critic parameters recorded during training are projected onto a principal component plane, where the critic match loss is evaluated to form a 3-D landscape with an overlaid 2-D optimization path. Applied to a spacecraft attitude control problem, the resulting landscapes are analyzed both qualitatively and quantitatively using sharpness, basin area, and local anisotropy metrics, together with temporal landscape snapshots. Comparisons between convergent SAC, divergent SAC, and divergent Action-Dependent Heuristic Dynamic Programming (ADHDP) cases reveal distinct geometric patterns and optimization behaviors under different algorithmic structures. The results demonstrate that the adapted critic match loss visualization framework serves as a geometric diagnostic tool for analyzing critic optimization dynamics in replay-based off-policy RL-based control problems.

new FlashHead: Efficient Drop-In Replacement for the Classification Head in Language Model Inference

Authors: Wilhelm Tranheden, Shahnawaz Ahmed, Devdatt Dubhashi, Jonna Matthiesen, Hannes von Essen

Abstract: Language models are increasingly adopting smaller architectures optimized for consumer devices. In this setting, inference efficiency is the primary constraint. Meanwhile, vocabulary sizes continue to grow rapidly, making the classification head a critical bottleneck that accounts for up to 60\% of model parameters, and 50\% of inference compute. We introduce FlashHead, the first efficient drop-in replacement for the dense classification head that is training-free and hardware-friendly. FlashHead builds on principles from information retrieval, reframing that computation at the output head as a retrieval problem rather than a dense classification over the full vocabulary. FlashHead introduces four key innovations: (1) a balanced clustering scheme that structures vocabulary partitions into compact hardware-efficient tensors, (2) extending multiprobe retrieval to language model heads, enabling thousands of clusters to be scored in parallel, (3) a novel inference-time sampling mechanism that extends retrieval beyond top tokens, enabling probabilistic sampling across the full vocabulary, and (4) selective quantization, enabling effective low-bit computation in the head. Experiments on Llama-3.2, Gemma-3, and Qwen-3 show that FlashHead delivers model-level inference speedups of up to \textbf{1.75x} which maintaining output accuracy compared to the original head. By overcoming the classification head bottleneck, FlashHead establishes a new benchmark for efficient inference and removes a key barrier to developing smaller, capable models for consumer hardware.

new A Multi-Scale Graph Learning Framework with Temporal Consistency Constraints for Financial Fraud Detection in Transaction Networks under Non-Stationary Conditions

Authors: Yiming Lei, Qiannan Shen, Junhao Song

Abstract: Financial fraud detection in transaction networks involves modeling sparse anomalies, dynamic patterns, and severe class imbalance in the presence of temporal drift in the data. In real-world transaction systems, a suspicious transaction is rarely isolated: rather, legitimate and suspicious transactions are often connected through accounts, intermediaries or through temporal transaction sequences. Attribute-based or randomly partitioned learning pipelines are therefore insufficient to detect relationally structured fraud. STC-MixHop, a graph-based framework combining spatial multi-resolution propagation with lightweight temporal consistency modeling for anomaly and fraud detection in dynamic transaction networks. It integrates three components: a MixHop-inspired multi-scale neighborhood diffusion encoder a multi-scale neighborhood diffusion MixHop-based encoder for learning structural patterns; a spatial-temporal attention module coupling current and preceding graph snapshots to stabilize representations; and a temporally informed self-supervised pretraining strategy exploiting unlabeled transaction interactions to improve representation quality. We evaluate the framework primarily on the PaySim dataset under strict chronological splits, supplementing the analysis with Porto Seguro and FEMA data to probe cross-domain component behavior. Results show that STC-MixHop is competitive among graph methods and achieves strong screening-oriented recall under highly imbalanced conditions. The experiments also reveal an important boundary condition: when node attributes are highly informative, tabular baselines remain difficult to outperform. Graph structure contributes most clearly where hidden relational dependencies are operationally important. These findings support a stability-focused view of graph learning for financial fraud detection.

new A Loss Landscape Visualization Framework for Interpreting Reinforcement Learning: An ADHDP Case Study

Authors: Jingyi Liu, Jian Guo, Eberhard Gill

Abstract: Reinforcement learning algorithms have been widely used in dynamic and control systems. However, interpreting their internal learning behavior remains a challenge. In the authors' previous work, a critic match loss landscape visualization method was proposed to study critic training. This study extends that method into a framework which provides a multi-perspective view of the learning dynamics, clarifying how value estimation, policy optimization, and temporal-difference (TD) signals interact during training. The proposed framework includes four complementary components; a three-dimensional reconstruction of the critic match loss surface that shows how TD targets shape the optimization geometry; an actor loss landscape under a frozen critic that reveals how the policy exploits that geometry; a trajectory combining time, Bellman error, and policy weights that indicates how updates move across the surface; and a state-TD map that identifies the state regions that drive those updates. The Action-Dependent Heuristic Dynamic Programming (ADHDP) algorithm for spacecraft attitude control is used as a case study. The framework is applied to compare several ADHDP variants and shows how training stabilizers and target updates change the optimization landscape and affect learning stability. Therefore, the proposed framework provides a systematic and interpretable tool for analyzing reinforcement learning behavior across algorithmic designs.

new Delightful Policy Gradient

Authors: Ian Osband

Abstract: Standard policy gradients weight each sampled action by advantage alone, regardless of how likely that action was under the current policy. This creates two pathologies: within a single decision context (e.g. one image or prompt), a rare negative-advantage action can disproportionately distort the update direction; across many such contexts in a batch, the expected gradient over-allocates budget to contexts the policy already handles well. We introduce the \textit{Delightful Policy Gradient} (DG), which gates each term with a sigmoid of \emph{delight}, the product of advantage and action surprisal (negative log-probability). For $K$-armed bandits, DG provably improves directional accuracy in a single context and, across multiple contexts, shifts the expected gradient strictly closer to the supervised cross-entropy oracle. This second effect is not variance reduction: it persists even with infinite samples. Empirically, DG outperforms REINFORCE, PPO, and advantage-weighted baselines across MNIST, transformer sequence modeling, and continuous control, with larger gains on harder tasks.

new Proactive Routing to Interpretable Surrogates with Distribution-Free Safety Guarantees

Authors: Iqtedar Uddin, Mazin Khider, Andr\'e Bauer

Abstract: Model routing determines whether to use an accurate black-box model or a simpler surrogate that approximates it at lower cost or greater interpretability. In deployment settings, practitioners often wish to restrict surrogate use to inputs where its degradation relative to a reference model is controlled. We study proactive (input-based) routing, in which a lightweight gate selects the model before either runs, enabling distribution-free control of the fraction of routed inputs whose degradation exceeds a tolerance {\tau}. The gate is trained to distinguish safe from unsafe inputs, and a routing threshold is chosen via Clopper-Pearson conformal calibration on a held-out set, guaranteeing that the routed-set violation rate is at most {\alpha} with probability 1-{\delta}. We derive a feasibility condition linking safe routing to the base safe rate {\pi} and risk budget {\alpha}, along with sufficient AUC thresholds ensuring that feasible routing exists. Across 35 OpenML datasets and multiple black-box model families, gate-based conformal routing maintains controlled violation while achieving substantially higher coverage than regression conformal and naive baselines. We further show that probabilistic calibration primarily affects routing efficiency rather than distribution-free validity.

new Anterior's Approach to Fairness Evaluation of Automated Prior Authorization System

Authors: Sai P. Selvaraj, Khadija Mahmoud, Anuj Iravane

Abstract: Increasing staffing constraints and turnaround-time pressures in Prior authorization (PA) have led to increasing automation of decision systems to support PA review. Evaluating fairness in such systems poses unique challenges because legitimate clinical guidelines and medical necessity criteria often differ across demographic groups, making parity in approval rates an inappropriate fairness metric. We propose a fairness evaluation framework for prior authorization models based on model error rates rather than approval outcomes. Using 7,166 human-reviewed cases spanning 27 medical necessity guidelines, we assessed consistency in sex, age, race/ethnicity, and socioeconomic status. Our evaluation combined error-rate comparisons, tolerance-band analysis with a predefined $\pm$5 percentage-point margin, statistical power evaluation, and protocol-controlled logistic regression. Across most demographics, model error rates were consistent, and confidence intervals fell within the predefined tolerance band, indicating no meaningful performance differences. For race/ethnicity, point estimates remain small, but subgroup sample sizes were limited, resulting in wide confidence intervals and underpowered tests, with inconclusive evidence within the dataset we explored. These findings illustrate a rigorous and regulator-aligned approach to fairness evaluation in administrative healthcare AI systems.

new A Methodology for Thermal Limit Bias Predictability Through Artificial Intelligence

Authors: Anirudh Tunga, Michael J. Mueterthies, Jonathan Nistor

Abstract: Nuclear power plant operators face significant challenges due to unpredictable deviations between offline and online thermal limits, a phenomenon known as thermal limit bias, which leads to conservative design margins, increased fuel costs, and operational inefficiencies. This work presents a deep learning based methodology to predict and correct this bias for Boiling Water Reactors (BWRs), focusing on the Maximum Fraction of Limiting Power Density (MFLPD) metric used to track the Linear Heat Generation Rate (LHGR) limit. The proposed model employs a fully convolutional encoder decoder architecture, incorporating a feature fusion network to predict corrected MFLPD values closer to online measurements. Evaluated across five independent fuel cycles, the model reduces the mean nodal array error by 74 percent, the mean absolute deviation in limiting values by 72 percent, and the maximum bias by 52 percent compared to offline methods. These results demonstrate the model's potential to meaningfully improve fuel cycle economics and operational planning, and a commercial variant has been deployed at multiple operating BWRs.

new EARCP: Self-Regulating Coherence-Aware Ensemble Architecture for Sequential Decision Making -- Ensemble Auto-Regule par Coherence et Performance

Authors: Mike Amega

Abstract: We present EARCP (Ensemble Auto-R\'egul\'e par Coh\'erence et Performance), a novel ensemble architecture that dynamically weights heterogeneous expert models based on both their individual performance and inter-model coherence. Unlike traditional ensemble methods that rely on static or offline-learned combinations, EARCP continuously adapts model weights through a principled online learning mechanism that balances exploitation of high-performing models with exploration guided by consensus signals. The architecture combines theoretical foundations from multiplicative weight update algorithms with a novel coherence-based regularization term, providing both theoretical guarantees through regret bounds and practical robustness in non-stationary environments. We formalize the EARCP framework, prove sublinear regret bounds of O(sqrt(T log M)) under standard assumptions, and demonstrate its effectiveness through empirical evaluation on sequential prediction tasks including time series forecasting, activity recognition, and financial prediction. The architecture is designed as a general-purpose framework applicable to any domain requiring ensemble learning with temporal dependencies. An open-source implementation is available at https://github.com/Volgat/earcp and via PyPI (pip install earcp).

URLs: https://github.com/Volgat/earcp

new \texttt{BayesBreak}: Generalized Hierarchical Bayesian Segmentation with Irregular Designs, Multi-Sample Hierarchies, and Grouped/Latent-Group Designs

Authors: Omid Shams Solari

Abstract: Bayesian change-point and segmentation models provide uncertainty-aware piecewise-constant representations of ordered data, but exact inference is often tied to narrow likelihood classes, single-sequence settings, or index-uniform designs. We present \texttt{BayesBreak}, a modular offline Bayesian segmentation framework built around a simple separation: each candidate block contributes a marginal likelihood and any required moment numerators, and a global dynamic program combines those block scores into posterior quantities over segment counts, boundary locations, and latent signals. For weighted exponential-family likelihoods with conjugate priors, block evidences and posterior moments are available in closed form from cumulative sufficient statistics, yielding exact sum-product inference for $P(y\mid k)$, $P(k\mid y)$, boundary marginals, and Bayes regression curves. We also distinguish these quantities from the \emph{joint} MAP segmentation, which is recovered by a separate max-sum backtracking recursion.

new AgentTrace: Causal Graph Tracing for Root Cause Analysis in Deployed Multi-Agent Systems

Authors: Zhaohui Geoffrey Wang

Abstract: As multi-agent AI systems are increasingly deployed in real-world settings - from automated customer support to DevOps remediation - failures become harder to diagnose due to cascading effects, hidden dependencies, and long execution traces. We present AgentTrace, a lightweight causal tracing framework for post-hoc failure diagnosis in deployed multi-agent workflows. AgentTrace reconstructs causal graphs from execution logs, traces backward from error manifestations, and ranks candidate root causes using interpretable structural and positional signals - without requiring LLM inference at debugging time. Across a diverse benchmark of multi-agent failure scenarios designed to reflect common deployment patterns, AgentTrace localizes root causes with high accuracy and sub-second latency, significantly outperforming both heuristic and LLM-based baselines. Our results suggest that causal tracing provides a practical foundation for improving the reliability and trustworthiness of agentic systems in the wild.

new Chain-of-Trajectories: Unlocking the Intrinsic Generative Optimality of Diffusion Models via Graph-Theoretic Planning

Authors: Ping Chen, Xiang Liu, Xingpeng Zhang, Fei Shen, Xun Gong, Zhaoxiang Liu, Zezhou Chen, Huan Hu, Kai Wang, Shiguo Lian

Abstract: Diffusion models operate in a reflexive System 1 mode, constrained by a fixed, content-agnostic sampling schedule. This rigidity arises from the curse of state dimensionality, where the combinatorial explosion of possible states in the high-dimensional noise manifold renders explicit trajectory planning intractable and leads to systematic computational misallocation. To address this, we introduce Chain-of-Trajectories (CoTj), a train-free framework enabling System 2 deliberative planning. Central to CoTj is Diffusion DNA, a low-dimensional signature that quantifies per-stage denoising difficulty and serves as a proxy for the high-dimensional state space, allowing us to reformulate sampling as graph planning on a directed acyclic graph. Through a Predict-Plan-Execute paradigm, CoTj dynamically allocates computational effort to the most challenging generative phases. Experiments across multiple generative models demonstrate that CoTj discovers context-aware trajectories, improving output quality and stability while reducing redundant computation. This work establishes a new foundation for resource-aware, planning-based diffusion modeling. The code is available at https://github.com/UnicomAI/CoTj.

URLs: https://github.com/UnicomAI/CoTj.

new Cross-RAG: Zero-Shot Retrieval-Augmented Time Series Forecasting via Cross-Attention

Authors: Seunghan Lee, Jaehoon Lee, Jun Seo, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, SoonYoung Lee, Wonbin Ahn

Abstract: Recent advances in time series foundation models (TSFMs) demonstrate strong expressive capacity through large-scale pretraining across diverse time series domains. Zero-shot time series forecasting with TSFMs, however, exhibits limited generalization to unseen datasets, which retrieval-augmented forecasting addresses by leveraging an external knowledge base. Existing approaches rely on a fixed number of retrieved samples that may introduce irrelevant information. To this end, we propose Cross-RAG, a zero-shot retrieval-augmented forecasting framework that selectively attends to query-relevant retrieved samples. Cross-RAG models input-level relevance between the query and retrieved samples via query-retrieval cross-attention, while jointly incorporating information from the query and retrieved samples. Extensive experiments demonstrate that Cross-RAG consistently improves zero-shot forecasting performance across various TSFMs and RAG methods, and additional analyses confirm its effectiveness across diverse retrieval scenarios. Code is available at https://github.com/seunghan96/cross-rag/.

URLs: https://github.com/seunghan96/cross-rag/.

new Training-Free Generation of Protein Sequences from Small Family Alignments via Stochastic Attention

Authors: Jeffrey D. Varner

Abstract: Most protein families have fewer than 100 known members, a regime where deep generative models overfit or collapse. We propose stochastic attention (SA), a training-free sampler that treats the modern Hopfield energy over a protein alignment as a Boltzmann distribution and draws samples via Langevin dynamics. The score function is a closed-form softmax attention operation requiring no training, no pretraining data, and no GPU, with cost linear in alignment size. Across eight Pfam families, SA generates sequences with low amino acid compositional divergence, substantial novelty, and structural plausibility confirmed by ESMFold and AlphaFold2. Generated sequences fold more faithfully to canonical family structures than natural members in six of eight families. Against profile HMMs, EvoDiff, and the MSA Transformer, which produce sequences that drift far outside the family, SA maintains 51 to 66 percent identity while remaining novel, in seconds on a laptop. The critical temperature governing generation is predicted from PCA dimensionality alone, enabling fully automatic operation. Controls confirm SA encodes correlated substitution patterns, not just per-position amino acid frequencies.

new Multimodal Deep Learning for Early Prediction of Patient Deterioration in the ICU: Integrating Time-Series EHR Data with Clinical Notes

Authors: Binesh Sadanandan

Abstract: Early identification of patients at risk for clinical deterioration in the intensive care unit (ICU) remains a critical challenge. Delayed recognition of impending adverse events, including mortality, vasopressor initiation, and mechanical ventilation, contributes to preventable morbidity and mortality. We present a multimodal deep learning approach that combines structured time-series data (vital signs and laboratory values) with unstructured clinical notes to predict patient deterioration within 24 hours. Using the MIMIC-IV database, we constructed a cohort of 74,822 ICU stays and generated 5.7 million hourly prediction samples. Our architecture employs a bidirectional LSTM encoder for temporal patterns in physiologic data and ClinicalBERT embeddings for clinical notes, fused through a cross-modal attention mechanism. We also present a systematic review of existing approaches to ICU deterioration prediction, identifying 31 studies published between 2015 and 2024. Most existing models rely solely on structured data and achieve area under the curve (AUC) values between 0.70 and 0.85. Studies incorporating clinical notes remain rare but show promise for capturing information not present in structured fields. Our multimodal model achieves a test AUROC of 0.7857 and AUPRC of 0.1908 on 823,641 held-out samples, with a validation-to-test gap of only 0.6 percentage points. Ablation analysis validates the multimodal approach: clinical notes improve AUROC by 2.5 percentage points and AUPRC by 39.2% relative to a structured-only baseline, while deep learning models consistently outperform classical baselines (XGBoost AUROC: 0.7486, logistic regression: 0.7171). This work contributes both a thorough review of the field and a reproducible multimodal framework for clinical deterioration prediction.

new DeFRiS: Silo-Cooperative IoT Applications Scheduling via Decentralized Federated Reinforcement Learning

Authors: Zhiyu Wang, Mohammad Goudarzi, Mingming Gong, Rajkumar Buyya

Abstract: Next-generation IoT applications increasingly span across autonomous administrative entities, necessitating silo-cooperative scheduling to leverage diverse computational resources while preserving data privacy. However, realizing efficient cooperation faces significant challenges arising from infrastructure heterogeneity, Non-IID workload shifts, and the inherent risks of adversarial environments. Existing approaches, relying predominantly on centralized coordination or independent learning, fail to address the incompatibility of state-action spaces across heterogeneous silos and lack robustness against malicious attacks. This paper proposes DeFRiS, a Decentralized Federated Reinforcement Learning framework for robust and scalable Silo-cooperative IoT application scheduling. DeFRiS integrates three synergistic innovations: (i) an action-space-agnostic policy utilizing candidate resource scoring to enable seamless knowledge transfer across heterogeneous silos; (ii) a silo-optimized local learning mechanism combining Generalized Advantage Estimation (GAE) with clipped policy updates to resolve sparse delayed reward challenges; and (iii) a Dual-Track Non-IID robust decentralized aggregation protocol leveraging gradient fingerprints for similarity-aware knowledge transfer and anomaly detection, and gradient tracking for optimization momentum. Extensive experiments on a distributed testbed with 20 heterogeneous silos and realistic IoT workloads demonstrate that DeFRiS significantly outperforms state-of-the-art baselines, reducing average response time by 6.4% and energy consumption by 7.2%, while lowering tail latency risk (CVaR$_{0.95}$) by 10.4% and achieving near-zero deadline violations. Furthermore, DeFRiS achieves over 3 times better performance retention as the system scales and over 8 times better stability in adversarial environments compared to the best-performing baseline.

new GNNVerifier: Graph-based Verifier for LLM Task Planning

Authors: Yu Hao, Qiuyu Wang, Cheng Yang, Yawen Li, Zhiqiang Zhang, Chuan Shi

Abstract: Large language models (LLMs) facilitate the development of autonomous agents. As a core component of such agents, task planning aims to decompose complex natural language requests into concrete, solvable sub-tasks. Since LLM-generated plans are frequently prone to hallucinations and sensitive to long-context prom-pts, recent research has introduced plan verifiers to identify and correct potential flaws. However, most existing approaches still rely on an LLM as the verifier via additional prompting for plan review or self-reflection. LLM-based verifiers can be misled by plausible narration and struggle to detect failures caused by structural relations across steps, such as type mismatches, missing intermediates, or broken dependencies. To address these limitations, we propose a graph-based verifier for LLM task planning. Specifically, the proposed method has four major components: Firstly, we represent a plan as a directed graph with enriched attributes, where nodes denote sub-tasks and edges encode execution order and dependency constraints. Secondly, a graph neural network (GNN) then performs structural evaluation and diagnosis, producing a graph-level plausibility score for plan acceptance as well as node/edge-level risk scores to localize erroneous regions. Thirdly, we construct controllable perturbations from ground truth plan graphs, and automatically generate training data with fine-grained annotations. Finally, guided by the feedback from our GNN verifier, we enable an LLM to conduct local edits (e.g., tool replacement or insertion) to correct the plan when the graph-level score is insufficient. Extensive experiments across diverse datasets, backbone LLMs, and planners demonstrate that our GNNVerifier achieves significant gains in improving plan quality. Our data and code is available at https://github.com/BUPT-GAMMA/GNNVerifier.

URLs: https://github.com/BUPT-GAMMA/GNNVerifier.

new CAMD: Coverage-Aware Multimodal Decoding for Efficient Reasoning of Multimodal Large Language Models

Authors: Huijie Guo, Jingyao Wang, Lingyu Si, Jiahuan Zhou, Changwen Zheng, Wenwen Qiang

Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have shown impressive reasoning capabilities across vision-language tasks, yet still face the challenge of compute-difficulty mismatch. Through empirical analyses, we identify that existing decoding methods may waste compute on easy cases while underserving hard ones, affecting both model effectiveness and efficiency. To address this issue, we first develop a theoretical framework that links sampling coverage, instance difficulty, and residual risk. Our analysis reveals that multimodal reasoning exhibits a heavy-tailed difficulty distribution; a small subset of hard or ambiguous samples dominates the residual failure probability. Based on this insight, we propose Coverage-Aware Multimodal Decoding (CAMD), an adaptive inference mechanism that dynamically allocates computation according to estimated uncertainty. CAMD integrates evidence-weighted scoring, posterior coverage estimation, and sequential Bayesian updating to balance efficiency and reliability under a limited token budget. Experiments on various benchmark datasets and baselines demonstrate the effectiveness and advantages of our approach.

new Understanding the geometry of deep learning with decision boundary volume

Authors: Matthew Burfitt, Jacek Brodzki, Pawel D{\l}otko

Abstract: For classification tasks, the performance of a deep neural network is determined by the structure of its decision boundary, whose geometry directly affects essential properties of the model, including accuracy and robustness. Motivated by a classical tube formula due to Weyl, we introduce a method to measure the decision boundary of a neural network through local surface volumes, providing a theoretically justifiable and efficient measure enabling a geometric interpretation of the effectiveness of the model applicable to the high dimensional feature spaces considered in deep learning. A smaller surface volume is expected to correspond to lower model complexity and better generalisation. We verify, on a number of image processing tasks with convolutional architectures that decision boundary volume is inversely proportional to classification accuracy. Meanwhile, the relationship between local surface volume and generalisation for fully connected architecture is observed to be less stable between tasks. Therefore, for network architectures suited to a particular data structure, we demonstrate that smoother decision boundaries lead to better performance, as our intuition would suggest.

new POLCA: Stochastic Generative Optimization with LLM

Authors: Xuanfei Ren, Allen Nie, Tengyang Xie, Ching-An Cheng

Abstract: Optimizing complex systems, ranging from LLM prompts to multi-turn agents, traditionally requires labor-intensive manual iteration. We formalize this challenge as a stochastic generative optimization problem where a generative language model acts as the optimizer, guided by numerical rewards and text feedback to discover the best system. We introduce Prioritized Optimization with Local Contextual Aggregation (POLCA), a scalable framework designed to handle stochasticity in optimization -- such as noisy feedback, sampling minibatches, and stochastic system behaviors -- while effectively managing the unconstrained expansion of solution space. POLCA maintains a priority queue to manage the exploration-exploitation tradeoff, systematically tracking candidate solutions and their evaluation histories. To enhance efficiency, we integrate an $\varepsilon$-Net mechanism to maintain parameter diversity and an LLM Summarizer to perform meta-learning across historical trials. We theoretically prove that POLCA converges to near-optimal candidate solutions under stochasticity. We evaluate our framework on diverse benchmarks, including $\tau$-bench, HotpotQA (agent optimization), VeriBench (code translation) and KernelBench (CUDA kernel generation). Experimental results demonstrate that POLCA achieves robust, sample and time-efficient performance, consistently outperforming state-of-the-art algorithms in both deterministic and stochastic problems. The codebase for this work is publicly available at https://github.com/rlx-lab/POLCA.

URLs: https://github.com/rlx-lab/POLCA.

new HO-SFL: Hybrid-Order Split Federated Learning with Backprop-Free Clients and Dimension-Free Aggregation

Authors: Qiyuan Chen, Xian Wu, Yi Wang, Xianhao Chen

Abstract: Fine-tuning large models on edge devices is severely hindered by the memory-intensive backpropagation (BP) in standard frameworks like federated learning and split learning. While substituting BP with zeroth-order optimization can significantly reduce memory footprints, it typically suffers from prohibitively degraded convergence speed. To resolve this dilemma, we propose Hybrid-Order Split Federated Learning (HO-SFL). By reformulating the split learning process within a Lagrangian framework, HO-SFL decouples the optimization landscape: The server performs precise first-order updates (i.e., BP), whereas clients conduct memory-efficient zeroth-order optimization. This hybrid design not only eliminates the need for client-side BP but also enables dimension-free model aggregation, drastically lowering communication costs. Crucially, we provide a theoretical convergence analysis, demonstrating that HO-SFL mitigates the dimension-dependent convergence slowdown of zeroth-order optimization, achieving a convergence rate comparable to first-order methods. Extensive experiments on tasks across vision and language modalities validate that HO-SFL achieves convergence speeds comparable to first-order baselines while significantly reducing communication costs and client memory footprints.

new Orthogonal Subspace Clustering: Enhancing High-Dimensional Data Analysis through Adaptive Dimensionality Reduction and Efficient Clustering

Authors: Qing-Yuan Wen, Da-Qing Zhang

Abstract: This paper presents Orthogonal Subspace Clustering (OSC), an innovative method for high-dimensional data clustering. We first establish a theoretical theorem proving that high-dimensional data can be decomposed into orthogonal subspaces in a statistical sense, whose form exactly matches the paradigm of Q-type factor analysis. This theorem lays a solid mathematical foundation for dimensionality reduction via matrix decomposition and factor analysis. Based on this theorem, we propose the OSC framework to address the "curse of dimensionality" -- a critical challenge that degrades clustering effectiveness due to sample sparsity and ineffective distance metrics. OSC integrates orthogonal subspace construction with classical clustering techniques, introducing a data-driven mechanism to select the subspace dimension based on cumulative variance contribution. This avoids manual selection biases while maximizing the retention of discriminative information. By projecting high-dimensional data into an uncorrelated, low-dimensional orthogonal subspace, OSC significantly improves clustering efficiency, robustness, and accuracy. Extensive experiments on various benchmark datasets demonstrate the effectiveness of OSC, with thorough analysis of evaluation metrics including Cluster Accuracy (ACC), Normalized Mutual Information (NMI), and Adjusted Rand Index (ARI) highlighting its advantages over existing methods.

new LaPro-DTA: Latent Dual-View Drug Representations and Salient Protein Feature Extraction for Generalizable Drug--Target Affinity Prediction

Authors: Zihan Dun, Liuyi Xu, An-Yang Lu, Shuang Li, Yining Qian

Abstract: Drug--target affinity prediction is pivotal for accelerating drug discovery, yet existing methods suffer from significant performance degradation in realistic cold-start scenarios (unseen drugs/targets/pairs), primarily driven by overfitting to training instances and information loss from irrelevant target sequences. In this paper, we propose LaPro-DTA, a framework designed to achieve robust and generalizable DTA prediction. To tackle overfitting, we devise a latent dual-view drug representation mechanism. It synergizes an instance-level view to capture fine-grained substructures with stochastic perturbation and a distribution-level view to distill generalized chemical scaffolds via semantic remapping, thereby enforcing the model to learn transferable structural rules rather than memorizing specific samples. To mitigate information loss, we introduce a salient protein feature extraction strategy using pattern-aware top-$k$ pooling, which effectively filters background noise and isolates high-response bioactive regions. Furthermore, a cross-view multi-head attention mechanism fuses these purified features to model comprehensive interactions. Extensive experiments on benchmark datasets demonstrate that LaPro-DTA significantly outperforms state-of-the-art methods, achieving an 8\% MSE reduction on the Davis dataset in the challenging unseen-drug setting, while offering interpretable insights into binding mechanisms.

new GARCH-FIS: A Hybrid Forecasting Model with Dynamic Volatility-Driven Parameter Adaptation

Authors: Wen-Jing Li, Da-Qing Zhang

Abstract: This paper proposes a novel hybrid model, termed GARCH-FIS, for recursive rolling multi-step forecasting of financial time series. It integrates a Fuzzy Inference System (FIS) with a Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model to jointly address nonlinear dynamics and time-varying volatility. The core innovation is a dynamic parameter adaptation mechanism for the FIS, specifically activated within the multi-step forecasting cycle. In this process, the conditional volatility estimated by a rolling window GARCH model is continuously translated into a price volatility measure. At each forecasting step, this measure, alongside the updated mean of the sliding window data -- which now incorporates the most recent predicted price -- jointly determines the parameters of the FIS membership functions for the next prediction. Consequently, the granularity of the fuzzy inference adapts as the forecast horizon extends: membership functions are automatically widened during high-volatility market regimes to bolster robustness and narrowed during stable periods to enhance precision. This constitutes a fundamental advancement over a static one-step-ahead prediction setup. Furthermore, the model's fuzzy rule base is automatically constructed from data using the Wang-Mendel method, promoting interpretability and adaptability. Empirical evaluation, focused exclusively on multi-step forecasting performance across ten diverse financial assets, demonstrates that the proposed GARCH-FIS model significantly outperforms benchmark models -- including Support Vector Regression(SVR), Long Short-Term Memory networks(LSTM), and an ARIMA-GARCH econometric model -- in terms of predictive accuracy and stability, while effectively mitigating error accumulation in extended recursive forecasts.

new Multi-Task Genetic Algorithm with Multi-Granularity Encoding for Protein-Nucleotide Binding Site Prediction

Authors: Yiming Gao, Liuyi Xu, Pengshan Cui, Yining Qian, An-Yang Lu, Xianpeng Wang

Abstract: Accurate identification of protein-nucleotide binding sites is fundamental to deciphering molecular mechanisms and accelerating drug discovery. However, current computational methods often struggle with suboptimal performance due to inadequate feature representation and rigid fusion mechanisms, which hinder the effective exploitation of cross-task information synergy. To bridge this gap, we propose MTGA-MGE, a framework that integrates a Multi-Task Genetic Algorithm with Multi-Granularity Encoding to enhance binding site prediction. Specifically, we develop a Multi-Granularity Encoding (MGE) network that synergizes multi-scale convolutions and self-attention mechanisms to distill discriminative signals from high-dimensional, redundant biological data. To overcome the constraints of static fusion, a genetic algorithm is employed to adaptively evolve task-specific fusion strategies, thereby effectively improving model generalization. Furthermore, to catalyze collaborative learning, we introduce an External-Neighborhood Mechanism (ENM) that leverages biological similarities to facilitate targeted information exchange across tasks. Extensive evaluations on fifteen nucleotide datasets demonstrate that MTGA-MGE not only establishes a new state-of-the-art in data-abundant, high-resource scenarios but also maintains a robust competitive edge in rare, low-resource regimes, presenting a highly adaptive scheme for decoding complex protein-ligand interactions in the post-genomic era.

new Universe Routing: Why Self-Evolving Agents Need Epistemic Control

Authors: Zhaohui Geoffrey Wang

Abstract: A critical failure mode of current lifelong agents is not lack of knowledge, but the inability to decide how to reason. When an agent encounters "Is this coin fair?" it must recognize whether to invoke frequentist hypothesis testing or Bayesian posterior inference - frameworks that are epistemologically incompatible. Mixing them produces not minor errors, but structural failures that propagate across decision chains. We formalize this as the universe routing problem: classifying questions into mutually exclusive belief spaces before invoking specialized solvers. Our key findings challenge conventional assumptions: (1) hard routing to heterogeneous solvers matches soft MoE accuracy while being 7x faster because epistemically incompatible frameworks cannot be meaningfully averaged; (2) a 465M-parameter router achieves a 2.3x smaller generalization gap than keyword-matching baselines, indicating semantic rather than surface-level reasoning; (3) when expanding to new belief spaces, rehearsal-based continual learning achieves zero forgetting, outperforming EWC by 75 percentage points, suggesting that modular epistemic architectures are fundamentally more amenable to lifelong learning than regularization-based approaches. These results point toward a broader architectural principle: reliable self-evolving agents may require an explicit epistemic control layer that governs reasoning framework selection.

new OpenReservoirComputing: GPU-Accelerated Reservoir Computing in JAX

Authors: Jan Williams, Dima Tretiak, Steven L. Brunton, J. Nathan Kutz, Krithika Manohar

Abstract: OpenReservoirComputing (ORC) is a Python library for reservoir computing (RC) written in JAX (Bradbury et al. 2018) and Equinox (Kidger and Garcia 2021). JAX is a Python library for high-performance numerical computing that enables automatic differentiation, just-in-time (JIT) compilation, and GPU/TPU acceleration, while Equinox is a neural network framework for JAX. RC is a form of machine learning that functions by lifting a low-dimensional sequence or signal into a high-dimensional dynamical system and training a simple, linear readout layer from the high-dimensional dynamics back to a lower-dimensional quantity of interest. The most common application of RC is time-series forecasting, where the goal is to predict a signal's future evolution. RC has achieved state-of-the-art performance on this task, particularly when applied to chaotic dynamical systems. In addition, RC approaches can be adapted to perform classification and control tasks. ORC provides both modular components for building custom RC models and built-in models for forecasting, classification, and control. By building on JAX and Equinox, ORC offers GPU acceleration, JIT compilation, and automatic vectorization. These capabilities make prototyping new models faster and enable larger and more powerful reservoir architectures. End-to-end differentiability also enables seamless integration with other deep learning models built with Equinox.

new Dataset Distillation Efficiently Encodes Low-Dimensional Representations from Gradient-Based Learning of Non-Linear Tasks

Authors: Yuri Kinoshita, Naoki Nishikawa, Taro Toyoizumi

Abstract: Dataset distillation, a training-aware data compression technique, has recently attracted increasing attention as an effective tool for mitigating costs of optimization and data storage. However, progress remains largely empirical. Mechanisms underlying the extraction of task-relevant information from the training process and the efficient encoding of such information into synthetic data points remain elusive. In this paper, we theoretically analyze practical algorithms of dataset distillation applied to the gradient-based training of two-layer neural networks with width $L$. By focusing on a non-linear task structure called multi-index model, we prove that the low-dimensional structure of the problem is efficiently encoded into the resulting distilled data. This dataset reproduces a model with high generalization ability for a required memory complexity of $\tilde{\Theta}$$(r^2d+L)$, where $d$ and $r$ are the input and intrinsic dimensions of the task. To the best of our knowledge, this is one of the first theoretical works that include a specific task structure, leverage its intrinsic dimensionality to quantify the compression rate and study dataset distillation implemented solely via gradient-based algorithms.

new Ablate and Rescue: A Causal Analysis of Residual Stream Hyper-Connections

Authors: William Peng, Josheev Rai, Kevin Tseng, Siwei Wang, Sean Wu

Abstract: Multi-stream transformer architectures have recently been proposed as a promising direction for managing representation collapse and the vanishing gradient problem for residual connections, yet their internal mechanisms remain unexplored. In particular, the recently introduced Manifold-Constrained Hyper-Connections (mHC) architecture posits multiple residual streams with constrained interaction, but lacks in-depth mechanistic analysis. We present the first open-source mHC language model (https://huggingface.co/wgpeng/mhc-780m) and analyze the multiple-stream architecture with a suite of representation-level metrics and causal interventions to probe how parallel streams encode and utilize information. Specifically, we introduce a systematic stream ablation-and-rescue framework that enables direct causal comparison of residual streams during inference. Through targeted pairwise interventions and controlled recovery experiments, we distinguish functional redundancy from asymmetric utilization and reveal how information is distributed across streams beyond what is observable from representational similarity alone.

URLs: https://huggingface.co/wgpeng/mhc-780m)

new Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling

Authors: Joyjit Roy, Samaresh Kumar Singh

Abstract: Road crashes remain a leading cause of preventable fatalities. Existing prediction models predominantly produce binary outcomes, which offer limited actionable insights for real-time driver feedback. These approaches often lack continuous risk quantification, interpretability, and explicit consideration of vulnerable road users (VRUs), such as pedestrians and cyclists. This research introduces SafeDriver-IQ, a framework that transforms binary crash classifiers into continuous 0-100 safety scores by combining national crash statistics with naturalistic driving data from autonomous vehicles. The framework fuses National Highway Traffic Safety Administration (NHTSA) crash records with Waymo Open Motion Dataset scenarios, engineers domain-informed features, and incorporates a calibration layer grounded in transportation safety literature. Evaluation across 15 complementary analyses indicates that the framework reliably differentiates high-risk from low-risk driving conditions with strong discriminative performance. Findings further reveal that 87% of crashes involve multiple co-occurring risk factors, with non-linear compounding effects that increase the risk to 4.5x baseline. SafeDriver-IQ delivers proactive, explainable safety intelligence relevant to advanced driver-assistance systems (ADAS), fleet management, and urban infrastructure planning. This framework shifts the focus from reactive crash counting to real-time risk prevention.

new Integrating Weather Foundation Model and Satellite to Enable Fine-Grained Solar Irradiance Forecasting

Authors: Ziqing Ma, Kai Ying, Xinyue Gu, Tian Zhou, Tianyu Zhu, Haifan Zhang, Peisong Niu, Wang Zheng, Cong Bai, Liang Sun

Abstract: Accurate day-ahead solar irradiance forecasting is essential for integrating solar energy into the power grid. However, it remains challenging due to the pronounced diurnal cycle and inherently complex cloud dynamics. Current methods either lack fine-scale resolution (e.g., numerical weather prediction, weather foundation models) or degrade at longer lead times (e.g., satellite extrapolation). We propose Baguan-solar, a two-stage multimodal framework that fuses forecasts from Baguan, a global weather foundation model, with high-resolution geostationary satellite imagery to produce 24- hour irradiance forecasts at kilometer scale. Its decoupled two-stage design first forecasts day-night continuous intermediates (e.g., cloud cover) and then infers irradiance, while its modality fusion jointly preserves fine-scale cloud structures from satellite and large-scale constraints from Baguan forecasts. Evaluated over East Asia using CLDAS as ground truth, Baguan-solar outperforms strong baselines (including ECMWF IFS, vanilla Baguan, and SolarSeer), reducing RMSE by 16.08% and better resolving cloud-induced transients. An operational deployment of Baguan-solar has supported solar power forecasting in an eastern province in China, since July 2025. Our code is accessible at https://github.com/DAMO-DI-ML/Baguansolar. git.

URLs: https://github.com/DAMO-DI-ML/Baguansolar.

new Lost in Aggregation: On a Fundamental Expressivity Limit of Message-Passing Graph Neural Networks

Authors: Eran Rosenbluth

Abstract: We define a generic class of functions that captures most conceivable aggregations for Message-Passing Graph Neural Networks (MP-GNNs), and prove that any MP-GNN model with such aggregations induces only a polynomial number of equivalence classes on all graphs - while the number of non-isomorphic graphs is doubly-exponential (in number of vertices). Adding a familiar perspective, we observe that merely 2-iterations of Color Refinement (CR) induce at least an exponential number of equivalence classes, making the aforementioned MP-GNNs relatively infinitely weaker. Previous results state that MP-GNNs match full CR, however they concern a weak, 'non-uniform', notion of distinguishing-power where each graph size may required a different MP-GNN to distinguish graphs up to that size. Our results concern both distinguishing between non-equivariant vertices and distinguishing between non-isomorphic graphs.

new Sample-Efficient Hypergradient Estimation for Decentralized Bi-Level Reinforcement Learning

Authors: Mikoto Kudo, Takumi Tanabe, Akifumi Wachi, Youhei Akimoto

Abstract: Many strategic decision-making problems, such as environment design for warehouse robots, can be naturally formulated as bi-level reinforcement learning (RL), where a leader agent optimizes its objective while a follower solves a Markov decision process (MDP) conditioned on the leader's decisions. In many situations, a fundamental challenge arises when the leader cannot intervene in the follower's optimization process; it can only observe the optimization outcome. We address this decentralized setting by deriving the hypergradient of the leader's objective, i.e., the gradient of the leader's strategy that accounts for changes in the follower's optimal policy. Unlike prior hypergradient-based methods that require extensive data for repeated state visits or rely on gradient estimators whose complexity can increase substantially with the high-dimensional leader's decision space, we leverage the Boltzmann covariance trick to derive an alternative hypergradient formulation. This enables efficient hypergradient estimation solely from interaction samples, even when the leader's decision space is high-dimensional. Additionally, to our knowledge, this is the first method that enables hypergradient-based optimization for 2-player Markov games in decentralized settings. Experiments highlight the impact of hypergradient updates and demonstrate our method's effectiveness in both discrete and continuous state tasks.

new IgPose: A Generative Data-Augmented Pipeline for Robust Immunoglobulin-Antigen Binding Prediction

Authors: Tien-Cuong Bui, Injae Chung, Wonjun Lee, Junsu Ko, Juyong Lee

Abstract: Predicting immunoglobulin-antigen (Ig-Ag) binding remains a significant challenge due to the paucity of experimentally-resolved complexes and the limited accuracy of de novo Ig structure prediction. We introduce IgPose, a generalizable framework for Ig-Ag pose identification and scoring, built on a generative data-augmentation pipeline. To mitigate data scarcity, we constructed the Structural Immunoglobulin Decoy Database (SIDD), a comprehensive repository of high-fidelity synthetic decoys. IgPose integrates equivariant graph neural networks, ESM-2 embeddings, and gated recurrent units to synergistically capture both geometric and evolutionary features. We implemented interface-focused k-hop sampling with biologically guided pooling to enhance generalization across diverse interfaces. The framework comprises two sub-networks--IgPoseClassifier for binding pose discrimination and IgPoseScore for DockQ score estimation--and achieves robust performance on curated internal test sets and the CASP-16 benchmark compared to physics and deep learning baselines. IgPose serves as a versatile computational tool for high-throughput antibody discovery pipelines by providing accurate pose filtering and ranking. IgPose is available on GitHub (https://github.com/arontier/igpose).

URLs: https://github.com/arontier/igpose).

new Seismic full-waveform inversion based on a physics-driven generative adversarial network

Authors: Xinyi Zhang, Caiyun Liu, Jie Xiong, Qingfeng Yu

Abstract: Objectives: Full-waveform inversion (FWI) is a high-resolution geophysical imaging technique that reconstructs subsurface velocity models by iteratively minimizing the misfit between predicted and observed seismic data. However, under complex geological conditions, conventional FWI suffers from strong dependence on the initial model and tends to produce unstable results when the data are sparse or contaminated by noise. Methods: To address these limitations, this paper proposes a physics-driven generative adversarial network-based full-waveform inversion method. The proposed approach integrates the data-driven capability of deep neural networks with the physical constraints imposed by the seismic wave equation, and employs adversarial training through a discriminator to enhance the stability and robustness of the inversion results. Results: Experimental results on two representative benchmark geological models demonstrate that the proposed method can effectively recover complex velocity structures and achieves superior performance in terms of structural similarity (SSIM) and signal-to-noise ratio (SNR). Conclusions: This method provides a promising solution for alleviating the initial-model dependence in full-waveform inversion and shows strong potential for practical applications.

new Informative Perturbation Selection for Uncertainty-Aware Post-hoc Explanations

Authors: Sumedha Chugh, Ranjitha Prasad, Nazreen Shah

Abstract: Trust and ethical concerns due to the widespread deployment of opaque machine learning (ML) models motivating the need for reliable model explanations. Post-hoc model-agnostic explanation methods addresses this challenge by learning a surrogate model that approximates the behavior of the deployed black-box ML model in the locality of a sample of interest. In post-hoc scenarios, neither the underlying model parameters nor the training are available, and hence, this local neighborhood must be constructed by generating perturbed inputs in the neighborhood of the sample of interest, and its corresponding model predictions. We propose \emph{Expected Active Gain for Local Explanations} (\texttt{EAGLE}), a post-hoc model-agnostic explanation framework that formulates perturbation selection as an information-theoretic active learning problem. By adaptively sampling perturbations that maximize the expected information gain, \texttt{EAGLE} efficiently learns a linear surrogate explainable model while producing feature importance scores along with the uncertainty/confidence estimates. Theoretically, we establish that cumulative information gain scales as $\mathcal{O}(d \log t)$, where $d$ is the feature dimension and $t$ represents the number of samples, and that the sample complexity grows linearly with $d$ and logarithmically with the confidence parameter $1/\delta$. Empirical results on tabular and image datasets corroborate our theoretical findings and demonstrate that \texttt{EAGLE} improves explanation reproducibility across runs, achieves higher neighborhood stability, and improves perturbation sample quality as compared to state-of-the-art baselines such as Tilia, US-LIME, GLIME and BayesLIME.

new BiTro: Bidirectional Transfer Learning Enhances Bulk and Spatial Transcriptomics Prediction in Cancer Pathological Images

Authors: Jingkun Yu, Guangkai Shang, Changtao Li, Xun Gong, Tianrui Li, Yazhou He, Zhipeng Luo

Abstract: Cancer pathological analysis requires modeling tumor heterogeneity across multiple modalities, primarily through transcriptomics and whole slide imaging (WSI), along with their spatial relations. On one hand, bulk transcriptomics and WSI images are largely available but lack spatial mapping; on the other hand, spatial transcriptomics (ST) data can offer high spatial resolution, yet facing challenges of high cost, low sequencing depth, and limited sample sizes. Therefore, the data foundation of either side is flawed and has its limit in accurately finding the mapping between the two modalities. To this end, we propose BiTro, a bidirectional transfer learning framework that can enhance bulk and spatial transcriptomics prediction from pathological images. Our contributions are twofold. First, we design a universal and transferable model architecture that works for both bulk+WSI and ST data. A major highlight is that we model WSI images on the cellular level to better capture cells' visual features, morphological phenotypes, and their spatial relations; to map cells' features to their transcriptomics measured in bulk or ST, we adopt multiple instance learning. Second, by using LoRA, our model can be efficiently transferred between bulk and ST data to exploit their complementary information. To test our framework, we conducted comprehensive experiments on five cancer datasets. Results demonstrate that 1) our base model can achieve better or competitive performance compared to existing models on bulk or spatial transcriptomics prediction, and 2) transfer learning can further improve the base model's performance.

new Directional Routing in Transformers

Authors: Kevin Taylor

Abstract: We introduce directional routing, a lightweight mechanism that gives each transformer attention head learned suppression directions controlled by a shared router, at 3.9% parameter cost. We train a 433M-parameter model alongside an identical baseline in a single run, then trace the resulting circuits through mechanistic interpretability. Routing becomes the model's dominant computational pathway. Disabling it collapses factual recall to near-zero probability across all 8 test prompts and drops induction accuracy from 93.4% to 0.0%. Knocking out individual attention heads has negligible effect: the primary mover head's removal actually increases target probability, and induction heads retain 98.6% accuracy without their strongest member. The coordination mechanism is irreplaceable; the components it coordinates are not. The model also self-organizes, without explicit pressure, into two regimes: domain-adaptive routing in early layers and fixed syntactic pruning in late layers, where the least-varying layer is the most critical (+42.6 PPL when disabled). Routing reduces perplexity 31-56% relative to the baseline, though downstream multiple-choice benchmarks do not yet reflect these gains.

new LLM as Graph Kernel: Rethinking Message Passing on Text-Rich Graphs

Authors: Ying Zhang, Hang Yu, Haipeng Zhang, Peng Di

Abstract: Text-rich graphs, which integrate complex structural dependencies with abundant textual information, are ubiquitous yet remain challenging for existing learning paradigms. Conventional methods and even LLM-hybrids compress rich text into static embeddings or summaries before structural reasoning, creating an information bottleneck and detaching updates from the raw content. We argue that in text-rich graphs, the text is not merely a node attribute but the primary medium through which structural relationships are manifested. We introduce RAMP, a Raw-text Anchored Message Passing approach that moves beyond using LLMs as mere feature extractors and instead recasts the LLM itself as a graph-native aggregation operator. RAMP exploits the text-rich nature of the graph via a novel dual-representation scheme: it anchors inference on each node's raw text during each iteration while propagating dynamically optimized messages from neighbors. It further handles both discriminative and generative tasks under a single unified generative formulation. Extensive experiments show that RAMP effectively bridges the gap between graph propagation and deep text reasoning, achieving competitive performance and offering new insights into the role of LLMs as graph kernels for general-purpose graph learning.

new Ultra-Early Prediction of Tipping Points: Integrating Dynamical Measures with Reservoir Computing

Authors: Xin Li, Qunxi Zhu, Chengli Zhao, Bolin Zhao, Xue Zhang, Xiaojun Duan, Wei Lin

Abstract: Complex dynamical systems-such as climate, ecosystems, and economics-can undergo catastrophic and potentially irreversible regime changes, often triggered by environmental parameter drift and stochastic disturbances. These critical thresholds, known as tipping points, pose a prediction problem of both theoretical and practical significance, yet remain largely unresolved. To address this, we articulate a model-free framework that integrates the measures characterizing the stability and sensitivity of dynamical systems with the reservoir computing (RC), a lightweight machine learning technique, using only observational time series data. The framework consists of two stages. The first stage involves using RC to robustly learn local complex dynamics from observational data segmented into windows. The second stage focuses on accurately detecting early warning signals of tipping points by analyzing the learned autonomous RC dynamics through dynamical measures, including the dominant eigenvalue of the Jacobian matrix, the maximum Floquet multiplier, and the maximum Lyapunov exponent. Furthermore, when these dynamical measures exhibit trend-like patterns, their extrapolation enables ultra-early prediction of tipping points significantly prior to the occurrence of critical transitions. We conduct a rigorous theoretical analysis of the proposed method and perform extensive numerical evaluations on a series of representative synthetic systems and eight real-world datasets, as well as quantitatively predict the tipping time of the Atlantic Meridional Overturning Circulation system. Experimental results demonstrate that our framework exhibits advantages over the baselines in comprehensive evaluations, particularly in terms of dynamical interpretability, prediction stability and robustness, and ultra-early prediction capability.

new Spiking Layer-Adaptive Magnitude-based Pruning

Authors: Junqiao Wang, Zhehang Ye, Yuqi Ouyang

Abstract: Spiking Neural Networks (SNNs) provide energy-efficient computation but their deployment is constrained by dense connectivity and high spiking operation costs. Existing magnitude-based pruning strategies, when naively applied to SNNs, fail to account for temporal accumulation, non-uniform timestep contributions, and membrane stability, often leading to severe performance degradation. This paper proposes Spiking Layer-Adaptive Magnitude-based Pruning (SLAMP), a theory-guided pruning framework that generalizes layer-adaptive magnitude pruning to temporal SNNs by explicitly controlling worst-case output distortion across layers and timesteps. SLAMP formulates sparsity allocation as a temporal distortion-constrained optimization problem, yielding time-aware layer importance scores that reduce to conventional layer-adaptive pruning in single-timestep limit. An efficient two-stage procedure is derived, combining temporal score estimation, global sparsity allocation, and magnitude pruning with retraining for stability recovery. Experiments on CIFAR10, CIFAR100, and the event-based CIFAR10-DVS datasets demonstrate that SLAMP achieves substantial connectivity and spiking operation reductions while preserving accuracy, enabling efficient and deployable SNN inference.

new FairMed-XGB: A Bayesian-Optimised Multi-Metric Framework with Explainability for Demographic Equity in Critical Healthcare Data

Authors: Mitul Goswami, Romit Chatterjee, Arif Ahmed Sekh

Abstract: Machine learning models deployed in critical care settings exhibit demographic biases, particularly gender disparities, that undermine clinical trust and equitable treatment. This paper introduces FairMed-XGB, a novel framework that systematically detects and mitigates gender-based prediction bias while preserving model performance and transparency. The framework integrates a fairness-aware loss function combining Statistical Parity Difference, Theil Index, and Wasserstein Distance, jointly optimised via Bayesian Search into an XGBoost classifier. Post-mitigation evaluation on seven clinically distinct cohorts derived from the MIMIC-IV-ED and eICU databases demonstrates substantial bias reduction: Statistical Parity Difference decreases by 40 to 51 percent on MIMIC-IV-ED and 10 to 19 percent on eICU; Theil Index collapses by four to five orders of magnitude to near-zero values; Wasserstein Distance is reduced by 20 to 72 percent. These gains are achieved with negligible degradation in predictive accuracy (AUC-ROC drop <0.02). SHAP-based explainability reveals that the framework diminishes reliance on gender-proxy features, providing clinicians with actionable insights into how and where bias is corrected. FairMed-XGB offers a robust, interpretable, and ethically aligned solution for equitable clinical decision-making, paving the way for trustworthy deployment of AI in high-stakes healthcare environments.

new SFedHIFI: Fire Rate-Based Heterogeneous Information Fusion for Spiking Federated Learning

Authors: Ran Tao, Qiugang Zhan, Shantian Yang, Xiurui Xie, Qi Tian, Guisong Liu

Abstract: Spiking Federated Learning (SFL) has been widely studied with the energy efficiency of Spiking Neural Networks (SNNs). However, existing SFL methods require model homogeneity and assume all clients have sufficient computational resources, resulting in the exclusion of some resource-constrained clients. To address the prevalent system heterogeneity in real-world scenarios, enabling heterogeneous SFL systems that allow clients to adaptively deploy models of different scales based on their local resources is crucial. To this end, we introduce SFedHIFI, a novel Spiking Federated Learning framework with Fire Rate-Based Heterogeneous Information Fusion. Specifically, SFedHIFI employs channel-wise matrix decomposition to deploy SNN models of adaptive complexity on clients with heterogeneous resources. Building on this, the proposed heterogeneous information fusion module enables cross-scale aggregation among models of different widths, thereby enhancing the utilization of diverse local knowledge. Extensive experiments on three public benchmarks demonstrate that SFedHIFI can effectively enable heterogeneous SFL, consistently outperforming all three baseline methods. Compared with ANN-based FL, it achieves significant energy savings with only a marginal trade-off in accuracy.

new Lightweight User-Personalization Method for Closed Split Computing

Authors: Yuya Okada, Takayuki Nishio

Abstract: Split Computing enables collaborative inference between edge devices and the cloud by partitioning a deep neural network into an edge-side head and a server-side tail, reducing latency and limiting exposure of raw input data. However, inference performance often degrades in practical deployments due to user-specific data distribution shifts, unreliable communication, and privacy-oriented perturbations, especially in closed environments where model architectures and parameters are inaccessible. To address this challenge, we propose SALT (Split-Adaptive Lightweight Tuning), a lightweight adaptation framework for closed Split Computing systems. SALT introduces a compact client-side adapter that refines intermediate representations produced by a frozen head network, enabling effective model adaptation without modifying the head or tail networks or increasing communication overhead. By modifying only the training conditions, SALT supports multiple adaptation objectives, including user personalization, communication robustness, and privacy-aware inference. Experiments using ResNet-18 on CIFAR-10 and CIFAR-100 show that SALT achieves higher accuracy than conventional retraining and fine-tuning while significantly reducing training cost. On CIFAR-10, SALT improves personalized accuracy from 88.1% to 93.8% while reducing training latency by more than 60%. SALT also maintains over 90% accuracy under 75% packet loss and preserves high accuracy (about 88% at sigma = 1.0) under noise injection. These results demonstrate that SALT provides an efficient and practical adaptation framework for real-world Split Computing systems.

new How Log-Barrier Helps Exploration in Policy Optimization

Authors: Leonardo Cesani, Matteo Papini, Marcello Restelli

Abstract: Recently, it has been shown that the Stochastic Gradient Bandit (SGB) algorithm converges to a globally optimal policy with a constant learning rate. However, these guarantees rely on unrealistic assumptions about the learning process, namely that the probability of the optimal action is always bounded away from zero. We attribute this to the lack of an explicit exploration mechanism in SGB. To address these limitations, we propose to regularize the SGB objective with a log-barrier on the parametric policy, structurally enforcing a minimal amount of exploration. We prove that Log-Barrier Stochastic Gradient Bandit (LB-SGB) matches the sample complexity of SGB, but also converges (at a slower rate) without any assumptions on the learning process. We also show a connection between the log-barrier regularization and Natural Policy Gradient, as both exploit the geometry of the policy space by controlling the Fisher information. We validate our theoretical findings through numerical simulations, showing the benefits of the log-barrier regularization.

new MONET: Modeling and Optimization of neural NEtwork Training from Edge to Data Centers

Authors: J\'er\'emy Morlier, Robin Geens, Stef Cuyckens, Arne Symons, Marian Verhelst, Vincent Gripon, Mathieu L\'eonardon

Abstract: While hardware-software co-design has significantly improved the efficiency of neural network inference, modeling the training phase remains a critical yet underexplored challenge. Training workloads impose distinct constraints, particularly regarding memory footprint and backpropagation complexity, which existing inference-focused tools fail to capture. This paper introduces MONET, a framework designed to model the training of neural networks on heterogeneous dataflow accelerators. MONET builds upon Stream, an experimentally verified framework that that models the inference of neural networks on heterogeneous dataflow accelerators with layer fusion. Using MONET, we explore the design space of ResNet-18 and a small GPT-2, demonstrating the framework's capability to model training workflows and find better hardware architectures. We then further examine problems that become more complex in neural network training due to the larger design space, such as determining the best layer-fusion configuration. Additionally, we use our framework to find interesting trade-offs in activation checkpointing, with the help of a genetic algorithm. Our findings highlight the importance of a holistic approach to hardware-software co-design for scalable and efficient deep learning deployment.

new TrajFlow: Nation-wide Pseudo GPS Trajectory Generation with Flow Matching Models

Authors: Peiran Li, Jiawei Wang, Haoran Zhang, Xiaodan Shi, Noboru Koshizuka, Chihiro Shimizu, Renhe Jiang

Abstract: The importance of mobile phone GPS trajectory data is widely recognized across many fields, yet the use of real data is often hindered by privacy concerns, limited accessibility, and high acquisition costs. As a result, generating pseudo-GPS trajectory data has become an active area of research. Recent diffusion-based approaches have achieved strong fidelity but remain limited in spatial scale (small urban areas), transportation-mode diversity, and efficiency (requiring numerous sampling steps). To address these challenges, we introduce TrajFlow, which to the best of our knowledge is the first flow-matching-based generative model for GPS trajectory generation. TrajFlow leverages the flow-matching paradigm to improve robustness and efficiency across multiple geospatial scales, and incorporates a trajectory harmonization and reconstruction strategy to jointly address scalability, diversity, and efficiency. Using a nationwide mobile phone GPS dataset with millions of trajectories across Japan, we show that TrajFlow or its variants consistently outperform diffusion-based and deep generative baselines at urban, metropolitan, and nationwide levels. As the first nationwide, multi-scale GPS trajectory generation model, TrajFlow demonstrates strong potential to support inter-region urban planning, traffic management, and disaster response, thereby advancing the resilience and intelligence of future mobility systems.

new Rethinking Machine Unlearning: Models Designed to Forget via Key Deletion

Authors: Sonia Laguna, Jorge da Silva Goncalves, Moritz Vandenhirtz, Alain Ryser, Irene Cannistraci, Julia E. Vogt

Abstract: Machine unlearning is rapidly becoming a practical requirement, driven by privacy regulations, data errors, and the need to remove harmful or corrupted training samples. Despite this, most existing methods tackle the problem purely from a post-hoc perspective. They attempt to erase the influence of targeted training samples through parameter updates that typically require access to the full training data. This creates a mismatch with real deployment scenarios where unlearning requests can be anticipated, revealing a fundamental limitation of post-hoc approaches. We propose \textit{unlearning by design}, a novel paradigm in which models are directly trained to support forgetting as an inherent capability. We instantiate this idea with Machine UNlearning via KEY deletion (MUNKEY), a memory augmented transformer that decouples instance-specific memorization from model weights. Here, unlearning corresponds to removing the instance-identifying key, enabling direct zero-shot forgetting without weight updates or access to the original samples or labels. Across natural image benchmarks, fine-grained recognition, and medical datasets, MUNKEY outperforms all post-hoc baselines. Our results establish that unlearning by design enables fast, deployment-oriented unlearning while preserving predictive performance.

new CrossADR: enhancing adverse drug reactions prediction for combination pharmacotherapy with cross-layer feature integration and cross-level associative learning

Authors: Y. Cheung

Abstract: Combination pharmacotherapy offers substantial therapeutic advantages but also poses substantial risks of adverse drug reactions (ADRs). The accurate prediction of ADRs with interpretable computational methods is crucial for clinical safety management, drug development, and precision medicine. However, managing ADRs remains a challenge due to the vast search space of drug combinations and the complexity of physiological responses. Current graph-based architectures often struggle to effectively integrate multi-scale biological information and frequently rely on fixed association matrices, which limits their ability to capture dynamic organ-level dependencies and generalize across diverse datasets. Here we propose CrossADR, a hierarchical framework for organ-level ADR prediction through cross-layer feature integration and cross-level associative learning. It incorporates a gated-residual-flow graph neural network to fuse multi-scale molecular features and utilizes a learnable ADR embedding space to dynamically capture latent biological correlations across 15 organ systems. Systematic evaluation on the newly constructed CrossADR-Dataset-covering 1,376 drugs and 946,000 unique combinations-demonstrates that CrossADR consistently achieves state-of-the-art performance across 80 distinct experimental scenarios and provides high-resolution insights into drug-related protein protein interactions and pathways. Overall, CrossADR represents a robust tool for cross-scale biomedical information integration, cross-layer feature integration as well as cross-level associative learning, and can be effectively utilized to prevent ADRs in clinical decision-making.

new Muon Converges under Heavy-Tailed Noise: Nonconvex H\"{o}lder-Smooth Empirical Risk Minimization

Authors: Hideaki Iiduka

Abstract: Muon is a recently proposed optimizer that enforces orthogonality in parameter updates by projecting gradients onto the Stiefel manifold, leading to stable and efficient training in large-scale deep neural networks. Meanwhile, the previously reported results indicated that stochastic noise in practical machine learning may exhibit heavy-tailed behavior, violating the bounded-variance assumption. In this paper, we consider the problem of minimizing a nonconvex H\"{o}lder-smooth empirical risk that works well with the heavy-tailed stochastic noise. We then show that Muon converges to a stationary point of the empirical risk under the boundedness condition accounting for heavy-tailed stochastic noise. In addition, we show that Muon converges faster than mini-batch SGD.

new Interpretable Classification of Time Series Using Euler Characteristic Surfaces

Authors: Salam Rabindrajit Luwang, Sushovan Majhi, Vishal Mandal, Atish J. Mitra, Md. Nurujjaman, Buddha Nath Sharma

Abstract: Persistent homology (PH) -- the conventional method in topological data analysis -- is computationally expensive, requires further vectorization of its signatures before machine learning (ML) can be applied, and captures information along only the spatial axis. For time series data, we propose Euler Characteristic Surfaces (ECS) as an alternative topological signature based on the Euler characteristic ($\chi$) -- a fundamental topological invariant. The ECS provides a computationally efficient, spatiotemporal, and inherently discretized feature representation that can serve as direct input to ML models. We prove a stability theorem guaranteeing that the ECS remains stable under small perturbations of the input time series. We first demonstrate that ECS effectively captures the nontrivial topological differences between the limit cycle and the strange attractor in the R\"{o}ssler system. We then develop an ECS-based classification framework and apply it to five benchmark biomedical datasets (four ECG, one EEG) from the UCR/UEA archive. On $\textit{ECG5000}$, our single-feature ECS classifier achieves $98\%$ accuracy with $O(n+R\cdot T)$ complexity, compared to $62\%$ reported by a recent PH-based method. An AdaBoost extension raises accuracy to $98.6\%$, matching the best deep learning results while retaining full interpretability. Strong results are also obtained on $\textit{TwoLeadECG}$ ($94.1\%$) and $\textit{Epilepsy2}$ ($92.6\%$).

new Sampling-guided exploration of active feature selection policies

Authors: Gabriel Bernardino, Anders Jonsson, Patrick Clarysse, Nicolas Duchateau

Abstract: Determining the most appropriate features for machine learning predictive models is challenging regarding performance and feature acquisition costs. In particular, global feature choice is limited given that some features will only benefit a subset of instances. In previous work, we proposed a reinforcement learning approach to sequentially recommend which modality to acquire next to reach the best information/cost ratio, based on the instance-specific information already acquired. We formulated the problem as a Markov Decision Process where the state's dimensionality changes during the episode, avoiding data imputation, contrary to existing works. However, this only allowed processing a small number of features, as all possible combinations of features were considered. Here, we address these limitations with two contributions: 1) we expand our framework to larger datasets with a heuristic-based strategy that focuses on the most promising feature combinations, and 2) we introduce a post-fit regularisation strategy that reduces the number of different feature combinations, leading to compact sequences of decisions. We tested our method on four binary classification datasets (one involving high-dimensional variables), the largest of which had 56 features and 4500 samples. We obtained better performance than state-of-the-art methods, both in terms of accuracy and policy complexity.

new Establishing Construct Validity in LLM Capability Benchmarks Requires Nomological Networks

Authors: Timo Freiesleben

Abstract: Recent work in machine learning increasingly attributes human-like capabilities such as reasoning or theory of mind to large language models (LLMs) on the basis of benchmark performance. This paper examines this practice through the lens of construct validity, understood as the problem of linking theoretical capabilities to their empirical measurements. It contrasts three influential frameworks: the nomological account developed by Cronbach and Meehl, the inferential account proposed by Messick and refined by Kane, and Borsboom's causal account. I argue that the nomological account provides the most suitable foundation for current LLM capability research. It avoids the strong ontological commitments of the causal account while offering a more substantive framework for articulating construct meaning than the inferential account. I explore the conceptual implications of adopting the nomological account for LLM research through a concrete case: the assessment of reasoning capabilities in LLMs.

new Safe Flow Q-Learning: Offline Safe Reinforcement Learning with Reachability-Based Flow Policies

Authors: Mumuksh Tayal, Manan Tayal, Ravi Prakash

Abstract: Offline safe reinforcement learning (RL) seeks reward-maximizing policies from static datasets under strict safety constraints. Existing methods often rely on soft expected-cost objectives or iterative generative inference, which can be insufficient for safety-critical real-time control. We propose Safe Flow Q-Learning (SafeFQL), which extends FQL to safe offline RL by combining a Hamilton--Jacobi reachability-inspired safety value function with an efficient one-step flow policy. SafeFQL learns the safety value via a self-consistency Bellman recursion, trains a flow policy by behavioral cloning, and distills it into a one-step actor for reward-maximizing safe action selection without rejection sampling at deployment. To account for finite-data approximation error in the learned safety boundary, we add a conformal prediction calibration step that adjusts the safety threshold and provides finite-sample probabilistic safety coverage. Empirically, SafeFQL trades modestly higher offline training cost for substantially lower inference latency than diffusion-style safe generative baselines, which is advantageous for real-time safety-critical deployment. Across boat navigation, and Safety Gymnasium MuJoCo tasks, SafeFQL matches or exceeds prior offline safe RL performance while substantially reducing constraint violations.

new Accelerating Byzantine-Robust Distributed Learning with Compressed Communication via Double Momentum and Variance Reduction

Authors: Yanghao Li, Changxin Liu, Yuhao Yi

Abstract: In collaborative and distributed learning, Byzantine robustness reflects a major facet of optimization algorithms. Such distributed algorithms are often accompanied by transmitting a large number of parameters, so communication compression is essential for an effective solution. In this paper, we propose Byz-DM21, a novel Byzantine-robust and communication-efficient stochastic distributed learning algorithm. Our key innovation is a novel gradient estimator based on a double-momentum mechanism, integrating recent advancements in error feedback techniques. Using this estimator, we design both standard and accelerated algorithms that eliminate the need for large batch sizes while maintaining robustness against Byzantine workers. We prove that the Byz-DM21 algorithm has a smaller neighborhood size and converges to $\varepsilon$-stationary points in $\mathcal{O}(\varepsilon^{-4})$ iterations. To further enhance efficiency, we introduce a distributed variant called Byz-VR-DM21, which incorporates local variance reduction at each node to progressively eliminate variance from random approximations. We show that Byz-VR-DM21 provably converges to $\varepsilon$-stationary points in $\mathcal{O}(\varepsilon^{-3 })$ iterations. Additionally, we extend our results to the case where the functions satisfy the Polyak-{\L}ojasiewicz condition. Finally, numerical experiments demonstrate the effectiveness of the proposed method.

new Point-Identification of a Robust Predictor Under Latent Shift with Imperfect Proxies

Authors: Zahra Rahiminasab, Reza Soumi, Arto Klami, Samuel Kaski

Abstract: Addressing the domain adaptation problem becomes more challenging when distribution shifts across domains stem from latent confounders that affect both covariates and outcomes. Existing proxy-based approaches that address latent shift rely on a strong completeness assumption to uniquely determine (point-identify) a robust predictor. Completeness requires that proxies have sufficient information about variations in latent confounders. For imperfect proxies the mapping from confounders to the space of proxy distributions is non-injective, and multiple latent confounder values can generate the same proxy distribution. This breaks the completeness assumption and observed data are consistent with multiple potential predictors (set-identified). To address this, we introduce latent equivalent classes (LECs). LECs are defined as groups of latent confounders that induce the same conditional proxy distribution. We show that point-identification for the robust predictor remains achievable as long as multiple domains differ sufficiently in how they mix proxy-induced LECs to form the robust predictor. This domain diversity condition is formalized as a cross-domain rank condition on the mixture weights, which is substantially weaker assumption than completeness. We introduce the Proximal Quasi-Bayesian Active learning (PQAL) framework, which actively queries a minimal set of diverse domains that satisfy this rank condition. PQAL can efficiently recover the point-identified predictor, demonstrates robustness to varying degrees of shift and outperforms previous methods on synthetic data and semi-synthetic dSprites dataset.

new CATFormer: When Continual Learning Meets Spiking Transformers With Dynamic Thresholds

Authors: Vaishnavi Nagabhushana, Kartikay Agrawal, Ayon Borthakur

Abstract: Although deep neural networks perform extremely well in controlled environments, they fail in real-world scenarios where data isn't available all at once, and the model must adapt to a new data distribution that may or may not follow the initial distribution. Previously acquired knowledge is lost during subsequent updates based on new data. a phenomenon commonly known as catastrophic forgetting. In contrast, the brain can learn without such catastrophic forgetting, irrespective of the number of tasks it encounters. Existing spiking neural networks (SNNs) for class-incremental learning (CIL) suffer a sharp performance drop as tasks accumulate. We here introduce CATFormer (Context Adaptive Threshold Transformer), a scalable framework that overcomes this limitation. We observe that the key to preventing forgetting in SNNs lies not only in synaptic plasticity but also in modulating neuronal excitability. At the core of CATFormer is the Dynamic Threshold Leaky Integrate-and-Fire (DTLIF) neuron model, which leverages context-adaptive thresholds as the primary mechanism for knowledge retention. This is paired with a Gated Dynamic Head Selection (G-DHS) mechanism for task-agnostic inference. Extensive evaluation on both static (CIFAR-10/100/Tiny-ImageNet) and neuromorphic (CIFAR10-DVS/SHD) datasets reveals that CATFormer outperforms existing rehearsal-free CIL algorithms across various task splits, establishing it as an ideal architecture for energy-efficient, true-class incremental learning.

new Joint Routing and Model Pruning for Decentralized Federated Learning in Bandwidth-Constrained Multi-Hop Wireless Networks

Authors: Xiaoyu He, Weicai Li, Tiejun Lv, Xi Yu

Abstract: Decentralized federated learning (D-FL) enables privacy-preserving training without a central server, but multi-hop model exchanges and aggregation are often bottlenecked by communication resource constraints. To address this issue, we propose a joint routing-and-pruning framework that optimizes routing paths and pruning rates to maintain communication latency within prescribed limits. We analyze how the sum of model biases across all clients affects the convergence bound of D-FL and formulate an optimization problem that maximizes the model retention rate to minimize these biases under communication constraints. Further analysis reveals that each client's model retention rate is path-dependent, which reduces the original problem to a routing optimization. Leveraging this insight, we develop a routing algorithm that selects latency-efficient transmission paths, allowing more parameters to be delivered within the time budget and thereby improving D-FL convergence. Simulations demonstrate that, compared with unpruned systems, the proposed framework reduces average transmission latency by 27.8% and improves testing accuracy by approximately 12%. Furthermore, relative to standard benchmark routing algorithms, the proposed routing method improves accuracy by roughly 8%.

new PiGRAND: Physics-informed Graph Neural Diffusion for Intelligent Additive Manufacturing

Authors: Benjamin Uhrich, Tim H\"antschel, Erhard Rahm

Abstract: A comprehensive understanding of heat transport is essential for optimizing various mechanical and engineering applications, including 3D printing. Recent advances in machine learning, combined with physics-based models, have enabled a powerful fusion of numerical methods and data-driven algorithms. This progress is driven by the availability of limited sensor data in various engineering and scientific domains, where the cost of data collection and the inaccessibility of certain measurements are high. To this end, we present PiGRAND, a Physics-informed graph neural diffusion framework. In order to reduce the computational complexity of graph learning, an efficient graph construction procedure was developed. Our approach is inspired by the explicit Euler and implicit Crank-Nicolson methods for modeling continuous heat transport, leveraging sub-learning models to secure the accurate diffusion across graph nodes. To enhance computational performance, our approach is combined with efficient transfer learning. We evaluate PiGRAND on thermal images from 3D printing, demonstrating significant improvements in prediction accuracy and computational performance compared to traditional graph neural diffusion (GRAND) and physics-informed neural networks (PINNs). These enhancements are attributed to the incorporation of physical principles derived from the theoretical study of partial differential equations (PDEs) into the learning model. The PiGRAND code is open-sourced on GitHub: https://github.com/bu32loxa/PiGRAND

URLs: https://github.com/bu32loxa/PiGRAND

new Massive Redundancy in Gradient Transport Enables Sparse Online Learning

Authors: Aur Shalev Merin

Abstract: Real-time recurrent learning (RTRL) computes exact online gradients by propagating a Jacobian tensor forward through recurrent dynamics, but at O(n^4) cost per step. Prior work has sought structured approximations (rank-1 compression, graph-based sparsity, Kronecker factorization). We show that, in the continuous error signal regime, the recurrent Jacobian is massively redundant:propagating through a random 6% of paths (k=4 of n=64) recovers 84 +/- 6% of full RTRL's adaptation ability across five seeds, and the absolute count k=4 remains effective from n=64 to n=256 (6% to 1.6%, recovery 84 to 78%), meaning sparse RTRL becomes relatively cheaper as networks grow. In RNNs, the recovery is selection-invariant (even adversarial path selection works) and exhibits a step-function transition from zero to any nonzero propagation. Spectral analysis reveals the mechanism: the Jacobian is full-rank but near-isotropic (condition numbers 2.6-6.5), so any random subset provides a directionally representative gradient estimate. On chaotic dynamics (Lorenz attractor), sparse propagation is more numerically stable than full RTRL (CV 13% vs. 88%), as subsampling avoids amplifying pathological spectral modes. The redundancy extends to LSTMs (k=4 matches full RTRL) and to transformers via sparse gradient transport (50% head sparsity outperforms the dense reference; 33% is borderline), with higher thresholds reflecting head specialization rather than isotropy. On real primate neural data, sparse RTRL (k=4) adapts online to cross-session electrode drift (80 +/- 11% recovery, 5 seeds), where sparse propagation is again more stable than full RTRL. Without continuous error signal, Jacobian propagation accumulates numerical drift and degrades all RTRL variants, a scope condition for all forward-mode methods. Results hold with SGD (92 +/- 1% recovery), suggesting independence from optimizer choice.

new Towards Foundation Models for Consensus Rank Aggregation

Authors: Yijun Jin, Simon Kl\"uttermann, Chiara Balestra, Emmanuel M\"uller

Abstract: Aggregating a consensus ranking from multiple input rankings is a fundamental problem with applications in recommendation systems, search engines, job recruitment, and elections. Despite decades of research in consensus ranking aggregation, minimizing the Kemeny distance remains computationally intractable. Specifically, determining an optimal aggregation of rankings with respect to the Kemeny distance is an NP-hard problem, limiting its practical application to relatively small-scale instances. We propose the Kemeny Transformer, a novel Transformer-based algorithm trained via reinforcement learning to efficiently approximate the Kemeny optimal ranking. Experimental results demonstrate that our model outperforms classical majority-heuristic and Markov-chain approaches, achieving substantially faster inference than integer linear programming solvers. Our approach thus offers a practical, scalable alternative for real-world ranking-aggregation tasks.

new ADV-0: Closed-Loop Min-Max Adversarial Training for Long-Tail Robustness in Autonomous Driving

Authors: Tong Nie, Yihong Tang, Junlin He, Yuewen Mei, Jie Sun, Lijun Sun, Wei Ma, Jian Sun

Abstract: Deploying autonomous driving systems requires robustness against long-tail scenarios that are rare but safety-critical. While adversarial training offers a promising solution, existing methods typically decouple scenario generation from policy optimization and rely on heuristic surrogates. This leads to objective misalignment and fails to capture the shifting failure modes of evolving policies. This paper presents ADV-0, a closed-loop min-max optimization framework that treats the interaction between driving policy (defender) and adversarial agent (attacker) as a zero-sum Markov game. By aligning the attacker's utility directly with the defender's objective, we reveal the optimal adversary distribution. To make this tractable, we cast dynamic adversary evolution as iterative preference learning, efficiently approximating this optimum and offering an algorithm-agnostic solution to the game. Theoretically, ADV-0 converges to a Nash Equilibrium and maximizes a certified lower bound on real-world performance. Experiments indicate that it effectively exposes diverse safety-critical failures and greatly enhances the generalizability of both learned policies and motion planners against unseen long-tail risks.

new Decomposing Probabilistic Scores: Reliability, Information Loss and Uncertainty

Authors: Arthur Charpentier, Agathe Fernandes-Machado

Abstract: Calibration is a conditional property that depends on the information retained by a predictor. We develop decomposition identities for arbitrary proper losses that make this dependence explicit. At any information level $\mathcal A$, the expected loss of an $\mathcal A$-measurable predictor splits into a proper-regret (reliability) term and a conditional entropy (residual uncertainty) term. For nested levels $\mathcal A\subseteq\mathcal B$, a chain decomposition quantifies the information gain from $\mathcal A$ to $\mathcal B$. Applied to classification with features $\boldsymbol{X}$ and score $S=s(\boldsymbol{X})$, this yields a three-term identity: miscalibration, a {\em grouping} term measuring information loss from $\boldsymbol{X}$ to $S$, and irreducible uncertainty at the feature level. We leverage the framework to analyze post-hoc recalibration, aggregation of calibrated models, and stagewise/boosting constructions, with explicit forms for Brier and log-loss.

new Mechanistic Foundations of Goal-Directed Control

Authors: Alma Lago

Abstract: Mechanistic interpretability has transformed the analysis of transformer circuits by decomposing model behavior into competing algorithms, identifying phase transitions during training, and deriving closed-form predictions for when and why strategies shift. However, this program has remained largely confined to sequence-prediction architectures, leaving embodied control systems without comparable mechanistic accounts. Here we extend this framework to sensorimotor-cognitive development, using infant motor learning as a model system. We show that foundational inductive biases give rise to causal control circuits, with learned gating mechanisms converging toward theoretically motivated uncertainty thresholds. The resulting dynamics reveal a clean phase transition in the arbitration gate whose commitment behavior is well described by a closed-form exponential moving-average surrogate. We identify context window k as the critical parameter governing circuit formation: below a minimum threshold (k$\leq$4) the arbitration mechanism cannot form; above it (k$\geq$8), gate confidence scales asymptotically as log k. A two-dimensional phase diagram further reveals task-demand-dependent route arbitration consistent with the prediction that prospective execution becomes advantageous only when prediction error remains within the task tolerance window. Together, these results provide a mechanistic account of how reactive and prospective control strategies emerge and compete during learning. More broadly, this work sharpens mechanistic accounts of cognitive development and provides principled guidance for the design of interpretable embodied agents.

new In-Context Symbolic Regression for Robustness-Improved Kolmogorov-Arnold Networks

Authors: Francesco Sovrano, Lidia Losavio, Giulia Vilone, Marc Langheinrich

Abstract: Symbolic regression aims to replace black-box predictors with concise analytical expressions that can be inspected and validated in scientific machine learning. Kolmogorov-Arnold Networks (KANs) are well suited to this goal because each connection between adjacent units (an "edge") is parametrised by a learnable univariate function that can, in principle, be replaced by a symbolic operator. In practice, however, symbolic extraction is a bottleneck: the standard KAN-to-symbol approach fits operators to each learned edge function in isolation, making the discrete choice sensitive to initialisation and non-convex parameter fitting, and ignoring how local substitutions interact through the full network. We study in-context symbolic regression for operator extraction in KANs, and present two complementary instantiations. Greedy in-context Symbolic Regression (GSR) performs greedy, in-context selection by choosing edge replacements according to end-to-end loss improvement after brief fine-tuning. Gated Matching Pursuit (GMP) amortises this in-context selection by training a differentiable gated operator layer that places an operator library behind sparse gates on each edge; after convergence, gates are discretised (optionally followed by a short in-context greedy refinement pass). We quantify robustness via one-factor-at-a-time (OFAT) hyper-parameter sweeps and assess both predictive error and qualitative consistency of recovered formulas. Across several experiments, greedy in-context symbolic regression achieves up to 99.8% reduction in median OFAT test MSE.

new Directional Embedding Smoothing for Robust Vision Language Models

Authors: Ye Wang, Jing Liu, Toshiaki Koike-Akino

Abstract: The safety and reliability of vision-language models (VLMs) are a crucial part of deploying trustworthy agentic AI systems. However, VLMs remain vulnerable to jailbreaking attacks that undermine their safety alignment to yield harmful outputs. In this work, we extend the Randomized Embedding Smoothing and Token Aggregation (RESTA) defense to VLMs and evaluate its performance against the JailBreakV-28K benchmark of multi-modal jailbreaking attacks. We find that RESTA is effective in reducing attack success rate over this diverse corpus of attacks, in particular, when employing directional embedding noise, where the injected noise is aligned with the original token embedding vectors. Our results demonstrate that RESTA can contribute to securing VLMs within agentic systems, as a lightweight, inference-time defense layer of an overall security framework.

new Faster Inference of Flow-Based Generative Models via Improved Data-Noise Coupling

Authors: Aram Davtyan, Leello Tadesse Dadi, Volkan Cevher, Paolo Favaro

Abstract: Conditional Flow Matching (CFM), a simulation-free method for training continuous normalizing flows, provides an efficient alternative to diffusion models for key tasks like image and video generation. The performance of CFM in solving these tasks depends on the way data is coupled with noise. A recent approach uses minibatch optimal transport (OT) to reassign noise-data pairs in each training step to streamline sampling trajectories and thus accelerate inference. However, its optimization is restricted to individual minibatches, limiting its effectiveness on large datasets. To address this shortcoming, we introduce LOOM-CFM (Looking Out Of Minibatch-CFM), a novel method to extend the scope of minibatch OT by preserving and optimizing these assignments across minibatches over training time. Our approach demonstrates consistent improvements in the sampling speed-quality trade-off across multiple datasets. LOOM-CFM also enhances distillation initialization and supports high-resolution synthesis in latent space training.

new Evaluating the Robustness of Reinforcement Learning based Adaptive Traffic Signal Control

Authors: Dickens Kwesiga, Angshuman Guin, Khaled Abdelghany, Michael Hunter

Abstract: Reinforcement learning (RL) has attracted increasing interest for adaptive traffic signal control due to its model-free ability to learn control policies directly from interaction with the traffic environment. However, several challenges remain before RL-based signal control can be considered ready for field deployment. Many existing studies rely on simplified signal timing structures, robustness of trained models under varying traffic demand conditions remains insufficiently evaluated, and runtime efficiency continues to pose challenges when training RL algorithms in traffic microscopic simulation environments. This study formulates an RL-based signal control algorithm capable of representing a full eight-phase ring-barrier configuration consistent with field signal controllers. The algorithm is trained and evaluated under varying traffic demand conditions and benchmarked against state-of-the-practice actuated signal control (ASC). To assess robustness, experiments are conducted across multiple traffic volumes and origin-destination (O-D) demand patterns with varying levels of structural similarity. To improve training efficiency, a distributed asynchronous training architecture is implemented that enables parallel simulation across multiple computing nodes. Results from a case study intersection show that the proposed RL-based signal control significantly outperforms optimized ASC, reducing average delay by 11-32% across movements. A model trained on a single O-D pattern generalizes well to similar unseen demand patterns but degrades under substantially different demand conditions. In contrast, a model trained on diverse O-D patterns demonstrates strong robustness, consistently outperforming ASC even under highly dissimilar unseen demand scenarios.

new Enhancing classification accuracy through chaos

Authors: Panos Stinis

Abstract: We propose a novel approach which exploits chaos to enhance classification accuracy. Specifically, the available data that need to be classified are treated as vectors that are first lifted into a higher-dimensional space and then used as initial conditions for the evolution of a chaotic dynamical system for a prescribed temporal interval. The evolved state of the dynamical system is then fed to a trainable softmax classifier which outputs the probabilities of the various classes. As proof-of-concept, we use samples of randomly perturbed orthogonal vectors of moderate dimension (2 to 20), with a corresponding number of classes equal to the vector dimension, and show how our approach can both significantly accelerate the training process and improve the classification accuracy compared to a standard softmax classifier which operates on the original vectors, as well as a softmax classifier which only lifts the vectors to a higher-dimensional space without evolving them. We also provide an explanation for the improved performance of the chaos-enhanced classifier.

new xplainfi: Feature Importance and Statistical Inference for Machine Learning in R

Authors: Lukas Burk, Fiona Katharina Ewald, Giuseppe Casalicchio, Marvin N. Wright, Bernd Bischl

Abstract: We introduce xplainfi, an R package built on top of the mlr3 ecosystem for global, loss-based feature importance methods for machine learning models. Various feature importance methods exist in R, but significant gaps remain, particularly regarding conditional importance methods and associated statistical inference procedures. The package implements permutation feature importance, conditional feature importance, relative feature importance, leave-one-covariate-out, and generalizations thereof, and both marginal and conditional Shapley additive global importance methods. It provides a modular conditional sampling architecture based on Gaussian distributions, adversarial random forests, conditional inference trees, and knockoff-based samplers, which enable conditional importance analysis for continuous and mixed data. Statistical inference is available through multiple approaches, including variance-corrected confidence intervals and the conditional predictive impact framework. We demonstrate that xplainfi produces importance scores consistent with existing implementations across multiple simulation settings and learner types, while offering competitive runtime performance. The package is available on CRAN and provides researchers and practitioners with a comprehensive toolkit for feature importance analysis and model interpretation in R.

new A Kolmogorov-Arnold Surrogate Model for Chemical Equilibria: Application to Solid Solutions

Authors: Leonardo Boledi, Dirk Bosbach, Jenna Poonoosamy

Abstract: The computational cost of geochemical solvers is a challenging matter. For reactive transport simulations, where chemical calculations are performed up to billions of times, it is crucial to reduce the total computational time. Existing publications have explored various machine-learning approaches to determine the most effective data-driven surrogate model. In particular, multilayer perceptrons are widely employed due to their ability to recognize nonlinear relationships. In this work, we focus on the recent Kolmogorov-Arnold networks, where learnable spline-based functions replace classical fixed activation functions. This architecture has achieved higher accuracy with fewer trainable parameters and has become increasingly popular for solving partial differential equations. First, we train a surrogate model based on an existing cement system benchmark. Then, we move to an application case for the geological disposal of nuclear waste, i.e., the determination of radionuclide-bearing solids solubilities. To the best of our knowledge, this work is the first to investigate co-precipitation with radionuclide incorporation using data-driven surrogate models, considering increasing levels of thermodynamic complexity from simple mechanical mixtures to non-ideal solid solutions of binary (Ba,Ra)SO$_4$ and ternary (Sr,Ba,Ra)SO$_4$ systems. On the cement benchmark, we demonstrate that the Kolmogorov-Arnold architecture outperforms multilayer perceptrons in both absolute and relative error metrics, reducing them by 62% and 59%, respectively. On the binary and ternary radium solid solution models, Kolmogorov-Arnold networks maintain median prediction errors near $1\times10^{-3}$. This is the first step toward employing surrogate models to speed up reactive transport simulations and optimize the safety assessment of deep geological waste repositories.

new CASHomon Sets: Efficient Rashomon Sets Across Multiple Model Classes and their Hyperparameters

Authors: Fiona Katharina Ewald, Martin Binder, Matthias Feurer, Bernd Bischl, Giuseppe Casalicchio

Abstract: Rashomon sets are model sets within one model class that perform nearly as well as a reference model from the same model class. They reveal the existence of alternative well-performing models, which may support different interpretations. This enables selecting models that match domain knowledge, hidden constraints, or user preferences. However, efficient construction methods currently exist for only a few model classes. Applied machine learning usually searches many model classes, and the best class is unknown beforehand. We therefore study Rashomon sets in the combined algorithm selection and hyperparameter optimization (CASH) setting and call them CASHomon sets. We propose TruVaRImp, a model-based active learning algorithm for level set estimation with an implicit threshold, and provide convergence guarantees. On synthetic and real-world datasets, TruVaRImp reliably identifies CASHomon sets members and matches or outperforms naive sampling, Bayesian optimization, classical and implicit level set estimation methods, and other baselines. Our analyses of predictive multiplicity and feature-importance variability across model classes question the common practice of interpreting data through a single model class.

new Data Augmentation via Causal-Residual Bootstrapping

Authors: Mateusz Gajewski, Sophia Xiao, Bijan Mazaheri

Abstract: Data augmentation integrates domain knowledge into a dataset by making domain-informed modifications to existing data points. For example, image data can be augmented by duplicating images in different tints or orientations, thereby incorporating the knowledge that images may vary in these dimensions. Recent work by Teshima and Sugiyama has explored the integration of causal knowledge (e.g, A causes B causes C) up to conditional independence equivalence. We suggest a related approach for settings with additive noise that can incorporate information beyond a Markov equivalence class. The approach, built on the principle of independent mechanisms, permutes the residuals of models built on marginal probability distributions. Predictive models built on our augmented data demonstrate improved accuracy, for which we provide theoretical backing in linear Gaussian settings.

new Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method

Authors: Haofei Xu, Wei Cheng, Sizhe Li, Jie Xiong

Abstract: Seismic inversion is a core problem in geophysical exploration, where traditional methods suffer from high computational costs and are susceptible to initial model dependence. In recent years, deep generative model-based seismic inversion methods have achieved remarkable progress, but existing generative models struggle to balance sampling efficiency and inversion accuracy. This paper proposes an end-to-end fast seismic inversion method based on Conditional Rectified Flow[1], which designs a dedicated seismic encoder to extract multi-scale seismic features and adopts a layer-by-layer injection control strategy to achieve fine-grained conditional control. Experimental results demonstrate that the proposed method achieves excellent inversion accuracy on the OpenFWI[2] benchmark dataset. Compared with Diffusion[3,4] methods, it achieves sampling acceleration; compared with InversionNet[5,6,7] methods, it achieves higher accuracy in generation. Our zero-shot generalization experiments on Marmousi[8,9] real data further verify the practical value of the method. Experimental results show that the proposed method achieves excellent inversion accuracy on the OpenFWI benchmark dataset; compared with Diffusion methods, it achieves sampling acceleration while maintaining higher accuracy than InversionNet methods; experiments based on the Marmousi standard model further verify that this method can generate high-quality initial velocity models in a zero-shot manner, effectively alleviating the initial model dependency problem in traditional Full Waveform Inversion (FWI), and possesses industrial practical value.

new FuXiWeather2: Learning accurate atmospheric state estimation for operational global weather forecasting

Authors: Xiaoze Xu, Xiuyu Sun, Songling Zhu, Xiaohui Zhong, Yuanqing Huang, Zijian Zhu, Jun Liu, Hao Li

Abstract: Numerical weather prediction has long been constrained by the computational bottlenecks inherent in data assimilation and numerical modeling. While machine learning has accelerated forecasting, existing models largely serve as "emulators of reanalysis products," thereby retaining their systematic biases and operational latencies. Here, we present FuXiWeather2, a unified end-to-end neural framework for assimilation and forecasting. We align training objectives directly with a combination of real-world observations and reanalysis data, enabling the framework to effectively rectify inherent errors within reanalysis products. To address the distribution shift between NWP-derived background inputs during training and self-generated backgrounds during deployment, we introduce a recursive unrolling training method to enhance the precision and stability of analysis generation. Furthermore, our model is trained on a hybrid dataset of raw and simulated observations to mitigate the impact of observational distribution inconsistency. FuXiWeather2 generates high-resolution ($0.25^{\circ}$) global analysis fields and 10-day forecasts within minutes. The analysis fields surpass the NCEP-GFS across most variables and demonstrate superior accuracy over both ERA5 and the ECMWF-HRES system in lower-tropospheric and surface variables. These high-quality analysis fields drive deterministic forecasts that exceed the skill of the HRES system in 91\% of evaluated metrics. Additionally, its outstanding performance in typhoon track prediction underscores its practical value for rapid response to extreme weather events. The FuXiWeather2 analysis dataset is available at https://doi.org/10.5281/zenodo.18872728.

URLs: https://doi.org/10.5281/zenodo.18872728.

new Deep learning and the rate of approximation by flows

Authors: Jingpu Cheng, Qianxiao Li, Ting Lin, Zuowei Shen

Abstract: We investigate the dependence of the approximation capacity of deep residual networks on its depth in a continuous dynamical systems setting. This can be formulated as the general problem of quantifying the minimal time-horizon required to approximate a diffeomorphism by flows driven by a given family $\mathcal F$ of vector fields. We show that this minimal time can be identified as a geodesic distance on a sub-Finsler manifold of diffeomorphisms, where the local geometry is characterised by a variational principle involving $\mathcal F$. This connects the learning efficiency of target relationships to their compatibility with the learning architectural choice. Further, the results suggest that the key approximation mechanism in deep learning, namely the approximation of functions by composition or dynamics, differs in a fundamental way from linear approximation theory, where linear spaces and norm-based rate estimates are replaced by manifolds and geodesic distances.

new GradCFA: A Hybrid Gradient-Based Counterfactual and Feature Attribution Explanation Algorithm for Local Interpretation of Neural Networks

Authors: Jacob Sanderson, Hua Mao, Wai Lok Woo

Abstract: Explainable Artificial Intelligence (XAI) is increasingly essential as AI systems are deployed in critical fields such as healthcare and finance, offering transparency into AI-driven decisions. Two major XAI paradigms, counterfactual explanations (CFX) and feature attribution (FA), serve distinct roles in model interpretability. This study introduces GradCFA, a hybrid framework combining CFX and FA to improve interpretability by explicitly optimizing feasibility, plausibility, and diversity - key qualities often unbalanced in existing methods. Unlike most CFX research focused on binary classification, GradCFA extends to multi-class scenarios, supporting a wider range of applications. We evaluate GradCFA's validity, proximity, sparsity, plausibility, and diversity against state-of-the-art methods, including Wachter, DiCE, CARE for CFX, and SHAP for FA. Results show GradCFA effectively generates feasible, plausible, and diverse counterfactuals while offering valuable FA insights. By identifying influential features and validating their impact, GradCFA advances AI interpretability. The code for implementation of this work can be found at: https://github.com/jacob-ws/GradCFs .

URLs: https://github.com/jacob-ws/GradCFs

new More Test-Time Compute Can Hurt: Overestimation Bias in LLM Beam Search

Authors: Gal Dalal, Assaf Hallak, Gal Chechik, Yftah Ziser

Abstract: Wider beam search should improve LLM reasoning, but when should you stop widening? Prior work on beam width selection has focused on inference efficiency \citep{qin2025dsbd, freitag2017beam}, without analyzing whether wider search can \emph{hurt} output quality. We present an analysis, grounded in Extreme Value Theory, that answers this question. Beam selection over noisy scorer outputs introduces a systematic overestimation bias that grows with the candidate pool size, and we derive a maximum useful beam width $\hat{k}$ beyond which search degrades performance. This critical width depends on the signal-to-noise ratio of the scorer: $\hat{k}$ grows exponentially with $(\Delta/\sigma)^2$, where $\Delta > 0$ is the quality advantage of correct paths over incorrect ones and $\sigma$ is the scorer noise. We validate this theory by comparing perplexity-guided and PRM-guided beam search across three 7B-parameter models and ten domains on MR-BEN (5,975 questions). Perplexity scoring, with its high noise, yields $\hat{k} = 1$: search provides no benefit at any width tested. PRM scoring, with lower noise, yields $\hat{k} \geq 4$, with gains of up to 8.9 percentage points. The same model, the same algorithm, but different scorers place $\hat{k}$ at opposite ends of the beam width range. Our analysis identifies the scorer's signal-to-noise ratio as the key quantity governing beam width selection, and we propose diagnostic indicators for choosing the beam width in practice.

new Efficient Morphology-Control Co-Design via Stackelberg Proximal Policy Optimization

Authors: Yanning Dai, Yuhui Wang, Dylan R. Ashley, J\"urgen Schmidhuber

Abstract: Morphology-control co-design concerns the coupled optimization of an agent's body structure and control policy. This problem exhibits a bi-level structure, where the control dynamically adapts to the morphology to maximize performance. Existing methods typically neglect the control's adaptation dynamics by adopting a single-level formulation that treats the control policy as fixed when optimizing morphology. This can lead to inefficient optimization, as morphology updates may be misaligned with control adaptation. In this paper, we revisit the co-design problem from a game-theoretic perspective, modeling the intrinsic coupling between morphology and control as a novel variant of a Stackelberg game. We propose Stackelberg Proximal Policy Optimization (Stackelberg PPO), which explicitly incorporates the control's adaptation dynamics into morphology optimization. By modeling this intrinsic coupling, our method aligns morphology updates with control adaptation, thereby stabilizing training and improving learning efficiency. Experiments across diverse co-design tasks demonstrate that Stackelberg PPO outperforms standard PPO in both stability and final performance, opening the way for dramatically more efficient robotics designs.

new Local Urysohn Width: A Topological Complexity Measure for Classification

Authors: Xin Li

Abstract: We introduce \emph{local Urysohn width}, a complexity measure for classification problems on metric spaces. Unlike VC dimension, fat-shattering dimension, and Rademacher complexity, which characterize the richness of hypothesis \emph{classes}, Urysohn width characterizes the topological-geometric complexity of the classification \emph{problem itself}: the minimum number of connected, diameter-bounded local experts needed to correctly classify all points within a margin-safe region. We prove four main results. First, a \textbf{strict hierarchy theorem}: for every integer $w \geq 1$, there exists a classification problem on a \emph{connected} compact metric space (a bouquet of circles with first Betti number $\beta_1 = w$) whose Urysohn width is exactly~$w$, establishing that topological complexity of the input space forces classifier complexity. Second, a \textbf{topology $\times$ geometry scaling law}: width scales as $\Omega(w \cdot L/D_0)$, where $w$ counts independent loops and $L/D_0$ is the ratio of loop circumference to locality scale. Third, a \textbf{two-way separation from VC dimension}: there exist problem families where width grows unboundedly while VC dimension is bounded by a constant, and conversely, families where VC dimension grows unboundedly while width remains~1. Fourth, a \textbf{sample complexity lower bound}: any learner that must correctly classify all points in the safe region of a width-$w$ problem needs $\Omega(w \log w)$ samples, independent of VC dimension.

new RESQ: A Unified Framework for REliability- and Security Enhancement of Quantized Deep Neural Networks

Authors: Ali Soltan Mohammadi, Samira Nazari, Ali Azarpeyvand, Mahdi Taheri, Milos Krstic, Michael Huebner, Christian Herglotz, Tara Ghasempouri

Abstract: This work proposes a unified three-stage framework that produces a quantized DNN with balanced fault and attack robustness. The first stage improves attack resilience via fine-tuning that desensitizes feature representations to small input perturbations. The second stage reinforces fault resilience through fault-aware fine-tuning under simulated bit-flip faults. Finally, a lightweight post-training adjustment integrates quantization to enhance efficiency and further mitigate fault sensitivity without degrading attack resilience. Experiments on ResNet18, VGG16, EfficientNet, and Swin-Tiny in CIFAR-10, CIFAR-100, and GTSRB show consistent gains of up to 10.35% in attack resilience and 12.47% in fault resilience, while maintaining competitive accuracy in quantized networks. The results also highlight an asymmetric interaction in which improvements in fault resilience generally increase resilience to adversarial attacks, whereas enhanced adversarial resilience does not necessarily lead to higher fault resilience.

new Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities

Authors: Vanshaj Khattar, Md Rafi ur Rashid, Moumita Choudhury, Jing Liu, Toshiaki Koike-Akino, Ming Jin, Ye Wang

Abstract: Test-time training (TTT) has recently emerged as a promising method to improve the reasoning abilities of large language models (LLMs), in which the model directly learns from test data without access to labels. However, this reliance on test data also makes TTT methods vulnerable to harmful prompt injections. In this paper, we investigate safety vulnerabilities of TTT methods, where we study a representative self-consistency-based test-time learning method: test-time reinforcement learning (TTRL), a recent TTT method that improves LLM reasoning by rewarding self-consistency using majority vote as a reward signal. We show that harmful prompt injection during TTRL amplifies the model's existing behaviors, i.e., safety amplification when the base model is relatively safe, and harmfulness amplification when it is vulnerable to the injected data. In both cases, there is a decline in reasoning ability, which we refer to as the reasoning tax. We also show that TTT methods such as TTRL can be exploited adversarially using specially designed "HarmInject" prompts to force the model to answer jailbreak and reasoning queries together, resulting in stronger harmfulness amplification. Overall, our results highlight that TTT methods that enhance LLM reasoning by promoting self-consistency can lead to amplification behaviors and reasoning degradation, highlighting the need for safer TTT methods.

new Physics-informed fine-tuning of foundation models for partial differential equations

Authors: Vlad Medvedev, Leon Armbruster, Christopher Straub, Georg Kruse, Andreas Rosskopf

Abstract: Foundation models for partial differential equations (PDEs) have emerged as powerful surrogates pre-trained on diverse physical systems, but adapting them to new downstream tasks remains challenging due to limited task-specific data and distribution shifts. While fine-tuning has proven transformative in natural language processing, best practices for adapting PDE foundation models remain underexplored. Although physics-informed training has successfully trained accurate solvers across a wide range of PDE problems, its potential for fine-tuning data-based foundation models has not been systematically studied. In this work, we introduce a physics-informed fine-tuning framework that adapts pre-trained PDE foundation models by incorporating physical constraints (PDE residuals and boundary conditions) directly into the fine-tuning objective. This enables effective adaptation in data-scarce regimes while promoting physical consistency. We evaluate our method on a downstream task composed of an unseen PDE class and compare it with data-driven finetuning counterparts. Our results demonstrate that physics-informed fine-tuning achieves competitive accuracy without requiring PDE solutions for training. Furthermore, a hybrid fine-tuning strategy yields superior generalization to out-of-distribution scenarios when only minimal training data is available. These findings establish physics-informed fine-tuning as a scalable and data-efficient paradigm, providing a physically interpretable pathway for adapting foundation models in scientific machine learning.

new TabKD: Tabular Knowledge Distillation through Interaction Diversity of Learned Feature Bins

Authors: Shovon Niverd Pereira, Krishna Khadka, Yu Lei

Abstract: Data-free knowledge distillation enables model compression without original training data, critical for privacy-sensitive tabular domains. However, existing methods does not perform well on tabular data because they do not explicitly address feature interactions, the fundamental way tabular models encode predictive knowledge. We identify interaction diversity, systematic coverage of feature combinations, as an essential requirement for effective tabular distillation. To operationalize this insight, we propose TabKD, which learns adaptive feature bins aligned with teacher decision boundaries, then generates synthetic queries that maximize pairwise interaction coverage. Across 4 benchmark datasets and 4 teacher architectures, TabKD achieves highest student-teacher agreement in 14 out of 16 configurations, outperforming 5 state-of-the-art baselines. We further show that interaction coverage strongly correlates with distillation quality, validating our core hypothesis. Our work establishes interaction-focused exploration as a principled framework for tabular model extraction.

new Grokking as a Variance-Limited Phase Transition: Spectral Gating and the Epsilon-Stability Threshold

Authors: Pratyush Acharya, Habish Dhakal

Abstract: Standard optimization theories struggle to explain grokking, where generalization occurs long after training convergence. While geometric studies attribute this to slow drift, they often overlook the interaction between the optimizer's noise structure and landscape curvature. This work analyzes AdamW dynamics on modular arithmetic tasks, revealing a ``Spectral Gating'' mechanism that regulates the transition from memorization to generalization. We find that AdamW operates as a variance-gated stochastic system. Grokking is constrained by a stability condition: the generalizing solution resides in a sharp basin ($\lambda_{max}^H$) initially inaccessible under low-variance regimes. The ``delayed'' phase represents the accumulation of gradient variance required to lift the effective stability ceiling, permitting entry into this sharp manifold. Our ablation studies identify three complexity regimes: (1) \textbf{Capacity Collapse} ($P < 23$), where rank-deficiency prevents structural learning; (2) \textbf{The Variance-Limited Regime} ($P \approx 41$), where generalization waits for the spectral gate to open; and (3) \textbf{Stability Override} ($P > 67$), where memorization becomes dimensionally unstable. Furthermore, we challenge the "Flat Minima" hypothesis for algorithmic tasks, showing that isotropic noise injection fails to induce grokking. Generalization requires the \textit{anisotropic rectification} unique to adaptive optimizers, which directs noise into the tangent space of the solution manifold.

new Seeking SOTA: Time-Series Forecasting Must Adopt Taxonomy-Specific Evaluation to Dispel Illusory Gains

Authors: Raeid Saqur, Christoph Bergmeir, Blanka Horvath, Daniel Schmidt, Frank Rudzicz, Terry Lyons

Abstract: We argue that the current practice of evaluating AI/ML time-series forecasting models, predominantly on benchmarks characterized by strong, persistent periodicities and seasonalities, obscures real progress by overlooking the performance of efficient classical methods. We demonstrate that these "standard" datasets often exhibit dominant autocorrelation patterns and seasonal cycles that can be effectively captured by simpler linear or statistical models, rendering complex deep learning architectures frequently no more performant than their classical counterparts for these specific data characteristics, and raising questions as to whether any marginal improvements justify the significant increase in computational overhead and model complexity. We call on the community to (I) retire or substantially augment current benchmarks with datasets exhibiting a wider spectrum of non-stationarities, such as structural breaks, time-varying volatility, and concept drift, and less predictable dynamics drawn from diverse real-world domains, and (II) require every deep learning submission to include robust classical and simple baselines, appropriately chosen for the specific characteristics of the downstream tasks' time series. By doing so, we will help ensure that reported gains reflect genuine scientific methodological advances rather than artifacts of benchmark selection favoring models adept at learning repetitive patterns.

new Federated Learning of Binary Neural Networks: Enabling Low-Cost Inference

Authors: Nitin Priyadarshini Shankar, Soham Lahiri, Sheetal Kalyani, Saurav Prakash

Abstract: Federated Learning (FL) preserves privacy by distributing training across devices. However, using DNNs is computationally intensive at the low-powered edge during inference. Edge deployment demands models that simultaneously optimize memory footprint and computational efficiency, a dilemma where conventional DNNs fail by exceeding resource limits. Traditional post-training binarization reduces model size but suffers from severe accuracy loss due to quantization errors. To address these challenges, we propose FedBNN, a rotation-aware binary neural network framework that learns binary representations directly during local training. By encoding each weight as a single bit $\{+1, -1\}$ instead of a $32$-bit float, FedBNN shrinks the model footprint, significantly reducing runtime (during inference) FLOPs and memory requirements in comparison to federated methods using real models. Evaluations across multiple benchmark datasets demonstrate that FedBNN significantly reduces resource consumption while performing similarly to existing federated methods using real-valued models.

new Not All Invariants Are Equal: Curating Training Data to Accelerate Program Verification with SLMs

Authors: Ido Pinto, Yizhak Yisrael Elboher, Haoze Wu, Nina Narodytska, Guy Katz

Abstract: The synthesis of inductive loop invariants is a critical bottleneck in automated program verification. While Large Language Models (LLMs) show promise in mitigating this issue, they often fail on hard instances, generating invariants that are invalid or computationally ineffective. While fine-tuning is a natural route to mitigate this limitation, obtaining high-quality training data for invariant generation remains an open challenge. We present a rigorous data curation pipeline designed to extract high-quality training signals from raw verifier-generated invariants. First, we formalize the properties required for a high-quality training invariant. Second, we propose Wonda, a pipeline that refines noisy data via AST-based normalization, followed by LLM-driven semantic rewriting and augmentation with provable quality guarantees. We demonstrate that fine-tuning Small Language Models (SLMs) on this curated dataset result in consistent and significant performance gain. In particular, a fine-tuned 4B parameter model matches the utility of a GPT-OSS-120B baseline and approaches the state-of-the-art GPT-5.2, without incurring reasoning-time overhead. On challenging instances from the recent InvBench evaluation suite, our approach doubles the invariant correctness and speedup rates of base models; and improves their Virtual Best Performance (VBP) rates on the verification task by up to 14.2%.

new Building Trust in PINNs: Error Estimation through Finite Difference Methods

Authors: Aleksander Krasowski, Ren\'e P. Klausen, Aycan Celik, Sebastian Lapuschkin, Wojciech Samek, Jonas Naujoks

Abstract: Physics-informed neural networks (PINNs) constitute a flexible deep learning approach for solving partial differential equations (PDEs), which model phenomena ranging from heat conduction to quantum mechanical systems. Despite their flexibility, PINNs offer limited insight into how their predictions deviate from the true solution, hindering trust in their prediction quality. We propose a lightweight post-hoc method that addresses this gap by producing pointwise error estimates for PINN predictions, which offer a natural form of explanation for such models, identifying not just whether a prediction is wrong, but where and by how much. For linear partial differential equations, the error between a PINN approximation and the true solution satisfies the same differential operator as the original problem, but driven by the PINN's PDE residual as its source term. We solve this error equation numerically using finite difference methods requiring no knowledge of the true solution. Evaluated on several benchmark PDEs, our method yields accurate error maps at low computational cost, enabling targeted and interpretable validation of PINNs.

new Vib2ECG: A Paired Chest-Lead SCG-ECG Dataset and Benchmark for ECG Reconstruction

Authors: Guorui Lu, Xiaohui Cai, Todor Stefanov, Qinyu Chen

Abstract: Twelve-lead electrocardiography (ECG) is essential for cardiovascular diagnosis, but its long-term acquisition in daily life is constrained by complex and costly hardware. Recent efforts have explored reconstructing ECG from low-cost cardiac vibrational signals such as seismocardiography (SCG), however, due to the lack of a dataset, current methods are limited to limb leads, while clinical diagnosis requires multi-lead ECG, including chest leads. In this work, we propose Vib2ECG, the first paired, multi-channel electro-mechanical cardiac signal dataset, which includes complete twelve-lead ECGs and vibrational signals acquired by inertial measurement units (IMUs) at six chest-lead positions from 17 subjects. Based on this dataset, we also provide a benchmark. Experimental results demonstrate the feasibility of reconstructing electrical cardiac signals at variable locations from vibrational signals using a lightweight 364 K-parameter U-Net. Furthermore, we observe a hallucination phenomenon in the model, where ECG waveforms are generated in regions where no corresponding electrical activity is present. We analyze the causes of this phenomenon and propose potential directions for mitigation. This study demonstrates the feasibility of mobile-device-friendly ECG monitoring through chest-lead ECG prediction from low-cost vibrational signals acquired using IMU sensors. It expands the application of cardiac vibrational signals and provides new insights into the spatial relationship between cardiac electrical and mechanical activities with spatial location variation.

new Bridging Local and Global Knowledge: Cascaded Mixture-of-Experts Learning for Near-Shortest Path Routing

Authors: Yung-Fu Chen, Anish Arora

Abstract: While deep learning models that leverage local features have demonstrated significant potential for near-optimal routing in dense Euclidean graphs, they struggle to generalize well in sparse networks where topological irregularities require broader structural awareness. To address this limitation, we train a Cascaded Mixture of Experts (Ca-MoE) to solve the all-pairs near-shortest path (APNSP) routing problem. Our Ca-MoE is a modular two-tier architecture that supports the decision-making for forwarder selection with lower-tier experts relying on local features and upper-tier experts relying on global features. It performs adaptive inference wherein the upper-tier experts are triggered only when the lower-tier ones do not suffice to achieve adequate decision quality. Computational efficiency is thus achieved by escalating model capacity only when necessitated by topological complexity, and parameter redundancy is avoided. Furthermore, we incorporate an online meta-learning strategy that facilitates independent expert fine-tuning and utilizes a stability-focused update mechanism to prevent catastrophic forgetting as new graph environments are encountered. Experimental evaluations demonstrate that Ca-MoE routing improves accuracy by up to 29.1% in sparse networks compared to single-expert baselines and maintains performance within 1%-6% of the theoretical upper bound across diverse graph densities.

new The PokeAgent Challenge: Competitive and Long-Context Learning at Scale

Authors: Seth Karten, Jake Grigsby, Tersoo Upaa Jr, Junik Bae, Seonghun Hong, Hyunyoung Jeong, Jaeyoon Jung, Kun Kerdthaisong, Gyungbo Kim, Hyeokgi Kim, Yujin Kim, Eunju Kwon, Dongyu Liu, Patrick Mariglia, Sangyeon Park, Benedikt Schink, Xianwei Shi, Anthony Sistilli, Joseph Twin, Arian Urdu, Matin Urdu, Qiao Wang, Ling Wu, Wenli Zhang, Kunsheng Zhou, Stephanie Milani, Kiran Vodrahalli, Amy Zhang, Fei Fang, Yuke Zhu, Chi Jin

Abstract: We present the PokeAgent Challenge, a large-scale benchmark for decision-making research built on Pokemon's multi-agent battle system and expansive role-playing game (RPG) environment. Partial observability, game-theoretic reasoning, and long-horizon planning remain open problems for frontier AI, yet few benchmarks stress all three simultaneously under realistic conditions. PokeAgent targets these limitations at scale through two complementary tracks: our Battling Track, which calls for strategic reasoning and generalization under partial observability in competitive Pokemon battles, and our Speedrunning Track, which requires long-horizon planning and sequential decision-making in the Pokemon RPG. Our Battling Track supplies a dataset of 20M+ battle trajectories alongside a suite of heuristic, RL, and LLM-based baselines capable of high-level competitive play. Our Speedrunning Track provides the first standardized evaluation framework for RPG speedrunning, including an open-source multi-agent orchestration system for modular, reproducible comparisons of harness-based LLM approaches. Our NeurIPS 2025 competition validates both the quality of our resources and the research community's interest in Pokemon, with over 100 teams competing across both tracks and winning solutions detailed in our paper. Participant submissions and our baselines reveal considerable gaps between generalist (LLM), specialist (RL), and elite human performance. Analysis against the BenchPress evaluation matrix shows that Pokemon battling is nearly orthogonal to standard LLM benchmarks, measuring capabilities not captured by existing suites and positioning Pokemon as an unsolved benchmark that can drive RL and LLM research forward. We transition to a living benchmark with a live leaderboard for Battling and self-contained evaluation for Speedrunning at https://pokeagentchallenge.com.

URLs: https://pokeagentchallenge.com.

new Predictive Uncertainty in Short-Term PV Forecasting under Missing Data: A Multiple Imputation Approach

Authors: Parastoo Pashmchi, J\'er\^ome Benoit, Motonobu Kanagawa

Abstract: Missing values are common in photovoltaic (PV) power data, yet the uncertainty they induce is not propagated into predictive distributions. We develop a framework that incorporates missing-data uncertainty into short-term PV forecasting by combining stochastic multiple imputation with Rubin's rule. The approach is model-agnostic and can be integrated with standard machine-learning predictors. Empirical results show that ignoring missing-data uncertainty leads to overly narrow prediction intervals. Accounting for this uncertainty improves interval calibration while maintaining comparable point prediction accuracy. These results demonstrate the importance of propagating imputation uncertainty in data-driven PV forecasting.

new Mamba-3: Improved Sequence Modeling using State Space Principles

Authors: Aakash Lahoti, Kevin Y. Li, Berlin Chen, Caitlin Wang, Aviv Bick, J. Zico Kolter, Tri Dao, Albert Gu

Abstract: Scaling inference-time compute has emerged as an important driver of LLM performance, making inference efficiency a central focus of model design alongside model quality. While the current Transformer-based models deliver strong model quality, their quadratic compute and linear memory make inference expensive. This has spurred the development of sub-quadratic models with reduced linear compute and constant memory requirements. However, many recent linear models trade off model quality and capability for algorithmic efficiency, failing on tasks such as state tracking. Moreover, their theoretically linear inference remains hardware-inefficient in practice. Guided by an inference-first perspective, we introduce three core methodological improvements inspired by the state space model (SSM) viewpoint of linear models. We combine: (1) a more expressive recurrence derived from SSM discretization, (2) a complex-valued state update rule that enables richer state tracking, and (3) a multi-input, multi-output (MIMO) formulation for better model performance without increasing decode latency. Together with architectural refinements, our Mamba-3 model achieves significant gains across retrieval, state-tracking, and downstream language modeling tasks. At the 1.5B scale, Mamba-3 improves average downstream accuracy by 0.6 percentage points compared to the next best model (Gated DeltaNet), with Mamba-3's MIMO variant further improving accuracy by another 1.2 points for a total 1.8 point gain. Across state-size experiments, Mamba-3 achieves comparable perplexity to Mamba-2 despite using half of its predecessor's state size. Our evaluations demonstrate Mamba-3's ability to advance the performance-efficiency Pareto frontier.

new Unbiased and Biased Variance-Reduced Forward-Reflected-Backward Splitting Methods for Stochastic Composite Inclusions

Authors: Quoc Tran-Dinh, Nghia Nguyen-Trung

Abstract: This paper develops new variance-reduction techniques for the forward-reflected-backward splitting (FRBS) method to solve a class of possibly nonmonotone stochastic composite inclusions. Unlike unbiased estimators such as mini-batching, developing stochastic biased variants faces a fundamental technical challenge and has not been utilized before for inclusions and fixed-point problems. We fill this gap by designing a new framework that can handle both unbiased and biased estimators. Our main idea is to construct stochastic variance-reduced estimators for the forward-reflected direction and use them to perform iterate updates. First, we propose a class of unbiased variance-reduced estimators and show that increasing mini-batch SGD, loopless-SVRG, and SAGA estimators fall within this class. For these unbiased estimators, we establish a $\mathcal{O}(1/k)$ best-iterate convergence rate for the expected squared residual norm, together with almost-sure convergence of the iterate sequence to a solution. Consequently, we prove that the best oracle complexities for the $n$-finite-sum and expectation settings are $\mathcal{O}(n^{2/3}\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-10/3})$, respectively, when employing loopless-SVRG or SAGA, where $\epsilon$ is a desired accuracy. Second, we introduce a new class of biased variance-reduced estimators for the forward-reflected direction, which includes SARAH, Hybrid SGD, and Hybrid SVRG as special instances. While the convergence rates remain valid for these biased estimators, the resulting oracle complexities are $\mathcal{O}(n^{3/4}\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-5})$ for the $n$-finite-sum and expectation settings, respectively. Finally, we conduct two numerical experiments on AUC optimization for imbalanced classification and policy evaluation in reinforcement learning.

new Physics-Informed Neural Systems for the Simulation of EUV Electromagnetic Wave Diffraction from a Lithography Mask

Authors: Vasiliy A. Es'kin, Egor V. Ivanov

Abstract: Physics-informed neural networks (PINNs) and neural operators (NOs) for solving the problem of diffraction of Extreme Ultraviolet (EUV) electromagnetic waves from contemporary lithography masks are presented. A novel hybrid Waveguide Neural Operator (WGNO) is introduced, based on a waveguide method with its most computationally expensive components replaced by a neural network. To evaluate performance, the accuracy and inference time of PINNs and NOs are compared against modern numerical solvers for a series of problems with known exact solutions. The emphasis is placed on investigation of solution accuracy by considered artificial neural systems for 13.5 nm and 11.2 nm wavelengths. Numerical experiments on realistic 2D and 3D masks demonstrate that PINNs and neural operators achieve competitive accuracy and significantly reduced prediction times, with the proposed WGNO architecture reaching state-of-the-art performance. The presented neural operator has pronounced generalizing properties, meaning that for unseen problem parameters it delivers a solution accuracy close to that for parameters seen in the training dataset. These results provide a highly efficient solution for accelerating the design and optimization workflows of next-generation lithography masks.

new Effective Distillation to Hybrid xLSTM Architectures

Authors: Lukas Hauzenberger, Niklas Schmidinger, Thomas Schmied, Anamaria-Roberta Hartl, David Stap, Pieter-Jan Hoedt, Maximilian Beck, Sebastian B\"ock, G\"unter Klambauer, Sepp Hochreiter

Abstract: There have been numerous attempts to distill quadratic attention-based large language models (LLMs) into sub-quadratic linearized architectures. However, despite extensive research, such distilled models often fail to match the performance of their teacher LLMs on various downstream tasks. We set out the goal of lossless distillation, which we define in terms of tolerance-corrected Win-and-Tie rates between student and teacher on sets of tasks. To this end, we introduce an effective distillation pipeline for xLSTM-based students. We propose an additional merging stage, where individually linearized experts are combined into a single model. We show the effectiveness of this pipeline by distilling base and instruction-tuned models from the Llama, Qwen, and Olmo families. In many settings, our xLSTM-based students recover most of the teacher's performance, and even exceed it on some downstream tasks. Our contributions are an important step towards more energy-efficient and cost-effective replacements for transformer-based LLMs.

new Robust and Computationally Efficient Linear Contextual Bandits under Adversarial Corruption and Heavy-Tailed Noise

Authors: Naoto Tani, Futoshi Futami

Abstract: We study linear contextual bandits under adversarial corruption and heavy-tailed noise with finite $(1+\epsilon)$-th moments for some $\epsilon \in (0,1]$. Existing work that addresses both adversarial corruption and heavy-tailed noise relies on a finite variance (i.e., finite second-moment) assumption and suffers from computational inefficiency. We propose a computationally efficient algorithm based on online mirror descent that achieves robustness to both adversarial corruption and heavy-tailed noise. While the existing algorithm incurs $\mathcal{O}(t\log T)$ computational cost, our algorithm reduces this to $\mathcal{O}(1)$ per round. We establish an additive regret bound consisting of a term depending on the $(1+\epsilon)$-moment bound of the noise and a term depending on the total amount of corruption. In particular, when $\epsilon = 1$, our result recovers existing guarantees under finite-variance assumptions. When no corruption is present, it matches the best-known rates for linear contextual bandits with heavy-tailed noise. Moreover, the algorithm requires no prior knowledge of the noise moment bound or the total amount of corruption and still guarantees sublinear regret.

new SmartSearch: How Ranking Beats Structure for Conversational Memory Retrieval

Authors: Jesper Derehag, Carlos Calva, Timmy Ghiurau

Abstract: Recent conversational memory systems invest heavily in LLM-based structuring at ingestion time and learned retrieval policies at query time. We show that neither is necessary. SmartSearch retrieves from raw, unstructured conversation history using a fully deterministic pipeline: NER-weighted substring matching for recall, rule-based entity discovery for multi-hop expansion, and a CrossEncoder+ColBERT rank fusion stage -- the only learned component -- running on CPU in ~650ms. Oracle analysis on two benchmarks identifies a compilation bottleneck: retrieval recall reaches 98.6%, but without intelligent ranking only 22.5% of gold evidence survives truncation to the token budget. With score-adaptive truncation and no per-dataset tuning, SmartSearch achieves 93.5% on LoCoMo and 88.4% on LongMemEval-S, exceeding all known memory systems under the same evaluation protocol on both benchmarks while using 8.5x fewer tokens than full-context baselines.

new HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification

Authors: Erik Y. Wang, Sumeet Motwani, James V. Roggeveen, Eliot Hodges, Dulhan Jayalath, Charles London, Kalyan Ramakrishnan, Flaviu Cipcigan, Philip Torr, Alessandro Abate

Abstract: Can AI make progress on important, unsolved mathematical problems? Large language models are now capable of sophisticated mathematical and scientific reasoning, but whether they can perform novel research is still widely debated and underexplored. We introduce HorizonMath, a benchmark of over 100 predominantly unsolved problems spanning 8 domains in computational and applied mathematics, paired with an open-source evaluation framework for automated verification. Our benchmark targets a class of problems where discovery is hard, requiring meaningful mathematical insight, but verification is computationally efficient and simple. Because these solutions are unknown, HorizonMath is immune to data contamination, and most state-of-the-art models score near 0%. Existing research-level benchmarks instead rely on formal proof verification or manual review, both of which are expensive to scale. Using this platform, we find two problems for which GPT 5.4 Pro proposes solutions that improve on the best-known published results, representing potential novel contributions (pending expert review). We release HorizonMath as an open challenge and a growing community resource, where correct solutions to problems in the unsolved problem classes could constitute novel results in the mathematical literature.

cross ReflexSplit: Single Image Reflection Separation via Layer Fusion-Separation

Authors: Chia-Ming Lee, Yu-Fan Lin, Jing-Hui Jung, Yu-Jou Hsiao, Chih-Chung Hsu, Yu-Lun Liu

Abstract: Single Image Reflection Separation (SIRS) disentangles mixed images into transmission and reflection layers. Existing methods suffer from transmission-reflection confusion under nonlinear mixing, particularly in deep decoder layers, due to implicit fusion mechanisms and inadequate multi-scale coordination. We propose ReflexSplit, a dual-stream framework with three key innovations. (1) Cross-scale Gated Fusion (CrGF) adaptively aggregates semantic priors, texture details, and decoder context across hierarchical depths, stabilizing gradient flow and maintaining feature consistency. (2) Layer Fusion-Separation Blocks (LFSB) alternate between fusion for shared structure extraction and differential separation for layer-specific disentanglement. Inspired by Differential Transformer, we extend attention cancellation to dual-stream separation via cross-stream subtraction. (3) Curriculum training progressively strengthens differential separation through depth-dependent initialization and epoch-wise warmup. Extensive experiments on synthetic and real-world benchmarks demonstrate state-of-the-art performance with superior perceptual quality and robust generalization. Our code is available at https://github.com/wuw2135/ReflexSplit.

URLs: https://github.com/wuw2135/ReflexSplit.

cross CLIPoint3D: Language-Grounded Few-Shot Unsupervised 3D Point Cloud Domain Adaptation

Authors: Mainak Singha, Sarthak Mehrotra, Paolo Casari, Subhasis Chaudhuri, Elisa Ricci, Biplab Banerjee

Abstract: Recent vision-language models (VLMs) such as CLIP demonstrate impressive cross-modal reasoning, extending beyond images to 3D perception. Yet, these models remain fragile under domain shifts, especially when adapting from synthetic to real-world point clouds. Conventional 3D domain adaptation approaches rely on heavy trainable encoders, yielding strong accuracy but at the cost of efficiency. We introduce CLIPoint3D, the first framework for few-shot unsupervised 3D point cloud domain adaptation built upon CLIP. Our approach projects 3D samples into multiple depth maps and exploits the frozen CLIP backbone, refined through a knowledge-driven prompt tuning scheme that integrates high-level language priors with geometric cues from a lightweight 3D encoder. To adapt task-specific features effectively, we apply parameter-efficient fine-tuning to CLIP's encoders and design an entropy-guided view sampling strategy for selecting confident projections. Furthermore, an optimal transport-based alignment loss and an uncertainty-aware prototype alignment loss collaboratively bridge source-target distribution gaps while maintaining class separability. Extensive experiments on PointDA-10 and GraspNetPC-10 benchmarks show that CLIPoint3D achieves consistent 3-16% accuracy gains over both CLIP-based and conventional encoder-based baselines. Codes are available at https://github.com/SarthakM320/CLIPoint3D.

URLs: https://github.com/SarthakM320/CLIPoint3D.

cross OpenClaw-RL: Train Any Agent Simply by Talking

Authors: Yinjie Wang, Xuyang Chen, Xiaolong Jin, Mengdi Wang, Ling Yang

Abstract: Every agent interaction generates a next-state signal, namely the user reply, tool output, terminal or GUI state change that follows each action, yet no existing agentic RL system recovers it as a live, online learning source. We present OpenClaw-RL, a framework built on a simple observation: next-state signals are universal, and policy can learn from all of them simultaneously. Personal conversations, terminal executions, GUI interactions, SWE tasks, and tool-call traces are not separate training problems. They are all interactions that can be used to train the same policy in the same loop. Next-state signals encode two forms of information: evaluative signals, which indicate how well the action performed and are extracted as scalar rewards via a PRM judge; and directive signals, which indicate how the action should have been different and are recovered through Hindsight-Guided On-Policy Distillation (OPD). We extract textual hints from the next state, construct an enhanced teacher context, and provide token-level directional advantage supervision that is richer than any scalar reward. Due to the asynchronous design, the model serves live requests, the PRM judges ongoing interactions, and the trainer updates the policy at the same time, with zero coordination overhead between them. Applied to personal agents, OpenClaw-RL enables an agent to improve simply by being used, recovering conversational signals from user re-queries, corrections, and explicit feedback. Applied to general agents, the same infrastructure supports scalable RL across terminal, GUI, SWE, and tool-call settings, where we additionally demonstrate the utility of process rewards. Code: https://github.com/Gen-Verse/OpenClaw-RL

URLs: https://github.com/Gen-Verse/OpenClaw-RL

cross A Hybrid Tsallis-Polarization Impurity Measure for Decision Trees: Theoretical Foundations and Empirical Evaluation

Authors: Edouard Lansiaux, Idriss Jairi, Hayfa Zgaya-Biau

Abstract: We introduce the Integrated Tsallis Combination (ITC), a hybrid impurity measure for decision tree learning that combines normalized Tsallis entropy with an exponential polarization component. While many existing measures sacrifice theoretical soundness for computational efficiency or vice versa, ITC provides a mathematically principled framework that balances both aspects. The core innovation lies in the complementarity between Tsallis entropy's information-theoretic foundations and the polarization component's sensitivity to distributional asymmetry. We establish key theoretical properties-concavity under explicit parameter conditions, proper boundary conditions, and connections to classical measures-and provide a rigorous justification for the hybridization strategy. Through an extensive comparative evaluation on seven benchmark datasets comparing 23 impurity measures with five-fold repetition, we show that simple parametric measures (Tsallis $\alpha=0.5$) achieve the highest average accuracy ($91.17\%$), while ITC variants yield competitive results ($88.38-89.16\%$) with strong theoretical guarantees. Statistical analysis (Friedman test: $\chi^2=3.89$, $p=0.692$) reveals no significant global differences among top performers, indicating practical equivalence for many applications. ITC's value resides in its solid theoretical grounding-proven concavity under suitable conditions, flexible parameterization ($\alpha$, $\beta$, $\gamma$), and computational efficiency $O(K)$-making it a rigorous, generalizable alternative when theoretical guarantees are paramount. We provide guidelines for measure selection based on application priorities and release an open-source implementation to foster reproducibility and further research.

cross When Alpha Breaks: Two-Level Uncertainty for Safe Deployment of Cross-Sectional Stock Rankers

Authors: Ursina Sanderink

Abstract: Cross-sectional ranking models are often deployed as if point predictions were sufficient: the model outputs scores and the portfolio follows the induced ordering. Under non-stationarity, rankers can fail during regime shifts. In the AI Stock Forecaster, a LightGBM ranker performs well overall at a 20-day horizon, yet the 2024 holdout coincides with an AI thematic rally and sector rotation that breaks the signal at longer horizons and weakens 20d. This motivates treating deployment as two decisions: (i) whether the strategy should trade at all, and (ii) how to control risk within active trades. We adapt Direct Epistemic Uncertainty Prediction (DEUP) to ranking by predicting rank displacement and defining an epistemic uncertainty signal ehat relative to a point-in-time (PIT-safe) baseline. Empirically, ehat is structurally coupled with signal strength (median correlation between ehat and absolute score is about 0.6 across 1,865 dates), so inverse-uncertainty sizing de-levers the strongest signals and degrades performance. To address this, we propose a two-level deployment policy: a strategy-level regime-trust gate G(t) that decides whether to trade (AUROC around 0.72 overall and 0.75 in FINAL) and a position-level epistemic tail-risk cap that reduces exposure only for the most uncertain predictions. The operational policy, trade only when G(t) is at least 0.2, apply volatility sizing on active dates, and cap the top epistemic tail, improves risk-adjusted performance in the 20d policy comparison and indicates DEUP adds value mainly as a tail-risk guard rather than a continuous sizing denominator.

cross A Counterfactual Approach for Addressing Individual User Unfairness in Collaborative Recommender System

Authors: Nikita Baidya, Bidyut Kr. Patra, Ratnakar Dash

Abstract: Recommender Systems (RSs) are exploited by various business enterprises to suggest their products (items) to consumers (users). Collaborative filtering (CF) is a widely used variant of RSs which learns hidden patterns from user-item interactions for recommending items to users. Recommendations provided by the traditional CF models are often biased. Generally, such models learn and update embeddings for all the users, thereby overlooking the biases toward each under-served users individually. This leads to certain users receiving poorer recommendations than the rest. Such unfair treatment toward users incur loss to the business houses. There is limited research which addressed individual user unfairness problem (IUUP). Existing literature employed explicit exploration-based multi-armed bandits, individual user unfairness metric, and explanation score to address this issue. Although, these works elucidate and identify the underlying individual user unfairness, however, they do not provide solutions for it. In this paper, we propose a dual-step approach which identifies and mitigates IUUP in recommendations. In the proposed work, we counterfactually introduce new interactions to the candidate users (one at a time) and subsequently analyze the benefit from this perturbation. This improves the user engagement with other users and items. Thus, the model can learn effective embeddings across the users. To showcase the effectiveness of the proposed counterfactual methodology, we conducted experiments on MovieLens-100K, Amazon Beauty and MovieLens-1M datasets. The experimental results validate the superiority of the proposed approach over the existing techniques.

cross Explain in Your Own Words: Improving Reasoning via Token-Selective Dual Knowledge Distillation

Authors: Minsang Kim, Seung Jun Baek

Abstract: Knowledge Distillation (KD) can transfer the reasoning abilities of large models to smaller ones, which can reduce the costs to generate Chain-of-Thoughts for reasoning tasks. KD methods typically ask the student to mimic the teacher's distribution over the entire output. However, a student with limited capacity can be overwhelmed by such extensive supervision causing a distribution mismatch, especially in complex reasoning tasks. We propose Token-Selective Dual Knowledge Distillation (TSD-KD), a framework for student-centric distillation. TSD-KD focuses on distilling important tokens for reasoning and encourages the student to explain reasoning in its own words. TSD-KD combines indirect and direct distillation. Indirect distillation uses a weak form of feedback based on preference ranking. The student proposes candidate responses generated on its own; the teacher re-ranks those candidates as indirect feedback without enforcing its entire distribution. Direct distillation uses distribution matching; however, it selectively distills tokens based on the relative confidence between teacher and student. Finally, we add entropy regularization to maintain the student's confidence during distillation. Overall, our method provides the student with targeted and indirect feedback to support its own reasoning process and to facilitate self-improvement. The experiments show the state-of-the-art performance of TSD-KD on 10 challenging reasoning benchmarks, outperforming the baseline and runner-up in accuracy by up to 54.4\% and 40.3\%, respectively. Notably, a student trained by TSD-KD even outperformed its own teacher model in four cases by up to 20.3\%. The source code is available at https://github.com/kmswin1/TSD-KD.

URLs: https://github.com/kmswin1/TSD-KD.

cross Deep Convolutional Architectures for EEG Classification: A Comparative Study with Temporal Augmentation and Confidence-Based Voting

Authors: Aryan Patodiya, Hubert Cecotti

Abstract: Electroencephalography (EEG) classification plays a key role in brain-computer interface (BCI) systems, yet it remains challenging due to the low signal-to-noise ratio, temporal variability of neural responses, and limited data availability. In this paper, we present a comparative study of deep learning architectures for classifying event-related potentials (ERPs) in EEG signals. The preprocessing pipeline includes bandpass filtering, spatial filtering, and normalization. We design and compare three main pipelines: a 2D convolutional neural network (CNN) using Common Spatial Pattern (CSP), a second 2D CNN trained directly on raw data for a fair comparison, and a 3D CNN that jointly models spatiotemporal representations. To address ERP latency variations, we introduce a temporal shift augmentation strategy during training. At inference time, we employ a confidence-based test-time voting mechanism to improve prediction stability across shifted trials. An experimental evaluation on a stratified five-fold cross-validation protocol demonstrates that while CSP provides a benefit to the 2D architecture, the proposed 3D CNN significantly outperforms both 2D variants in terms of AUC and balanced accuracy. These findings highlight the effectiveness of temporal-aware architectures and augmentation strategies for robust EEG signal classification.

cross Bullet Trains: Parallelizing Training of Temporally Precise Spiking Neural Networks

Authors: Todd Morrill, Christian Pehle, Anthony Zador

Abstract: Continuous-time, event-native spiking neural networks (SNNs) operate strictly on spike events, treating spike timing and ordering as the representation rather than an artifact of time discretization. This viewpoint aligns with biological computation and with the native resolution of event sensors and neuromorphic processors, while enabling compute and memory that scale with the number of events. However, two challenges hinder practical, end-to-end trainable event-based SNN systems: 1) exact charge--fire--reset dynamics impose inherently sequential processing of input spikes, and 2) precise spike times must be solved without time bins. We address both. First, we use parallel associative scans to consume multiple input spikes at once, yielding up to 44x speedups over sequential simulation while retaining exact hard-reset dynamics. Second, we implement differentiable spike-time solvers that compute spike times to machine precision without discrete-time approximations or restrictive analytic assumptions. We demonstrate the viability of training SNNs using our solutions on four event-based datasets on GPUs.

cross TAS-GNN: A Status-Aware Signed Graph Neural Network for Anomaly Detection in Bitcoin Trust Systems

Authors: Chang Xue, Fang Liu, Jiaye Wang, Jinming Xing, Chen Yang

Abstract: Decentralized financial platforms rely heavily on Web of Trust reputation systems to mitigate counterparty risk in the absence of centralized identity verification. However, these pseudonymous networks are inherently vulnerable to adversarial behaviors, such as Sybil attacks and camouflaged fraud, where malicious actors cultivate artificial reputations before executing exit scams. Traditional anomaly detection in this domain faces two critical limitations. First, reliance on naive statistical heuristics (e.g., flagging the lowest 5% of rated users) fails to distinguish between victims of bad-mouthing attacks and actual fraudsters. Second, standard Graph Neural Networks (GNNs) operate on the assumption of homophily and cannot effectively process the semantic inversion inherent in signed (trust vs. distrust) and directed (status) edges. We propose TAS-GNN (Topology-Aware Signed Graph Neural Network), a novel framework designed for feature-sparse signed networks like Bitcoin-Alpha. TAS-GNN integrates recursive Web-of-Trust labeling and a dual-channel message-passing architecture that separately models trust and distrust signals, fused through a Status-Aware Attention mechanism. Experiments demonstrate that TAS-GNN achieves state-of-the-art performance, significantly outperforming existing signed GNN baselines.

cross Not All Queries Need Rewriting: When Prompt-Only LLM Refinement Helps and Hurts Dense Retrieval

Authors: Varun Kotte

Abstract: Prompt-only, single-step LLM query rewriting, where a rewrite is generated from the query alone without retrieval feedback, is commonly used in production RAG pipelines, but its effect on dense retrieval is poorly understood. We present a systematic empirical study across three BEIR benchmarks, two dense retrievers, and multiple training configurations, and find strongly domain-dependent behavior: rewriting degrades nDCG@10 by 9.0 percent on FiQA, improves it by 5.1 percent on TREC-COVID, and has no significant effect on SciFact. We identify a consistent mechanism: degradations co-occur with reduced lexical alignment between rewritten queries and relevant documents, as rewriting replaces domain-specific terms in already well-matched queries. In contrast, improvements arise when rewriting shifts queries toward corpus-preferred terminology and resolves inconsistent nomenclature. Lexical substitution occurs in 95 percent of rewrites across all outcome groups, showing that effectiveness depends on the direction of substitution rather than substitution itself. We also study selective rewriting and find that simple feature-based gating can reduce worst-case regressions but does not reliably outperform never rewriting, with even oracle selection offering only modest gains. Overall, these results show that prompt-only rewriting can be harmful in well-optimized verticals and suggest that domain-adaptive post-training is a safer strategy when supervision or implicit feedback is available.

cross PolyMon: A Unified Framework for Polymer Property Prediction

Authors: Gaopeng Ren, Yijie Yang, Jiajun Zhou, Kim E. Jelfs

Abstract: Accurate prediction of polymer properties is essential for materials design, but remains challenging due to data scarcity, diverse polymer representations, and the lack of systematic evaluation across modelling choices. Here, we present PolyMon, a unified and accessible framework that integrates multiple polymer representations, machine learning methods, and training strategies within a single, accessible platform. PolyMon supports various descriptors and graph construction strategies for polymer representations, and includes a wide range of models, from tabular models to graph neural networks, along with flexible training strategies including multi-fidelity learning, {\Delta}-learning, active learning, and ensemble learning. Using five key polymer properties as benchmarks, we perform systematic evaluations to assess how representations and models affect predictive performance. These case studies further illustrate how different training strategies can be applied within a consistent workflow to leverage limited data and incorporate physical model derived information. Overall, PolyMon provides a comprehensive and extensible foundation for benchmarking and advancing machine learning-based polymer property prediction. The code is available at github.com/fate1997/polymon.

cross Benchmarking Compact VLMs for Clip-Level Surveillance Anomaly Detection Under Weak Supervision

Authors: Kirill Borodin, Kirill Kondrashov, Nikita Vasiliev, Ksenia Gladkova, Inna Larina, Mikhail Gorodnichev, Grach Mkrtchian

Abstract: CCTV safety monitoring demands anomaly detectors combine reliable clip-level accuracy with predictable per-clip latency despite weak supervision. This work investigates compact vision-language models (VLMs) as practical detectors for this regime. A unified evaluation protocol standardizes preprocessing, prompting, dataset splits, metrics, and runtime settings to compare parameter-efficiently adapted compact VLMs against training-free VLM pipelines and weakly supervised baselines. Evaluation spans accuracy, precision, recall, F1, ROC-AUC, and average per-clip latency to jointly quantify detection quality and efficiency. With parameter-efficient adaptation, compact VLMs achieve performance on par with, and in several cases exceeding, established approaches while retaining competitive per-clip latency. Adaptation further reduces prompt sensitivity, producing more consistent behavior across prompt regimes under the shared protocol. These results show that parameter-efficient fine-tuning enables compact VLMs to serve as dependable clip-level anomaly detectors, yielding a favorable accuracy-efficiency trade-off within a transparent and consistent experimental setup.

cross Suppressing Domain-Specific Hallucination in Construction LLMs: A Knowledge Graph Foundation for GraphRAG and QLoRA on River and Sediment Control Technical Standards

Authors: Takato Yasuno

Abstract: This paper addresses the challenge of answering technical questions derived from Japan's River and Sediment Control Technical Standards -- a multi-volume regulatory document covering survey, planning, design, and maintenance of river levees, dams, and sabo structures -- using open-source large language models running entirely on local hardware. We implement and evaluate three complementary approaches: Case A (plain 20B LLM baseline), Case B (8B LLM with QLoRA domain fine-tuning on 715 graph-derived QA pairs), and Case C (20B LLM augmented with a Neo4j knowledge graph via GraphRAG). All three cases use the Swallow series of Japanese-adapted LLMs and are evaluated on a 100-question benchmark spanning 8 technical categories, judged automatically by an independent LLM (Qwen2.5-14B, score 0--3). The key finding is a performance inversion: the 8B QLoRA fine-tuned model (Case B) achieves a judge average of 2.92/3 -- surpassing both the 20B plain baseline (Case A: 2.29/3, $+$0.63) and the 20B GraphRAG approach (Case C: 2.62/3, $+$0.30) -- while running at 3$\times$ faster latency (14.2s vs. 42.2s for Case A). GraphRAG provides moderate gains ($+$0.33 over baseline) but is outperformed by domain-specific fine-tuning in both quality and efficiency. We document the full engineering pipeline, including knowledge graph construction (200 nodes, 268 relations), QLoRA training data generation from Neo4j relations, training on a single GPU (16 GB VRAM) using unsloth, GGUF Q4_K_M quantisation and Ollama deployment, and the graph retrieval and re-ranking design. High-level engineering lessons are distilled in the main body; implementation pitfalls and toolchain details are documented in Supplementary Materials.

cross Multi-view Attention Fusion of Heterogeneous Hypergraph with Dynamic Behavioral Profiling for Personalized Learning Resource Recommendation

Authors: Tao Xie, Yan Li, Yongpan Sheng, Jian Liao

Abstract: Hypergraph can capture complex and higher-order dependencies among learners and learning resources in personalized educational recommender systems. Many existing hypergraph-based recommendation approaches underexplored the dynamic behavioral processes inherent to learning and often oversimplified the complementary information embedded across multiple dimensions (i.e. views) within hypergraphs. These limitations compromise both the distinctiveness of learned representations and the model's generalization capabilities, especially under data-sparse conditions typical in educational settings. In this study, we propose a unified model comprising a dynamic behavioral profiling module and a multi-view attention fusion module based on heterogeneous hypergraph construction. The dynamic behavioral profiling module is designed to capture evolving behavioral processes and infer latent higher-order relations crucial for hypergraph completion; The multi-view fusion module cohesively integrates information from distinct relational views, enriching the overall data representation. The proposed model was systematically evaluated on five public benchmark datasets and one real-world, self-constructed dataset. Experimental results demonstrate that the model outperforms baseline methods across most datasets in key metrics; Furthermore, hypergraph completion based on dynamic behavioral profiling contributes significantly to performance gains, though its efficacy is modulated by dataset characteristics. Beyond offline experiments, we implemented a functional prototype system tailored for postgraduate student literature recommendation. A mixed-methods user study was conducted to assess its practical utility. Quantitative analysis revealed significantly higher perceived recommendation quality; Qualitative feedback highlighted enhanced user engagement and satisfaction with the prototype system.

cross Design-MLLM: A Reinforcement Alignment Framework for Verifiable and Aesthetic Interior Design

Authors: Yuxuan Yang, Xiaotong Mao, Jingyao Wang, Fuchun Sun

Abstract: Interior design is a requirements-to-visual-plan generation process that must simultaneously satisfy verifiable spatial feasibility and comparative aesthetic preferences. While recent multimodal large language models (MLLMs) offer a unified foundation for interpreting user intent and producing design rationales, our empirical analysis reveals a persistent contradiction in real-world deployment: MLLMs often produce layouts that are unbuildable and aesthetically inconsistent. These findings indicate that simply adding in-domain text is insufficient; effective interior design requires an alignment mechanism that separates hard constraints from soft preferences and coordinates them during optimization. To address this, we propose Design-MLLM, a reinforcement alignment framework that optimizes a feasibility-first preference objective via a dual-branch, aesthetic-oriented reward. Specifically, Design-MLLM (i) explicitly evaluates spatial feasibility using programmatic constraint checks, (ii) assesses aesthetic preference only among feasible candidates to avoid visually appealing but unexecutable shortcuts, and (iii) performs group-relative optimization to obtain stable preference signals. Through this process, Design-MLLM learns a controllable policy that consistently selects and generates solutions that are both executable and aesthetically coherent, rather than occasionally producing visually appealing but infeasible designs. Extensive experiments on various benchmark datasets demonstrate the advantages of Design-MLLM.

cross Nepali Passport Question Answering: A Low-Resource Dataset for Public Service Applications

Authors: Funghang Limbu Begha, Praveen Acharya, Bal Krishna Bal

Abstract: Nepali, a low-resource language, faces significant challenges in building an effective information retrieval system due to the unavailability of annotated data and computational linguistic resources. In this study, we attempt to address this gap by preparing a pair-structured Nepali Question-Answer dataset. We focus on Frequently Asked Questions (FAQs) for passport-related services, building a data set for training and evaluation of IR models. In our study, we have fine-tuned transformer-based embedding models for semantic similarity in question-answer retrieval. The fine-tuned models were compared with the baseline BM25. In addition, we implement a hybrid retrieval approach, integrating fine-tuned models with BM25, and evaluate the performance of the hybrid retrieval. Our results show that the fine-tuned SBERT-based models outperform BM25, whereas multilingual E5 embedding-based models achieve the highest retrieval performance among all evaluated models.

cross Self-Supervised Multi-Stage Domain Unlearning for White-Matter Lesion Segmentation

Authors: Domen Prelo\v{z}nik, \v{Z}iga \v{S}piclin

Abstract: Inter-scanner variability of magnetic resonance imaging has an adverse impact on the diagnostic and prognostic quality of the scans and necessitates the development of models robust to domain shift inflicted by the unseen scanner data. Review of recent advances in domain adaptation showed that efficacy of strategies involving modifications or constraints on the latent space appears to be contingent upon the level and/or depth of supervision during model training. In this paper, we therefore propose an unsupervised domain adaptation technique based on self-supervised multi-stage unlearning (SSMSU). Building upon the state-of-the-art segmentation framework nnU-Net, we employ deep supervision at deep encoder stages using domain classifier unlearning, applied sequentially across the deep stages to suppress domain-related latent features. Following self-configurable approach of the nnU-Net, the auxiliary feedback loop implements a self-supervised backpropagation schedule for the unlearning process, since continuous unlearning was found to have a detrimental effect on the main segmentation task. Experiments were carried out on four public datasets for benchmarking white-matter lesion segmentation methods. Five benchmark models and/or strategies, covering passive to active unsupervised domain adaptation, were tested. In comparison, the SSMSU demonstrated the advantage of unlearning by enhancing lesion sensitivity and limiting false detections, which resulted in higher overall segmentation quality in terms of segmentation overlap and relative lesion volume error. The proposed model inputs only the FLAIR modality, which simplifies preprocessing pipelines, eliminates the need for inter-modality registration errors and harmonization, which can introduce variability. Source code is available on https://github.com/Pubec/nnunetv2-unlearning.

URLs: https://github.com/Pubec/nnunetv2-unlearning.

cross Why Grokking Takes So Long: A First-Principles Theory of Representational Phase Transitions

Authors: Truong Xuan Khanh, Truong Quynh Hoa, Luu Duc Trung, Phan Thanh Duc

Abstract: Grokking is the sudden generalization that appears long after a model has perfectly memorized its training data. Although this phenomenon has been widely observed, there is still no quantitative theory explaining the length of the delay between memorization and generalization. Prior work has noted that weight decay plays an important role, but no result derives tight bounds for the delay or explains its scaling behavior. We present a first-principles theory showing that grokking arises from a norm-driven representational phase transition in regularized training dynamics. Training first converges to a high-norm memorization solution and only later contracts toward a lower-norm structured representation that generalizes. Our main result establishes a scaling law for the delay: T_grok - T_mem = Theta((1 / gamma_eff) * log(||theta_mem||^2 / ||theta_post||^2)), where gamma_eff is the effective contraction rate of the optimizer (gamma_eff = eta * lambda for SGD and gamma_eff >= eta * lambda for AdamW). The upper bound follows from a discrete Lyapunov contraction argument, and the matching lower bound arises from dynamical constraints of regularized first-order optimization. Across 293 training runs spanning modular addition, modular multiplication, and sparse parity tasks, we confirm three predictions: inverse scaling with weight decay, inverse scaling with learning rate, and logarithmic dependence on the norm ratio (R^2 > 0.97). We further find that grokking requires an optimizer that can decouple memorization from contraction: SGD fails under hyperparameters where AdamW reliably groks. These results show that grokking is a predictable consequence of norm separation between competing interpolating representations and provide the first quantitative scaling law for the delay of grokking.

cross MultiSolSegment: Multi-channel segmentation of overlapping features in electroluminescence images of photovoltaic cells

Authors: Ojas Sanghi (Sandia National Laboratories), Norman Jost (Sandia National Laboratories), Benjamin G. Pierce (Case Western Reserve University), Emma Cooper (University of Colorado, Boulder), Isaiah H. Deane (Sandia National Laboratories), Jennifer L. Braid (Sandia National Laboratories)

Abstract: Electroluminescence (EL) imaging is widely used to detect defects in photovoltaic (PV) modules, and machine learning methods have been applied to enable large-scale analysis of EL images. However, existing methods cannot assign multiple labels to the same pixel, limiting their ability to capture overlapping degradation features. We present a multi-channel U-Net architecture for pixel-level multi-label segmentation of EL images. The model outputs independent probability maps for cracks, busbars, dark areas, and non-cell regions, enabling accurate co-classification of interacting features such as cracks crossing busbars. The model achieved an accuracy of 98% and has been shown to generalize to unseen datasets. This framework offers a scalable, extensible tool for automated PV module inspection, improving defect quantification and lifetime prediction in large-scale PV systems.

cross OpenExtract: Automated Data Extraction for Systematic Reviews in Health

Authors: Jim Achterberg, Bram Van Dijk, Jing Meng, Saif Ul Islam, Gregory Epiphaniou, Carsten Maple, Xuefei Ding, Theodoros N. Arvanitis, Simon Brouwer, Marcel Haas, Marco Spruit

Abstract: This study presents OpenExtract, an open-source pipeline for automated data extraction in large-scale systematic literature reviews. The pipeline queries large language models (LLMs) to predict data entries based on relevant sections of scientific articles. To test the efficacy of OpenExtract, we apply it to a systematic literature review in digital health and compare its outputs with those of human researchers. OpenExtract achieves precision and recall scores of > 0.8 in this task, indicating that it can be effective at extracting data automatically and efficiently. OpenExtract: https://github.com/JimAchterbergLUMC/OpenExtract.

URLs: https://github.com/JimAchterbergLUMC/OpenExtract.

cross AgriPath: A Systematic Exploration of Architectural Trade-offs for Crop Disease Classification

Authors: Hamza Mooraj, George Pantazopoulos, Alessandro Suglia

Abstract: Reliable crop disease detection requires models that perform consistently across diverse acquisition conditions, yet existing evaluations often focus on single architectural families or lab-generated datasets. This work presents a systematic empirical comparison of three model paradigms for fine-grained crop disease classification: Convolutional Neural Networks (CNNs), contrastive Vision-Language Models (VLMs), and generative VLMs. To enable controlled analysis of domain effects, we introduce AgriPath-LF16, a benchmark containing 111k images spanning 16 crops and 41 diseases with explicit separation between laboratory and field imagery, alongside a balanced 30k subset for standardized training and evaluation. All models are trained and evaluated under unified protocols across full, lab-only, and field-only training regimes using macro-F1 and Parse Success Rate (PSR) to account for generative reliability. The results reveal distinct performance profiles. CNNs achieve the highest accuracy on lab imagery but degrade under domain shift. Contrastive VLMs provide a robust and parameter-efficient alternative with competitive cross-domain performance. Generative VLMs demonstrate the strongest resilience to distributional variation, albeit with additional failure modes stemming from free-text generation. These findings highlight that architectural choice should be guided by deployment context rather than aggregate accuracy alone.

cross Learning When to Trust in Contextual Bandits

Authors: Majid Ghasemi, Mark Crowley

Abstract: Standard approaches to Robust Reinforcement Learning assume that feedback sources are either globally trustworthy or globally adversarial. In this paper, we challenge this assumption and we identify a more subtle failure mode. We term this mode as Contextual Sycophancy, where evaluators are truthful in benign contexts but strategically biased in critical ones. We prove that standard robust methods fail in this setting, suffering from Contextual Objective Decoupling. To address this, we propose CESA-LinUCB, which learns a high-dimensional Trust Boundary for each evaluator. We prove that CESA-LinUCB achieves sublinear regret $\tilde{O}(\sqrt{T})$ against contextual adversaries, recovering the ground truth even when no evaluator is globally reliable.

cross Bi-CamoDiffusion: A Boundary-informed Diffusion Approach for Camouflaged Object Detection

Authors: Patricia L. Suarez, Leo Thomas Ramos, Angel D. Sappa

Abstract: Bi-CamoDiffusion is introduced, an evolution of the CamoDiffusion framework for camouflaged object detection. It integrates edge priors into early-stage embeddings via a parameter-free injection process, which enhances boundary sharpness and prevents structural ambiguity. This is governed by a unified optimization objective that balances spatial accuracy, structural constraints, and uncertainty supervision, allowing the model to capture of both the object's global context and its intricate boundary transitions. Evaluations across the CAMO, COD10K, and NC4K benchmarks show that Bi-CamoDiffusion surpasses the baseline, delivering sharper delineation of thin structures and protrusions while also minimizing false positives. Also, our model consistently outperforms existing state-of-the-art methods across all evaluated metrics, including $S_m$, $F_{\beta}^{w}$, $E_m$, and $MAE$, demonstrating a more precise object-background separation and sharper boundary recovery.

cross Graph2Video: Leveraging Video Models to Model Dynamic Graph Evolution

Authors: Hua Liu, Yanbin Wei, Fei Xing, Tyler Derr, Haoyu Han, Yu Zhang

Abstract: Dynamic graphs are common in real-world systems such as social media, recommender systems, and traffic networks. Existing dynamic graph models for link prediction often fall short in capturing the complexity of temporal evolution. They tend to overlook fine-grained variations in temporal interaction order, struggle with dependencies that span long time horizons, and offer limited capability to model pair-specific relational dynamics. To address these challenges, we propose \textbf{Graph2Video}, a video-inspired framework that views the temporal neighborhood of a target link as a sequence of "graph frames". By stacking temporally ordered subgraph frames into a "graph video", Graph2Video leverages the inductive biases of video foundation models to capture both fine-grained local variations and long-range temporal dynamics. It generates a link-level embedding that serves as a lightweight and plug-and-play link-centric memory unit. This embedding integrates seamlessly into existing dynamic graph encoders, effectively addressing the limitations of prior approaches. Extensive experiments on benchmark datasets show that Graph2Video outperforms state-of-the-art baselines on the link prediction task in most cases. The results highlight the potential of borrowing spatio-temporal modeling techniques from computer vision as a promising and effective approach for advancing dynamic graph learning.

cross Multimodal Deep Learning for Dynamic and Static Neuroimaging: Integrating MRI and fMRI for Alzheimer Disease Analysis

Authors: Anima Kujur, Zahra Monfared

Abstract: Magnetic Resonance Imaging (MRI) provides detailed structural information, while functional MRI (fMRI) captures temporal brain activity. In this work, we present a multimodal deep learning framework that integrates MRI and fMRI for multi-class classification of Alzheimer Disease (AD), Mild Cognitive Impairment, and Normal Cognitive State. Structural features are extracted from MRI using 3D convolutional neural networks, while temporal features are learned from fMRI sequences using recurrent architectures. These representations are fused to enable joint spatial-temporal learning. Experiments were conducted on a small paired MRI-fMRI dataset (29 subjects), both with and without data augmentation. Results show that data augmentation substantially improves classification stability and generalization, particularly for the multimodal 3DCNN-LSTM model. In contrast, augmentation was found to be ineffective for a large-scale single-modality MRI dataset. These findings highlight the importance of dataset size and modality when designing augmentation strategies for neuroimaging-based AD classification.

cross GraphVLM: Benchmarking Vision Language Models for Multimodal Graph Learning

Authors: Jiajin Liu, Dongzhe Fan, Chuanhao Ji, Daochen Zha, Qiaoyu Tan

Abstract: Vision-Language Models (VLMs) have demonstrated remarkable capabilities in aligning and understanding multimodal signals, yet their potential to reason over structured data, where multimodal entities are connected through explicit relational graphs, remains largely underexplored. Unlocking this capability is crucial for real-world applications such as social networks, recommendation systems, and scientific discovery, where multimodal information is inherently structured. To bridge this gap, we present GraphVLM, a systematic benchmark designed to evaluate and harness the capabilities of VLMs for multimodal graph learning (MMGL). GraphVLM investigates three complementary paradigms for integrating VLMs with graph reasoning: (1) VLM-as-Encoder, which enriches graph neural networks through multimodal feature fusion; (2) VLM-as-Aligner, which bridges modalities in latent or linguistic space to facilitate LLM-based structured reasoning; and (3) VLM-as-Predictor, which directly employs VLMs as multimodal backbones for graph learning tasks. Extensive experiments across six datasets from diverse domains demonstrate that VLMs enhance multimodal graph learning via all three roles. Among these paradigms, VLM-as-Predictor achieves the most substantial and consistent performance gains, revealing the untapped potential of vision-language models as a new foundation for multimodal graph learning. The benchmark code is publicly available at https://github.com/oamyjin/GraphVLM.

URLs: https://github.com/oamyjin/GraphVLM.

cross The ARC of Progress towards AGI: A Living Survey of Abstraction and Reasoning

Authors: Sahar Vahdati, Andrei Aioanei, Haridhra Suresh, Jens Lehmann

Abstract: The Abstraction and Reasoning Corpus (ARC-AGI) has become a key benchmark for fluid intelligence in AI. This survey presents the first cross-generation analysis of 82 approaches across three benchmark versions and the ARC Prize 2024-2025 competitions. Our central finding is that performance degradation across versions is consistent across all paradigms: program synthesis, neuro-symbolic, and neural approaches all exhibit 2-3x drops from ARC-AGI-1 to ARC-AGI-2, indicating fundamental limitations in compositional generalization. While systems now reach 93.0% on ARC-AGI-1 (Opus 4.6), performance falls to 68.8% on ARC-AGI-2 and 13% on ARC-AGI-3, as humans maintain near-perfect accuracy across all versions. Cost fell 390x in one year (o3's $4,500/task to GPT-5.2's $12/task), although this largely reflects reduced test-time parallelism. Trillion-scale models vary widely in score and cost, while Kaggle-constrained entries (660M-8B) achieve competitive results, aligning with Chollet's thesis that intelligence is skill-acquisition efficiency. Test-time adaptation and refinement loops emerge as critical success factors, while compositional reasoning and interactive learning remain unsolved. ARC Prize 2025 winners needed hundreds of thousands of synthetic examples to reach 24% on ARC-AGI-2, confirming that reasoning remains knowledge-bound. This first release of the ARC-AGI Living Survey captures the field as of February 2026, with updates at https://nimi-ai.com/arc-survey/

URLs: https://nimi-ai.com/arc-survey/

cross Ethical Fairness without Demographics in Human-Centered AI

Authors: Shaily Roy, Harshit Sharma, Asif Salekin

Abstract: Computational models are increasingly embedded in human-centered domains such as healthcare, education, workplace analytics, and digital well-being, where their predictions directly influence individual outcomes and collective welfare. In such contexts, achieving high accuracy alone is insufficient; models must also act ethically and equitably across diverse populations. However, fair AI approaches that rely on demographic attributes are impractical, as such information is often unavailable, privacy-sensitive, or restricted by regulatory frameworks. Moreover, conventional parity-based fairness approaches, while aiming for equity, can inadvertently violate core ethical principles by trading off subgroup performance or stability. To address this challenge, we present Flare (Fisher-guided LAtent-subgroup learning with do-no-harm REgularization), the first demographic-agnostic framework that aligns algorithmic fairness with ethical principles through the geometry of optimization. Flare leverages Fisher Information to regularize curvature, uncovering latent disparities in model behavior without access to demographic or sensitive attributes. By integrating representation, loss, and curvature signals, it identifies hidden performance strata and adaptively refines them through collaborative but do-no-harm optimization, enhancing each subgroup's performance while preserving global stability and ethical balance. We also introduce BHE (Beneficence-Harm Avoidance-Equity), a novel metric suite that operationalizes ethical fairness evaluation beyond statistical parity. Extensive evaluations across diverse physiological (EDA), behavioral (IHS), and clinical (OhioT1DM) datasets show that Flare consistently enhances ethical fairness compared to state-of-the-art baselines.

cross InfiniteDance: Scalable 3D Dance Generation Towards in-the-wild Generalization

Authors: Ronghui Li, Zhongyuan Hu, Li Siyao, Youliang Zhang, Haozhe Xie, Mingyuan Zhang, Jie Guo, Xiu Li, Ziwei Liu

Abstract: Although existing 3D dance generation methods perform well in controlled scenarios, they often struggle to generalize in the wild. When conditioned on unseen music, existing methods often produce unstructured or physically implausible dance, largely due to limited music-to-dance data and restricted model capacity. This work aims to push the frontier of generalizable 3D dance generation by scaling up both data and model design. (1) On the data side, we develop a fully automated pipeline that reconstructs high-fidelity 3D dance motions from monocular videos. To eliminate the physical artifacts prevalent in existing reconstruction methods, we introduce a Foot Restoration Diffusion Model (FRDM) guided by foot-contact and geometric constraints that enforce physical plausibility while preserving kinematic smoothness and expressiveness, resulting in a diverse, high-quality multimodal 3D dance dataset totaling 100.69 hours. (2) On model design, we propose Choreographic LLaMA (ChoreoLLaMA), a scalable LLaMA-based architecture. To enhance robustness under unfamiliar music conditions, we integrate a retrieval-augmented generation (RAG) module that injects reference dance as a prompt. Additionally, we design a slow/fast-cadence Mixture-of-Experts (MoE) module that enables ChoreoLLaMA to smoothly adapt motion rhythms across varying music tempos. Extensive experiments across diverse dance genres show that our approach surpasses existing methods in both qualitative and quantitative evaluations, marking a step toward scalable, real-world 3D dance generation. Code, models, and data will be released.

cross A Computer-aided Framework for Detecting Osteosarcoma in Computed Tomography Scans

Authors: Maximo Rodriguez-Herrero, Dante D. Sanchez-Gallegos, Marco Antonio N\'u\~nez-Gaona, Heriberto Aguirre-Meneses, Luis Alberto Villalvazo Guti\'errez, Mario Ibrahin Guti\'errez Velasco, J. L. Gonzalez-Compean, Jesus Carretero

Abstract: Osteosarcoma is the most common primary bone cancer, mainly affecting the youngest and oldest populations. Its detection at early stages is crucial to reduce the probability of developing bone metastasis. In this context, accurate and fast diagnosis is essential to help physicians during the prognosis process. The research goal is to automate the diagnosis of osteosarcoma through a pipeline that includes the preprocessing, detection, postprocessing, and visualization of computed tomography (CT) scans. Thus, this paper presents a machine learning and visualization framework for classifying CT scans using different convolutional neural network (CNN) models. Preprocessing includes data augmentation and identification of the region of interest in scans. Post-processing includes data visualization to render a 3D bone model that highlights the affected area. An evaluation on 12 patients revealed the effectiveness of our framework, obtaining an area under the curve (AUC) of 94.8\% and a specificity of 94.6\%.

cross Deep Learning for BioImaging: What Are We Learning?

Authors: Ivan Svatko, Maxime Sanchez, Ihab Bendidi, Gilles Cottrell, Auguste Genovesio

Abstract: Representation learning has driven major advances in natural image analysis by enabling models to acquire high-level semantic features. In microscopy imaging, however, it remains unclear what current representation learning methods actually learn. In this work, we conduct a systematic study of representation learning for the two most widely used and broadly available microscopy data types, representing critical scales in biology: cell culture and tissue imaging. To this end, we introduce a set of simple yet revealing baselines on curated benchmarks, including untrained models and simple structural representations of cellular tissue. Our results show that, surprisingly, state-of-the-art methods perform comparably to these baselines. We further show that, in contrast to natural images, existing models fail to consistently acquire high-level, biologically meaningful features. Moreover, we demonstrate that commonly used benchmark metrics are insufficient to assess representation quality and often mask this limitation. In addition, we investigate how detailed comparisons with these benchmarks provide ways to interpret the strengths and weaknesses of models for further improvements. Together, our results suggest that progress in microscopy image representation learning requires not only stronger models, but also more diagnostic benchmarks that measure what is actually learned.

cross High-Fidelity Text-to-Image Generation from Pre-Trained Vision-Language Models via Distribution-Conditioned Diffusion Decoding

Authors: Ji Woo Hong, Hee Suk Yoon, Gwanhyeong Koo, Eunseop Yoon, SooHwan Eom, Qi Dai, Chong Luo, Chang D. Yoo

Abstract: Recent large-scale vision-language models (VLMs) have shown remarkable text-to-image generation capabilities, yet their visual fidelity remains constrained by the discrete image tokenization, which poses a major challenge. Although several studies have explored continuous representation modeling to enhance visual quality, adapting pre-trained VLM models to such representations requires large-scale data and training costs comparable to the original pre-training. To circumvent this limitation, we propose a diffusion-based decoding framework that enhances image fidelity by training only a diffusion decoder on the output image-token logits of pre-trained VLMs, thereby preserving the original model intact. At its core, Logit-to-Code Distributional Mapping converts the VLM's image-token logits into continuous, distribution-weighted code vectors with uncertainty features, providing an effective conditioning signal for diffusion decoding. A lightweight Logit Calibration aligns training-time proxy logits from the VQ-VAE encoder with VLM-generated logits, mitigating the train-inference gap. Conditioned on these representations, the Distribution-Conditioned Diffusion Decoder generates high-fidelity images. Achieved solely through short training on ImageNet-1K, our method consistently improves visual fidelity for both VQ-VAE reconstructions and text-to-image generations from VLM-predicted tokens.

cross COT-FM: Cluster-wise Optimal Transport Flow Matching

Authors: Chiensheng Chiang, Kuan-Hsun Tu, Jia-Wei Liao, Cheng-Fu Chou, Tsung-Wei Ke

Abstract: We introduce COT-FM, a general framework that reshapes the probability path in Flow Matching (FM) to achieve faster and more reliable generation. FM models often produce curved trajectories due to random or batchwise couplings, which increase discretization error and reduce sample quality. COT-FM fixes this by clustering target samples and assigning each cluster a dedicated source distribution obtained by reversing pretrained FM models. This divide-and-conquer strategy yields more accurate local transport and significantly straighter vector fields, all without changing the model architecture. As a plug-and-play approach, COT-FM consistently accelerates sampling and improves generation quality across 2D datasets, image generation benchmarks, and robotic manipulation tasks.

cross Event-Driven Video Generation

Authors: Chika Maduabuchi

Abstract: State-of-the-art text-to-video models often look realistic frame-by-frame yet fail on simple interactions: motion starts before contact, actions are not realized, objects drift after placement, and support relations break. We argue this stems from frame-first denoising, which updates latent state everywhere at every step without an explicit notion of when and where an interaction is active. We introduce Event-Driven Video Generation (EVD), a minimal DiT-compatible framework that makes sampling event-grounded: a lightweight event head predicts token-aligned event activity, event-grounded losses couple activity to state change during training, and event-gated sampling (with hysteresis and early-step scheduling) suppresses spurious updates while concentrating updates during interactions. On EVD-Bench, EVD consistently improves human preference and VBench dynamics, substantially reducing failure modes in state persistence, spatial accuracy, support relations, and contact stability without sacrificing appearance. These results indicate that explicit event grounding is a practical abstraction for reducing interaction hallucinations in video generation.

cross Projection Guided Personalized Federated Learning for Low Dose CT Denoising

Authors: Anas Zafar, Muhammad Waqas, Amgad Muneer, Rukhmini Bandyopadhyay, Jia Wu

Abstract: Low-dose CT (LDCT) reduces radiation exposure but introduces protocol-dependent noise and artifacts that vary across institutions. While federated learning enables collaborative training without centralizing patient data, existing methods personalize in image space, making it difficult to separate scanner noise from patient anatomy. We propose ProFed (Projection Guided Personalized Federated Learning), a framework that complements the image space approach by performing dual-level personalization in the projection space, where noise originates during CT measurements before reconstruction combines protocol and anatomy effects. ProFed introduces: (i) anatomy-aware and protocol-aware networks that personalize CT reconstruction to patient and scanner-specific features, (ii) multi-constraint projection losses that enforce consistency with CT measurements, and (iii) uncertainty-guided selective aggregation that weights clients by prediction confidence. Extensive experiments on the Mayo Clinic 2016 dataset demonstrate that ProFed achieves 42.56 dB PSNR with CNN backbones and 44.83 dB with Transformers, outperforming 11 federated learning baselines, including the physics-informed SCAN-PhysFed by +1.42 dB.

cross Dynamic Sparse Attention: Access Patterns and Architecture

Authors: Noam Levy

Abstract: Dynamic sparse attention (DSA) reduces the per-token attention bandwidth by restricting computation to a top-k subset of cached key-value (KV) entries, but its token-dependent selection pattern introduces a system-level challenge: the KV working set is fragmented, volatile, and difficult to prefetch, which can translate into poor cache locality and stalled decode throughput. We study these effects by implementing a lightweight indexer for DSA-style selection on multiple open-source backbones and logging per-layer KV indices during autoregressive decoding. Our analysis shows a gap in serving DSA backbones - a potential for a high volume of blocking LL (last level) cache miss events, causing inefficiency; we propose a novel LL cache reservation system to save KV tokens in the LL cache between decode steps, combined with a token-granularity LRU eviction policy, and show on the data we collected how this architecture can benefit serving with DSA implemented on different backbones. Finally, we propose directions for future architectural and algorithmic exploration to improve serving of DSA on modern inference platforms.

cross Filtered Spectral Projection for Quantum Principal Component Analysis

Authors: Sk Mujaffar Hossain, Satadeep Bhattacharjee

Abstract: Quantum principal component analysis (qPCA) is commonly formulated as the extraction of eigenvalues and eigenvectors of a covariance-encoded density operator. Yet in many qPCA settings, the practical objective is simpler: projecting data onto the dominant spectral subspace. In this work, we introduce a projection-first framework, the Filtered Spectral Projection Algorithm (FSPA), which bypasses explicit eigenvalue estimation while preserving the essential spectral structure. FSPA amplifies any nonzero warm-start overlap with the leading principal subspace and remains robust in small-gap and near-degenerate regimes without inducing artificial symmetry breaking in the absence of bias. To connect this approach to classical datasets, we show that for amplitude-encoded centered data, the ensemble density matrix $\rho=\sum_i p_i|\psi_i\rangle\langle\psi_i|$ coincides with the covariance matrix. For uncentered data, $\rho$ corresponds to PCA without centering, and we derive eigenvalue interlacing bounds quantifying the deviation from standard PCA. We further show that ensembles of quantum states admit an equivalent centered covariance interpretation. Numerical demonstrations on benchmark datasets, including Breast Cancer Wisconsin and handwritten Digits, show that downstream performance remains stable whenever projection quality is preserved. These results suggest that, in a broad class of qPCA settings, spectral projection is the essential primitive, and explicit eigenvalue estimation is often unnecessary.

cross Diffusion-based Generative Machine Learning Model for Predicting Crack Propagation in Aluminum Nitride at the Atomic Scale

Authors: Jiali Lu, Shengfeng Yang

Abstract: Predicting atomic-scale crack propagation in aluminum nitride (AlN) is critical for semiconductor reliability but remains prohibitively expensive via molecular dynamics (MD). We develop a diffusion-based generative machine learning model to predict atomic-scale crack propagation in AlN, a critical semiconductor material, by conditioning solely on initial microstructure embeddings. Trained on MD simulations of single-crack systems, the model achieves a significant speedup while accurately forecasting dynamic fracture processes, including stress-driven crack initiation, crack branching, and atomic-scale bridging ligaments. Crucially, it demonstrates inherent physical fidelity by reproducing material-intrinsic mechanisms while disregarding periodic boundary artifacts, and generalizes to unseen multi-crack configurations. Validation against MD ground truth confirms the capability of the model to capture complex fracture physics without auxiliary stress or energy data, enabling rapid exploration of crack-mediated failure for semiconductor reliability optimization.

cross MESD: Detecting and Mitigating Procedural Bias in Intersectional Groups

Authors: Gideon Popoola, John Sheppard

Abstract: Research about bias in machine learning has mostly focused on outcome-oriented fairness metrics (e.g., equalized odds) and on a single protected category. Although these approaches offer great insight into bias in ML, they provide limited insight into model procedure bias. To address this gap, we proposed multi-category explanation stability disparity (MESD), an intersectional, procedurally oriented metric that measures the disparity in the quality of explanations across intersectional subgroups in multiple protected categories. MESD serves as a complementary metric to outcome-oriented metrics, providing detailed insight into the procedure of a model. To further extend the scope of the holistic selection model, we also propose a multi-objective optimization framework, UEF (Utility-Explanation-Fairness), that jointly optimizes three objectives. Experimental results across multiple datasets show that UEF effectively balances objectives. Also, the results show that MESD can effectively capture the explanation difference between intersectional groups. This research addresses an important gap by examining explainability with respect to fairness across multiple protected categories.

cross Purifying Generative LLMs from Backdoors without Prior Knowledge or Clean Reference

Authors: Jianwei Li, Jung-Eun Kim

Abstract: Backdoor attacks pose severe security threats to large language models (LLMs), where a model behaves normally under benign inputs but produces malicious outputs when a hidden trigger appears. Existing backdoor removal methods typically assume prior knowledge of triggers, access to a clean reference model, or rely on aggressive finetuning configurations, and are often limited to classification tasks. However, such assumptions fall apart in real-world instruction-tuned LLM settings. In this work, we propose a new framework for purifying instruction-tuned LLM without any prior trigger knowledge or clean references. Through systematic sanity checks, we find that backdoor associations are redundantly encoded across MLP layers, while attention modules primarily amplify trigger signals without establishing the behavior. Leveraging this insight, we shift the focus from isolating specific backdoor triggers to cutting off the trigger-behavior associations, and design an immunization-inspired elimination approach: by constructing multiple synthetic backdoored variants of the given suspicious model, each trained with different malicious trigger-behavior pairs, and contrasting them with their clean counterparts. The recurring modifications across variants reveal a shared "backdoor signature"-analogous to antigens in a virus. Guided by this signature, we neutralize highly suspicious components in LLM and apply lightweight finetuning to restore its fluency, producing purified models that withstand diverse backdoor attacks and threat models while preserving generative capability.

cross Equivalence of approximation by networks of single- and multi-spike neurons

Authors: Dominik Dold, Philipp Christian Petersen

Abstract: In a spiking neural network, is it enough for each neuron to spike at most once? In recent work, approximation bounds for spiking neural networks have been derived, quantifying how well they can fit target functions. However, these results are only valid for neurons that spike at most once, which is commonly thought to be a strong limitation. Here, we show that the opposite is true for a large class of spiking neuron models, including the commonly used leaky integrate-and-fire model with subtractive reset: for every approximation bound that is valid for a set of multi-spike neural networks, there is an equivalent set of single-spike neural networks with only linearly more neurons (in the maximum number of spikes) for which the bound holds. The same is true for the reverse direction too, showing that regarding their approximation capabilities in general machine learning tasks, single-spike and multi-spike neural networks are equivalent. Consequently, many approximation results in the literature for single-spike neural networks also hold for the multi-spike case.

cross Synthetic Melanoma Image Generation and Evaluation Using Generative Adversarial Networks

Authors: Pei-Yu Lin, Yidan Shen, Neville Mathew, Renjie Hu, Siyu Huang, Courtney M. Queen, Cameron E. West, Ana Ciurea, George Zouridakis

Abstract: Melanoma is the most lethal form of skin cancer, and early detection is critical for improving patient outcomes. Although dermoscopy combined with deep learning has advanced automated skin-lesion analysis, progress is hindered by limited access to large, well-annotated datasets and by severe class imbalance, where melanoma images are substantially underrepresented. To address these challenges, we present the first systematic benchmarking study comparing four GAN architectures-DCGAN, StyleGAN2, and two StyleGAN3 variants (T/R)-for high-resolution melanoma-specific synthesis. We train and optimize all models on two expert-annotated benchmarks (ISIC 2018 and ISIC 2020) under unified preprocessing and hyperparameter exploration, with particular attention to R1 regularization tuning. Image quality is assessed through a multi-faceted protocol combining distribution-level metrics (FID), sample-level representativeness (FMD), qualitative dermoscopic inspection, downstream classification with a frozen EfficientNet-based melanoma detector, and independent evaluation by two board-certified dermatologists. StyleGAN2 achieves the best balance of quantitative performance and perceptual quality, attaining FID scores of 24.8 (ISIC 2018) and 7.96 (ISIC 2020) at gamma=0.8. The frozen classifier recognizes 83% of StyleGAN2-generated images as melanoma, while dermatologists distinguish synthetic from real images at only 66.5% accuracy (chance = 50%), with low inter-rater agreement (kappa = 0.17). In a controlled augmentation experiment, adding synthetic melanoma images to address class imbalance improved melanoma detection AUC from 0.925 to 0.945 on a held-out real-image test set. These findings demonstrate that StyleGAN2-generated melanoma images preserve diagnostically relevant features and can provide a measurable benefit for mitigating class imbalance in melanoma-focused machine learning pipelines.

cross Standard Acquisition Is Sufficient for Asynchronous Bayesian Optimization

Authors: Ben Riegler, James Odgers, Vincent Fortuin

Abstract: Asynchronous Bayesian optimization is widely used for gradient-free optimization in domains with independent parallel experiments and varying evaluation times. Existing methods posit that standard acquisitions lead to redundant and repeated queries, proposing complex solutions to enforce diversity in queries. Challenging this fundamental premise, we show that methods, like the Upper Confidence Bound, can in fact achieve theoretical guarantees essentially equivalent to those of sequential Thompson sampling. A conceptual analysis of asynchronous Bayesian optimization reveals that existing works neglect intermediate posterior updates, which we find to be generally sufficient to avoid redundant queries. Further investigation shows that by penalizing busy locations, diversity-enforcing methods can over-explore in asynchronous settings, reducing their performance. Our extensive experiments demonstrate that simple standard acquisition functions match or outperform purpose-built asynchronous methods across synthetic and real-world tasks.

cross VoXtream2: Full-stream TTS with dynamic speaking rate control

Authors: Nikita Torgashov, Gustav Eje Henter, Gabriel Skantze

Abstract: Full-stream text-to-speech (TTS) for interactive systems must start speaking with minimal delay while remaining controllable as text arrives incrementally. We present VoXtream2, a zero-shot full-stream TTS model with dynamic speaking-rate control that can be updated mid-utterance on the fly. VoXtream2 combines a distribution matching mechanism over duration states with classifier-free guidance across conditioning signals to improve controllability and synthesis quality. Prompt-text masking enables textless audio prompting, removing the need for prompt transcription. Across standard zero-shot benchmarks and a dedicated speaking-rate test set, VoXtream2 achieves competitive objective and subjective results against public baselines despite a smaller model and less training data. In full-stream mode, it runs 4 times faster than real time with 74 ms first-packet latency on a consumer GPU.

cross Hide and Seek: Investigating Redundancy in Earth Observation Imagery

Authors: Tasos Papazafeiropoulos, Nikolaos Ioannis Bountos, Nikolas Papadopoulos, Ioannis Papoutsis

Abstract: The growing availability of Earth Observation (EO) data and recent advances in Computer Vision have driven rapid progress in machine learning for EO, producing domain-specific models at ever-increasing scales. Yet this progress risks overlooking fundamental properties of EO data that distinguish it from other domains. We argue that EO data exhibit a multidimensional redundancy (spectral, temporal, spatial, and semantic) which has a more pronounced impact on the domain and its applications than what current literature reflects. To validate this hypothesis, we conduct a systematic domain-specific investigation examining the existence, consistency, and practical implications of this phenomenon across key dimensions of EO variability. Our findings confirm that redundancy in EO data is both substantial and pervasive: exploiting it yields comparable performance ($\approx98.5\%$ of baseline) at a fraction of the computational cost ($\approx4\times$ fewer GFLOPs), at both training and inference. Crucially, these gains are consistent across tasks, geospatial locations, sensors, ground sampling distances, and architectural designs; suggesting that multi-faceted redundancy is a structural property of EO data rather than an artifact of specific experimental choices. These results lay the groundwork for more efficient, scalable, and accessible large-scale EO models.

cross Active Sampling Sample-based Quantum Diagonalization from Finite-Shot Measurements

Authors: Rinka Miura

Abstract: Near-term quantum devices provide only finite-shot measurements and prepare imperfect, contaminated states. This motivates algorithms that convert samples into reliable low-energy estimates without full tomography or exhaustive measurements. We propose Active Sampling Sample-based Quantum Diagonalization (AS-SQD), framing SQD as an active learning problem: given measured bitstrings, which additional basis states should be included to efficiently recover the ground-state energy? SQD restricts the Hamiltonian to a selected set of basis states and classically diagonalizes the restricted matrix. However, naive SQD using only sampled states suffers from bias under finite-shot sampling and excited-state contamination, while blind random expansion is inefficient as system size grows. We introduce a perturbation-theoretic acquisition function based on Epstein--Nesbet second-order energy corrections to rank candidate basis states connected to the current subspace. At each iteration, AS-SQD diagonalizes the restricted Hamiltonian, generates connected candidates, and adds the most valuable ones according to this score. We evaluate AS-SQD on disordered Heisenberg and Transverse-Field Ising (TFIM) spin chains up to 16 qubits under a preparation model mixing 80\% ground state and 20\% first excited state. Furthermore, we validate its robustness against real-world state preparation and measurement (SPAM) errors using physical samples from an IBM Quantum processor. Across simulated and hardware evaluations, AS-SQD consistently achieves substantially lower absolute energy errors than standard SQD and random expansion. Detailed ablation studies demonstrate that physics-guided basis acquisition effectively concentrates computation on energetically relevant directions, bypassing exponential combinatorial bottlenecks.

cross AMES: Approximate Multi-modal Enterprise Search via Late Interaction Retrieval

Authors: Tony Joseph, Carlos Pareja, David Lopes Pegna, Abhishek Singh

Abstract: We present AMES (Approximate Multimodal Enterprise Search), a unified multimodal late interaction retrieval architecture which is backend agnostic. AMES demonstrates that fine-grained multimodal late interaction retrieval can be deployed within a production grade enterprise search engine without architectural redesign. Text tokens, image patches, and video frames are embedded into a shared representation space using multi-vector encoders, enabling cross-modal retrieval without modality specific retrieval logic. AMES employs a two-stage pipeline: parallel token level ANN search with per document Top-M MaxSim approximation, followed by accelerator optimized Exact MaxSim re-ranking. Experiments on the ViDoRe V3 benchmark show that AMES achieves competitive ranking performance within a scalable, production ready Solr based system.

cross Holographic Invariant Storage: Design-Time Safety Contracts via Vector Symbolic Architectures

Authors: Arsenios Scrivens

Abstract: We introduce Holographic Invariant Storage (HIS), a protocol that assembles known properties of bipolar Vector Symbolic Architectures into a design-time safety contract for LLM context-drift mitigation. The contract provides three closed-form guarantees evaluable before deployment: single-signal recovery fidelity converging to $1/\sqrt{2} \approx 0.707$ (regardless of noise depth or content), continuous-noise robustness $2\Phi(1/\sigma) - 1$, and multi-signal capacity degradation $\approx\sqrt{1/(K+1)}$. These bounds, validated by Monte Carlo simulation ($n = 1{,}000$), enable a systems engineer to budget recovery fidelity and codebook capacity at design time -- a property no timer or embedding-distance metric provides. A pilot behavioral experiment (four LLMs, 2B--7B, 720 trials) confirms that safety re-injection improves adherence at the 2B scale; full results are in an appendix.

cross Robust Automatic Differentiation of Square-Root Kalman Filters via Gramian Differentials

Authors: Adrien Corenflos

Abstract: Square-root Kalman filters propagate state covariances in Cholesky-factor form for numerical stability, and are a natural target for gradient-based parameter learning in state-space models. Their core operation, triangularization of a matrix $M \in \mathbb{R}^{n \times m}$, is computed via a QR decomposition in practice, but naively differentiating through it causes two problems: the semi-orthogonal factor is non-unique when $m > n$, yielding undefined gradients; and the standard Jacobian formula involves inverses, which diverges when $M$ is rank-deficient. Both are resolved by the observation that all filter outputs relevant to learning depend on the input matrix only through the Gramian $MM^\top$, so the composite loss is smooth in $M$ even where the triangularization is not. We derive a closed-form chain-rule directly from the differential of this Gramian identity, prove it exact for the Kalman log-marginal likelihood and filtered moments, and extend it to rank-deficient inputs via a two-component decomposition: a column-space term based on the Moore--Penrose pseudoinverse, and a null-space correction for perturbations outside the column space of $M$.

cross Task-Oriented Wireless Transmission of 3D Point Clouds: Geometric Versus Semantic Robustness

Authors: Vukan Ninkovic, Tamara Sobot, Vladimir Vincan, Gorana Gojic, Dragisa Miskovic, Dejan Vukobratovic

Abstract: Wireless transmission of high-dimensional 3D point clouds (PCs) is increasingly required in industrial collaborative robotics systems. Conventional compression methods prioritize geometric fidelity, although many practical applications ultimately depend on reliable task-level inference rather than exact coordinate reconstruction. In this paper, we propose an end-to-end semantic communication framework for wireless 3D PC transmission and conduct a systematic study of the relationship between geometric reconstruction fidelity and semantic robustness under channel impairments. The proposed architecture jointly supports geometric recovery and object classification from a shared transmitted representation, enabling direct comparison between coordinate-level and task-level sensitivity to noise. Experimental evaluation on a real industrial dataset reveals a pronounced asymmetry: semantic inference remains stable across a broad signal-to-noise ratio (SNR) range even when geometric reconstruction quality degrades significantly. These results demonstrate that reliable task execution does not require high-fidelity geometric recovery and provide design insights for task-oriented wireless perception systems in bandwidth- and power-constrained industrial environments.

cross EmDT: Embedding Diffusion Transformer for Tabular Data Generation in Fraud Detection

Authors: En-Ya Kuo, Sebastien Motsch

Abstract: Imbalanced datasets pose a difficulty in fraud detection, as classifiers are often biased toward the majority class and perform poorly on rare fraudulent transactions. Synthetic data generation is therefore commonly used to mitigate this problem. In this work, we propose the Clustered Embedding Diffusion-Transformer (EmDT), a diffusion model designed to generate fraudulent samples. Our key innovation is to leverage UMAP clustering to identify distinct fraudulent patterns, and train a Transformer denoising network with sinusoidal positional embeddings to capture feature relationships throughout the diffusion process. Once the synthetic data has been generated, we employ a standard decision-tree-based classifier (e.g., XGBoost) for classification, as this type of model remains better suited to tabular datasets. Experiments on a credit card fraud detection dataset demonstrate that EmDT significantly improves downstream classification performance compared to existing oversampling and generative methods, while maintaining comparable privacy protection and preserving feature correlations present in the original data.

cross State Algebra for Probabilistic Logic

Authors: Dmitry Lesnik, Tobias Sch\"afer

Abstract: This paper presents a Probabilistic State Algebra as an extension of deterministic propositional logic, providing a computational framework for constructing Markov Random Fields (MRFs) through pure linear algebra. By mapping logical states to real-valued coordinates interpreted as energy potentials, we define an energy-based model where global probability distributions emerge from coordinate-wise Hadamard products. This approach bypasses the traditional reliance on graph-traversal algorithms and compiled circuits, utilising $t$-objects and wildcards to embed logical reduction natively within matrix operations. We demonstrate that this algebra constructs formal Gibbs distributions, offering a rigorous mathematical link between symbolic constraints and statistical inference. A central application of this framework is the development of Probabilistic Rule Models (PRMs), which are uniquely capable of incorporating both probabilistic associations and deterministic logical constraints simultaneously. These models are designed to be inherently interpretable, supporting a human-in-the-loop approach to decisioning in high-stakes environments such as healthcare and finance. By representing decision logic as a modular summation of rules within a vector space, the framework ensures that complex probabilistic systems remain auditable and maintainable without compromising the rigour of the underlying configuration space.

cross State-space models through the lens of ensemble control

Authors: Ye Feng, Jianfeng Lu

Abstract: State-space models (SSMs) are effective architectures for sequential modeling, but a rigorous theoretical understanding of their training dynamics is still lacking. In this work, we formulate the training of SSMs as an ensemble optimal control problem, where a shared control law governs a population of input-dependent dynamical systems. We derive Pontryagin's maximum principle (PMP) for this ensemble control formulation, providing necessary conditions for optimality. Motivated by these conditions, we introduce an algorithm based on the method of successive approximations. We prove convergence of this iterative scheme along a subsequence and establish sufficient conditions for global optimality. The resulting framework provides a control-theoretic perspective on SSM training.

cross EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings

Authors: Shiva Krishna Reddy Malay, Shravan Nayak, Jishnu Sethumadhavan Nair, Sagar Davasam, Aman Tiwari, Sathwik Tejaswi Madhusudhan, Sridhar Krishna Nemala, Srinivas Sunkara, Sai Rajeswar

Abstract: Large language models are shifting from passive information providers to active agents intended for complex workflows. However, their deployment as reliable AI workers in enterprise is stalled by benchmarks that fail to capture the intricacies of professional environments, specifically, the need for long-horizon planning amidst persistent state changes and strict access protocols. In this work, we introduce EnterpriseOps-Gym, a benchmark designed to evaluate agentic planning in realistic enterprise settings. Specifically, EnterpriseOps-Gym features a containerized sandbox with 164 database tables and 512 functional tools to mimic real-world search friction. Within this environment, agents are evaluated on 1,150 expert-curated tasks across eight mission-critical verticals (including Customer Service, HR, and IT). Our evaluation of 14 frontier models reveals critical limitations in state-of-the-art models: the top-performing Claude Opus 4.5 achieves only 37.4% success. Further analysis shows that providing oracle human plans improves performance by 14-35 percentage points, pinpointing strategic reasoning as the primary bottleneck. Additionally, agents frequently fail to refuse infeasible tasks (best model achieves 53.9%), leading to unintended and potentially harmful side effects. Our findings underscore that current agents are not yet ready for autonomous enterprise deployment. More broadly, EnterpriseOps-Gym provides a concrete testbed to advance the robustness of agentic planning in professional workflows.

cross NCCL EP: Towards a Unified Expert Parallel Communication API for NCCL

Authors: Amos Goldman (NVIDIA Corporation), Nimrod Boker (NVIDIA Corporation), Maayan Sheraizin (NVIDIA Corporation), Nimrod Admoni (NVIDIA Corporation), Artem Polyakov (NVIDIA Corporation), Subhadeep Bhattacharya (NVIDIA Corporation), Fan Yu (NVIDIA Corporation), Kai Sun (NVIDIA Corporation), Georgios Theodorakis (NVIDIA Corporation), Hsin-Chun Yin (NVIDIA Corporation), Peter-Jan Gootzen (NVIDIA Corporation), Aamir Shafi (NVIDIA Corporation), Assaf Ravid (NVIDIA Corporation), Salvatore Di Girolamo (NVIDIA Corporation), Manjunath Gorentla Venkata (NVIDIA Corporation), Gil Bloch (NVIDIA Corporation)

Abstract: Mixture-of-Experts (MoE) architectures have become essential for scaling large language models, driving the development of specialized device-initiated communication libraries such as DeepEP, Hybrid-EP, and others. These libraries demonstrate the performance benefits of GPU-initiated RDMA for MoE dispatch and combine operations. This paper presents NCCL EP (Expert Parallelism), a ground-up MoE communication library built entirely on NCCL's Device API. NCCL EP provides unified ncclEpDispatch and ncclEpCombine primitives with both C and Python interfaces, supporting Low-Latency (LL) mode for inference decoding and High-Throughput (HT) mode for training and inference prefill. LL targets small batch sizes (1-128 tokens) using direct all-to-all RDMA+NVLink mesh connectivity with double-buffered communication for overlapping dispatch and combine phases. HT targets large batches (4096+ tokens) using hierarchical communication that aggregates tokens within NVLink domains before inter-node RDMA transmission. Both modes leverage Device API for both intra- and inter-node communications, taking advantage of its topology awareness and optimized GPU-initiated implementation. We evaluate NCCL EP on an H100-based cluster across multi-node configurations, demonstrating competitive LL kernel performance and presenting end-to-end results with vLLM integration. By building MoE communication natively within NCCL, NCCL EP provides a supported path for expert parallelism on current and emerging NVIDIA platforms.

cross Robust Sequential Tracking via Bounded Information Geometry and Non-Parametric Field Actions

Authors: Carlos C. Rodriguez

Abstract: Standard sequential inference architectures are compromised by a normalizability crisis when confronted with extreme, structured outliers. By operating on unbounded parameter spaces, state-of-the-art estimators lack the intrinsic geometry required to appropriately sever anomalies, resulting in unbounded covariance inflation and mean divergence. This paper resolves this structural failure by analyzing the abstraction sequence of inference at the meta-prior level (S_2). We demonstrate that extremizing the action over an infinite-dimensional space requires a non-parametric field anchored by a pre-prior, as a uniform volume element mathematically does not exist. By utilizing strictly invariant Delta (or \nu) Information Separations on the statistical manifold, we physically truncate the infinite tails of the spatial distribution. When evaluated as a Radon-Nikodym derivative against the base measure, the active parameter space compresses into a strictly finite, normalizable probability droplet. Empirical benchmarks across three domains--LiDAR maneuvering target tracking, high-frequency cryptocurrency order flow, and quantum state tomography--demonstrate that this bounded information geometry analytically truncates outliers, ensuring robust estimation without relying on infinite-tailed distributional assumptions.

cross Design and evaluation of an agentic workflow for crisis-related synthetic tweet datasets

Authors: Roben Delos Reyes, Timothy Douglas, Asanobu Kitamoto

Abstract: Twitter (now X) has become an important source of social media data for situational awareness during crises. Crisis informatics research has widely used tweets from Twitter to develop and evaluate artificial intelligence (AI) systems for various crisis-relevant tasks, such as extracting locations and estimating damage levels from tweets to support damage assessment. However, recent changes in Twitter's data access policies have made it increasingly difficult to curate real-world tweet datasets related to crises. Moreover, existing curated tweet datasets are limited to past crisis events in specific contexts and are costly to annotate at scale. These limitations constrain the development and evaluation of AI systems used in crisis informatics. To address these limitations, we introduce an agentic workflow for generating crisis-related synthetic tweet datasets. The workflow iteratively generates synthetic tweets conditioned on prespecified target characteristics, evaluates them using predefined compliance checks, and incorporates structured feedback to refine them in subsequent iterations. As a case study, we apply the workflow to generate synthetic tweet datasets relevant to post-earthquake damage assessment. We show that the workflow can generate synthetic tweets that capture their target labels for location and damage level. We further demonstrate that the resulting synthetic tweet datasets can be used to evaluate AI systems on damage assessment tasks like geolocalization and damage level prediction. Our results indicate that the workflow offers a flexible and scalable alternative to real-world tweet data curation, enabling the systematic generation of synthetic social media data across diverse crisis events, societal contexts, and crisis informatics applications.

cross SHAMISA: SHAped Modeling of Implicit Structural Associations for Self-supervised No-Reference Image Quality Assessment

Authors: Mahdi Naseri, Zhou Wang

Abstract: No-Reference Image Quality Assessment (NR-IQA) aims to estimate perceptual quality without access to a reference image of pristine quality. Learning an NR-IQA model faces a fundamental bottleneck: its need for a large number of costly human perceptual labels. We propose SHAMISA, a non-contrastive self-supervised framework that learns from unlabeled distorted images by leveraging explicitly structured relational supervision. Unlike prior methods that impose rigid, binary similarity constraints, SHAMISA introduces implicit structural associations, defined as soft, controllable relations that are both distortion-aware and content-sensitive, inferred from synthetic metadata and intrinsic feature structure. A key innovation is our compositional distortion engine, which generates an uncountable family of degradations from continuous parameter spaces, grouped so that only one distortion factor varies at a time. This enables fine-grained control over representational similarity during training: images with shared distortion patterns are pulled together in the embedding space, while severity variations produce structured, predictable shifts. We integrate these insights via dual-source relation graphs that encode both known degradation profiles and emergent structural affinities to guide the learning process throughout training. A convolutional encoder is trained under this supervision and then frozen for inference, with quality prediction performed by a linear regressor on its features. Extensive experiments on synthetic, authentic, and cross-dataset NR-IQA benchmarks demonstrate that SHAMISA achieves strong overall performance with improved cross-dataset generalization and robustness, all without human quality annotations or contrastive losses.

cross LLM-MINE: Large Language Model based Alzheimer's Disease and Related Dementias Phenotypes Mining from Clinical Notes

Authors: Mingchen Shao, Yuzhang Xie, Carl Yang, Jiaying Lu

Abstract: Accurate extraction of Alzheimer's Disease and Related Dementias (ADRD) phenotypes from electronic health records (EHR) is critical for early-stage detection and disease staging. However, this information is usually embedded in unstructured textual data rather than tabular data, making it difficult to be extracted accurately. We therefore propose LLM-MINE, a Large Language Model-based phenotype mining framework for automatic extraction of ADRD phenotypes from clinical notes. Using two expert-defined phenotype lists, we evaluate the extracted phenotypes by examining their statistical significance across cohorts and their utility for unsupervised disease staging. Chi-square analyses confirm statistically significant phenotype differences across cohorts, with memory impairment being the strongest discriminator. Few-shot prompting with the combined phenotype lists achieves the best clustering performance (ARI=0.290, NMI=0.232), substantially outperforming biomedical NER and dictionary-based baselines. Our results demonstrate that LLM-based phenotype extraction is a promising tool for discovering clinically meaningful ADRD signals from unstructured notes.

cross When Should Humans Step In? Optimal Human Dispatching in AI-Assisted Decisions

Authors: Lezhi Tan, Naomi Sagan, Lihua Lei, Jose Blanchet

Abstract: AI systems increasingly assist human decision making by producing preliminary assessments of complex inputs. However, such AI-generated assessments can often be noisy or systematically biased, raising a central question: how should costly human effort be allocated to correct AI outputs where it matters the most for the final decision? We propose a general decision-theoretic framework for human-AI collaboration in which AI assessments are treated as factor-level signals and human judgments as costly information that can be selectively acquired. We consider cases where the optimal selection problem reduces to maximizing a reward associated with each candidate subset of factors, and turn policy design into reward estimation. We develop estimation procedures under both nonparametric and linear models, covering contextual and non-contextual selection rules. In the linear setting, the optimal rule admits a closed-form expression with a clear interpretation in terms of factor importance and residual variance. We apply our framework to AI-assisted peer review. Our approach substantially outperforms LLM-only predictions and achieves performance comparable to full human review while using only 20-30% of the human information. Across different selection rules, we find that simpler rules derived under linear models can significantly reduce computational cost without harming final prediction performance. Our results highlight both the value of human intervention and the efficiency of principled dispatching.

cross Repetition Without Exclusivity: Scale Sensitivity of Referential Mechanisms in Child-Scale Language Models

Authors: Jon-Paul Cacioli

Abstract: We present the first systematic evaluation of mutual exclusivity (ME) -- the bias to map novel words to novel referents -- in text-only language models trained on child-directed speech. We operationalise ME as referential suppression: when a familiar object is relabelled in a two-referent discourse context, ME predicts decreased probability of the labelled noun at a subsequent completion position. Three pilot findings motivate a pre-registered scale-sensitivity experiment: (1) a masked language model (BabyBERTa) is entirely insensitive to multi-sentence referential context; (2) autoregressive models show robust repetition priming -- the opposite of ME -- when familiar nouns are re-labelled; and (3) a novel context-dependence diagnostic reveals that apparent ME-like patterns with nonce tokens are fully explained by embedding similarity, not referential disambiguation. In the confirmatory experiment, we train 45 GPT-2-architecture models (2.9M, 8.9M, and 33.5M parameters; 5, 10, and 20 epochs on AO-CHILDES; 5 seeds each) and evaluate on a pre-registered ME battery. Anti-ME repetition priming is significant in all 9 cells (85-100% of items; all p < 2.4 x 10^-13). Priming attenuates with improved language modelling (Spearman rho = -0.533, p = 0.0002) but never crosses zero across a 3.8x perplexity range. The context-dependence diagnostic replicates in all 9 cells, and dose-response priming increases with repetitions in 8/9 cells (all trend p < 0.002). These findings indicate that distributional learning on child-directed speech produces repetition-based reference tracking rather than lexical exclusivity. We connect this to the grounded cognition literature and argue that referential grounding may be a necessary ingredient for ME -- an empirical claim about required input structure, not a nativist one.

cross REFINE-DP: Diffusion Policy Fine-tuning for Humanoid Loco-manipulation via Reinforcement Learning

Authors: Zhaoyuan Gu, Yipu Chen, Zimeng Chai, Alfred Cueva, Thong Nguyen, Yifan Wu, Huishu Xue, Minji Kim, Isaac Legene, Fukang Liu, Matthew Kim, Ayan Barula, Yongxin Chen, Ye Zhao

Abstract: Humanoid loco-manipulation requires coordinated high-level motion plans with stable, low-level whole-body execution under complex robot-environment dynamics and long-horizon tasks. While diffusion policies (DPs) show promise for learning from demonstrations, deploying them on humanoids poses critical challenges: the motion planner trained offline is decoupled from the low-level controller, leading to poor command tracking, compounding distribution shift, and task failures. The common approach of scaling demonstration data is prohibitively expensive for high-dimensional humanoid systems. To address this challenge, we present REFINE-DP (REinforcement learning FINE-tuning of Diffusion Policy), a hierarchical framework that jointly optimizes a DP high-level planner and an RL-based low-level loco-manipulation controller. The DP is fine-tuned via a PPO-based diffusion policy gradient to improve task success rate, while the controller is simultaneously updated to accurately track the planner's evolving command distribution, reducing the distributional mismatch that degrades motion quality. We validate REFINE-DP on a humanoid robot performing loco-manipulation tasks, including door traversal and long-horizon object transport. REFINE-DP achieves an over $90\%$ success rate in simulation, even in out-of-distribution cases not seen in the pre-trained data, and enables smooth autonomous task execution in real-world dynamic environments. Our proposed method substantially outperforms pre-trained DP baselines and demonstrates that RL fine-tuning is key to reliable humanoid loco-manipulation. https://refine-dp.github.io/REFINE-DP/

URLs: https://refine-dp.github.io/REFINE-DP/

cross Implicit Maximum Likelihood Estimation for Real-time Generative Model Predictive Control

Authors: Grayson Lee, Minh Bui, Shuzi Zhou, Yankai Li, Mo Chen, Ke Li

Abstract: Diffusion-based models have recently shown strong performance in trajectory planning, as they are capable of capturing diverse, multimodal distributions of complex behaviors. A key limitation of these models is their slow inference speed, which results from the iterative denoising process. This makes them less suitable for real-time applications such as closed-loop model predictive control (MPC), where plans must be generated quickly and adapted continuously to a changing environment. In this paper, we investigate Implicit Maximum Likelihood Estimation (IMLE) as an alternative generative modeling approach for planning. IMLE offers strong mode coverage while enabling inference that is two orders of magnitude faster, making it particularly well suited for real-time MPC tasks. Our results demonstrate that IMLE achieves competitive performance on standard offline reinforcement learning benchmarks compared to the standard diffusion-based planner, while substantially improving planning speed in both open-loop and closed-loop settings. We further validate IMLE in a closed-loop human navigation scenario, operating in real-time, demonstrating how it enables rapid and adaptive plan generation in dynamic environments.

cross MeTok: An Efficient Meteorological Tokenization with Hyper-Aligned Group Learning for Precipitation Nowcasting

Authors: Qizhao Jin, Xianhuang Xu, Yong Cao, Shiming Xiang, Xinyu Xiao

Abstract: Recently, Transformer-based architectures have advanced meteorological prediction. However, this position-centric tokenizer conflicts with the core principle of meteorological systems, where the weather phenomena undoubtedly involve synergistic interactions among multiple elements while positional information constitutes merely a component of the boundary conditions. This paper focuses primarily on the task of precipitation nowcasting and develops an efficient distribution-centric Meteorological Tokenization (MeTok) scheme, which spatially sequences to group similar meteorological features. Based on the rearrangement, realigned group learning enhances robustness across precipitation patterns, especially extreme ones. Specifically, we introduce the Hyper-Aligned Grouping Transformer (HyAGTransformer) with two key improvements: 1) The Grouping Attention (GA) mechanism uses MeTok to enable self-aligned learning of features from different precipitation patterns; 2) The Neighborhood Feed-Forward Network (N-FFN) integrates adjacent group features, aggregating contextual information to boost patch embedding discriminability. Experiments on the ERA5 dataset for 6-hour forecasts show our method improves the IoU metric by at least 8.2% in extreme precipitation prediction compared to other methods. Additionally, it gains performance with more training data and increased parameters, demonstrating scalability, stability, and superiority over traditional methods.

cross Brain Tumor Classification from 3D MRI Using Persistent Homology and Betti Features: A Topological Data Analysis Approach on BraTS2020

Authors: Faisal Ahmed

Abstract: Accurate and interpretable brain tumor classification from medical imaging remains a challenging problem due to the high dimensionality and complex structural patterns present in magnetic resonance imaging (MRI). In this study, we propose a topology-driven framework for brain tumor classification based on Topological Data Analysis (TDA) applied directly to three-dimensional (3D) MRI volumes. Specifically, we analyze 3D Fluid Attenuated Inversion Recovery (FLAIR) images from the BraTS 2020 dataset and extract interpretable topological descriptors using persistent homology. Persistent homology captures intrinsic geometric and structural characteristics of the data through Betti numbers, which describe connected components (Betti-0), loops (Betti-1), and voids (Betti-2). From the 3D MRI volumes, we derive a compact set of 100 topological features that summarize the underlying topology of brain tumor structures. These descriptors represent complex 3D tumor morphology while significantly reducing data dimensionality. Unlike many deep learning approaches that require large-scale training data or complex architectures, the proposed framework relies on computationally efficient topological features extracted directly from the images. These features are used to train classical machine learning classifiers, including Random Forest and XGBoost, for binary classification of high-grade glioma (HGG) and low-grade glioma (LGG). Experimental results on the BraTS 2020 dataset show that the Random Forest classifier combined with selected Betti features achieves an accuracy of 89.19%. These findings highlight the potential of persistent homology as an effective and interpretable approach for analyzing complex 3D medical images and performing brain tumor classification.

cross Hierarchy of extreme-event predictability in turbulence revealed by machine learning

Authors: Yuxuan Yang, Chenyu Dong, Gianmarco Mengaldo

Abstract: Extreme-event predictability in turbulence is strongly state dependent, yet event-by-event predictability horizons are difficult to quantify without access to governing equations or costly perturbation ensembles. Here we train an autoregressive conditional diffusion model on direct numerical simulations of the two-dimensional Kolmogorov flow and use a CRPS-based skill score to define an event-wise predictability horizon. Enstrophy extremes exhibit a pronounced hierarchy: forecast skill persists from $\approx 1$ to $> 4$ Lyapunov times across events. Spectral filtering shows that these horizons are controlled predominantly by large-scale structures. Extremes are preceded by intense strain cores organizing quadrupolar vortex packets, whose lifetime sharply separates long- from short-horizon events. These results identify coherent-structure persistence as a governing mechanism for the predictability of turbulence extremes and provide a data-driven route to diagnose predictability limits from observations.

cross An Interpretable and Stable Framework for Sparse Principal Component Analysis

Authors: Ying Hu, Hu Yang

Abstract: Sparse principal component analysis (SPCA) addresses the poor interpretability and variable redundancy often encountered by principal component analysis (PCA) in high-dimensional data. However, SPCA typically imposes uniform penalties on variables and does not account for differences in variable importance, which may lead to unstable performance in highly noisy or structurally complex settings. We propose SP-SPCA, a method that introduces a single equilibrium parameter into the regularization framework to adaptively adjust variable penalties. This modification of the L2 penalty provides flexible control over the trade-off between sparsity and explained variance while maintaining computational efficiency. Simulation studies show that the proposed method consistently outperforms standard sparse principal component methods in identifying sparse loading patterns, filtering noise variables, and preserving cumulative variance, especially in high-dimensional and noisy settings. Empirical applications to crime and financial market data further demonstrate its practical utility. In real data analyses, the method selects fewer but more relevant variables, thereby reducing model complexity while maintaining explanatory power. Overall, the proposed approach offers a robust and efficient alternative for sparse modeling in complex high-dimensional data, with clear advantages in stability, feature selection, and interpretability

cross PA-Net: Precipitation-Adaptive Mixture-of-Experts for Long-Tail Rainfall Nowcasting

Authors: Xinyu Xiao, Sen Lei, Eryun Liu, Shiming Xiang, Hao Li, Cheng Yuan, Yuan Qi, Qizhao Jin

Abstract: Precipitation nowcasting is vital for flood warning, agricultural management, and emergency response, yet two bottlenecks persist: the prohibitive cost of modeling million-scale spatiotemporal tokens from multi-variate atmospheric fields, and the extreme long-tailed rainfall distribution where heavy-to-torrential events -- those of greatest societal impact -- constitute fewer than 0.1% of all samples. We propose the Precipitation-Adaptive Network (PA-Net), a Transformer framework whose computational budget is explicitly governed by rainfall intensity. Its core component, Precipitation-Adaptive MoE (PA-MoE), dynamically scales the number of activated experts per token according to local precipitation magnitude, channeling richer representational capacity toward the rare yet critical heavy-rainfall tail. A Dual-Axis Compressed Latent Attention mechanism factorizes spatiotemporal attention with convolutional reduction to manage massive context lengths, while an intensity-aware training protocol progressively amplifies learning signals from extreme-rainfall samples. Experiment on ERA5 demonstrate consistent improvements over state-of-the-art baselines, with particularly significant gains in heavy-rain and rainstorm regimes.

cross Early Rug Pull Warning for BSC Meme Tokens via Multi-Granularity Wash-Trading Pattern Profiling

Authors: Dingding Cao, Bianbian Jiao, Jingzong Yang, Yujing Zhong, Wei Yang

Abstract: The high-frequency issuance and short-cycle speculation of meme tokens in decentralized finance (DeFi) have significantly amplified rug-pull risk. Existing approaches still struggle to provide stable early warning under scarce anomalies, incomplete labels, and limited interpretability. To address this issue, an end-to-end warning framework is proposed for BSC meme tokens, consisting of four stages: dataset construction and labeling, wash-trading pattern feature modeling, risk prediction, and error analysis. Methodologically, 12 token-level behavioral features are constructed based on three wash-trading patterns (Self, Matched, and Circular), unifying transaction-, address-, and flow-level signals into risk vectors. Supervised models are then employed to output warning scores and alert decisions. Under the current setting (7 tokens, 33,242 records), Random Forest outperforms Logistic Regression on core metrics, achieving AUC=0.9098, PR-AUC=0.9185, and F1=0.7429. Ablation results show that trade-level features are the primary performance driver (Delta PR-AUC=-0.1843 when removed), while address-level features provide stable complementary gain (Delta PR-AUC=-0.0573). The model also demonstrates actionable early-warning potential for a subset of samples, with a mean Lead Time (v1) of 3.8133 hours. The error profile (FP=1, FN=8) indicates that the current system is better positioned as a high-precision screener rather than a high-recall automatic alarm engine. The main contributions are threefold: an executable and reproducible rug-pull warning pipeline, empirical validation of multi-granularity wash-trading features under weak supervision, and deployment-oriented evidence through lead-time and error-bound analysis.

cross Efficient Semi-Automated Material Microstructure Analysis Using Deep Learning: A Case Study in Additive Manufacturing

Authors: Sanjeev S. Navaratna, Nikhil Thawari, Gunashekhar Mari, Amritha V P, Murugaiyan Amirthalingam, Rohit Batra

Abstract: Image segmentation is fundamental to microstructural analysis for defect identification and structure-property correlation, yet remains challenging due to pronounced heterogeneity in materials images arising from varied processing and testing conditions. Conventional image processing techniques often fail to capture such complex features rendering them ineffective for large-scale analysis. Even deep learning approaches struggle to generalize across heterogeneous datasets due to scarcity of high-quality labeled data. Consequently, segmentation workflows often rely on manual expert-driven annotations which are labor intensive and difficult to scale. Using an additive manufacturing (AM) dataset as a case study, we present a semi-automated active learning based segmentation pipeline that integrates a U-Net based convolutional neural network with an interactive user annotation and correction interface and a representative core-set image selection strategy. The active learning workflow iteratively updates the model by incorporating user corrected segmentations into the training pool while the core-set strategy identifies representative images for annotation. Three subset selection strategies, manual selection, uncertainty driven sampling and proposed maximin Latin hypercube sampling from embeddings (SMILE) method were evaluated over six refinement rounds. The SMILE strategy consistently outperformed other approaches, improving the macro F1 score from 0.74 to 0.93 while reducing manual annotation time by about 65 percent. The segmented defect regions were further analyzed using a coupled classification model to categorize defects based on microstructural characteristics and map them to corresponding AM process parameters. The proposed framework reduces labeling effort while maintaining scalability and robustness and is broadly applicable to image based analysis across diverse materials systems.

cross Intelligent Materials Modelling: Large Language Models Versus Partial Least Squares Regression for Predicting Polysulfone Membrane Mechanical Performance

Authors: Dingding Cao, Mieow Kee Chan, Wan Sieng Yeo, Said Bey, Alberto Figoli

Abstract: Predicting the mechanical properties of polysulfone (PSF) membranes from structural descriptors remains challenging due to extreme data scarcity typical of experimental studies. To investigate this issue, this study benchmarked knowledge-driven inference using four large language models (LLMs) (DeepSeek-V3, DeepSeek-R1, ChatGPT-4o, and GPT-5) against partial least squares (PLS) regression for predicting Young's modulus (E), tensile strength (TS), and elongation at break (EL) based on pore diameter (PD), contact angle (CA), thickness (T), and porosity (P) measurements. These knowledge-driven approaches demonstrated property-specific advantages over the chemometric baseline. For EL, LLMs achieved statistically significant improvements, with DeepSeek-R1 and GPT-5 delivering 40.5% and 40.3% of Root Mean Square Error reductions, respectively, reducing mean absolute errors from $11.63\pm5.34$% to $5.18\pm0.17$%. Run-to-run variability was markedly compressed for LLMs ($\leq$3%) compared to PLS (up to 47%). E and TS predictions showed statistical parity between approaches ($q\geq0.05$), indicating sufficient performance of linear methods for properties with strong structure-property correlations. Error topology analysis revealed systematic regression-to-the-mean behavior dominated by data-regime effects rather than model-family limitations. These findings establish that LLMs excel for non-linear, constraint-sensitive properties under bootstrap instability, while PLS remains competitive for linear relationships requiring interpretable latent-variable decompositions. The demonstrated complementarity suggests hybrid architectures leveraging LLM-encoded knowledge within interpretable frameworks may optimise small-data materials discovery.

cross MICRO: A Lightweight Middleware for Optimizing Cross-store Cross-model Graph-Relation Joins [Technical Report]

Authors: Xiuwen Zheng, Arun Kumar, Amarnath Gupta

Abstract: Modern data applications increasingly involve heterogeneous data managed in different models and stored across disparate database engines, often deployed as separate installs. Limited research has addressed cross-model query processing in federated environments. This paper takes a step toward bridging this gap by: (1) formally defining a class of cross-model join queries between a graph store and a relational store by proposing a unified algebra; (2) introducing one real-world benchmark and four semi-synthetic benchmarks to evaluate such queries; and (3) proposing a lightweight middleware, MICRO, for efficient query execution. At the core of MICRO is CMLero, a learning-to-rank-based query optimizer that selects efficient execution plans without requiring exact cost estimation. By avoiding the need to materialize or convert all data into a single model, which is often infeasible due to third-party data control or cost, MICRO enables native querying across heterogeneous systems. Experimental results on the benchmark workloads demonstrate that MICRO outperforms the state-of-the-art federated relational system XDB by up to 2.1x in total runtime across the full test set. On the 93 test queries of real-world benchmark, 14 queries achieve over 100 speedup, including 4 queries with more than 100x speedup; however, 4 queries experienced slowdowns of over 5 seconds, highlighting opportunities for future improvement of MICRO. Further comparisons show that CMLero consistently outperforms rule-based and regression-based optimizers, highlighting the advantage of learning-to-rank in complex cross-model optimization.

cross GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent

Authors: Yuri Kuratov, Matvey Kairov, Aydar Bulatov, Ivan Rodkin, Mikhail Burtsev

Abstract: Many large language model applications require conditioning on long contexts. Transformers typically support this by storing a large per-layer KV-cache of past activations, which incurs substantial memory overhead. A desirable alternative is ompressive memory: read a context once, store it in a compact state, and answer many queries from that state. We study this in a context removal setting, where the model must generate an answer without access to the original context at inference time. We introduce GradMem, which writes context into memory via per-sample test-time optimization. Given a context, GradMem performs a few steps of gradient descent on a small set of prefix memory tokens while keeping model weights frozen. GradMem explicitly optimizes a model-level self-supervised context reconstruction loss, resulting in a loss-driven write operation with iterative error correction, unlike forward-only methods. On associative key--value retrieval, GradMem outperforms forward-only memory writers with the same memory size, and additional gradient steps scale capacity much more effectively than repeated forward writes. We further show that GradMem transfers beyond synthetic benchmarks: with pretrained language models, it attains competitive results on natural language tasks including bAbI and SQuAD variants, relying only on information encoded in memory.

cross Benchmarking the Energy Cost of Assurance in Neuromorphic Edge Robotics

Authors: Sylvester Kaczmarek

Abstract: Deploying trustworthy artificial intelligence on edge robotics imposes a difficult trade-off between high-assurance robustness and energy sustainability. Traditional defense mechanisms against adversarial attacks typically incur significant computational overhead, threatening the viability of power-constrained platforms in environments such as cislunar space. This paper quantifies the energy cost of assurance in event-driven neuromorphic systems. We benchmark the Hierarchical Temporal Defense (HTD) framework on the BrainChip Akida AKD1000 processor against a suite of adversarial temporal attacks. We demonstrate that unlike traditional deep learning defenses which often degrade efficiency significantly with increased robustness, the event-driven nature of the proposed architecture achieves a superior trade-off. The system reduces gradient-based adversarial success rates from 82.1% to 18.7% and temporal jitter success rates from 75.8% to 25.1%, while maintaining an energy consumption of approximately 45 microjoules per inference. We report a counter-intuitive reduction in dynamic power consumption in the fully defended configuration, attributed to volatility-gated plasticity mechanisms that induce higher network sparsity. These results provide empirical evidence that neuromorphic sparsity enables sustainable and high-assurance edge autonomy.

cross MO-SAE:Multi-Objective Stacked Autoencoders Optimization for Edge Anomaly Detection

Authors: Lizhao Zhang, Shengsong Kong, Tao Guo, Shaobo Li, Zhenzhou Ji

Abstract: Stacked AutoEncoders (SAE) have been widely adopted in edge anomaly detection scenarios. However, the resource-intensive nature of SAE can pose significant challenges for edge devices, which are typically resource-constrained and must adapt rapidly to dynamic and changing conditions. Optimizing SAE to meet the heterogeneous demands of real-world deployment scenarios, including high performance under constrained storage, low power consumption, fast inference, and efficient model updates, remains a substantial challenge. To address this, we propose an integrated optimization framework that jointly considers these critical factors to achieve balanced and adaptive system-level optimization. Specifically, we formulate SAE optimization for edge anomaly detection as a multi-objective optimization problem and propose MO-SAE (Multi-Objective Stacked AutoEncoders). The multiple objectives are addressed by integrating model clipping, multi-branch exit design, and a matrix approximation technique. In addition, a multi-objective heuristic algorithm is employed to effectively balance the competing objectives in SAE optimization. Our results demonstrate that the proposed MO-SAE delivers substantial improvements over the original approach. On the x86 architecture, it reduces storage space and power consumption by at least 50%, improves runtime efficiency by no less than 28%, and achieves an 11.8% compression rate, all while maintaining application performance. Furthermore, MO-SAE runs efficiently on edge devices with ARM architecture. Experimental results show a 15% improvement in inference speed, facilitating efficient deployment in cloud-edge collaborative anomaly detection systems.

cross Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition

Authors: Seokmin Lee, Yunghee Lee, Byeonghyun Pak, Byeongju Woo

Abstract: For robotic agents operating in dynamic environments, learning visual state representations from streaming video observations is essential for sequential decision making. Recent self-supervised learning methods have shown strong transferability across vision tasks, but they do not explicitly address what a good visual state should encode. We argue that effective visual states must capture what-is-where by jointly encoding the semantic identities of scene elements and their spatial locations, enabling reliable detection of subtle dynamics across observations. To this end, we propose CroBo, a visual state representation learning framework based on a global-to-local reconstruction objective. Given a reference observation compressed into a compact bottleneck token, CroBo learns to reconstruct heavily masked patches in a local target crop from sparse visible cues, using the global bottleneck token as context. This learning objective encourages the bottleneck token to encode a fine-grained representation of scene-wide semantic entities, including their identities, spatial locations, and configurations. As a result, the learned visual states reveal how scene elements move and interact over time, supporting sequential decision making. We evaluate CroBo on diverse vision-based robot policy learning benchmarks, where it achieves state-of-the-art performance. Reconstruction analyses and perceptual straightness experiments further show that the learned representations preserve pixel-level scene composition and encode what-moves-where across observations.

cross The Phenomenology of Hallucinations

Authors: Valeria Ruscio, Keiran Thompson

Abstract: We show that language models hallucinate not because they fail to detect uncertainty, but because of a failure to integrate it into output generation. Across architectures, uncertain inputs are reliably identified, occupying high-dimensional regions with 2-3$\times$ the intrinsic dimensionality of factual inputs. However, this internal signal is weakly coupled to the output layer: uncertainty migrates into low-sensitivity subspaces, becoming geometrically amplified yet functionally silent. Topological analysis shows that uncertainty representations fragment rather than converging to a unified abstention state, while gradient and Fisher probes reveal collapsing sensitivity along the uncertainty direction. Because cross-entropy training provides no attractor for abstention and uniformly rewards confident prediction, associative mechanisms amplify these fractured activations until residual coupling forces a committed output despite internal detection. Causal interventions confirm this account by restoring refusal when uncertainty is directly connected to logits.

cross Generative Inverse Design of Cold Metals for Low-Power Electronics

Authors: Kedeng Wu, Yucheng Zhu, Yan Chen, Bizhu Zhang, Shuyu Liu, Xiaobin Deng, Yabei Wu, Liangliang Zhu, Hang Xiao

Abstract: Cold metals are a class of metals with an intrinsic energy gap located close to the Fermi level, which enables cold-carrier injection for steep-slope transistors and is therefore promising for low-power electronic applications. High-throughput screening has revealed 252 three-dimensional (3D) cold metals in the Materials Project database, but database searches are inherently limited to known compounds. Here we present an inverse-design workflow that generates 3D cold metals using MatterGPT, a conditional autoregressive Transformer trained on SLICES, an invertible and symmetry-invariant crystal string representation. We curate a training set of 26,309 metallic structures labeled with energy above hull and a unified band-edge distance descriptor that merges p-type and n-type cold-metal characteristics to address severe label imbalance. Property-conditioned generation targeting thermodynamic stability and 50-500 meV band-edge distances produces 148,506 unique candidates; 92.1% are successfully reconstructed to 3D structures and down-selected by symmetry, uniqueness and novelty filters, followed by high-throughput DFT validation. We identify 257 cold metals verified as novel with respect to the Materials Project database, with gaps around the Fermi level spanning 50-500 meV. First-principles phonon, electronic-structure, and work-function calculations for representative candidates confirm dynamical stability and contact-relevant work functions. Our results demonstrate that SLICES-enabled generative transformers can expand the chemical space of cold metals beyond high-throughput screening, providing a route to low-power electronic materials discovery.

cross Sat-JEPA-Diff: Bridging Self-Supervised Learning and Generative Diffusion for Remote Sensing

Authors: Kursat Komurcu, Linas Petkevicius

Abstract: Predicting satellite imagery requires a balance between structural accuracy and textural detail. Standard deterministic methods like PredRNN or SimVP minimize pixel-based errors but suffer from the "regression to the mean" problem, producing blurry outputs that obscure subtle geographic-spatial features. Generative models provide realistic textures but often misleadingly reveal structural anomalies. To bridge this gap, we introduce Sat-JEPA-Diff, which combines Self-Supervised Learning (SSL) with Hidden Diffusion Models (LDM). An IJEPA module predicts stable semantic representations, which then route a frozen Stable Diffusion backbone via a lightweight cross-attention adapter. This ensures that the synthesized high-accuracy textures are based on absolutely accurate structural predictions. Evaluated on a global Sentinel-2 dataset, Sat-JEPA-Diff excels at resolving sharp boundaries. It achieves leading perceptual scores (GSSIM: 0.8984, FID: 0.1475) and significantly outperforms deterministic baselines, despite standard autoregressive stability limits. The code and dataset are publicly available on https://github.com/VU-AIML/SAT-JEPA-DIFF.

URLs: https://github.com/VU-AIML/SAT-JEPA-DIFF.

cross Exploiting temporal parallelism for LSTM Autoencoder acceleration on FPGA

Authors: Aimilios Leftheriotis, Dimosthenis Masouros, Dimitrios Soudris, George Theodoridis

Abstract: Recurrent Neural Networks (RNNs) are vital for sequential data processing. Long Short-Term Memory Autoencoders (LSTM-AEs) are particularly effective for unsupervised anomaly detection in time-series data. However, inherent sequential dependencies limit parallel computation. While previous work has explored FPGA-based acceleration for LSTM networks, efforts have typically focused on optimizing a single LSTM layer at a time. We introduce a novel FPGA-based accelerator using a dataflow architecture that exploits temporal parallelism for concurrent multi-layer processing of different timesteps within sequences. Experimental evaluations on four representative LSTM-AE models with varying widths and depths, implemented on a Zynq UltraScale+ MPSoC FPGA, demonstrate significant advantages over CPU (Intel Xeon Gold 5218R) and GPU (NVIDIA V100) implementations. Our accelerator achieves latency speedups up to 79.6x vs. CPU and 18.2x vs. GPU, alongside energy-per-timestep reductions of up to 1722x vs. CPU and 59.3x vs. GPU. These results, including superior network depth scalability, highlight our approach's potential for high-performance, real-time, power-efficient LSTM-AE-based anomaly detection on FPGAs.

cross Faithful or Just Plausible? Evaluating the Faithfulness of Closed-Source LLMs in Medical Reasoning

Authors: Halimat Afolabi, Zainab Afolabi, Elizabeth Friel, Jude Roberts, Antonio Ji-Xu, Lloyd Chen, Egheosa Ogbomo, Emiliomo Imevbore, Phil Eneje, Wissal El Ouahidi, Aaron Sohal, Alisa Kennan, Shreya Srivastava, Anirudh Vairavan, Laura Napitu, Katie McClure

Abstract: Closed-source large language models (LLMs), such as ChatGPT and Gemini, are increasingly consulted for medical advice, yet their explanations may appear plausible while failing to reflect the model's underlying reasoning process. This gap poses serious risks as patients and clinicians may trust coherent but misleading explanations. We conduct a systematic black-box evaluation of faithfulness in medical reasoning among three widely used closed-source LLMs. Our study consists of three perturbation-based probes: (1) causal ablation, testing whether stated chain-of-thought (CoT) reasoning causally influences predictions; (2) positional bias, examining whether models create post-hoc justifications for answers driven by input positioning; and (3) hint injection, testing susceptibility to external suggestions. We complement these quantitative probes with a small-scale human evaluation of model responses to patient-style medical queries to examine concordance between physician assessments of explanation faithfulness and layperson perceptions of trustworthiness. We find that CoT reasoning steps often do not causally drive predictions, and models readily incorporate external hints without acknowledgment. In contrast, positional biases showed minimal impact in this setting. These results underscore that faithfulness, not just accuracy, must be central in evaluating LLMs for medicine, to ensure both public protection and safe clinical deployment.

cross Location Aware Embedding for Geotargeting in Sponsored Search Advertising

Authors: Jelena Gligorijevic, Djordje Gligorijevic, Aravindan Raghuveer, Mihajlo Grbovic, Zoran Obradovic

Abstract: Web search has become an inevitable part of everyday life. Improving and monetizing web search has been a focus of major Internet players. Understanding the context of web search query is an important aspect of this task as it represents unobserved facts that add meaning to an otherwise incomplete query.The context of a query consists of user's location, local time, search history, behavioral segments, installed apps on their phone and so on. Queries that either explicitly use location context (eg: "best hotels in New York City") or implicitly refer to the user's physical location (e.g. "coffee shops near me") are becoming increasingly common on mobile devices. Understanding and representing the user's interest location and/or physical location is essential for providing a relevant user experience. In this study, we developed a simple and powerful neural embedding based framework to represent a user's query and their location in a single low-dimensional space. We show that this representation is able to capture the subtle interactions between the user's query intent and query/physical location, while improving the ad ranking and query-ad relevance scores over other location-unaware approaches and location-aware approaches.

cross A Systematic Evaluation Protocol of Graph-Derived Signals for Tabular Machine Learning

Authors: Mario Heidrich, Jeffrey Heidemann, R\"udiger Buchkremer, Gonzalo Wandosell Fern\'andez de Bobadilla

Abstract: While graph-derived signals are widely used in tabular learning, existing studies typically rely on limited experimental setups and average performance comparisons, leaving the statistical reliability and robustness of observed gains largely unexplored. Consequently, it remains unclear which signals provide consistent and robust improvements. This paper presents a taxonomy-driven empirical analysis of graph-derived signals for tabular machine learning. We propose a unified and reproducible evaluation protocol to systematically assess which categories of graph-derived signals yield statistically significant and robust performance improvements. The protocol provides an extensible setup for the controlled integration of diverse graph-derived signals into tabular learning pipelines. To ensure a fair and rigorous comparison, it incorporates automated hyperparameter optimization, multi-seed statistical evaluation, formal significance testing, and robustness analysis under graph perturbations. We demonstrate the protocol through an extensive case study on a large-scale, imbalanced cryptocurrency fraud detection dataset. The analysis identifies signal categories providing consistently reliable performance gains and offers interpretable insights into which graph-derived signals indicate fraud-discriminative structural patterns. Furthermore, robustness analyses reveal pronounced differences in how various signals handle missing or corrupted relational data. These findings demonstrate practical utility for fraud detection and illustrate how the proposed taxonomy-driven evaluation protocol can be applied in other application domains.

cross The Taxonomies, Training, and Applications of Event Stream Modelling for Electronic Health Records

Authors: Mingcheng Zhu, Yu Liu, Zhiyao Luo, Tingting Zhu

Abstract: The widespread adoption of electronic health records (EHRs) enables the acquisition of heterogeneous clinical data, spanning lab tests, vital signs, medications, and procedures, which offer transformative potential for artificial intelligence in healthcare. Although traditional modelling approaches have typically relied on multivariate time series, they often struggle to accommodate the inherent sparsity and irregularity of real-world clinical workflows. Consequently, research has shifted toward event stream representation, which treats patient records as continuous sequences, thereby preserving the precise temporal structure of the patient journey. However, the existing literature remains fragmented, characterised by inconsistent definitions, disparate modelling architectures, and varying training protocols. To address these gaps, this review establishes a unified definition of EHR event streams and introduces a novel taxonomy that categorises models based on their handling of event time, type, and value. We systematically review training strategies, ranging from supervised learning to self-supervised methods, and provide a comprehensive discussion of applications across clinical scenarios. Finally, we identify open critical challenges and future directions, with the aim of clarifying the current landscape and guiding the development of next-generation healthcare models.

cross What Counts as Real? Speech Restoration and Voice Quality Conversion Pose New Challenges to Deepfake Detection

Authors: Shree Harsha Bokkahalli Satish, Harm Lameris, Joakim Gustafson, \'Eva Sz\'ekely

Abstract: Audio anti-spoofing systems are typically formulated as binary classifiers distinguishing bona fide from spoofed speech. This assumption fails under layered generative processing, where benign transformations introduce distributional shifts that are misclassified as spoofing. We show that phonation-modifying voice conversion and speech restoration are treated as out-of-distribution despite preserving speaker authenticity. Using a multi-class setup separating bona fide, converted, spoofed, and converted-spoofed speech, we analyse model behaviour through self-supervised learning (SSL) embeddings and acoustic correlates. The benign transformations induce a drift in the SSL space, compressing bona fide and spoofed speech and reducing classifier separability. Reformulating anti-spoofing as a multi-class problem improves robustness to benign shifts while preserving spoof detection, suggesting binary systems model the distribution of raw speech rather than authenticity itself.

cross Schr\"odinger Bridge Over A Compact Connected Lie Group

Authors: Hamza Mahmood, Abhishek Halder, Adeel Akhtar

Abstract: This work studies the Schr\"odinger bridge problem for the kinematic equation on a compact connected Lie group. The objective is to steer a controlled diffusion between given initial and terminal densities supported over the Lie group while minimizing the control effort. We develop a coordinate-free formulation of this stochastic optimal control problem that respects the underlying geometric structure of the Lie group, thereby avoiding limitations associated with local parameterizations or embeddings in Euclidean spaces. We establish the existence and uniqueness of solution to the corresponding Schr\"odinger system. Our results are constructive in that they derive a geometric controller that optimally interpolates probability densities supported over the Lie group. To illustrate the results, we provide numerical examples on $\mathsf{SO}(2)$ and $\mathsf{SO}(3)$.

cross NepTam: A Nepali-Tamang Parallel Corpus and Baseline Machine Translation Experiments

Authors: Rupak Raj Ghimire, Bipesh Subedi, Balaram Prasain, Prakash Poudyal, Praveen Acharya, Nischal Karki, Rupak Tiwari, Rishikesh Kumar Sharma, Jenny Poudel, Bal Krishna Bal

Abstract: Modern Translation Systems heavily rely on high-quality, large parallel datasets for state-of-the-art performance. However, such resources are largely unavailable for most of the South Asian languages. Among them, Nepali and Tamang fall into such category, with Tamang being among the least digitally resourced languages in the region. This work addresses the gap by developing NepTam20K, a 20K gold standard parallel corpus, and NepTam80K, an 80K synthetic Nepali-Tamang parallel corpus, both sentence-aligned and designed to support machine translation. The datasets were created through a pipeline involving data scraping from Nepali news and online sources, pre-processing, semantic filtering, balancing for tense and polarity (in NepTam20K dataset), expert translation into Tamang by native speakers of the language, and verification by an expert Tamang linguist. The dataset covers five domains: Agriculture, Health, Education and Technology, Culture, and General Communication. To evaluate the dataset, baseline machine translation experiments were carried out using various multilingual pre-trained models: mBART, M2M-100, NLLB-200, and a vanilla Transformer model. The fine-tuning on the NLLB-200 achieved the highest sacreBLEU scores of 40.92 (Nepali-Tamang) and 45.26 (Tamang-Nepali).

cross TMPDiff: Temporal Mixed-Precision for Diffusion Models

Authors: Basile Lewandowski, Simon Kurz, Aditya Shankar, Robert Birke, Jian-Jia Chen, Lydia Y. Chen

Abstract: Diffusion models are the go-to method for Text-to-Image generation, but their iterative denoising processes has high inference latency. Quantization reduces compute time by using lower bitwidths, but applies a fixed precision across all denoising timesteps, leaving an entire optimization axis unexplored. We propose TMPDiff, a temporal mixed-precision framework for diffusion models that assigns different numeric precision to different denoising timesteps. We hypothesize that quantization errors accumulate additively across timesteps, which we then validate experimentally. Based on our observations, we develop an adaptive bisectioning-based algorithm, which assigns per-step precisions with linear evaluation complexity, reducing an otherwise exponential search problem. Across four state-of-the-art diffusion models and three datasets, TMPDiff consistently outperforms uniform-precision baselines at matched speedup, achieving 10 to 20% improvement in perceptual quality. On FLUX.1-dev, TMPDiff achieves 90% SSIM relative to the full-precision model at a speedup of 2.5x over 16-bit inference.

cross A Benchmark for Multi-Party Negotiation Games from Real Negotiation Data

Authors: Leo Benac, Jonas Raedler, Zilin Ma, Finale Doshi-Velez

Abstract: Many real-world multi-party negotiations unfold as sequences of binding, action-level commitments rather than a single final outcome. We introduce a benchmark for this under-studied regime featuring a configurable game generator that sweeps key structural properties such as incentive alignment, goal complexity, and payoff distribution. To evaluate decision-making, we test three value-function approximations - myopic reward, an optimistic upper bound, and a pessimistic lower bound - that act as biased lenses on deal evaluation. Through exact evaluation on small games and comparative evaluation on large, document-grounded instances derived from the Harvard Negotiation Challenge, we map the strategic regimes where each approximation succeeds or fails. We observe that different game structures demand different valuation strategies, motivating agents that learn robust state values and plan effectively over long horizons under binding commitments and terminal only rewards.

cross Conditioning on a Volatility Proxy Compresses the Apparent Timescale of Collective Market Correlation

Authors: Yuda Bi, Vince D Calhoun

Abstract: We address the attribution problem for apparent slow collective dynamics: is the observed persistence intrinsic, or inherited from a persistent driver? For the leading eigenvalue fraction $\psi_1=\lambda_{\max}/N$ of S\&P 500 60-day rolling correlation matrices ($237$ stocks, 2004--2023), a VIX-coupled Ornstein--Uhlenbeck model reduces the effective relaxation time from $298$ to $61$ trading days and improves the fit over bare mean reversion by $\Delta$BIC$=109$. On the decomposition sample, an informational residual of $\log(\mathrm{VIX})$ alone retains most of that gain ($\Delta$BIC$=78.6$), whereas a mechanical VIX proxy alone does not improve the fit. Autocorrelation-matched placebo fields fail ($\Delta$BIC$_{\max}=2.7$), disjoint weekly reconstructions still favor the field-coupled model ($\Delta$BIC$=140$--$151$), and six anchored chronological holdouts preserve the out-of-sample advantage. Quiet-regime and field-stripped residual autocorrelation controls show the same collapse of persistence. Stronger hidden-variable extensions remain only partially supported. Within the tested stochastic class, conditioning on the observed VIX proxy absorbs most of the apparent slow dynamics.

cross MotionCFG: Boosting Motion Dynamics via Stochastic Concept Perturbation

Authors: Byungjun Kim, Soobin Um, Jong Chul Ye

Abstract: Despite recent advances in Text-to-Video (T2V) synthesis, generating high-fidelity and dynamic motion remains a significant challenge. Existing methods primarily rely on Classifier-Free Guidance (CFG), often with explicit negative prompts (e.g. "static", "blurry"), to suppress undesired artifacts. However, such explicit negations frequently introduce unintended semantic bias and distort object integrity; a phenomenon we define as Content-Motion Drift. To address this, we propose MotionCFG, a framework that enhances motion dynamics by contrasting a target concept with its noise-perturbed counterparts. Specifically, by injecting Gaussian noise into the concept embeddings, MotionCFG creates localized negative anchors that encapsulate a broad complementary space of sub-optimal motion variations. Unlike explicit negations, this approach facilitates implicit hard negative mining without shifting the global semantic identity, allowing for a focused refinement of temporal details. Combined with a piecewise guidance schedule that confines intervention to the early denoising steps, MotionCFG consistently improves motion dynamics across state-of-the-art T2V frameworks with negligible computational overhead and minimal compromise in visual quality. Additionally, we demonstrate that this noise-induced contrastive mechanism is effective not only for sharpening motion trajectories but also for steering complex, non-linear concepts such as precise object numerosity, which are typically difficult to modulate via standard text-based guidance.

cross Self-Supervised Uncertainty Estimation For Super-Resolution of Satellite Images

Authors: Zhe Zheng, Val\'ery Dewil, Pablo Arias

Abstract: Super-resolution (SR) of satellite imagery is challenging due to the lack of paired low-/high-resolution data. Recent self-supervised SR methods overcome this limitation by exploiting the temporal redundancy in burst observations, but they lack a mechanism to quantify uncertainty in the reconstruction. In this work, we introduce a novel self-supervised loss that allows to estimate uncertainty in image super-resolution without ever accessing the ground-truth high-resolution data. We adopt a decision-theoretic perspective and show that minimizing the corresponding Bayesian risk yields the posterior mean and variance as optimal estimators. We validate our approach on a synthetic SkySat L1B dataset and demonstrate that it produces calibrated uncertainty estimates comparable to supervised methods. Our work bridges self-supervised restoration with uncertainty quantification, making a practical framework for uncertainty-aware image reconstruction.

cross CMHL: Contrastive Multi-Head Learning for Emotionally Consistent Text Classification

Authors: Menna Elgabry, Ali Hamdi, Khaled Shaban

Abstract: Textual Emotion Classification (TEC) is one of the most difficult NLP tasks. State of the art approaches rely on Large language models (LLMs) and multi-model ensembles. In this study, we challenge the assumption that larger scale or more complex models are necessary for improved performance. In order to improve logical consistency, We introduce CMHL, a novel single-model architecture that explicitly models the logical structure of emotions through three key innovations: (1) multi-task learning that jointly predicts primary emotions, valence, and intensity, (2) psychologically-grounded auxiliary supervision derived from Russell's circumplex model, and (3) a novel contrastive contradiction loss that enforces emotional consistency by penalizing mutually incompatible predictions (e.g., simultaneous high confidence in joy and anger). With just 125M parameters, our model outperforms 56x larger LLMs and sLM ensembles with a new state-of-the-art F1 score of 93.75\% compared to (86.13\%-93.2\%) on the dair-ai Emotion dataset. We further show cross domain generalization on the Reddit Suicide Watch and Mental Health Collection dataset (SWMH), outperforming domain-specific models like MentalBERT and MentalRoBERTa with an F1 score of 72.50\% compared to (68.16\%-72.16\%) + a 73.30\% recall compared to (67.05\%-70.89\%) that translates to enhanced sensitivity for detecting mental health distress. Our work establishes that architectural intelligence (not parameter count) drives progress in TEC. By embedding psychological priors and explicit consistency constraints, a well-designed single model can outperform both massive LLMs and complex ensembles, offering a efficient, interpretable, and clinically-relevant paradigm for affective computing.

cross Evaluating Four FPGA-accelerated Space Use Cases based on Neural Network Algorithms for On-board Inference

Authors: Pedro Antunes, Muhammad Ihsan Al Hafiz, Jonah Ekelund, Ekaterina Dineva, George Miloshevich, Panagiotis Gonidakis, Artur Podobas

Abstract: Space missions increasingly deploy high-fidelity sensors that produce data volumes exceeding onboard buffering and downlink capacity. This work evaluates FPGA acceleration of neural networks (NNs) across four space use cases on the AMD ZCU104 board. We use Vitis AI (AMD DPU) and Vitis HLS to implement inference, quantify throughput and energy, and expose toolchain and architectural constraints relevant to deployment. Vitis AI achieves up to 34.16$\times$ higher inference rate than the embedded ARM CPU baseline, while custom HLS designs reach up to 5.4$\times$ speedup and add support for operators (e.g., sigmoids, 3D layers) absent in the DPU. For these implementations, measured MPSoC inference power spans 1.5-6.75 W, reducing energy per inference versus CPU execution in all use cases. These results show that NN FPGA acceleration can enable onboard filtering, compression, and event detection, easing downlink pressure in future missions.

cross Maximin Robust Bayesian Experimental Design

Authors: Hany Abdulsamad, Sahel Iqbal, Christian A. Naesseth, Takuo Matsubara, Adrien Corenflos

Abstract: We address the brittleness of Bayesian experimental design under model misspecification by formulating the problem as a max--min game between the experimenter and an adversarial nature subject to information-theoretic constraints. We demonstrate that this approach yields a robust objective governed by Sibson's $\alpha$-mutual information~(MI), which identifies the $\alpha$-tilted posterior as the robust belief update and establishes the R\'enyi divergence as the appropriate measure of conditional information gain. To mitigate the bias and variance of nested Monte Carlo estimators needed to estimate Sibson's $\alpha$-MI, we adopt a PAC-Bayes framework to search over stochastic design policies, yielding rigorous high-probability lower bounds on the robust expected information gain that explicitly control finite-sample error.

cross Diffusion Reinforcement Learning via Centered Reward Distillation

Authors: Yuanzhi Zhu, Xi Wang, St\'ephane Lathuili\`ere, Vicky Kalogeiton

Abstract: Diffusion and flow models achieve State-Of-The-Art (SOTA) generative performance, yet many practically important behaviors such as fine-grained prompt fidelity, compositional correctness, and text rendering are weakly specified by score or flow matching pretraining objectives. Reinforcement Learning (RL) fine-tuning with external, black-box rewards is a natural remedy, but diffusion RL is often brittle. Trajectory-based methods incur high memory cost and high-variance gradient estimates; forward-process approaches converge faster but can suffer from distribution drift, and hence reward hacking. In this work, we present \textbf{Centered Reward Distillation (CRD)}, a diffusion RL framework derived from KL-regularized reward maximization built on forward-process-based fine-tuning. The key insight is that the intractable normalizing constant cancels under \emph{within-prompt centering}, yielding a well-posed reward-matching objective. To enable reliable text-to-image fine-tuning, we introduce techniques that explicitly control distribution drift: (\textit{i}) decoupling the sampler from the moving reference to prevent ratio-signal collapse, (\textit{ii}) KL anchoring to a CFG-guided pretrained model to control long-run drift and align with the inference-time semantics of the pre-trained model, and (\textit{iii}) reward-adaptive KL strength to accelerate early learning under large KL regularization while reducing late-stage exploitation of reward-model loopholes. Experiments on text-to-image post-training with \texttt{GenEval} and \texttt{OCR} rewards show that CRD achieves competitive SOTA reward optimization results with fast convergence and reduced reward hacking, as validated on unseen preference metrics.

cross DualSwinFusionSeg: Multimodal Martian Landslide Segmentation via Dual Swin Transformer with Multi-Scale Fusion and UNet++

Authors: Shahriar Kabir, Abdullah Muhammed Amimul Ehsan, Istiak Ahmmed Rifti, Md Kaykobad Reza

Abstract: Automated segmentation of Martian landslides, particularly in tectonically active regions such as Valles Marineris,is important for planetary geology, hazard assessment, and future robotic exploration. However, detecting landslides from planetary imagery is challenging due to the heterogeneous nature of available sensing modalities and the limited number of labeled samples. Each observation combines RGB imagery with geophysical measurements such as digital elevation models, slope maps, thermal inertia, and contextual grayscale imagery, which differ significantly in resolution and statistical properties. To address these challenges, we propose DualSwinFusionSeg, a multimodal segmentation architecture that separates modality-specific feature extraction and performs multi-scale cross-modal fusion. The model employs two parallel Swin Transformer V2 encoders to independently process RGB and auxiliary geophysical inputs, producing hierarchical feature representations. Corresponding features from the two streams are fused at multiple scales and decoded using a UNet++ decoder with dense nested skip connections to preserve fine boundary details. Extensive ablation studies evaluate modality contributions, loss functions, decoder architectures, and fusion strategies. Experiments on the MMLSv2 dataset from the PBVS 2026 Mars-LS Challenge show that modality-specific encoders and simple concatenation-based fusion improve segmentation accuracy under limited training data. The final model achieves 0.867 mIoU and 0.905 F1 on the development benchmark and 0.783 mIoU on the held-out test set, demonstrating strong performance for multimodal planetary surface segmentation.

cross Solving physics-constrained inverse problems with conditional flow matching

Authors: Agnimitra Dasgupta, Ali Fardisi, Mehrnegar Aminy, Brianna Binder, Bryan Shaddy, Assad Oberai

Abstract: This study presents a conditional flow matching framework for solving physics-constrained Bayesian inverse problems. In this setting, samples from the joint distribution of inferred variables and measurements are assumed available, while explicit evaluation of the prior and likelihood densities is not required. We derive a simple and self-contained formulation of both the unconditional and conditional flow matching algorithms, tailored specifically to inverse problems. In the conditional setting, a neural network is trained to learn the velocity field of a probability flow ordinary differential equation that transports samples from a chosen source distribution directly to the posterior distribution conditioned on observed measurements. This black-box formulation accommodates nonlinear, high-dimensional, and potentially non-differentiable forward models without restrictive assumptions on the noise model. We further analyze the behavior of the learned velocity field in the regime of finite training data. Under mild architectural assumptions, we show that overtraining can induce degenerate behavior in the generated conditional distributions, including variance collapse and a phenomenon termed selective memorization, wherein generated samples concentrate around training data points associated with similar observations. A simplified theoretical analysis explains this behavior, and numerical experiments confirm it in practice. We demonstrate that standard early-stopping criteria based on monitoring test loss effectively mitigate such degeneracy. The proposed method is evaluated on several physics-based inverse problems. We investigate the impact of different choices of source distributions, including Gaussian and data-informed priors. Across these examples, conditional flow matching accurately captures complex, multimodal posterior distributions while maintaining computational efficiency.

cross An Alternative Trajectory for Generative AI

Authors: Margarita Belova, Yuval Kansal, Yihao Liang, Jiaxin Xiao, Niraj K. Jha

Abstract: The generative artificial intelligence (AI) ecosystem is undergoing rapid transformations that threaten its sustainability. As models transition from research prototypes to high-traffic products, the energetic burden has shifted from one-time training to recurring, unbounded inference. This is exacerbated by reasoning models that inflate compute costs by orders of magnitude per query. The prevailing pursuit of artificial general intelligence through scaling of monolithic models is colliding with hard physical constraints: grid failures, water consumption, and diminishing returns on data scaling. This trajectory yields models with impressive factual recall but struggles in domains requiring in-depth reasoning, possibly due to insufficient abstractions in training data. Current large language models (LLMs) exhibit genuine reasoning depth only in domains like mathematics and coding, where rigorous, pre-existing abstractions provide structural grounding. In other fields, the current approach fails to generalize well. We propose an alternative trajectory based on domain-specific superintelligence (DSS). We argue for first constructing explicit symbolic abstractions (knowledge graphs, ontologies, and formal logic) to underpin synthetic curricula enabling small language models to master domain-specific reasoning without the model collapse problem typical of LLM-based synthetic data methods. Rather than a single generalist giant model, we envision "societies of DSS models": dynamic ecosystems where orchestration agents route tasks to distinct DSS back-ends. This paradigm shift decouples capability from size, enabling intelligence to migrate from energy-intensive data centers to secure, on-device experts. By aligning algorithmic progress with physical constraints, DSS societies move generative AI from an environmental liability to a sustainable force for economic empowerment.

cross Clinician input steers frontier AI models toward both accurate and harmful decisions

Authors: Ivan Lopez, Selin S. Everett, Bryan J. Bunning, April S. Liang, Dong Han Yao, Shivam C. Vedak, Kameron C. Black, Sophie Ostmeier, Stephen P. Ma, Emily Alsentzer, Jonathan H. Chen, Akshay S. Chaudhari, Eric Horvitz

Abstract: Large language models (LLMs) are entering clinician workflows, yet evaluations rarely measure how clinician reasoning shapes model behavior during clinical interactions. We combined 61 New England Journal of Medicine Case Records with 92 real-world clinician-AI interactions to evaluate 21 reasoning LLM variants across 8 frontier models on differential diagnosis generation and next step recommendations under three conditions: reasoning alone, after expert clinician context, and after adversarial clinician context. LLM-clinician concordance increased substantially after clinician exposure, with simulations sharing >=3 differential diagnosis items rising from 65.8% to 93.5% and >=3 next step recommendations from 20.3% to 53.8%. Expert context significantly improved correct final diagnosis inclusion across all 21 models (mean +20.4 percentage points), reflecting both reasoning improvement and passive content echoing, while adversarial context caused significant diagnostic degradation in 14 models (mean -5.4 percentage points). Multi-turn disagreement probes revealed distinct model phenotypes ranging from highly conformist to dogmatic, with adversarial arguments remaining a persistent vulnerability even for otherwise resilient models. Inference-time scaling reduced harmful echoing of clinician-introduced recommendations across WHO-defined harm severity tiers (relative reductions: 62.7% mild, 57.9% moderate, 76.3% severe, 83.5% death-tier). In GPT-4o experiments, explicit clinician uncertainty signals improved diagnostic performance after adversarial context (final diagnosis inclusion 27% to 42%) and reduced alignment with incorrect arguments by 21%. These findings establish a foundation for evaluating clinician-AI collaboration, introducing interactive metrics and mitigation strategies essential for safety and robustness.

cross A Real-Time Neuro-Symbolic Ethical Governor for Safe Decision Control in Autonomous Robotic Manipulation

Authors: Aueaphum Aueawatthanaphisut, Kuepon Aueawatthanaphisut

Abstract: Ethical decision governance has become a critical requirement for autonomous robotic systems operating in human-centered and safety-sensitive environments. This paper presents a real-time neuro-symbolic ethical governor designed to enable risk-aware supervisory control in autonomous robotic manipulation tasks. The proposed framework integrates transformer-based ethical reasoning with a probabilistic ethical risk field formulation and a threshold-based override control mechanism. language-grounded ethical intent inference capability is learned from natural language task descriptions using a fine-tuned DistilBERT model trained on the ETHICS commonsense dataset. A continuous ethical risk metric is subsequently derived from predicted unsafe action probability, confidence uncertainty, and probabilistic variance to support adaptive decision filtering. The effectiveness of the proposed approach is validated through simulated autonomous robot-arm task scenarios involving varying levels of human proximity and operational hazard. Experimental results demonstrate stable model convergence, reliable ethical risk discrimination, and improved safety-aware decision outcomes without significant degradation of task execution efficiency. The proposed neuro-symbolic architecture further provides enhanced interpretability compared with purely data-driven safety filters, enabling transparent ethical reasoning in real-time control loops. The findings suggest that ethical decision governance can be effectively modeled as a dynamic supervisory risk layer for autonomous robotic systems, with potential applicability to broader cyber-physical and assistive robotics domains.

cross QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

Authors: Yutong Wu, Chenrui Cao, Pengwei Jin, Di Huang, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Xing Hu

Abstract: SystemVerilog Assertions (SVAs) are crucial for hardware verification. Recent studies leverage general-purpose LLMs to translate natural language properties to SVAs (NL2SVA), but they perform poorly due to limited data. We propose a data synthesis framework to tackle two challenges: the scarcity of high-quality real-world SVA corpora and the lack of reliable methods to determine NL-SVA semantic equivalence. For the former, large-scale open-source RTLs are used to guide LLMs to generate real-world SVAs; for the latter, bidirectional translation serves as a data selection method. With the synthesized data, we train CodeV-SVA, a series of SVA generation models. Notably, CodeV-SVA-14B achieves 75.8% on NL2SVA-Human and 84.0% on NL2SVA-Machine in Func.@1, matching or exceeding advanced LLMs like GPT-5 and DeepSeek-R1.

cross ZOTTA: Test-Time Adaptation with Gradient-Free Zeroth-Order Optimization

Authors: Ronghao Zhang, Shuaicheng Niu, Qi Deng, Yanjie Dong, Jian Chen, Runhao Zeng

Abstract: Test-time adaptation (TTA) aims to improve model robustness under distribution shifts by adapting to unlabeled test data, but most existing methods rely on backpropagation (BP), which is computationally costly and incompatible with non-differentiable models such as quantized models, limiting practical deployment on numerous edge devices. Recent BP-free approaches alleviate overhead but remain either architecture-specific or limited in optimization capacity to handle high-dimensional models. We propose ZOTTA, a fully BP-free TTA framework that performs efficient adaptation using only forward passes via Zeroth-Order Optimization (ZOO). While ZOO is theoretically appealing, naive application leads to slow convergence under high-dimensional parameter spaces and unstable optimization due to the lack of labels. ZOTTA overcomes these challenges through 1) Distribution-Robust Layer Selection, which automatically identifies and freezes layers that already extract distribution-invariant features, updating only domain-sensitive layers to reduce the optimization dimensionality and accelerate convergence; 2) Spatial Feature Aggregation Alignment, which stabilizes ZOO by aligning globally aggregated spatial features between source and target to reduce gradient variance. Together, these components enable architecture-agnostic and stable BP-free adaptation. Extensive experiments on ImageNet-C/R/Sketch/A show that ZOTTA outperforms or matches BP-based methods, e.g., it reduces memory usage by 84% and improves accuracy by 3.9% over SAR on ImageNet-C.

cross Seeking Physics in Diffusion Noise

Authors: Chujun Tang, Lei Zhong, Fangqiang Ding

Abstract: Do video diffusion models encode signals predictive of physical plausibility? We probe intermediate denoising representations of a pretrained Diffusion Transformer (DiT) and find that physically plausible and implausible videos are partially separable in mid-layer feature space across noise levels. This separability cannot be fully attributed to visual quality or generator identity, suggesting recoverable physics-related cues in frozen DiT features. Leveraging this observation, we introduce progressive trajectory selection, an inference-time strategy that scores parallel denoising trajectories at a few intermediate checkpoints using a lightweight physics verifier trained on frozen features, and prunes low-scoring candidates early. Extensive experiments on PhyGenBench demonstrate that our method improves physical consistency while reducing inference cost, achieving comparable results to Best-of-K sampling with substantially fewer denoising steps.

cross Autonomous Agents Coordinating Distributed Discovery Through Emergent Artifact Exchange

Authors: Fiona Y. Wang, Lee Marom, Subhadeep Pal, Rachel K. Luu, Wei Lu, Jaime A. Berkovich, Markus J. Buehler

Abstract: We present ScienceClaw + Infinite, a framework for autonomous scientific investigation in which independent agents conduct research without central coordination, and any contributor can deploy new agents into a shared ecosystem. The system is built around three components: an extensible registry of over 300 interoperable scientific skills, an artifact layer that preserves full computational lineage as a directed acyclic graph (DAG), and a structured platform for agent-based scientific discourse with provenance-aware governance. Agents select and chain tools based on their scientific profiles, produce immutable artifacts with typed metadata and parent lineage, and broadcast unsatisfied information needs to a shared global index. The ArtifactReactor enables plannerless coordination: peer agents discover and fulfill open needs through pressure-based scoring, while schema-overlap matching triggers multi-parent synthesis across independent analyses. An autonomous mutation layer actively prunes the expanding artifact DAG to resolve conflicting or redundant workflows, while persistent memory allows agents to continuously build upon complex epistemic states across multiple cycles. Infinite converts these outputs into auditable scientific records through structured posts, provenance views, and machine-readable discourse relations, with community feedback steering subsequent investigation cycles. Across four autonomous investigations, peptide design for the somatostatin receptor SSTR2, lightweight impact-resistant ceramic screening, cross-domain resonance bridging biology, materials, and music, and formal analogy construction between urban morphology and grain-boundary evolution, the framework demonstrates heterogeneous tool chaining, emergent convergence among independently operating agents, and traceable reasoning from raw computation to published finding.

cross Learning-to-Defer with Expert-Conditioned Advice

Authors: Yannis Montreuil, Leina Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

Abstract: Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert, one may also choose what additional information that expert should receive, such as retrieved documents, tool outputs, or escalation context. We study this problem and call it Learning-to-Defer with advice. We show that a broad family of natural separated surrogates, which learn routing and advice with distinct heads, are inconsistent even in the smallest non-trivial setting. We then introduce an augmented surrogate that operates on the composite expert--advice action space and prove an $\mathcal{H}$-consistency guarantee together with an excess-risk transfer bound, yielding recovery of the Bayes-optimal policy in the limit. Experiments on tabular, LLMs, and multi-modal tasks show that the resulting method improves over standard Learning-to-Defer while adapting its advice-acquisition behavior to the cost regime.

cross Data-Driven Physics Embedded Dynamics with Predictive Control and Reinforcement Learning for Quadrupeds

Authors: Prakrut Kotecha, Aditya Shirwatkar, Shishir Kolathaya

Abstract: State of the art quadrupedal locomotion approaches integrate Model Predictive Control (MPC) with Reinforcement Learning (RL), enabling complex motion capabilities with planning and terrain adaptive behaviors. However, they often face compounding errors over long horizons and have limited interpretability due to the absence of physical inductive biases. We address these issues by integrating Lagrangian Neural Networks (LNNs) into an RL MPC framework, enabling physically consistent dynamics learning. At deployment, our inverse dynamics infinite horizon MPC scheme avoids costly matrix inversions, improving computational efficiency by up to 4x with minimal loss of task performance. We validate our framework through multiple ablations of the proposed LNN and its variants. We show improved sample efficiency, reduced long-horizon error, and faster real time planning compared to unstructured neural dynamics. Lastly, we also test our framework on the Unitree Go1 robot to show real world viability.

cross Generation of Human Comprehensible Access Control Policies from Audit Logs

Authors: Gautam Kumar (Indian Institute of Technology Kharagpur, India), Ravi Sundaram (Northeastern University, Boston, USA), Shamik Sural (Indian Institute of Technology Kharagpur, India)

Abstract: Over the years, access control systems have become increasingly more complex, often causing a disconnect between what is envisaged by the stakeholders in decision-making positions and the actual permissions granted as evidenced from access logs. For instance, Attribute-based Access Control (ABAC), which is a flexible yet complex model typically configured by system security officers, can be made understandable to others only when presented at a high level in natural language. Although several algorithms have been proposed in the literature for automatic extraction of ABAC rules from access logs, there is no attempt yet to bridge the semantic gap between the machine-enforceable formal logic and human-centric policy intent. Our work addresses this problem by developing a framework that generates human understandable natural language access control policies from logs. We investigate to what extent the power of Large Language Models (LLMs) can be harnessed to achieve both accuracy and scalability in the process. Named LANTERN (LLM-based ABAC Natural Translation and Explanation for Rule Navigation), we have instantiated the framework as a publicly accessible web based application for reproducibility of our results.

cross Representation Alignment for Just Image Transformers is not Easier than You Think

Authors: Jaeyo Shin, Jiwook Kim, Hyunjung Shim

Abstract: Representation Alignment (REPA) has emerged as a simple way to accelerate Diffusion Transformers training in latent space. At the same time, pixel-space diffusion transformers such as Just image Transformers (JiT) have attracted growing attention because they remove a dependency on a pretrained tokenizer, and then avoid the reconstruction bottleneck of latent diffusion. This paper shows that the REPA can fail for JiT. REPA yields worse FID for JiT as training proceeds and collapses diversity on image subsets that are tightly clustered in the representation space of pretrained semantic encoder on ImageNet. We trace the failure to an information asymmetry: denoising occurs in the high dimensional image space, while the semantic target is strongly compressed, making direct regression a shortcut objective. We propose PixelREPA, which transforms the alignment target and constrains alignment with a Masked Transformer Adapter that combines a shallow transformer adapter with partial token masking. PixelREPA improves both training convergence and final quality. PixelREPA reduces FID from 3.66 to 3.17 for JiT-B$/16$ and improves Inception Score (IS) from 275.1 to 284.6 on ImageNet $256 \times 256$, while achieving $> 2\times$ faster convergence. Finally, PixelREPA-H$/16$ achieves FID$=1.81$ and IS$=317.2$. Our code is available at https://github.com/kaist-cvml/PixelREPA.

URLs: https://github.com/kaist-cvml/PixelREPA.

cross Label Noise Cleaning for Supervised Classification via Bernoulli Random Sampling

Authors: Yuxin Liu, Xiong Jin, Yang Han

Abstract: Label noise - incorrect labels assigned to observations - can substantially degrade the performance of supervised classifiers. This paper proposes a label noise cleaning method based on Bernoulli random sampling. We show that the mean label noise levels of subsets generated by Bernoulli random sampling containing a given observation are identically distributed for all clean observations, and identically distributed, with a different distribution, for all noisy observations. Although the mean label noise levels are not independent across observations, by introducing an independent coupling we further prove that they converge to a mixture of two well-separated distributions corresponding to clean and noisy observations. By establishing a linear model between cross-validated classification errors and label noise levels, we are able to approximate this mixture distribution and thereby separate clean and noisy observations without any prior label information. The proposed method is classifier-agnostic, theoretically justified, and demonstrates strong performance on both simulated and real datasets.

cross Questionnaire Responses Do not Capture the Safety of AI Agents

Authors: Max Hellrigel-Holderbaum, Edward James Young

Abstract: As AI systems advance in capabilities, measuring their safety and alignment to human values is becoming paramount. A fast-growing field of AI research is devoted to developing such assessments. However, most current advances therein may be ill-suited for assessing AI systems across real-world deployments. Standard methods prompt large language models (LLMs) in a questionnaire-style to describe their values or behavior in hypothetical scenarios. By focusing on unaugmented LLMs, they fall short of evaluating AI agents, which could actually perform relevant behaviors, hence posing much greater risks. LLMs' engagement with scenarios described by questionnaire-style prompts differs starkly from that of agents based on the same LLMs, as reflected in divergences in the inputs, possible actions, environmental interactions, and internal processing. As such, LLMs' responses to scenario descriptions are unlikely to be representative of the corresponding LLM agents' behavior. We further contend that such assessments make strong assumptions concerning the ability and tendency of LLMs to report accurately about their counterfactual behavior. This makes them inadequate to assess risks from AI systems in real-world contexts as they lack construct validity. We then argue that a structurally identical issue holds for current AI alignment approaches. Lastly, we discuss improving safety assessments and alignment training by taking these shortcomings to heart.

cross AR-Flow VAE: A Structured Autoregressive Flow Prior Variational Autoencoder for Unsupervised Blind Source Separation

Authors: Yuan-Hao Wei, Fu-Hao Deng, Lin-Yong Cui, Yan-Jie Sun

Abstract: Blind source separation (BSS) seeks to recover latent source signals from observed mixtures. Variational autoencoders (VAEs) offer a natural perspective for this problem: the latent variables can be interpreted as source components, the encoder can be viewed as a demixing mapping from observations to sources, and the decoder can be regarded as a remixing process from inferred sources back to observations. In this work, we propose AR-Flow VAE, a novel VAE-based framework for BSS in which each latent source is endowed with a parameter-adaptive autoregressive flow prior. This prior significantly enhances the flexibility of latent source modeling, enabling the framework to capture complex non-Gaussian behaviors and structured dependencies, such as temporal correlations, that are difficult to represent with conventional priors. In addition, the structured prior design assigns distinct priors to different latent dimensions, thereby encouraging the latent components to separate into different source signals under heterogeneous prior constraints. Experimental results validate the effectiveness of the proposed architecture for blind source separation. More importantly, this work provides a foundation for future investigations into the identifiability and interpretability of AR-Flow VAE.

cross How to find expressible and trainable parameterized quantum circuits?

Authors: Peter R\"oseler, Dennis Willsch, Kristel Michielsen

Abstract: Whether parameterized quantum circuits (PQCs) can be systematically constructed to be both trainable and expressive remains an open question. Highly expressive PQCs often exhibit barren plateaus, while several trainable alternatives admit efficient classical simulation. We address this question by deriving a finite-sample, dimension-independent concentration bound for estimating the variance of a PQC cost function, yielding explicit trainability guarantees. Across commonly used ans\"atze, we observe an anticorrelation between trainability and expressibility, consistent with theoretical insights. Building on this observation, we propose a property-based ansatz-search framework for identifying circuits that combine trainability and expressibility. We demonstrate its practical viability on a real quantum computer and apply it to variational quantum algorithms. We identify quantum neural network ans\"atze with improved effective dimension using over $6 \times$ fewer parameters, and for VQE on $\mathrm{H}_2$ we achieve UCCSD-like accuracy at substantially reduced circuit complexity.

cross Inclusive AI for Group Interactions: Predicting Gaze-Direction Behaviors in People with Intellectual and Developmental Disabilities

Authors: Giulia Huang, Maristella Matera, Micol Spitale

Abstract: Artificial agents that support human group interactions hold great promise, especially in sensitive contexts such as well-being promotion and therapeutic interventions. However, current systems struggle to mediate group interactions involving people who are not neurotypical. This limitation arises because most AI detection models (e.g., for turn-taking) are trained on data from neurotypical populations. This work takes a step toward inclusive AI by addressing the challenge of eye contact detection, a core component of non-verbal communication, with and for people with Intellectual and Developmental Disabilities. First, we introduce a new dataset, Multi-party Interaction with Intellectual and Developmental Disabilities (MIDD), capturing atypical gaze and engagement patterns. Second, we present the results of a comparative analysis with neurotypical datasets, highlighting differences in class imbalance, speaking activity, gaze distribution, and interaction dynamics. Then, we evaluate classifiers ranging from SVMs to FSFNet, showing that fine-tuning on MIDD improves performance, though notable limitations remain. Finally, we present the insights gathered through a focus group with six therapists to interpret our quantitative findings and understand the practical implications of atypical gaze and engagement patterns. Based on these results, we discuss data-driven strategies and emphasize the importance of feature choice for building more inclusive human-centered tools.

cross Physics-Informed Policy Optimization via Analytic Dynamics Regularization

Authors: Namai Chandra, Liu Mohan, Zhihao Gu, Lin Wang

Abstract: Reinforcement learning (RL) has achieved strong performance in robotic control; however, state-of-the-art policy learning methods, such as actor-critic methods, still suffer from high sample complexity and often produce physically inconsistent actions. This limitation stems from neural policies implicitly rediscovering complex physics from data alone, despite accurate dynamics models being readily available in simulators. In this paper, we introduce a novel physics-informed RL framework, called PIPER, that seamlessly integrates physical constraints directly into neural policy optimization with analytical soft physics constraints. At the core of our method is the integration of a differentiable Lagrangian residual as a regularization term within the actor's objective. This residual, extracted from a robot's simulator description, subtly biases policy updates towards dynamically consistent solutions. Crucially, this physics integration is realized through an additional loss term during policy optimization, requiring no alterations to existing simulators or core RL algorithms. Extensive experiments demonstrate that our method significantly improves learning efficiency, stability, and control accuracy, establishing a new paradigm for efficient and physically consistent robotic control.

cross Convergence of Two Time-Scale Stochastic Approximation: A Martingale Approach

Authors: Mathukumalli Vidyasagar

Abstract: In this paper, we analyze the two time-scale stochastic approximation (TTSSA) algorithm introduced in Borkar (1997) using a martingale approach. This approach leads to simple sufficient conditions for the iterations to be bounded almost surely, as well as estimates on the rate of convergence of the mean-squared error of the TTSSA algorithm to zero. Our theory is applicable to nonlinear equations, in contrast to many papers in the TTSSA literature which assume that the equations are linear. The convergence of TTSSA is proved in the "almost sure" sense, in contrast to earlier papers on TTSSA that establish convergence in distribution, convergence in the mean, and the like. Moreover, in this paper we establish different rates of convergence for the fast and the slow subsystems, perhaps for the first time. Finally, all of the above results to continue to hold in the case where the two measurement errors have nonzero conditional mean, and/or have conditional variances that grow without bound as the iterations proceed. This is in contrast to previous papers which assumed that the errors form a martingale difference sequence with uniformly bounded conditional variance. It is shown that when the measurement errors have zero conditional mean and the conditional variance remains bounded, the mean-squared error of the iterations converges to zero at a rate of $o(t^{-\eta})$ for all $\eta \in (0,1)$. This improves upon the rate of $O(t^{-2/3})$ proved in Doan (2023) (which is the best bound available to date). Our bound is virtually the same as the rate of $O(t^{-1})$ proved in Doan (2024), but for a Polyak-Ruppert averaged version of TTSSA, and not directly. Rates of convergence are also established for the case where the errors have nonzero conditional mean and/or unbounded conditional variance.

cross Fine-tuning MLLMs Without Forgetting Is Easier Than You Think

Authors: He Li, Yuhui Zhang, Xiaohan Wang, Kaifeng Lyu, Serena Yeung-Levy

Abstract: The paper demonstrate that simple adjustments of the fine-tuning recipes of multimodal large language models (MLLM) are sufficient to mitigate catastrophic forgetting. On visual question answering, we design a 2x2 experimental framework to assess model performance across in-distribution and out-of-distribution image and text inputs. Our results show that appropriate regularization, such as constraining the number of trainable parameters or adopting a low learning rate, effectively prevents forgetting when dealing with out-of-distribution images. However, we uncover a distinct form of forgetting in settings with in-distribution images and out-of-distribution text. We attribute this forgetting as task-specific overfitting and address this issue by introducing a data-hybrid training strategy that combines datasets and tasks. Finally, we demonstrate that this approach naturally extends to continual learning, outperforming existing methods with complex auxiliary mechanisms. In general, our findings challenge the prevailing assumptions by highlighting the inherent robustness of MLLMs and providing practical guidelines for adapting them while preserving their general capabilities.

cross Refining 3D Medical Segmentation with Verbal Instruction

Authors: Kangxian Xie, Jiancheng Yang, Nandor Pinter, Chao Wu, Behzad Bozorgtabar, Mingchen Gao

Abstract: Accurate 3D anatomical segmentation is essential for clinical diagnosis and surgical planning. However, automated models frequently generate suboptimal shape predictions due to factors such as limited and imbalanced training data, inadequate labeling quality, and distribution shifts between training and deployment settings. A natural solution is to iteratively refine the predicted shape based on the radiologists' verbal instructions. However, this is hindered by the scarcity of paired data that explicitly links erroneous shapes to corresponding corrective instructions. As an initial step toward addressing this limitation, we introduce CoWTalk, a benchmark comprising 3D arterial anatomies with controllable synthesized anatomical errors and their corresponding repairing instructions. Building on this benchmark, we further propose an iterative refinement model that represents 3D shapes as vector sets and interacts with textual instructions to progressively update the target shape. Experimental results demonstrate that our method achieves significant improvements over corrupted inputs and competitive baselines, highlighting the feasibility of language-driven clinician-in-the-loop refinement for 3D medical shapes modeling.

cross Learning to Forget: Sleep-Inspired Memory Consolidation for Resolving Proactive Interference in Large Language Models

Authors: Ying Xie

Abstract: Large language models (LLMs) suffer from proactive interference (PI): outdated information in the context window disrupts retrieval of current values. This interference degrades retrieval accuracy log-linearly as stale associations accumulate, a bottleneck that persists regardless of context length and resists prompt-engineering mitigations. Biological brains resolve an analogous challenge through sleep-dependent memory consolidation: synaptic downscaling, selective replay, and targeted forgetting. We propose SleepGate, a biologically inspired framework that augments transformer-based LLMs with a learned sleep cycle over the key-value (KV) cache. SleepGate introduces three mechanisms: (1) a conflict-aware temporal tagger detecting when new entries supersede old ones; (2) a lightweight forgetting gate trained to selectively evict or compress stale cache entries; and (3) a consolidation module that merges surviving entries into compact summaries. These components activate periodically during inference in sleep micro-cycles, governed by an adaptive entropy-based trigger. We formalize a dual-phase training objective jointly optimizing language modeling during the wake phase and post-consolidation retrieval during the sleep phase. Theoretical analysis shows SleepGate reduces the interference horizon from O(n) to O(log n). In experiments with a small-scale transformer (4 layers, 793K parameters), SleepGate achieves 99.5% retrieval accuracy at PI depth 5 and 97.0% at depth 10, while all five baselines -- full KV cache, sliding window, H2O, StreamingLLM, and decay-only ablation -- remain below 18%. Our framework offers an architecture-level solution that prompt engineering cannot address.

cross ASAP: Attention-Shift-Aware Pruning for Efficient LVLM Inference

Authors: Surendra Pathak, Bo Han

Abstract: While Large Vision-Language Models (LVLMs) demonstrate exceptional multi-modal capabilities, the quadratic computational cost of processing high-resolution visual tokens remains a critical bottleneck. Though recent token reduction strategies attempt to accelerate inference, such methods inadequately exploit attention values and fail to address token redundancy. More critically, they overlook the ``attention shift'' phenomenon inherent in LVLMs, which skews token attention scores. In this work, we propose ASAP, a novel training-free, KV-Cache-compatible pruning recipe that comprehensively addresses these limitations. First, we mitigate the attention shift by utilizing a dynamic bidirectional soft attention mask, ensuring the selection of genuinely informative tokens rather than naive attention-based selection. Second, we posit that high semantic redundancy within the token set degrades performance. We therefore introduce a weighted soft merging component that merges semantically similar tokens, preserving only the most feature-dense visual patches for subsequent layers. ASAP achieves virtually lossless compression of visual context, retaining 99.02% of the original LLaVA-NeXT-7B performance while aggressively slashing computational FLOPs by ~80%.

cross Rigorous Asymptotics for First-Order Algorithms Through the Dynamical Cavity Method

Authors: Yatin Dandi, David Gamarnik, Francisco Pernice, Lenka Zdeborov\'a

Abstract: Dynamical Mean Field Theory (DMFT) provides an asymptotic description of the dynamics of macroscopic observables in certain disordered systems. Originally pioneered in the context of spin glasses by Sompolinsky and Zippelius (1982), it has since been used to derive asymptotic dynamical equations for a wide range of models in physics, high-dimensional statistics and machine learning. One of the main tools used by physicists to obtain these equations is the dynamical cavity method, which has remained largely non-rigorous. In contrast, existing mathematical formalizations have relied on alternative approaches, including Gaussian conditioning, large deviations over paths, or Fourier analysis. In this work, we formalize the dynamical cavity method and use it to give a new proof of the DMFT equations for General First Order Methods, a broad class of dynamics encompassing algorithms such as Gradient Descent and Approximate Message Passing.

cross IQP Born Machines under Data-dependent and Agnostic Initialization Strategies

Authors: Sacha Lerch, Joseph Bowles, Ricard Puig, Erik Armengol, Zo\"e Holmes, Supanut Thanasilp

Abstract: Quantum circuit Born machines based on instantaneous quantum polynomial-time (IQP) circuits are natural candidates for quantum generative modeling, both because of their probabilistic structure and because IQP sampling is provably classically hard in certain regimes. Recent proposals focus on training IQP-QCBMs using Maximum Mean Discrepancy (MMD) losses built from low-body Pauli-$Z$ correlators, but the effect of initialization on the resulting optimization landscape remains poorly understood. In this work, we address this by first proving that the MMD loss landscape suffers from barren plateaus for random full-angle-range initializations of IQP circuits. We then establish lower bounds on the loss variance for identity and an unbiased data-agnostic initialization. We then additionally consider a data-dependent initialization that is better aligned with the target distribution and, under suitable assumptions, yields provable gradients and generally converges quicker to a good minimum (as indicated by our training of circuits with 150 qubits on genomic data). Finally, as a by-product, the developed variance lower bound framework is applicable to a general class of non-linear losses, offering a broader toolset for analyzing warm-starts in quantum machine learning.

cross Power-Law Spectrum of the Random Feature Model

Authors: Elliot Paquette, Ke Liang Xiao, Yizhe Zhu

Abstract: Scaling laws for neural networks, in which the loss decays as a power-law in the number of parameters, data, and compute, depend fundamentally on the spectral structure of the data covariance, with power-law eigenvalue decay appearing ubiquitously in vision and language tasks. A central question is whether this spectral structure is preserved or destroyed when data passes through the basic building block of a neural network: a random linear projection followed by a nonlinear activation. We study this question for the random feature model: given data $x \sim N(0,H)\in \mathbb{R}^v$ where $H$ has $\alpha$-power-law spectrum ($\lambda_j(H ) \asymp j^{-\alpha}$, $\alpha > 1$), a Gaussian sketch matrix $W \in \mathbb{R}^{v\times d}$, and an entrywise monomial $f(y) = y^{p}$, we characterize the eigenvalues of the population random-feature covariance $\mathbb{E}_{x }[\frac{1}{d}f(W^\top x )^{\otimes 2}]$. We prove matching upper and lower bounds: for all $1 \leq j \leq c_1 d \log^{-(p+1)}(d)$, the $j$-th eigenvalue is of order $\left(\log^{p-1}(j+1)/j\right)^{\alpha}$. For $ c_1 d \log^{-(p+1)}(d)\leq j\leq d$, the $j$-th eigenvalue is of order $j^{-\alpha}$ up to a polylog factor. That is, the power-law exponent $\alpha$ is inherited exactly from the input covariance, modified only by a logarithmic correction that depends on the monomial degree $p$. The proof combines a dyadic head-tail decomposition with Wick chaos expansions for higher-order monomials and random matrix concentration inequalities.

cross Medical Image Spatial Grounding with Semantic Sampling

Authors: Andrew Seohwan Yu, Mohsen Hariri, Kunio Nakamura, Mingrui Yang, Xiaojuan Li, Vipin Chaudhary

Abstract: Vision language models (VLMs) have shown significant promise in visual grounding for images as well as videos. In medical imaging research, VLMs represent a bridge between object detection and segmentation, and report understanding and generation. However, spatial grounding of anatomical structures in the three-dimensional space of medical images poses many unique challenges. In this study, we examine image modalities, slice directions, and coordinate systems as differentiating factors for vision components of VLMs, and the use of anatomical, directional, and relational terminology as factors for the language components. We then demonstrate that visual and textual prompting systems such as labels, bounding boxes, and mask overlays have varying effects on the spatial grounding ability of VLMs. To enable measurement and reproducibility, we introduce \textbf{MIS-Ground}, a benchmark that comprehensively tests a VLM for vulnerabilities against specific modes of \textbf{M}edical \textbf{I}mage \textbf{S}patial \textbf{Ground}ing. We release MIS-Ground to the public at \href{https://anonymous.4open.science/r/mis-ground}{\texttt{anonymous.4open.science/r/mis-ground}}. In addition, we present \textbf{MIS-SemSam}, a low-cost, inference-time, and model-agnostic optimization of VLMs that improve their spatial grounding ability with the use of \textbf{Sem}antic \textbf{Sam}pling. We find that MIS-SemSam improves the accuracy of Qwen3-VL-32B on MIS-Ground by 13.06\%.

URLs: https://anonymous.4open.science/r/mis-ground

cross Machine Learning-Driven Intelligent Memory System Design: From On-Chip Caches to Storage

Authors: Rahul Bera, Rakesh Nadig, Onur Mutlu

Abstract: Despite the data-rich environment in which memory systems of modern computing platforms operate, many state-of-the-art architectural policies employed in the memory system rely on static, human-designed heuristics that fail to truly adapt to the workload and system behavior via principled learning methodologies. In this article, we propose a fundamentally different design approach: using lightweight and practical machine learning (ML) methods to enable adaptive, data-driven control throughout the memory hierarchy. We present three ML-guided architectural policies: (1) Pythia, a reinforcement learning-based data prefetcher for on-chip caches, (2) Hermes, a perceptron learning-based off-chip predictor for multi-level cache hierarchies, and (3) Sibyl, a reinforcement learning-based data placement policy for hybrid storage systems. Our evaluation shows that Pythia, Hermes, and Sibyl significantly outperform the best-prior human-designed policies, while incurring modest hardware overheads. Collectively, this article demonstrates that integrating adaptive learning into memory subsystems can lead to intelligent, self-optimizing architectures that unlock performance and efficiency gains beyond what is possible with traditional human-designed approaches.

cross SuperLocalMemory V3: Information-Geometric Foundations for Zero-LLM Enterprise Agent Memory

Authors: Varun Pratap Bhardwaj

Abstract: Persistent memory is a central capability for AI agents, yet the mathematical foundations of memory retrieval, lifecycle management, and consistency remain unexplored. Current systems employ cosine similarity for retrieval, heuristic decay for salience, and provide no formal contradiction detection. We establish information-geometric foundations through three contributions. First, a retrieval metric derived from the Fisher information structure of diagonal Gaussian families, satisfying Riemannian metric axioms, invariant under sufficient statistics, and computable in O(d) time. Second, memory lifecycle formulated as Riemannian Langevin dynamics with proven existence and uniqueness of the stationary distribution via the Fokker-Planck equation, replacing hand-tuned decay with principled convergence guarantees. Third, a cellular sheaf model where non-trivial first cohomology classes correspond precisely to irreconcilable contradictions across memory contexts. On the LoCoMo benchmark, the mathematical layers yield +12.7 percentage points over engineering baselines across six conversations, reaching +19.9 pp on the most challenging dialogues. A four-channel retrieval architecture achieves 75% accuracy without cloud dependency. Cloud-augmented results reach 87.7%. A zero-LLM configuration satisfies EU AI Act data sovereignty requirements by architectural design. To our knowledge, this is the first work establishing information-geometric, sheaf-theoretic, and stochastic-dynamical foundations for AI agent memory systems.

cross Scaling the Explanation of Multi-Class Bayesian Network Classifiers

Authors: Yaofang Zhang, Adnan Darwiche

Abstract: We propose a new algorithm for compiling Bayesian network classifier (BNC) into class formulas. Class formulas are logical formulas that represent a classifier's input-output behavior, and are crucial in the recent line of work that uses logical reasoning to explain the decisions made by classifiers. Compared to prior work on compiling class formulas of BNCs, our proposed algorithm is not restricted to binary classifiers, shows significant improvement in compilation time, and outputs class formulas as negation normal form (NNF) circuits that are OR-decomposable, which is an important property when computing explanations of classifiers.

cross $K-$means with learned metrics

Authors: Pablo Groisman, Matthieu Jonckheere, Jordan Serres, Mariela Sued

Abstract: We study the Fr\'echet {\it k-}means of a metric measure space when both the measure and the distance are unknown and have to be estimated. We prove a general result that states that the {\it k-}means are continuous with respect to the measured Gromov-Hausdorff topology. In this situation, we also prove a stability result for the Voronoi clusters they determine. We do not assume uniqueness of the set of {\it k-}means, but when it is unique, the results are stronger. {This framework provides a unified approach to proving consistency for a wide range of metric learning procedures. As concrete applications, we obtain new consistency results for several important estimators that were previously unestablished, even when $k=1$. These include {\it k-}means based on: (i) Isomap and Fermat geodesic distances on manifolds, (ii) difussion distances, (iii) Wasserstein distances computed with respect to learned ground metrics. Finally, we consider applications beyond the statistical inference paradigm like (iv) first passage percolation and (v) discrete approximations of length spaces.}

cross $PA^3$: $\textbf{P}$olicy-$\textbf{A}$ware $\textbf{A}$gent $\textbf{A}$lignment through Chain-of-Thought

Authors: Shubhashis Roy Dipta, Daniel Bis, Kun Zhou, Lichao Wang, Benjamin Z. Yao, Chenlei Guo, Ruhi Sarikaya

Abstract: Conversational assistants powered by large language models (LLMs) excel at tool-use tasks but struggle with adhering to complex, business-specific rules. While models can reason over business rules provided in context, including all policies for every query introduces high latency and wastes compute. Furthermore, these lengthy prompts lead to long contexts, harming overall performance due to the "needle-in-the-haystack" problem. To address these challenges, we propose a multi-stage alignment method that teaches models to recall and apply relevant business policies during chain-of-thought reasoning at inference time, without including the full business policy in-context. Furthermore, we introduce a novel PolicyRecall reward based on the Jaccard score and a Hallucination Penalty for GRPO training. Altogether, our best model outperforms the baseline by 16 points and surpasses comparable in-context baselines of similar model size by 3 points, while using 40% fewer words.

cross Tactile Modality Fusion for Vision-Language-Action Models

Authors: Charlotte Morissette, Amin Abyaneh, Wei-Di Chang, Anas Houssaini, David Meger, Hsiu-Chin Lin, Jonathan Tremblay, Gregory Dudek

Abstract: We propose TacFiLM, a lightweight modality-fusion approach that integrates visual-tactile signals into vision-language-action (VLA) models. While recent advances in VLA models have introduced robot policies that are both generalizable and semantically grounded, these models mainly rely on vision-based perception. Vision alone, however, cannot capture the complex interaction dynamics that occur during contact-rich manipulation, including contact forces, surface friction, compliance, and shear. While recent attempts to integrate tactile signals into VLA models often increase complexity through token concatenation or large-scale pretraining, the heavy computational demands of behavioural models necessitate more lightweight fusion strategies. To address these challenges, TacFiLM outlines a post-training finetuning approach that conditions intermediate visual features on pretrained tactile representations using feature-wise linear modulation (FiLM). Experimental results on insertion tasks demonstrate consistent improvements in success rate, direct insertion performance, completion time, and force stability across both in-distribution and out-of-distribution tasks. Together, these results support our method as an effective approach to integrating tactile signals into VLA models, improving contact-rich manipulation behaviours.

cross EcoFair-CH-MARL: Scalable Constrained Hierarchical Multi-Agent RL with Real-Time Emission Budgets and Fairness Guarantees

Authors: Saad Alqithami

Abstract: Global decarbonisation targets and tightening market pressures demand maritime logistics solutions that are simultaneously efficient, sustainable, and equitable. We introduce EcoFair-CH-MARL, a constrained hierarchical multi-agent reinforcement learning framework that unifies three innovations: (i) a primal-dual budget layer that provably bounds cumulative emissions under stochastic weather and demand; (ii) a fairness-aware reward transformer with dynamically scheduled penalties that enforces max-min cost equity across heterogeneous fleets; and (iii) a two-tier policy architecture that decouples strategic routing from real-time vessel control, enabling linear scaling in agent count. New theoretical results establish O(\sqrt{T}) regret for both constraint violations and fairness loss. Experiments on a high-fidelity maritime digital twin (16 ports, 50 vessels) driven by automatic identification system traces, plus an energy-grid case study, show up to 15% lower emissions, 12% higher through-put, and a 45% fair-cost improvement over state-of-the-art hierarchical and constrained MARL baselines. In addition, EcoFair-CH-MARL achieves stronger equity (lower Gini and higher min-max welfare) than fairness-specific MARL baselines (e.g., SOTO, FEN), and its modular design is compatible with both policy- and value-based learners. EcoFair-CH-MARL therefore advances the feasibility of large-scale, regulation-compliant, and socially responsible multi-agent coordination in safety-critical domains.

cross LUMINA: A Multi-Vendor Mammography Benchmark with Energy Harmonization Protocol

Authors: Hongyi Pan, Gorkem Durak, Halil Ertugrul Aktas, Andrea M. Bejar, Baver Tutun, Emre Uysal, Ezgi Bulbul, Mehmet Fatih Dogan, Berrin Erok, Berna Akkus Yildirim, Sukru Mehmet Erturk, Ulas Bagci

Abstract: Publicly available full-field digital mammography (FFDM) datasets remain limited in size, clinical annotations, and vendor diversity, hindering the development of robust models. We introduce LUMINA, a curated, multi-vendor FFDM dataset that explicitly encodes acquisition energy and vendor metadata to capture clinically relevant appearance variations often overlooked in existing benchmarks. This dataset contains 1824 images from 468 patients (960 benign, 864 malignant), with pathology-confirmed labels, BI-RADS assessments, and breast-density annotations. LUMINA spans six acquisition systems and includes both high- and low-energy imaging styles, enabling systematic analysis of vendor- and energy-induced domain shifts. To address these variations, we propose a foreground-only pixel-space alignment method (''energy harmonization'') that maps images to a low-energy reference while preserving lesion morphology. We benchmark CNN and transformer models on three clinically relevant tasks: diagnosis (benign vs. malignant), BI-RADS classification, and density estimation. Two-view models consistently outperform single-view models. EfficientNet-B0 achieves an AUC of 93.54% for diagnosis, while Swin-T achieves the best macro-AUC of 89.43% for density prediction. Harmonization improves performance across architectures and produces more localized Grad-CAM responses. Overall, LUMINA provides (1) a vendor-diverse benchmark and (2) a model-agnostic harmonization framework for reliable and deployable mammography AI.

cross A Single-Sample Polylogarithmic Regret Bound for Nonstationary Online Linear Programming

Authors: Haoran Xu, Owen Shen, Peter Glynn, Yinyu Ye, Patrick Jaillet

Abstract: We study nonstationary Online Linear Programming (OLP), where $n$ orders arrive sequentially with reward-resource consumption pairs that form a sequence of independent, but not necessarily identically distributed, random vectors. At the beginning of the planning horizon, the decision-maker is provided with a resource endowment that is sufficient to fulfill a significant portion of the requests. The decision-maker seeks to maximize the expected total reward by making immediate and irrevocable acceptance or rejection decisions for each order, subject to this resource endowment. We focus on the challenging single-sample setting, where only one sample from each of the $n$ distributions is available at the start of the planning horizon. We propose a novel re-solving algorithm that integrates a dynamic programming perspective with the dual-based frameworks traditionally employed in stationary environments. In the large-resource regime, where the resource endowment scales linearly with the number of orders, we prove that our algorithm achieves $O((\log n)^2)$ regret across a broad class of nonstationary distribution sequences. Our results demonstrate that polylogarithmic regret is attainable even under significant environmental shifts and minimal data availability, bridging the gap between stationary OLP and more volatile real-world resource allocation problems.

cross Robust Building Damage Detection in Cross-Disaster Settings Using Domain Adaptation

Authors: Asmae Mouradi, Shruti Kshirsagar

Abstract: Rapid structural damage assessment from remote sensing imagery is essential for timely disaster response. Within human-machine systems (HMS) for disaster management, automated damage detection provides decision-makers with actionable situational awareness. However, models trained on multi-disaster benchmarks often underperform in unseen geographic regions due to domain shift - a distributional mismatch between training and deployment data that undermines human trust in automated assessments. We explore a two-stage ensemble approach using supervised domain adaptation (SDA) for building damage classification across four severity classes. The pipeline adapts the xView2 first-place method to the Ida-BD dataset using SDA and systematically investigates the effect of individual augmentation components on classification performance. Comprehensive ablation experiments on the unseen Ida-BD test split demonstrate that SDA is indispensable: removing it causes damage detection to fail entirely. Our pipeline achieves the most robust performance using SDA with unsharp-enhanced RGB input, attaining a Macro-F1 of 0.5552. These results underscore the critical role of domain adaptation in building trustworthy automated damage assessment modules for HMS-integrated disaster response.

cross Scaling Autoregressive Models for Lattice Thermodynamics

Authors: Xiaochen Du, Juno Nam, Sulin Liu, Rafael G\'omez-Bombarelli

Abstract: Predicting how materials behave under realistic conditions requires understanding the statistical distribution of atomic configurations on crystal lattices, a problem central to alloy design, catalysis, and the study of phase transitions. Traditional Markov-chain Monte Carlo sampling suffers from slow convergence and critical slowing down near phase transitions, motivating the use of generative models that directly learn the thermodynamic distribution. Existing autoregressive models (ARMs), however, generate configurations in a fixed sequential order and incur high memory and training costs, limiting their applicability to realistic systems. Here, we develop a framework combining any-order ARMs, which generate configurations flexibly by conditioning on any known subset of lattice sites, with marginalization models (MAMs), which approximate the probability of any partial configuration in a single forward pass and substantially reduce memory requirements. This combination enables models trained on smaller lattices to be reused for sampling larger systems, while supporting expressive Transformer architectures with lattice-aware positional encodings at manageable computational cost. We demonstrate that Transformer-based any-order MAMs achieve more accurate free energies than multilayer perceptron-based ARMs on both the two-dimensional Ising model and CuAu alloys, faithfully capturing phase transitions and critical behavior. Overall, our framework scales from $10 \times 10$ to $20 \times 20$ Ising systems and from $2 \times 2 \times 4$ to $4 \times 4 \times 8$ CuAu supercells at reduced computational cost compared to conventional sampling methods.

cross Design Space of Self--Consistent Electrostatic Machine Learning Interatomic Potentials

Authors: William J. Baldwin, Ilyes Batatia, Martin Vondr\'ak, Johannes T. Margraf, G\'abor Cs\'anyi

Abstract: Machine learning interatomic potentials (MLIPs) have become widely used tools in atomistic simulations. For much of the history of this field, the most commonly employed architectures were based on short-ranged atomic energy contributions, and the assumption of locality still persists in many modern foundation models. While this approach has enabled efficient and accurate modelling for many use cases, it poses intrinsic limitations for systems where long-range electrostatics, charge transfer, or induced polarization play a central role. A growing body of work has proposed extensions that incorporate electrostatic effects, ranging from locally predicted atomic charges to self-consistent models. While these models have demonstrated success for specific examples, their underlying assumptions, and fundamental limitations are not yet well understood. In this work, we present a framework for treating electrostatics in MLIPs by viewing existing models as coarse-grained approximations to density functional theory (DFT). This perspective makes explicit the approximations involved, clarifies the physical meaning of the learned quantities, and reveals connections and equivalences between several previously proposed models. Using this formalism, we identify key design choices that define a broader design space of self-consistent electrostatic MLIPs. We implement salient points in this space using the MACE architecture and a shared representation of the charge density, enabling controlled comparisons between different approaches. Finally, we evaluate these models on two instructive test cases: metal-water interfaces, which probe the contrasting electrostatic response of conducting and insulating systems, and charged vacancies in silicon dioxide. Our results highlight the limitations of existing approaches and demonstrate how more expressive self-consistent models are needed to resolve failures.

cross AdapterTune: Zero-Initialized Low-Rank Adapters for Frozen Vision Transformers

Authors: Salim Khazem

Abstract: Frozen-backbone transfer with Vision Transformers faces two under-addressed issues: optimization instability when adapters are naively inserted into a fixed feature extractor, and the absence of principled guidance for setting adapter capacity. We introduce AdapterTune, which augments each transformer block with a residual low-rank bottleneck whose up-projection is zero-initialized, guaranteeing that the adapted network starts exactly at the pretrained function and eliminates early-epoch representation drift. On the analytical side, we formalize adapter rank as a capacity budget for approximating downstream task shifts in feature space. The resulting excess-risk decomposition predicts monotonic but diminishing accuracy gains with increasing rank, an ``elbow'' behavior we confirm through controlled sweeps. We evaluate on 9 datasets and 3 backbone scales with multi-seed reporting throughout. On a core 5 dataset transfer suite, AdapterTune improves top-1 accuracy over head-only transfer by +14.9 points on average while training only 0.92 of the parameters required by full fine-tuning, and outperforms full fine-tuning on 10 of 15 dataset-backbone pairs. Across the full benchmark, AdapterTune improves over head-only transfer on every dataset-backbone pair tested. Ablations on rank, placement, and initialization isolate each design choice. The code is available at: https://github.com/salimkhazem/adaptertune

URLs: https://github.com/salimkhazem/adaptertune

cross Towards Next-Generation LLM Training: From the Data-Centric Perspective

Authors: Hao Liang, Zhengyang Zhao, Zhaoyang Han, Meiyi Qiang, Xiaochen Ma, Bohan Zeng, Qifeng Cai, Zhiyu Li, Linpeng Tang, Weinan E, Wentao Zhang

Abstract: Large language models (LLMs) have demonstrated remarkable performance across a wide range of tasks and domains, with data playing a central role in enabling these advances. Despite this success, the preparation and effective utilization of the massive datasets required for LLM training remain major bottlenecks. In current practice, LLM training data is often constructed using ad hoc scripts, and there is still a lack of mature, agent-based data preparation systems that can automatically construct robust and reusable data workflows, thereby freeing data scientists from repetitive and error-prone engineering efforts. Moreover, once collected, datasets are often consumed largely in their entirety during training, without systematic mechanisms for data selection, mixture optimization, or reweighting. To address these limitations, we advocate two complementary research directions. First, we propose building a robust, agent-based automatic data preparation system that supports automated workflow construction and scalable data management. Second, we argue for a unified data-model interaction training system in which data is dynamically selected, mixed, and reweighted throughout the training process, enabling more efficient, adaptive, and performance-aware data utilization. Finally, we discuss the remaining challenges and outline promising directions for future research and system development.

cross Gauge-Equivariant Intrinsic Neural Operators for Geometry-Consistent Learning of Elliptic PDE Maps

Authors: Pengcheng Cheng

Abstract: Learning solution operators of partial differential equations (PDEs) from data has emerged as a promising route to fast surrogate models in multi-query scientific workflows. However, for geometric PDEs whose inputs and outputs transform under changes of local frame (gauge), many existing operator-learning architectures remain representation-dependent, brittle under metric perturbations, and sensitive to discretization changes. We propose Gauge-Equivariant Intrinsic Neural Operators (GINO), a class of neural operators that parameterize elliptic solution maps primarily through intrinsic spectral multipliers acting on geometry-dependent spectra, coupled with gauge-equivariant nonlinearities. This design decouples geometry from learnable functional dependence and enforces consistency under frame transformations. We validate GINO on controlled problems on the flat torus ($\mathbb{T}^2$), where ground-truth resolvent operators and regularized Helmholtz--Hodge decompositions admit closed-form Fourier representations, enabling theory-aligned diagnostics. Across experiments E1--E6, GINO achieves low operator-approximation error, near machine-precision gauge equivariance, robustness to structured metric perturbations, strong cross-resolution generalization with small commutation error under restriction/prolongation, and structure-preserving performance on a regularized exact/coexact decomposition task. Ablations further link the smoothness of the learned spectral multiplier to stability under geometric perturbations. These results suggest that enforcing intrinsic structure and gauge equivariance yields operator surrogates that are more geometry-consistent and discretization-robust for elliptic PDEs on form-valued fields.

cross Online Learning for Supervisory Switching Control

Authors: Haoyuan Sun, Ali Jadbabaie

Abstract: We study supervisory switching control for partially-observed linear dynamical systems. The objective is to identify and deploy the best controller for the unknown system by periodically selecting among a collection of $N$ candidate controllers, some of which may destabilize the underlying system. While classical estimator-based supervisory control guarantees asymptotic stability, it lacks quantitative finite-time performance bounds. Conversely, current non-asymptotic methods in both online learning and system identification require restrictive assumptions that are incompatible in a control setting, such as system stability, which preclude testing potentially unstable controllers. To bridge this gap, we propose a novel, non-asymptotic analysis of supervisory control that adapts multi-armed bandit algorithms to address these control-theoretic challenges. Our data-driven algorithm evaluates candidate controllers via scoring criteria that leverage system observability to isolate the effects of historical states, enabling both detection of destabilizing controllers and accurate system identification. We present two algorithmic variants with dimension-free, finite-time guarantees, where each identifies the most suitable controller in $\mathcal{O}(N \log N)$ steps, while simultaneously achieving finite $L_2$-gain with respect to system disturbances.

cross Topology-Preserving Data Augmentation for Ring-Type Polygon Annotations

Authors: Sudip Laudari, Sang Hun Baek

Abstract: Geometric data augmentation is widely used in segmentation pipelines and typically assumes that polygon annotations represent simply connected regions. However, in structured domains such as architectural floorplan analysis, ring-type regions are often encoded as a single cyclic polygon chain connecting outer and inner boundaries. During augmentation, clipping operations may remove intermediate vertices and disrupt this cyclic connectivity, breaking the structural relationship between the boundaries. In this work, we introduce an order-preserving polygon augmentation strategy that performs transformations in mask space and then projects surviving vertices back into index-space to restore adjacency relations. This repair maintains the original traversal order of the polygon and preserves topological consistency with minimal computational overhead. Experiments demonstrate that the approach reliably restores connectivity, achieving near-perfect Cyclic Adjacency Preservation (CAP) across both single and compound augmentations.

cross Investigating the Impact of Speech Enhancement on Audio Deepfake Detection in Noisy Environments

Authors: Anacin, Angela, Shruti Kshirsagar, Anderson R. Avila

Abstract: Logical Access (LA) attacks, also known as audio deepfake attacks, use Text-to-Speech (TTS) or Voice Conversion (VC) methods to generate spoofed speech data. This can represent a serious threat to Automatic Speaker Verification (ASV) systems, as intruders can use such attacks to bypass voice biometric security. In this study, we investigate the correlation between speech quality and the performance of audio spoofing detection systems (i.e., LA task). For that, the performance of two enhancement algorithms is evaluated based on two perceptual speech quality measures, namely Perceptual Evaluation of Speech Quality (PESQ) and Speech-to-Reverberation Modulation Ratio (SRMR), and in respect to their impact on the audio spoofing detection system. We adopted the LA dataset, provided in the ASVspoof 2019 Challenge, and corrupted its test set with different Signal-to-Noise Ratio (SNR) levels, while leaving the training data untouched. Enhancement was applied to attenuate the detrimental effects of noisy speech, and the performances of two models, Speech Enhancement Generative Adversarial Network (SEGAN) and Metric-Optimized Generative Adversarial Network Plus (MetricGAN+), were compared. Although we expect that speech quality will correlate well with speech applications' performance, it can also have as a side effect on downstream tasks if unwanted artifacts are introduced or relevant information is removed from the speech signal. Our results corroborate with this hypothesis, as we found that the enhancement algorithm leading to the highest speech quality scores, MetricGAN+, provided the lowest Equal Error Rate (EER) on the audio spoofing detection task, whereas the enhancement method with the lowest speech quality scores, SEGAN, led to the lowest EER, thus leading to better performance on the LA task.

cross Face-to-Face: A Video Dataset for Multi-Person Interaction Modeling

Authors: Ernie Chu, Vishal M. Patel

Abstract: Modeling the reactive tempo of human conversation remains difficult because most audio-visual datasets portray isolated speakers delivering short monologues. We introduce \textbf{Face-to-Face with Jimmy Fallon (F2F-JF)}, a 70-hour, 14k-clip dataset of two-person talk-show exchanges that preserves the sequential dependency between a guest turn and the host's response. A semi-automatic pipeline combines multi-person tracking, speech diarization, and lightweight human verification to extract temporally aligned host/guest tracks with tight crops and metadata that are ready for downstream modeling. We showcase the dataset with a reactive, speech-driven digital avatar task in which the host video during $[t_1,t_2]$ is generated from their audio plus the guest's preceding video during $[t_0,t_1]$. Conditioning a MultiTalk-style diffusion model on this cross-person visual context yields small but consistent Emotion-FID and FVD gains while preserving lip-sync quality relative to an audio-only baseline. The dataset, preprocessing recipe, and baseline together provide an end-to-end blueprint for studying dyadic, sequential behavior, which we expand upon throughout the paper. Dataset and code will be made publicly available.

cross Preconditioned One-Step Generative Modeling for Bayesian Inverse Problems in Function Spaces

Authors: Zilan Cheng, Li-Lian Wang, Zhongjian Wang

Abstract: We propose a machine-learning algorithm for Bayesian inverse problems in the function-space regime based on one-step generative transport. Building on the Mean Flows, we learn a fully conditional amortized sampler with a neural-operator backbone that maps a reference Gaussian noise to approximate posterior samples. We show that while white-noise references may be admissible at fixed discretization, they become incompatible with the function-space limit, leading to instability in inference for Bayesian problems arising from PDEs. To address this issue, we adopt a prior-aligned anisotropic Gaussian reference distribution and establish the Lipschitz regularity of the resulting transport. Our method is not distilled from MCMC: training relies only on prior samples and simulated partial and noisy observations. Once trained, it generates a $64\times64$ posterior sample in $\sim 10^{-3}$s, avoiding the repeated PDE solves of MCMC while matching key posterior summaries.

cross Fold-CP: A Context Parallelism Framework for Biomolecular Modeling

Authors: Dejun Lin, Simon Chu, Vishanth Iyer, Youhan Lee, John St John, Kevin Boyd, Brian Roland, Xiaowei Ren, Guoqing Zhou, Zhonglin Cao, Polina Binder, Yuliya Zhautouskaya, Jakub Zakrzewski, Maximilian Stadler, Kyle Gion, Yuxing Peng, Xi Chen, Tianjing Zhang, Philipp Junk, Michelle Dimon, Pawe{\l} Gniewek, Fabian Ortega, McKinley Polen, Ivan Grubisic, Ali Bashir, Graham Holt, Danny Kovtun, Matthias Grass, Luca Naef, Rui Wang, Jian Peng, Anthony Costa, Saee Paliwal, Eddie Calleja, Timur Rvachov, Neha Tadimeti, Roy Tal, Emine Kucukbenli

Abstract: Understanding cellular machinery requires atomic-scale reconstruction of large biomolecular assemblies. However, predicting the structures of these systems has been constrained by hardware memory requirements of models like AlphaFold 3, imposing a practical ceiling of a few thousand residues that can be processed on a single GPU. Here we present NVIDIA BioNeMo Fold-CP, a context parallelism framework that overcomes this barrier by distributing the inference and training pipelines of co-folding models across multiple GPUs. We use the Boltz models as open source reference architectures and implement custom multidimensional primitives that efficiently parallelize both the dense triangular updates and the irregular, data-dependent pattern of window-batched local attention. Our approach achieves efficient memory scaling; for an N-token input distributed across P GPUs, per-device memory scales as $O(N^2/P)$, enabling the structure prediction of assemblies exceeding 30,000 residues on 64 NVIDIA B300 GPUs. We demonstrate the scientific utility of this approach through successful developer use cases: Fold-CP enabled the scoring of over 90% of Comprehensive Resource of Mammalian protein complexes (CORUM) database, as well as folding of disease-relevant PI4KA lipid kinase complex bound to an intrinsically disordered region without cropping. By providing a scalable pathway for modeling massive systems with full global context, Fold-CP represents a significant step toward the realization of a virtual cell.

cross SimCert: Probabilistic Certification for Behavioral Similarity in Deep Neural Network Compression

Authors: Jingyang Li, Fu Song, Guoqiang Li

Abstract: Deploying Deep Neural Networks (DNNs) on resource-constrained embedded systems requires aggressive model compression techniques like quantization and pruning. However, ensuring that the compressed model preserves the behavioral fidelity of the original design is a critical challenge in the safety-critical system design flow. Existing verification methods often lack scalability or fail to handle the architectural heterogeneity introduced by pruning. In this work, we propose SimCert, a probabilistic certification framework for verifying the behavioral similarity of compressed neural networks. Unlike worst-case analysis, SimCert provides quantitative safety guarantees with adjustable confidence levels. Our framework features: (1) A dual-network symbolic propagation method supporting both quantization and pruning; (2) A variance-aware bounding technique using Bernstein's inequality to tighten safety certificates; and (3) An automated verification toolchain. Experimental results on ACAS Xu and computer vision benchmarks demonstrate that SimCert outperforms state-of-the-art baselines.

cross Neural Networks as Local-to-Global Computations

Authors: Vicente Bosca, Robert Ghrist

Abstract: We construct a cellular sheaf from any feedforward ReLU neural network by placing one vertex for each intermediate quantity in the forward pass and encoding each computational step - affine transformation, activation, output - as a restriction map on an edge. The restricted coboundary operator on the free coordinates is unitriangular, so its determinant is $1$ and the restricted Laplacian is positive definite for every activation pattern. It follows that the relative cohomology vanishes and the forward pass output is the unique harmonic extension of the boundary data. The sheaf heat equation converges exponentially to this output despite the state-dependent switching introduced by piecewise linear activations. Unlike the forward pass, the heat equation propagates information bidirectionally across layers, enabling pinned neurons that impose constraints in both directions, training through local discrepancy minimization without a backward pass, and per-edge diagnostics that decompose network behavior by layer and operation type. We validate the framework experimentally on small synthetic tasks, confirming the convergence theorems and demonstrating that sheaf-based training, while not yet competitive with stochastic gradient descent, obeys quantitative scaling laws predicted by the theory.

cross Halfway to 3D: Ensembling 2.5D and 3D Models for Robust COVID-19 CT Diagnosis

Authors: Tuan-Anh Yang, Bao V. Q. Bui, Chanh-Quang Vo-Van, Truong-Son Hy

Abstract: We propose a deep learning framework for COVID-19 detection and disease classification from chest CT scans that integrates both 2.5D and 3D representations to capture complementary slice-level and volumetric information. The 2.5D branch processes multi-view CT slices (axial, coronal, sagittal) using a DINOv3 vision transformer to extract robust visual features, while the 3D branch employs a ResNet-18 architecture to model volumetric context and is pretrained with Variance Risk Extrapolation (VREx) followed by supervised contrastive learning to improve cross-source robustness. Predictions from both branches are combined through logit-level ensemble inference. Experiments on the PHAROS-AIF-MIH benchmark demonstrate the effectiveness of the proposed approach: for binary COVID-19 detection, the ensemble achieves 94.48% accuracy and a 0.9426 Macro F1-score, outperforming both individual models, while for multi-class disease classification the 2.5D DINOv3 model achieves the best performance with 79.35% accuracy and a 0.7497 Macro F1-score. These results highlight the benefit of combining pretrained slice-based representations with volumetric modeling for robust multi-source medical imaging analysis. Code is available at https://github.com/HySonLab/PHAROS-AIF-MIH

URLs: https://github.com/HySonLab/PHAROS-AIF-MIH

cross A Score Filter Enhanced Data Assimilation Framework for Data-Driven Dynamical Systems

Authors: Jingqiao Tang, Ryan Bausback, Feng Bao, Guannan Zhang, Phuoc-Toan Huynh

Abstract: We introduce a score-filter-enhanced data assimilation framework designed to reduce predictive uncertainty in machine learning (ML) models for data-driven dynamical system forecasting. Machine learning serves as an efficient numerical model for predicting dynamical systems. However, even with sufficient data, model uncertainty remains and accumulates over time, causing the long-term performance of ML models to deteriorate. To overcome this difficulty, we integrate data assimilation techniques into the training process to iteratively refine the model predictions by incorporating observational information. Specifically, we apply the Ensemble Score Filter (EnSF), a generative AI-based training-free diffusion model approach, for solving the data assimilation problem in high-dimensional nonlinear complex systems. This leads to a hybrid data assimilation-training framework that combines ML with EnSF to improve long-term predictive performance. We shall demonstrate that EnSF-enhanced ML can effectively reduce predictive uncertainty in ML-based Lorenz-96 system prediction and the Korteweg-De Vries (KdV) equation prediction.

cross Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness

Authors: Jingyu Lu, Yuhan Wang, Fan Zhuo, Xize Cheng, Changhao Pan, Xueyi Pu, Yifu Chen, Chenyuhao Wen, Tianle Liang, Zhou Zhao

Abstract: The rapid evolution of end-to-end spoken dialogue systems demands transcending mere textual semantics to incorporate paralinguistic nuances and the spontaneous nature of human conversation. However, current methods struggle with two critical gaps: the modality gap, involving prosody and emotion, and the colloquialness gap, distinguishing written scripts from natural speech. To address these challenges, we introduce SDiaReward, an end-to-end multi-turn reward model trained on SDiaReward-Dataset, a novel collection of episode-level preference pairs explicitly targeting these gaps. It operates directly on full multi-turn speech episodes and is optimized with pairwise preference supervision, enabling joint assessment of modality and colloquialness in a single evaluator. We further establish ESDR-Bench, a stratified benchmark for robust episode-level evaluation. Experiments demonstrate that SDiaReward achieves state-of-the-art pairwise preference accuracy, significantly outperforming general-purpose audio LLMs. Further analysis suggests that SDiaReward captures relative conversational expressiveness beyond superficial synthesis cues, improving generalization across domains and recording conditions. Code, data, and demos are available at https://sdiareward.github.io/.

URLs: https://sdiareward.github.io/.

cross Photonic Quantum-Enhanced Knowledge Distillation

Authors: Kuan-Cheng Chen, Shang Yu, Chen-Yu Liu, Samuel Yen-Chi Chen, Huan-Hsin Tseng, Yen Jui Chang, Wei-Hao Huang, Felix Burt, Esperanza Cuenca Gomez, Zohim Chandani, William Clements, Ian Walmsley, Kin K. Leung

Abstract: Photonic quantum processors naturally produce intrinsically stochastic measurement outcomes, offering a hardware-native source of structured randomness that can be exploited during machine-learning training. Here we introduce Photonic Quantum-Enhanced Knowledge Distillation (PQKD), a hybrid quantum photonic--classical framework in which a programmable photonic circuit generates a compact conditioning signal that constrains and guides a parameter-efficient student network during distillation from a high-capacity teacher. PQKD replaces fully trainable convolutional kernels with dictionary convolutions: each layer learns only a small set of shared spatial basis filters, while sample-dependent channel-mixing weights are derived from shot-limited photonic features and mapped through a fixed linear transform. Training alternates between standard gradient-based optimisation of the student and sampling-robust, gradient-free updates of photonic parameters, avoiding differentiation through photonic hardware. Across MNIST, Fashion-MNIST and CIFAR-10, PQKD traces a controllable compression--accuracy frontier, remaining close to teacher performance on simpler benchmarks under aggressive convolutional compression. Performance degrades predictably with finite sampling, consistent with shot-noise scaling, and exponential moving-average feature smoothing suppresses high-frequency shot-noise fluctuations, extending the practical operating regime at moderate shot budgets.

cross The impact of machine learning forecasting on strategic decision-making for Bike Sharing Systems

Authors: Enrico Angelelli, Andrea Mor, Carlotta Orsenigo, M. Grazia Speranza, Carlo Vercellis

Abstract: In this paper, machine learning techniques are used to forecast the difference between bike returns and withdrawals at each station of a bike sharing system. The forecasts are integrated into a simulation framework that is used to support long-term decisions and model the daily dynamics, including the relocation of bikes. We assess the quality of the machine learning-based forecasts in two ways. Firstly, we compare the forecasts with alternative prediction methods. Secondly, we analyze the impact of the forecasts on the quality of the output of the simulation framework. The evaluation is based on real-world data of the bike sharing system currently operating in Brescia, Italy.

cross Machine learning for sustainable geoenergy: uncertainty, physics and decision-ready inference

Authors: Hannah P. Menke, Ahmed H. Elsheikh, Lingli Wei, Nanzhe Wang, Andreas Busch

Abstract: Geoenergy projects (CO2 storage, geothermal, subsurface H2 generation/storage, critical minerals from subsurface fluids, or nuclear waste disposal) increasingly follow a petroleum-style funnel from screening and appraisal to operations, monitoring, and stewardship. Across this funnel, limited and heterogeneous observations must be turned into risk-bounded operational choices under strong physical and geological constraints - choices that control deployment rate, cost of capital, and the credibility of climate-mitigation claims. These choices are inherently multi-objective, balancing performance against containment, pressure footprint, induced seismicity, energy/water intensity, and long-term stewardship. We argue that progress is limited by four recurring bottlenecks: (i) scarce, biased labels and few field performance outcomes; (ii) uncertainty treated as an afterthought rather than the deliverable; (iii) weak scale-bridging from pore to basin (including coupled chemical-flow-geomechanics); and (iv) insufficient quality assurance (QA), auditability, and governance for regulator-facing deployment. We outline machine learning (ML) approaches that match these realities (hybrid physics-ML, probabilistic uncertainty quantification (UQ), structure-aware representations, and multi-fidelity/continual learning) and connect them to four anchor applications: imaging-to-process digital twins, multiphase flow and near-well conformance, monitoring and inverse problems (monitoring, measurement, and verification (MMV), including deformation and microseismicity), and basin-scale portfolio management. We close with a pragmatic agenda for benchmarks, validation, reporting standards, and policy support needed for reproducible and defensible ML in sustainable geoenergy.

cross Spectrogram features for audio and speech analysis

Authors: Ian McLoughlin, Lam Pham, Yan Song, Xiaoxiao Miao, Huy Phan, Pengfei Cai, Qing Gu, Jiang Nan, Haoyu Song, Donny Soh

Abstract: Spectrogram-based representations have grown to dominate the feature space for deep learning audio analysis systems, and are often adopted for speech analysis also. Initially, the primary motivator for spectrogram-based representations was their ability to present sound as a two dimensional signal in the time-frequency plane, which not only provides an interpretable physical basis for analysing sound, but also unlocks the use of a wide range of machine learning techniques such as convolutional neural networks, that had been developed for image processing. A spectrogram is a matrix characterised by the resolution and span of its two dimensions, as well as by the representation and scaling of each element. Many possibilities for these three characteristics have been explored by researchers across numerous application areas, with different settings showing affinity for various tasks. This paper reviews the use of spectrogram-based representations and surveys the state-of-the-art to question how front-end feature representation choice allies with back-end classifier architecture for different tasks.

cross Bayesian Inference for Missing Physics

Authors: Arno Strouwen

Abstract: Model-based approaches for (bio)process systems often suffer from incomplete knowledge of the underlying physical, chemical, or biological laws. Universal differential equations, which embed neural networks within differential equations, have emerged as powerful tools to learn this missing physics from experimental data. However, neural networks are inherently opaque, motivating their post-processing via symbolic regression to obtain interpretable mathematical expressions. Genetic algorithm-based symbolic regression is a popular approach for this post-processing step, but provides only point estimates and cannot quantify the confidence we should place in a discovered equation. We address this limitation by applying Bayesian symbolic regression, which uses Reversible Jump Markov Chain Monte Carlo to sample from the posterior distribution over symbolic expression trees. This approach naturally quantifies uncertainty in the recovered model structure. We demonstrate the methodology on a Lotka-Volterra predator-prey system and then show how a well-designed experiment leads to lower uncertainty in a fed-batch bioreactor case study.

cross Masked BRep Autoencoder via Hierarchical Graph Transformer

Authors: Yifei Li, Kang Wu, Wenming Wu, Xiao-Ming Fu

Abstract: We introduce a novel self-supervised learning framework that automatically learns representations from input computer-aided design (CAD) models for downstream tasks, including part classification, modeling segmentation, and machining feature recognition. To train our network, we construct a large-scale, unlabeled dataset of boundary representation (BRep) models. The success of our algorithm relies on two keycomponents. The first is a masked graph autoencoder that reconstructs randomly masked geometries and attributes of BReps for representation learning to enhance the generalization. The second is a hierarchical graph Transformer architecture that elegantly fuses global and local learning by a cross-scale mutual attention block to model long-range geometric dependencies and a graph neural network block to aggregate local topological information. After training the autoencoder, we replace its decoder with a task-specific network trained on a small amount of labeled data for downstream tasks. We conduct experiments on various tasks and achieve high performance, even with a small amount of labeled data, demonstrating the practicality and generalizability of our model. Compared to other methods, our model performs significantly better on downstream tasks with the same amount of training data, particularly when the training data is very limited.

cross Intelligent Control of Differential Drive Robots Subject to Unmodeled Dynamics with EKF-based State Estimation

Authors: Amos Alwala, Yuchen Hu, Gabriel da Silva Lima, Wallace Moreira Bessa

Abstract: Reliable control and state estimation of differential drive robots (DDR) operating in dynamic and uncertain environments remains a challenge, particularly when system dynamics are partially unknown and sensor measurements are prone to degradation. This work introduces a unified control and state estimation framework that combines a Lyapunov-based nonlinear controller and Adaptive Neural Networks (ANN) with Extended Kalman Filter (EKF)-based multi-sensor fusion. The proposed controller leverages the universal approximation property of neural networks to model unknown nonlinearities in real time. An online adaptation scheme updates the weights of the radial basis function (RBF), the architecture chosen for the ANN. The learned dynamics are integrated into a feedback linearization (FBL) control law, for which theoretical guarantees of closed-loop stability and asymptotic convergence in a trajectory-tracking task are established through a Lyapunov-like stability analysis. To ensure robust state estimation, the EKF fuses inertial measurement unit (IMU) and odometry from monocular, 2D-LiDAR and wheel encoders. The fused state estimate drives the intelligent controller, ensuring consistent performance even under drift, wheel slip, sensor noise and failure. Gazebo simulations and real-world experiments are done using DDR, demonstrating the effectiveness of the approach in terms of improved velocity tracking performance with reduction in linear and angular velocity errors up to $53.91\%$ and $29.0\%$ in comparison to the baseline FBL.

cross Empowering Chemical Structures with Biological Insights for Scalable Phenotypic Virtual Screening

Authors: Xiaoqing Lian, Pengsen Ma, Tengfeng Ma, Zhonghao Ren, Xibao Cai, Zhixiang Cheng, Bosheng Song, He Wang, Xiang Pan, Yangyang Chen, Sisi Yuan, Chen Lin

Abstract: Motivation: The scalable identification of bioactive compounds is essential for contemporary drug discovery. This process faces a key trade-off: structural screening offers scalability but lacks biological context, whereas high-content phenotypic profiling provides deep biological insights but is resource-intensive. The primary challenge is to extract robust biological signals from noisy data and encode them into representations that do not require biological data at inference. Results: This study presents DECODE (DEcomposing Cellular Observations of Drug Effects), a framework that bridges this gap by empowering chemical representations with intrinsic biological semantics to enable structure-based in silico biological profiling. DECODE leverages limited paired transcriptomic and morphological data as supervisory signals during training, enabling the extraction of a measurement-invariant biological fingerprint from chemical structures and explicit filtering of experimental noise. Our evaluations demonstrate that DECODE retrieves functionally similar drugs in zero-shot settings with over 20% relative improvement over chemical baselines in mechanism-of-action (MOA) prediction. Furthermore, the framework achieves a 6-fold increase in hit rates for novel anti-cancer agents during external validation. Availability and implementation: The codes and datasets of DECODE are available at https://github.com/lian-xiao/DECODE.

URLs: https://github.com/lian-xiao/DECODE.

cross Consequentialist Objectives and Catastrophe

Authors: Henrik Marklund, Alex Infanger, Benjamin Van Roy

Abstract: Because human preferences are too complex to codify, AIs operate with misspecified objectives. Optimizing such objectives often produces undesirable outcomes; this phenomenon is known as reward hacking. Such outcomes are not necessarily catastrophic. Indeed, most examples of reward hacking in previous literature are benign. And typically, objectives can be modified to resolve the issue. We study the prospect of catastrophic outcomes induced by AIs operating in complex environments. We argue that, when capabilities are sufficiently advanced, pursuing a fixed consequentialist objective tends to result in catastrophic outcomes. We formalize this by establishing conditions that provably lead to such outcomes. Under these conditions, simple or random behavior is safe. Catastrophic risk arises due to extraordinary competence rather than incompetence. With a fixed consequentialist objective, avoiding catastrophe requires constraining AI capabilities. In fact, constraining capabilities the right amount not only averts catastrophe but yields valuable outcomes. Our results apply to any objective produced by modern industrial AI development pipelines.

cross Training-free Detection of Generated Videos via Spatial-Temporal Likelihoods

Authors: Omer Ben Hayun, Roy Betser, Meir Yossef Levi, Levi Kassel, Guy Gilboa

Abstract: Following major advances in text and image generation, the video domain has surged, producing highly realistic and controllable sequences. Along with this progress, these models also raise serious concerns about misinformation, making reliable detection of synthetic videos increasingly crucial. Image-based detectors are fundamentally limited because they operate per frame and ignore temporal dynamics, while supervised video detectors generalize poorly to unseen generators, a critical drawback given the rapid emergence of new models. These challenges motivate zero-shot approaches, which avoid synthetic data and instead score content against real-data statistics, enabling training-free, model-agnostic detection. We introduce \emph{STALL}, a simple, training-free, theoretically justified detector that provides likelihood-based scoring for videos, jointly modeling spatial and temporal evidence within a probabilistic framework. We evaluate STALL on two public benchmarks and introduce ComGenVid, a new benchmark with state-of-the-art generative models. STALL consistently outperforms prior image- and video-based baselines. Code and data are available at https://omerbenhayun.github.io/stall-video.

URLs: https://omerbenhayun.github.io/stall-video.

cross Interpretable Predictability-Based AI Text Detection: A Replication Study

Authors: Adam Skurla, Dominik Macko, Jakub Simko

Abstract: This paper replicates and extends the system used in the AuTexTification 2023 shared task for authorship attribution of machine-generated texts. First, we tried to reproduce the original results. Exact replication was not possible because of differences in data splits, model availability, and implementation details. Next, we tested newer multilingual language models and added 26 document-level stylometric features. We also applied SHAP analysis to examine which features influence the model's decisions. We replaced the original GPT-2 models with newer generative models such as Qwen and mGPT for computing probabilistic features. For contextual representations, we used mDeBERTa-v3-base and applied the same configuration to both English and Spanish. This allowed us to use one shared configuration for Subtask 1 and Subtask 2. Our experiments show that the additional stylometric features improve performance in both tasks and both languages. The multilingual configuration achieves the results that are comparable to or better than language-specific models. The study also shows that clear documentation is important for reliable replication and fair comparison of systems.

cross A convolutional autoencoder and neural ODE framework for surrogate modeling of transient counterflow flames

Authors: Mert Yakup Baykan, Weitao Liu, Thorsten Zirwes, Andreas Kronenburg, Hong G. Im, Dong-hyuk Shin

Abstract: A novel convolutional autoencoder neural ODE (CAE-NODE) framework is proposed for a reduced-order model (ROM) of transient 2D counterflow flames, as an extension of AE-NODE methods in homogeneous reactive systems to spatially resolved flows. The spatial correlations of the multidimensional fields are extracted by the convolutional layers, allowing CAE to autonomously construct a physically consistent 6D continuous latent manifold by compressing high-fidelity 2D snapshots (256x256 grid, 21 variables) by over 100,000 times. The NODE is subsequently trained to describe the continuous-time dynamics on the non-linear manifold, enabling the prediction of the full temporal evolution of the flames by integrating forward in time from an initial condition. The results demonstrate that the network can accurately capture the entire transient process, including ignition, flame propagation, and the gradual transition to a non-premixed condition, with relative errors less than ~2% for major species. This study, for the first time, highlights the potential of CAE-NODE for surrogate modeling of unsteady dynamics of multi-dimensional reacting flows.

cross Prompt Readiness Levels (PRL): a maturity scale and scoring framework for production grade prompt assets

Authors: Sebastien Guinard (Univ. Grenoble Alpes, CEA, DRT F-38000 Grenoble)

Abstract: Prompt engineering has become a production critical component of generative AI systems. However, organizations still lack a shared, auditable method to qualify prompt assets against operational objectives, safety constraints, and compliance requirements. This paper introduces Prompt Readiness Levels (PRL), a nine level maturity scale inspired by TRL, and the Prompt Readiness Score (PRS), a multidimensional scoring method with gating thresholds designed to prevent weak link failure modes. PRL/PRS provide an original, structured and methodological framework for governing prompt assets specification, testing, traceability, security evaluation, and deployment readiness enabling valuation of prompt engineering through reproducible qualification decisions across teams and industries.

cross Thinking in Latents: Adaptive Anchor Refinement for Implicit Reasoning in LLMs

Authors: Disha Sheshanarayana, Rajat Subhra Pal, Manjira Sinha, Tirthankar Dasgupta

Abstract: Token-level Chain-of-Thought (CoT) prompting has become a standard way to elicit multi-step reasoning in large language models (LLMs), especially for mathematical word problems. However, generating long intermediate traces increases output length and inference cost, and can be inefficient when the model could arrive at the correct answer without extensive verbalization. This has motivated latent-space reasoning approaches that shift computation into hidden representations and only emit a final answer. Yet, many latent reasoning methods depend on a fixed number of latent refinement steps at inference, adding another hyperparameter that must be tuned across models and datasets to balance accuracy and efficiency. We introduce AdaAnchor, a latent reasoning framework that performs silent iterative computation by refining a set of latent anchor vectors attached to the input. AdaAnchor further incorporates an adaptive halting mechanism that monitors anchor stability across iterations and terminates refinement once the anchor dynamics converge, allocating fewer steps to easier instances while reserving additional refinement steps for harder ones under a shared maximum-step budget. Our empirical evaluation across three mathematical word-problem benchmarks shows that AdaAnchor with adaptive halting yields accuracy gains of up to 5% over fixed-step latent refinement while reducing average latent refinement steps by 48-60% under the same maximum-step budget. Compared to standard reasoning baselines, AdaAnchor achieves large reductions in generated tokens (92-93%) by moving computation into silent latent refinement, offering a different accuracy-efficiency trade-off with substantially lower output-token usage.

cross Spatio-temporal probabilistic forecast using MMAF-guided learning

Authors: Leonardo Bardi, Imma Valentina Curato, Lorenzo Proietti

Abstract: We employ stochastic feed-forward neural networks with Gaussian-distributed weights to determine a probabilistic forecast for spatio-temporal raster datasets. The networks are trained using MMAF-guided learning, a generalized Bayesian methodology in which the observed data are preprocessed using an embedding designed to produce a low-dimensional representation that captures their dependence and causal structure. The design of the embedding is theory-guided by the assumption that a spatio-temporal Ornstein-Uhlenbeck process with finite second-order moments generates the observed data. The trained networks, in inference mode, are then used to generate ensemble forecasts by applying different initial conditions at different horizons. Experiments conducted on both synthetic and real data demonstrate that our forecasts remain calibrated across multiple time horizons. Moreover, we show that on such data, simple feed-forward architectures can achieve performance comparable to, and in some cases better than, convolutional or diffusion deep learning architectures used in probabilistic forecasting tasks.

cross Analyzing Error Sources in Global Feature Effect Estimation

Authors: Timo Hei{\ss}, Coco B\"ogel, Bernd Bischl, Giuseppe Casalicchio

Abstract: Global feature effects such as partial dependence (PD) and accumulated local effects (ALE) plots are widely used to interpret black-box models. However, they are only estimates of true underlying effects, and their reliability depends on multiple sources of error. Despite the popularity of global feature effects, these error sources are largely unexplored. In particular, the practically relevant question of whether to use training or holdout data to estimate feature effects remains unanswered. We address this gap by providing a systematic, estimator-level analysis that disentangles sources of bias and variance for PD and ALE. To this end, we derive a mean-squared-error decomposition that separates model bias, estimation bias, model variance, and estimation variance, and analyze their dependence on model characteristics, data selection, and sample size. We validate our theoretical findings through an extensive simulation study across multiple data-generating processes, learners, estimation strategies (training data, validation data, and cross-validation), and sample sizes. Our results reveal that, while using holdout data is theoretically the cleanest, potential biases arising from the training data are empirically negligible and dominated by the impact of the usually higher sample size. The estimation variance depends on both the presence of interactions and the sample size, with ALE being particularly sensitive to the latter. Cross-validation-based estimation is a promising approach that reduces the model variance component, particularly for overfitting models. Our analysis provides a principled explanation of the sources of error in feature effect estimates and offers concrete guidance on choosing estimation strategies when interpreting machine learning models.

cross Generative Semantic HARQ: Latent-Space Text Retransmission and Combining

Authors: Bin Han, Yulin Hu, Hans D. Schotten

Abstract: Semantic communication conveys meaning rather than raw bits, but reliability at the semantic level remains an open challenge. We propose a semantic-level hybrid automatic repeat request (HARQ) framework for text communication, in which a Transformer-variational autoencoder (VAE) codec operates as a lightweight overlay on the conventional protocol stack. The stochastic encoder inherently generates diverse latent representations across retransmissions-providing incremental knowledge (IK) from a single model without dedicated protocol design. On the receiver side, a soft quality estimator triggers retransmissions and a quality-aware combiner merges the received latent vectors within a consistent latent space. We systematically benchmark six semantic quality metrics and four soft combining strategies under hybrid semantic distortion that mixes systematic bias with additive noise. The results suggest combining Weighted-Average or MRC-Inspired combining with self-consistency-based HARQ triggering for the best performance.

cross Affordable Precision Agriculture: A Deployment-Oriented Review of Low-Cost, Low-Power Edge AI and TinyML for Resource-Constrained Farming Systems

Authors: Riya Samanta, Bidyut Saha

Abstract: Precision agriculture increasingly integrates artificial intelligence to enhance crop monitoring, irrigation management, and resource efficiency. Nevertheless, the vast majority of the current systems are still mostly cloud-based and require reliable connectivity, which hampers the adoption to smaller scale, smallholder farming and underdeveloped country systems. Using recent literature reviews, ranging from 2023 to 2026, this review covers deployments of Edge AI, focused on the evolution and acceptance of Tiny Machine Learning, in low-cost and low-powered agriculture. A hardware-targeted deployment-oriented study has shown pronounced variation in architecture with microcontroller-class platforms i.e. ESP32, STM32, ATMega dominating the inference options, in parallel with single-board computers and UAV-assisted solutions. Quantitative synthesis shows quantization is the dominant optimization strategy; the approach in many works identified: around 50% of such works are quantized, while structured pruning, multi-objective compression and hardware aware neural architecture search are relatively under-researched. Also, resource profiling practices are not uniform: while model size is occasionally reported, explicit flash, RAM, MAC, latency and millijoule level energy metrics are not well documented, hampering reproducibility and cross-system comparison. Moreoever, to bridge the gap between research prototypes and deployment-ready systems, the review also presents a literature-informed deployment perspective in the form of a privacy-preserving layered Edge AI architecture for agriculture, synthesizing the key system-level design insights emerging from the surveyed works. Overall, the findings demonstrate a clear architectural shift toward localized inference with centralized training asymmetry.

cross Trustworthy Koopman Operator Learning: Invariance Diagnostics and Error Bounds

Authors: Gustav Conradie, Nicolas Boull\'e, Jean-Christophe Loiseau, Steven L. Brunton, Matthew J. Colbrook

Abstract: Koopman operator theory provides a global linear representation of nonlinear dynamics and underpins many data-driven methods. In practice, however, finite-dimensional feature spaces induced by a user-chosen dictionary are rarely invariant, so closure failures and projection errors lead to spurious eigenvalues, misleading Koopman modes, and overconfident forecasts. This paper addresses a central validation problem in data-driven Koopman methods: how to quantify invariance and projection errors for an arbitrary feature space using only snapshot data, and how to use these diagnostics to produce actionable guarantees and guide dictionary refinement? A unified a posteriori methodology is developed for certifying when a Koopman approximation is trustworthy and improving it when it is not. Koopman invariance is quantified using principal angles between a subspace and its Koopman image, yielding principal observables and a principal angle decomposition (PAD), a dynamics-informed alternative to SVD truncation with significantly improved performance. Multi-step error bounds are derived for Koopman and Perron--Frobenius mode decompositions, including RKHS-based pointwise guarantees, and are complemented by Gaussian process expected error surrogates. The resulting toolbox enables validated spectral analysis, certified forecasting, and principled dictionary and kernel learning, demonstrated on chaotic and high-dimensional benchmarks and real-world datasets, including cavity flow and the Pluto--Charon system.

cross Storage and selection of multiple chaotic attractors in minimal reservoir computers

Authors: Francesco Martinuzzi, Holger Kantz

Abstract: Modern predictive modeling increasingly calls for a single learned dynamical substrate to operate across multiple regimes. From a dynamical-systems viewpoint, this capability decomposes into the storage of multiple attractors and the selection of the appropriate attractor in response to contextual cues. In reservoir computing (RC), multi-attractor learning has largely been pursued using large, randomly wired reservoirs, on the assumption that stochastic connectivity is required to generate sufficiently rich internal dynamics. At the same time, recent work shows that minimal deterministic reservoirs can match random designs for single-system chaotic forecasting. Under which conditions can minimal topologies learn multiple chaotic attractors? In this paper, we find that minimal architectures can successfully store multiple chaotic attractors. However, these same architectures struggle with task switching, in which the system must transition between attractors in response to external cues. We test storage and selection on all 28 unordered system pairs formed from eight three-dimensional chaotic systems. We do not observe a robust dependence of multi-attractor performance on reservoir topology. Over the ten topologies investigated, we find that no single one consistently outperforms the others for either storage or cue-dependent selection. Our results suggest that while minimal substrates possess the representational capacity to model coexisting attractors, they may lack the robust temporal memory required for cued transitions.

cross HindSight: Evaluating LLM-Generated Research Ideas via Future Impact

Authors: Bo Jiang

Abstract: Evaluating AI-generated research ideas typically relies on LLM judges or human panels -- both subjective and disconnected from actual research impact. We introduce HindSight, a time-split evaluation framework that measures idea quality by matching generated ideas against real future publications and scoring them by citation impact and venue acceptance. Using a temporal cutoff~$T$, we restrict an idea generation system to pre-$T$ literature, then evaluate its outputs against papers published in the subsequent 30 months. Experiments across 10 AI/ML research topics reveal a striking disconnect: LLM-as-Judge finds no significant difference between retrieval-augmented and vanilla idea generation ($p{=}0.584$), while HindSight shows the retrieval-augmented system produces 2.5$\times$ higher-scoring ideas ($p{<}0.001$). Moreover, HindSight scores are \emph{negatively} correlated with LLM-judged novelty ($\rho{=}{-}0.29$, $p{<}0.01$), suggesting that LLMs systematically overvalue novel-sounding ideas that never materialize in real research.

cross Sequential Transport for Causal Mediation Analysis

Authors: Agathe Fernandes-Machado, Iryna Voitsitska, Arthur Charpentier, Ewen Gallic

Abstract: We propose sequential transport (ST), a distributional framework for mediation analysis that combines optimal transport (OT) with a mediator directed acyclic graph (DAG). Instead of relying on cross-world counterfactual assumptions, ST constructs unit-level mediator counterfactuals by minimally transporting each mediator, either marginally or conditionally, toward its distribution under an alternative treatment while preserving the causal dependencies encoded by the DAG. For numerical mediators, ST uses monotone (conditional) OT maps based on conditional CDF/quantile estimators; for categorical mediators, it extends naturally via simplex-based transport. We establish consistency of the estimated transport maps and of the induced unit-level decompositions into mutatis mutandis direct and indirect effects under standard regularity and support conditions. When the treatment is randomized or ignorable (possibly conditional on covariates), these decompositions admit a causal interpretation; otherwise, they provide a principled distributional attribution of differences between groups aligned with the mediator structure. Gaussian examples show that ST recovers classical mediation formulas, while additional simulations confirm good performance in nonlinear and mixed-type settings. An application to the COMPAS dataset illustrates how ST yields deterministic, DAG-consistent counterfactual mediators and a fine-grained mediator-level attribution of disparities.

cross Token Coherence: Adapting MESI Cache Protocols to Minimize Synchronization Overhead in Multi-Agent LLM Systems

Authors: Vladyslav Parakhin

Abstract: Multi-agent LLM orchestration incurs synchronization costs scaling as O(n x S x |D|) in agents, steps, and artifact size under naive broadcast -- a regime I term broadcast-induced triply-multiplicative overhead. I argue this pathology is a structural residue of full-state rebroadcast, not an inherent property of multi-agent coordination. The central claim: synchronization cost explosion in LLM multi-agent systems maps with formal precision onto the cache coherence problem in shared-memory multiprocessors, and MESI-protocol invalidation transfers to artifact synchronization under minimal structural modification. I construct the Artifact Coherence System (ACS) and prove the Token Coherence Theorem: lazy invalidation attenuates cost by at least S/(n + W(d_i)) when S > n + W(d_i), converting O(n x S x |D|) to O((n + W) x |D|). A TLA+-verified protocol enforces single-writer safety, monotonic versioning, and bounded staleness across ~2,400 explored states. Simulation across four workload configurations yields token savings of 95.0% +/- 1.3% at V=0.05, 92.3% +/- 1.4% at V=0.10, 88.3% +/- 1.5% at V=0.25, and 84.2% +/- 1.3% at V=0.50 -- each exceeding the theorem's conservative lower bounds. Savings of ~81% persist at V=0.9, contrary to the predicted collapse threshold. Contributions: (1) formal MESI-to-artifact state mapping; (2) Token Coherence Theorem as savings lower bound; (3) TLA+-verified protocol with three proven invariants; (4) characterization of conditional artifact access semantics resolving the always-read objection; (5) reference Python implementation integrating with LangGraph, CrewAI, and AutoGen via thin adapter layers.

cross The Sampling Complexity of Condorcet Winner Identification in Dueling Bandits

Authors: El Mehdi Saad (CC-UM6P-Rabat), Victor Thuot (MISTEA), Nicolas Verzelen (MISTEA)

Abstract: We study best-arm identification in stochastic dueling bandits under the sole assumption that a Condorcet winner exists, i.e., an arm that wins each noisy pairwise comparison with probability at least $1/2$. We introduce a new identification procedure that exploits the full gap matrix $\Delta_{i,j}=q_{i,j}-\tfrac12$ (where $q_{i,j}$ is the probability that arm $i$ beats arm $j$), rather than only the gaps between the Condorcet winner and the other arms. We derive high-probability, instance-dependent sample-complexity guarantees that (up to logarithmic factors) improve the best known ones by leveraging informative comparisons beyond those involving the winner. We complement these results with new lower bounds which, to our knowledge, are the first for Condorcet-winner identification in stochastic dueling bandits. Our lower-bound analysis isolates the intrinsic cost of locating informative entries in the gap matrix and estimating them to the required confidence, establishing the optimality of our non-asymptotic bounds. Overall, our results reveal new regimes and trade-offs in the sample complexity that are not captured by asymptotic analyses based only on the expected budget.

cross Geometric framework for biological evolution

Authors: Vitaly Vanchurin

Abstract: We develop a generally covariant description of evolutionary dynamics that operates consistently in both genotype and phenotype spaces. We show that the maximum entropy principle yields a fundamental identification between the inverse metric tensor and the covariance matrix, revealing the Lande equation as a covariant gradient ascent equation. This demonstrates that evolution can be modeled as a learning process on the fitness landscape, with the specific learning algorithm determined by the functional relation between the metric tensor and the noise covariance arising from microscopic dynamics. While the metric (or the inverse genotypic covariance matrix) has been extensively characterized empirically, the noise covariance and its associated observable (the covariance of evolutionary changes) have never been directly measured. This poses the experimental challenge of determining the functional form relating metric to noise covariance.

cross Modeling Matches as Language: A Generative Transformer Approach for Counterfactual Player Valuation in Football

Authors: Miru Hong, Minho Lee, Geonhee Jo, Hyeokje Jo, Pascal Bauer, Sang-Ki Ko

Abstract: Evaluating football player transfers is challenging because player actions depend strongly on tactical systems, teammates, and match context. Despite this complexity, recruitment decisions often rely on static statistics and subjective expert judgment, which do not fully account for these contextual factors. This limitation stems largely from the absence of counterfactual simulation mechanisms capable of predicting outcomes in hypothetical scenarios. To address these challenges, we propose ScoutGPT, a generative model that treats football match events as sequential tokens within a language modeling framework. Utilizing a NanoGPT-based Transformer architecture trained on next-token prediction, ScoutGPT learns the dynamics of match event sequences to simulate event sequences under hypothetical lineups, demonstrating superior predictive performance compared to existing baseline models. Leveraging this capability, the model employs Monte Carlo sampling to enable counterfactual simulation, allowing for the assessment of unobserved scenarios. Experiments on K League data show that simulated player transfers lead to measurable changes in offensive progression and goal probabilities, indicating that ScoutGPT captures player-specific impact beyond traditional static metrics.

cross A proof-of-concept for automated AI-driven stellarator coil optimization with in-the-loop finite-element calculations

Authors: Alan A. Kaptanoglu, Pedro F. Gil

Abstract: Finding feasible coils for stellarator fusion devices is a critical challenge of realizing this concept for future power plants. Years of research work can be put into the design of even a single reactor-scale stellarator design. To rapidly speed up and automate the workflow of designing stellarator coils, we have designed an end-to-end ``runner'' for performing stellarator coil optimization. The entirety of pre and post-processing steps have been automated; the user specifies only a few basic input parameters, and final coil solutions are updated on an open-source leaderboard. Two policies are available for performing non-stop automated coil optimizations through a genetic algorithm or a context-aware LLM. Lastly, we construct a novel in-the-loop optimization of Von Mises stresses in the coils, opening up important future capabilities for in-the-loop finite-element calculations.

cross IConE: Batch Independent Collapse Prevention for Self-Supervised Representation Learning

Authors: Konstantinos Almpanakis, Anna Kreshuk

Abstract: Self-supervised learning (SSL) has revolutionized representation learning, with Joint-Embedding Architectures (JEAs) emerging as an effective approach for capturing semantic features. Existing JEAs rely on implicit or explicit batch interaction -- via negative sampling or statistical regularization -- to prevent representation collapse. This reliance becomes problematic in regimes where batch sizes must be small, such as high-dimensional scientific data, where memory constraints and class imbalance make large, well-balanced batches infeasible. We introduce IConE (Instance-Contrasted Embeddings), a framework that decouples collapse prevention from the training batch size. Rather than enforcing diversity through batch statistics, IConE maintains a global set of learnable auxiliary instance embeddings regularized by an explicit diversity objective. This transfers the anti-collapse mechanism from the transient batch to a dataset-level embedding space, allowing stable training even when batch statistics are unreliable, down to batch size 1. Across diverse 2D and 3D biomedical modalities, IConE outperforms strong contrastive and non-contrastive baselines throughout the small-batch regime (from B=1 to B=64) and demonstrates marked robustness to severe class imbalance. Geometric analysis shows that IConE preserves high intrinsic dimensionality in the learned representations, preventing the collapse observed in existing JEAs as batch sizes shrink.

cross Scalable Simulation-Based Model Inference with Test-Time Complexity Control

Authors: Manuel Gloeckler, J. P. Manzano-Patr\'on, Stamatios N. Sotiropoulos, Cornelius Schr\"oder, Jakob H. Macke

Abstract: Simulation plays a central role in scientific discovery. In many applications, the bottleneck is no longer running a simulator; it is choosing among large families of plausible simulators, each corresponding to different forward models/hypotheses consistent with observations. Over large model families, classical Bayesian workflows for model selection are impractical. Furthermore, amortized model selection methods typically hard-code a fixed model prior or complexity penalty at training time, requiring users to commit to a particular parsimony assumption before seeing the data. We introduce PRISM, a simulation-based encoder-decoder that infers a joint posterior over both discrete model structures and associated continuous parameters, while enabling test-time control of model complexity via a tunable model prior that the network is conditioned on. We show that PRISM scales to families with combinatorially many (up to billions) of model instantiations on a synthetic symbolic regression task. As a scientific application, we evaluate PRISM on biophysical modeling for diffusion MRI data, showing the ability to perform model selection across several multi-compartment models, on both synthetic and in vivo neuroimaging data.

cross A scaled TW-PINN: A physics-informed neural network for traveling wave solutions of reaction-diffusion equations with general coefficients

Authors: Seungwan Han, Kwanghyuk Park, Jiaxi Gu, Jae-Hun Jung

Abstract: We propose an efficient and generalizable physics-informed neural network (PINN) framework for computing traveling wave solutions of $n$-dimensional reaction-diffusion equations with various reaction and diffusion coefficients. By applying a scaling transformation with the traveling wave form, the original problem is reduced to a one-dimensional scaled reaction-diffusion equation with unit reaction and diffusion coefficients. This reduction leads to the proposed framework, termed scaled TW-PINN, in which a single PINN solver trained on the scaled equation is reused for different coefficient choices and spatial dimensions. We also prove a universal approximation property of the proposed PINN solver for traveling wave solutions. Numerical experiments in one and two dimensions, together with a comparison to the existing wave-PINN method, demonstrate the accuracy, flexibility, and superior performance of scaled TW-PINN. Finally, we explore an extension of the framework to the Fisher's equation with general initial conditions.

cross Active Seriation: Efficient Ordering Recovery with Statistical Guarantees

Authors: James Cheshire, Yann Issartel

Abstract: Active seriation aims at recovering an unknown ordering of $n$ items by adaptively querying pairwise similarities. The observations are noisy measurements of entries of an underlying $n$ x $n$ permuted Robinson matrix, whose permutation encodes the latent ordering. The framework allows the algorithm to start with partial information on the latent ordering, including seriation from scratch as a special case. We propose an active seriation algorithm that provably recovers the latent ordering with high probability. Under a uniform separation condition on the similarity matrix, optimal performance guarantees are established, both in terms of the probability of error and the number of observations required for successful recovery.

cross Controlled Langevin Dynamics for Sampling of Feedforward Neural Networks Trained with Minibatches

Authors: Alessandro Zambon, Francesca Caruso, Riccardo Zecchina, Guido Tiana

Abstract: Sampling the parameter space of artificial neural networks according to a Boltzmann distribution provides insight into the geometry of low-loss solutions and offers an alternative to conventional loss minimization for training. However, exact sampling methods such as hybrid Monte Carlo (hMC), while formally correct, become computationally prohibitive for realistic datasets because they require repeated evaluation of full-batch gradients. We introduce a pseudo-Langevin (pL) dynamics that enables efficient Boltzmann sampling of feed-forward neural networks trained with large datasets by using minibatches in a controlled manner. The method exploits the statistical properties of minibatch gradient noise and adjusts fictitious masses and friction coefficients to ensure that the induced stochastic process samples efficiently the desired equilibrium distribution. We validate numerically the approach by comparing its equilibrium statistics with those obtained from exact hMC sampling. Performance benchmarks demonstrate that, while hMC rapidly becomes inefficient as network size increases, the pL scheme maintains high computational diffusion and scales favorably to networks with over one million parameters. Finally, we show that sampling at intermediate temperatures yields optimal generalization performance, comparable to SGD, without requiring a validation set or early stopping procedure. These results establish controlled minibatch Langevin dynamics as a practical and scalable tool for exploring and exploiting the solution space of large neural networks.

cross Persistence Spheres: a Bi-continuous Linear Representation of Measures for Partial Optimal Transport

Authors: Matteo Pegoraro

Abstract: We improve and extend persistence spheres, introduced in~\cite{pegoraro2025persistence}. Persistence spheres map an integrable measure $\mu$ on the upper half-plane, including persistence diagrams (PDs) as counting measures, to a function $S(\mu)\in C(\mathbb{S}^2)$, and the map is stable with respect to 1-Wasserstein partial transport distance $\mathrm{POT}_1$. Moreover, to the best of our knowledge, persistence spheres are the first explicit representation used in topological machine learning for which continuity of the inverse on the image is established at every compactly supported target. Recent bounded-cardinality bi-Lipschitz embedding results in partial transport spaces, despite being powerful, are not given by the kind of explicit summary map considered here. Our construction is rooted in convex geometry: for positive measures, the defining ReLU integral is the support function of the lift zonoid. Building on~\cite{pegoraro2025persistence}, we refine the definition to better match the $\mathrm{POT}_1$ deletion mechanism, encoding partial transport via a signed diagonal augmentation. In particular, for integrable $\mu$, the uniform norm between $S(0)$ and $S(\mu)$ depends only on the persistence of $\mu$, without any need of ad-hoc re-weightings, reflecting optimal transport to the diagonal at persistence cost. This yields a parameter-free representation at the level of measures (up to numerical discretization), while accommodating future extensions where $\mu$ is a smoothed measure derived from PDs (e.g., persistence intensity functions~\citep{wu2024estimation}). Across clustering, regression, and classification tasks involving functional data, time series, graphs, meshes, and point clouds, the updated persistence spheres are competitive and often improve upon persistence images, persistence landscapes, persistence splines, and sliced Wasserstein kernel baselines.

cross TrinityGuard: A Unified Framework for Safeguarding Multi-Agent Systems

Authors: Kai Wang, Biaojie Zeng, Zeming Wei, Chang Jin, Hefeng Zhou, Xiangtian Li, Chao Yang, Jingjing Qu, Xingcheng Xu, Xia Hu

Abstract: With the rapid development of LLM-based multi-agent systems (MAS), their significant safety and security concerns have emerged, which introduce novel risks going beyond single agents or LLMs. Despite attempts to address these issues, the existing literature lacks a cohesive safeguarding system specialized for MAS risks. In this work, we introduce TrinityGuard, a comprehensive safety evaluation and monitoring framework for LLM-based MAS, grounded in the OWASP standards. Specifically, TrinityGuard encompasses a three-tier fine-grained risk taxonomy that identifies 20 risk types, covering single-agent vulnerabilities, inter-agent communication threats, and system-level emergent hazards. Designed for scalability across various MAS structures and platforms, TrinityGuard is organized in a trinity manner, involving an MAS abstraction layer that can be adapted to any MAS structures, an evaluation layer containing risk-specific test modules, alongside runtime monitor agents coordinated by a unified LLM Judge Factory. During Evaluation, TrinityGuard executes curated attack probes to generate detailed vulnerability reports for each risk type, where monitor agents analyze structured execution traces and issue real-time alerts, enabling both pre-development evaluation and runtime monitoring. We further formalize these safety metrics and present detailed case studies across various representative MAS examples, showcasing the versatility and reliability of TrinityGuard. Overall, TrinityGuard acts as a comprehensive framework for evaluating and monitoring various risks in MAS, paving the way for further research into their safety and security.

cross A Hybrid Modeling Framework for Crop Prediction Tasks via Dynamic Parameter Calibration and Multi-Task Learning

Authors: William Solow, Paola Pesantez-Cabrera, Markus Keller, Lav Khot, Sandhya Saisubramanian, Alan Fern

Abstract: Accurate prediction of crop states (e.g., phenology stages and cold hardiness) is essential for timely farm management decisions such as irrigation, fertilization, and canopy management to optimize crop yield and quality. While traditional biophysical models can be used for season-long predictions, they lack the precision required for site-specific management. Deep learning methods are a compelling alternative, but can produce biologically unrealistic predictions and require large-scale data. We propose a \emph{hybrid modeling} approach that uses a neural network to parameterize a differentiable biophysical model and leverages multi-task learning for efficient data sharing across crop cultivars in data limited settings. By predicting the \emph{parameters} of the biophysical model, our approach improves the prediction accuracy while preserving biological realism. Empirical evaluation using real-world and synthetic datasets demonstrates that our method improves prediction accuracy by 60\% for phenology and 40\% for cold hardiness compared to deployed biophysical models.

cross Deep Reinforcement Learning for Fano Hypersurfaces

Authors: Marc Truter

Abstract: We design a deep reinforcement learning algorithm to explore a high-dimensional integer lattice with sparse rewards, training a feedforward neural network as a dynamic search heuristic to steer exploration toward reward dense regions. We apply this to the discovery of Fano 4-fold hypersurfaces with terminal singularities, objects of central importance in algebraic geometry. Fano varieties with terminal singularities are fundamental building blocks of algebraic varieties, and explicit examples serve as a vital testing ground for the development and generalisation of theory. Despite decades of effort, the combinatorial intractability of the underlying search space has left this classification severely incomplete. Our reinforcement learning approach yields thousands of previously unknown examples, hundreds of which we show are inaccessible to known search methods.

cross Seeing Beyond: Extrapolative Domain Adaptive Panoramic Segmentation

Authors: Yuanfan Zheng, Kunyu Peng, Xu Zheng, Kailun Yang

Abstract: Cross-domain panoramic semantic segmentation has attracted growing interest as it enables comprehensive 360{\deg} scene understanding for real-world applications. However, it remains particularly challenging due to severe geometric Field of View (FoV) distortions and inconsistent open-set semantics across domains. In this work, we formulate an open-set domain adaptation setting, and propose Extrapolative Domain Adaptive Panoramic Segmentation (EDA-PSeg) framework that trains on local perspective views and tests on full 360{\deg} panoramic images, explicitly tackling both geometric FoV shifts across domains and semantic uncertainty arising from previously unseen classes. To this end, we propose the Euler-Margin Attention (EMA), which introduces an angular margin to enhance viewpoint-invariant semantic representation, while performing amplitude and phase modulation to improve generalization toward unseen classes. Additionally, we design the Graph Matching Adapter (GMA), which builds high-order graph relations to align shared semantics across FoV shifts while effectively separating novel categories through structural adaptation. Extensive experiments on four benchmark datasets under camera-shift, weather-condition, and open-set scenarios demonstrate that EDA-PSeg achieves state-of-the-art performance, robust generalization to diverse viewing geometries, and resilience under varying environmental conditions. The code is available at https://github.com/zyfone/EDA-PSeg.

URLs: https://github.com/zyfone/EDA-PSeg.

cross Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty

Authors: Jeonghye Kim, Xufang Luo, Minbeom Kim, Sangmook Lee, Dongsheng Li, Yuqing Yang

Abstract: LLMs often exhibit Aha moments during reasoning, such as apparent self-correction following tokens like "Wait," yet their underlying mechanisms remain unclear. We introduce an information-theoretic framework that decomposes reasoning into procedural information and epistemic verbalization - the explicit externalization of uncertainty that supports downstream control actions. We show that purely procedural reasoning can become informationally stagnant, whereas epistemic verbalization enables continued information acquisition and is critical for achieving information sufficiency. Empirical results demonstrate that strong reasoning performance is driven by uncertainty externalization rather than specific surface tokens. Our framework unifies prior findings on Aha moments and post-training experiments, and offers insights for future reasoning model design.

cross DOT: Dynamic Knob Selection and Online Sampling for Automated Database Tuning

Authors: Yifan Wang, Debabrota Basu, Pierre Bourhis, Romain Rouvoy, Patrick Royer

Abstract: Database Management Systems (DBMS) are crucial for efficient data management and access control, but their administration remains challenging for Database Administrators (DBAs). Tuning, in particular, is known to be difficult. Modern systems have many tuning parameters, but only a subset significantly impacts performance. Focusing on these influential parameters reduces the search space and optimizes performance. Current methods rely on costly warm-up phases and human expertise to identify important tuning parameters. In this paper, we present DOT, a dynamic knob selection and online sampling DBMS tuning algorithm. DOT uses Recursive Feature Elimination with Cross-Validation (RFECV) to prune low-importance tuning parameters and a Likelihood Ratio Test (LRT) strategy to balance exploration and exploitation. For parameter search, DOT uses a Bayesian Optimization (BO) algorithm to optimize configurations on-the-fly, eliminating the need for warm-up phases or prior knowledge (although existing knowledge can be incorporated). Experiments show that DOT achieves matching or outperforming performance compared to state-of-the-art tuners while substantially reducing tuning overhead.

cross Self-Distillation of Hidden Layers for Self-Supervised Representation Learning

Authors: Scott C. Lowe, Anthony Fuller, Sageev Oore, Evan Shelhamer, Graham W. Taylor

Abstract: The landscape of self-supervised learning (SSL) is currently dominated by generative approaches (e.g., MAE) that reconstruct raw low-level data, and predictive approaches (e.g., I-JEPA) that predict high-level abstract embeddings. While generative methods provide strong grounding, they are computationally inefficient for high-redundancy modalities like imagery, and their training objective does not prioritize learning high-level, conceptual features. Conversely, predictive methods often suffer from training instability due to their reliance on the non-stationary targets of final-layer self-distillation. We introduce Bootleg, a method that bridges this divide by tasking the model with predicting latent representations from multiple hidden layers of a teacher network. This hierarchical objective forces the model to capture features at varying levels of abstraction simultaneously. We demonstrate that Bootleg significantly outperforms comparable baselines (+10% over I-JEPA) on classification of ImageNet-1K and iNaturalist-21, and semantic segmentation of ADE20K and Cityscapes.

cross Estimating Staged Event Tree Models via Hierarchical Clustering on the Simplex

Authors: Muhammad Shoaib, Eva Riccomagno, Manuele Leonelli, Gherardo Varando

Abstract: Staged tree models enhance Bayesian networks by incorporating context-specific dependencies through a stage-based structure. In this study, we present a new framework for estimating staged trees using hierarchical clustering on the probability simplex, utilizing simplex basesd divergences. We conduct a thorough evaluation of several distance and divergence metrics including Total Variation, Hellinger, Fisher, and Kaniadakis; alongside various linkage methods such as Ward.D2, average, complete, and McQuitty. We conducted the simulation experiments that reveals Total Variation, especially when combined with Ward.D2 linkage, consistently produces staged trees with better model fit, structure recovery, and computational efficiency. We assess performance by utilizing relative Bayesian Information Criterion (BIC), and Hamming distance. Our findings indicate that although Backward Hill Climbing (BHC) delivers competitive outcomes, it incurs a significantly higher computational cost. On the other, Total Variation divergence with Ward.D2 linkage, achieves similar performance while providing significantly better computational efficiency, making it a more viable option for large-scale or time sensitive tasks.

cross Co-Design of Memory-Storage Systems for Workload Awareness with Interpretable Models

Authors: Jay Sarkar, Vamsi Pavan Rayaprolu, Abhijeet Bhalerao

Abstract: Solid-state storage architectures based on NAND or emerging memory devices (SSD), are fundamentally architected and optimized for both reliability and performance. Achieving these simultaneous goals requires co-design of memory components with firmware-architected Error Management (EM) algorithms for density- and performance-scaled memory technologies. We describe a Machine Learning (ML) for systems methodology and modeling for co-designing the EM subsystem together with the natural variance inherent to scaled silicon process of memory components underlying SSD technology. The modeling analyzes NAND memory components and EM algorithms interacting with comprehensive suite of synthetic (stress-focused and JEDEC) and emulation (YCSB and similar) workloads across Flash Translation abstraction layers, by leveraging a statistically interpretable and intuitively explainable ML algorithm. The generalizable co-design framework evaluates several thousand datacenter SSDs spanning multiple generations of memory and storage technology. Consequently, the modeling framework enables continuous, holistic, data-driven design towards generational architectural advancements. We additionally demonstrate that the framework enables Representation Learning of the EM-workload domain for enhancement of the architectural design-space across broad spectrum of workloads.

cross AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer

Authors: Pengjun Fang, Yingqing He, Yazhou Xing, Qifeng Chen, Ser-Nam Lim, Harry Yang

Abstract: Existing video-to-audio (V2A) generation methods predominantly rely on text prompts alongside visual information to synthesize audio. However, two critical bottlenecks persist: semantic granularity gaps in training data, such as conflating acoustically distinct sounds under coarse labels, and textual ambiguity in describing micro-acoustic features. These bottlenecks make it difficult to perform fine-grained sound synthesis using text-controlled modes. To address these limitations, we propose AC-Foley, an audio-conditioned V2A model that directly leverages reference audio to achieve precise and fine-grained control over generated sounds. This approach enables fine-grained sound synthesis, timbre transfer, zero-shot sound generation, and improved audio quality. By directly conditioning on audio signals, our approach bypasses the semantic ambiguities of text descriptions while enabling precise manipulation of acoustic attributes. Empirically, AC-Foley achieves state-of-the-art performance for Foley generation when conditioned on reference audio, while remaining competitive with state-of-the-art video-to-audio methods even without audio conditioning.

replace Conceptual Views of Neural Networks: A Framework for Neuro-Symbolic Analysis

Authors: Johannes Hirth, Tom Hanika

Abstract: We introduce \emph{conceptual views} as a formal framework grounded in Formal Concept Analysis for globally explaining neural networks. Experiments on twenty-four ImageNet models and Fruits-360 show that these views faithfully represent the original models, enable architecture comparison via Gromov--Wasserstein distance, and support abductive learning of human-comprehensible rules from neurons.

replace Skeleton Regression: A Graph-Based Approach to Estimation with Manifold Structure

Authors: Zeyu Wei, Yen-Chi Chen

Abstract: We introduce a new regression framework designed to deal with large-scale, complex data that lies around a low-dimensional manifold with noises. Our approach first constructs a graph representation, referred to as the skeleton, to capture the underlying geometric structure. We then define metrics on the skeleton graph and apply nonparametric regression techniques, along with feature transformations based on the graph, to estimate the regression function. We also discuss the limitations of some nonparametric regressors with respect to the general metric space such as the skeleton graph. The proposed regression framework suggests a novel way to deal with data with underlying geometric structures and provides additional advantages in handling the union of multiple manifolds, additive noises, and noisy observations. We provide statistical guarantees for the proposed method and demonstrate its effectiveness through simulations and real data examples.

replace Tensor Completion Leveraging Graph Information: A Dynamic Regularization Approach with Statistical Guarantees

Authors: Kaidong Wang, Qianxin Yi, Yao Wang, Xiuwu Liao, Shaojie Tang, Can Yang

Abstract: We consider the problem of tensor completion with graphs serving as side information to represent interrelationships among variables. Existing approaches suffer from several limitations: (1) they are often task-specific and lack generality or systematic formulation; (2) they typically treat graphs as static structures, ignoring their inherent dynamism in tensor-based settings; (3) they lack theoretical guarantees on statistical and computational complexity. To address these issues, we introduce a pioneering framework that systematically develops a novel model, theory, and algorithm for dynamic graph-regularized tensor completion. At the modeling level, we establish a rigorous mathematical representation of dynamic graphs and derive a new tensor-oriented graph smoothness regularization effectively capturing the similarity structure of the tensor. At the theory level, we establish the statistical consistency for our model under certain conditions, providing the first theoretical guarantees for tensor recovery in the presence of graph information. Moreover, we develop an efficient algorithm with guaranteed convergence. A series of experiments on both synthetic and real-world data demonstrate that our method achieves superior recovery accuracy, especially under highly sparse observations and strong dynamics.

replace ConjNorm: Tractable Density Estimation for Out-of-Distribution Detection

Authors: Bo Peng, Yadan Luo, Yonggang Zhang, Yixuan Li, Zhen Fang

Abstract: Post-hoc out-of-distribution (OOD) detection has garnered intensive attention in reliable machine learning. Many efforts have been dedicated to deriving score functions based on logits, distances, or rigorous data distribution assumptions to identify low-scoring OOD samples. Nevertheless, these estimate scores may fail to accurately reflect the true data density or impose impractical constraints. To provide a unified perspective on density-based score design, we propose a novel theoretical framework grounded in Bregman divergence, which extends distribution considerations to encompass an exponential family of distributions. Leveraging the conjugation constraint revealed in our theorem, we introduce a \textsc{ConjNorm} method, reframing density function design as a search for the optimal norm coefficient $p$ against the given dataset. In light of the computational challenges of normalization, we devise an unbiased and analytically tractable estimator of the partition function using the Monte Carlo-based importance sampling technique. Extensive experiments across OOD detection benchmarks empirically demonstrate that our proposed \textsc{ConjNorm} has established a new state-of-the-art in a variety of OOD detection setups, outperforming the current best method by up to 13.25$\%$ and 28.19$\%$ (FPR95) on CIFAR-100 and ImageNet-1K, respectively.

replace Survey of Computerized Adaptive Testing: A Machine Learning Perspective

Authors: Yan Zhuang, Qi Liu, Haoyang Bi, Zhenya Huang, Weizhe Huang, Jiatong Li, Junhao Yu, Zirui Liu, Zirui Hu, Yuting Hong, Zachary A. Pardos, Haiping Ma, Mengxiao Zhu, Shijin Wang, Enhong Chen

Abstract: Computerized Adaptive Testing (CAT) offers an efficient and personalized method for assessing examinee proficiency by dynamically adjusting test questions based on individual performance. Compared to traditional, non-personalized testing methods, CAT requires fewer questions and provides more accurate assessments. As a result, CAT has been widely adopted across various fields, including education, healthcare, sports, sociology, and the evaluation of AI models. While traditional methods rely on psychometrics and statistics, the increasing complexity of large-scale testing has spurred the integration of machine learning techniques. This paper aims to provide a machine learning-focused survey on CAT, presenting a fresh perspective on this adaptive testing paradigm. We delve into measurement models, question selection algorithm, bank construction, and test control within CAT, exploring how machine learning can optimize these components. Through an analysis of current methods, strengths, limitations, and challenges, we strive to develop robust, fair, and efficient CAT systems. By bridging psychometric-driven CAT research with machine learning, this survey advocates for a more inclusive and interdisciplinary approach to the future of adaptive testing.

replace Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty

Authors: Yanwei Jia

Abstract: This paper studies continuous-time risk-sensitive reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation with the exponential-form objective. The risk-sensitive objective arises either as the agent's risk attitude or as a distributionally robust approach against the model uncertainty. Owing to the martingale perspective in Jia and Zhou (J Mach Learn Res 24(161): 1--61, 2023) the risk-sensitive RL problem is shown to be equivalent to ensuring the martingale property of a process involving both the value function and the q-function, augmented by an additional penalty term: the quadratic variation of the value process, capturing the variability of the value-to-go along the trajectory. This characterization allows for the straightforward adaptation of existing RL algorithms developed for non-risk-sensitive scenarios to incorporate risk sensitivity by adding the realized variance of the value process. Additionally, I highlight that the conventional policy gradient representation is inadequate for risk-sensitive problems due to the nonlinear nature of quadratic variation; however, q-learning offers a solution and extends to infinite horizon settings. Finally, I prove the convergence of the proposed algorithm for Merton's investment problem and quantify the impact of temperature parameter on the behavior of the learning procedure. I also conduct simulation experiments to demonstrate how risk-sensitive RL improves the finite-sample performance in the linear-quadratic control problem.

replace Random Scaling and Momentum for Non-smooth Non-convex Optimization

Authors: Qinzi Zhang, Ashok Cutkosky

Abstract: Training neural networks requires optimizing a loss function that may be highly irregular, and in particular neither convex nor smooth. Popular training algorithms are based on stochastic gradient descent with momentum (SGDM), for which classical analysis applies only if the loss is either convex or smooth. We show that a very small modification to SGDM closes this gap: simply scale the update at each time point by an exponentially distributed random scalar. The resulting algorithm achieves optimal convergence guarantees. Intriguingly, this result is not derived by a specific analysis of SGDM: instead, it falls naturally out of a more general framework for converting online convex optimization algorithms to non-convex optimization algorithms.

replace MultiTask Learning AI system to assist BCC diagnosis with dual explanation

Authors: Iv\'an Matas, Carmen Serrano, Francisca Silva, Amalia Serrano, Tom\'as Toledo-Pastrana, Bego\~na Acha

Abstract: Basal cell carcinoma (BCC) accounts for about 75% of skin cancers. The adoption of teledermatology protocols in Spanish public hospitals has increased dermatologists' workload, motivating the development of AI tools for lesion prioritization. However, limited transparency in current systems hinders clinical acceptance. This study proposes an AI system for BCC detection from dermoscopic images that integrates dermatologist diagnostic criteria based on specific dermoscopic patterns. We analyzed 1559 dermoscopic images from 60 primary care centers annotated by four dermatologists for seven BCC patterns. An Expectation-Maximization consensus algorithm was used to build a unified standard reference. A multitask learning model based on MobileNet-V2 was developed to classify lesions and identify clinically relevant patterns, supported by Grad-CAM visual explanations. The system achieved 90% accuracy in BCC classification (precision 0.90, recall 0.89). Clinically relevant BCC patterns were correctly detected in 99% of positive cases, and the pigment network exclusion criterion was satisfied in 95% of non-BCC cases. Grad-CAM maps showed strong spatial agreement with dermatologist-defined regions. The proposed system combines accurate BCC detection with transparent pattern-based explanations, helping bridge the gap between AI performance and clinical trust in teledermatology.

replace TraffiDent: A Dataset for Understanding the Interplay Between Traffic Dynamics and Incidents

Authors: Xiaochuan Gou, Ziyue Li, Tian Lan, Junpeng Lin, Zhishuai Li, Bingyu Zhao, Chen Zhang, Di Wang, Xiangliang Zhang

Abstract: Long-separated research has been conducted on two highly correlated tracks: traffic and incidents. Traffic track witnesses complicating deep learning models, e.g., to push the prediction a few percent more accurate, and the incident track only studies the incidents alone, e.g., to infer the incident risk. We, for the first time, spatiotemporally aligned the two tracks in a large-scale region (16,972 traffic nodes) from year 2022 to 2024: our TraffiDent dataset includes traffic, i.e., time-series indexes on traffic flow, lane occupancy, and average vehicle speed, and incident, whose records are spatiotemporally aligned with traffic data, with seven different incident classes. Additionally, each node includes detailed physical and policy-level meta-attributes of lanes. Previous datasets typically contain only traffic or incident data in isolation, limiting research to general forecasting tasks. TraffiDent integrates both, enabling detailed analysis of traffic-incident interactions and causal relationships. To demonstrate its broad applicability, we design: (1) post-incident traffic forecasting to quantify the impact of different incidents on traffic indexes; (2) incident classification using traffic indexes to determine the incidents types for precautions measures; (3) global causal analysis among the traffic indexes, meta-attributes, and incidents to give high-level guidance of the interrelations of various factors; (4) local causal analysis within road nodes to examine how different incidents affect the road segments' relations. The dataset is available at https://xaitraffic.github.io.

URLs: https://xaitraffic.github.io.

replace FC-KAN: Function Combinations in Kolmogorov-Arnold Networks

Authors: Hoang-Thang Ta, Duy-Quy Thai, Abu Bakar Siddiqur Rahman, Grigori Sidorov, Alexander Gelbukh

Abstract: In this paper, we introduce FC-KAN, a Kolmogorov-Arnold Network (KAN) that leverages combinations of popular mathematical functions such as B-splines, wavelets, and radial basis functions on low-dimensional data through element-wise operations. We explore several methods for combining the outputs of these functions, including sum, element-wise product, the addition of sum and element-wise product, representations of quadratic and cubic functions, concatenation, linear transformation of the concatenated output, and others. In our experiments, we compare FC-KAN with a multi-layer perceptron network (MLP) and other existing KANs, such as BSRBF-KAN, EfficientKAN, FastKAN, and FasterKAN, on the MNIST and Fashion-MNIST datasets. Two variants of FC-KAN, which use a combination of outputs from B-splines and Derivative of Gaussians (DoG) and from B-splines and linear transformations in the form of a quadratic function, outperformed overall other models on the average of 5 independent training runs. We expect that FC-KAN can leverage function combinations to design future KANs. Our repository is publicly available at: https://github.com/hoangthangta/FC_KAN.

URLs: https://github.com/hoangthangta/FC_KAN.

replace On the Adversarial Transferability of Generalized "Skip Connections"

Authors: Yisen Wang, Yichuan Mo, Dongxian Wu, Mingjie Li, Xingjun Ma, Zhouchen Lin

Abstract: Skip connection is an essential ingredient for modern deep models to be deeper and more powerful. Despite their huge success in normal scenarios (state-of-the-art classification performance on natural examples), we investigate and identify an interesting property of skip connections under adversarial scenarios, namely, the use of skip connections allows easier generation of highly transferable adversarial examples. Specifically, in ResNet-like models (with skip connections), we find that biasing backpropagation to favor gradients from skip connections--while suppressing those from residual modules via a decay factor--allows one to craft adversarial examples with high transferability. Based on this insight, we propose the Skip Gradient Method (SGM). Although starting from ResNet-like models in vision domains, we further extend SGM to more advanced architectures, including Vision Transformers (ViTs), models with varying-length paths, and other domains such as natural language processing. We conduct comprehensive transfer-based attacks against diverse model families, including ResNets, Transformers, Inceptions, Neural Architecture Search-based models, and Large Language Models (LLMs). The results demonstrate that employing SGM can greatly improve the transferability of crafted attacks in almost all cases. Furthermore, we demonstrate that SGM can still be effective under more challenging settings such as ensemble-based attacks, targeted attacks, and against defense equipped models. At last, we provide theoretical explanations and empirical insights on how SGM works. Our findings not only motivate new adversarial research into the architectural characteristics of models but also open up further challenges for secure model architecture design. Our code is available at https://github.com/mo666666/SGM.

URLs: https://github.com/mo666666/SGM.

replace Deconfounded Time Series Forecasting: A Causal Inference Approach

Authors: Wentao Gao, Xiaojing Du, Wenjun Yu, Xiongren Chen, Yifan Guo, Feiyu Yang

Abstract: Time series forecasting is a critical task in various domains, where accurate predictions can drive informed decision-making. Traditional forecasting methods often rely on current observations of variables to predict future outcomes, typically overlooking the influence of latent confounders, unobserved variables that simultaneously affect both the predictors and the target outcomes. This oversight can introduce bias and degrade the performance of predictive models. In this study, we address this challenge by proposing an enhanced forecasting approach that incorporates representations of latent confounders derived from historical data. By integrating these confounders into the predictive process, our method aims to improve the accuracy and robustness of time series forecasts. The proposed approach is demonstrated through its application to climate science data, showing significant improvements over traditional methods that do not account for confounders.

replace HyReaL: Clustering Attributed Graph via Hyper-Complex Space Representation Learning

Authors: Junyang Chen, Yang Lu, Mengke Li, Cuie Yang, Yiqun Zhang, Yiu-ming Cheung

Abstract: Clustering complex data in the form of attributed graphs has attracted increasing attention, where powerful graph representation is a critical prerequisite. However, the well-known Over-Smoothing (OS) effect makes Graph Convolutional Networks tend to homogenize the representation of graph nodes, while the existing OS solutions focus on alleviating the homogeneity of nodes' embeddings from the aspect of graph topology information, which is inconsistent with the attributed graph clustering objective. Therefore, we introduce hyper-complex space with powerful quaternion feature transformation to enhance the representation learning of the attributes. A generalized \textbf{Hy}per-complex space \textbf{Re}present\textbf{a}tion \textbf{L}earning (\textbf{HyReaL}) model is designed to: 1) bridge arbitrary dimensional attributes to the well-developed quaternion algebra with four parts, and 2) connect the learned representations to more generalized clustering objective without being restricted to a given number of clusters $k$. The novel introduction of quaternion benefits attributed graph clustering from two aspects: 1) enhanced attribute coupling learning capability allows complex attribute information to be sufficiently exploited in clustering, and 2) stronger learning capability makes it unnecessary to stack too many graph convolution layers, naturally alleviating the OS problem. It turns out that the node representations learned by HyReaL are more discriminative and widely suit downstream clustering with different $k$s. Extensive experiments including significance tests, ablation studies, qualitative results, etc., show the superiority of HyReaL.

replace Symplectic Neural Flows for Modeling and Discovery

Authors: Priscilla Canizares, Davide Murari, Carola-Bibiane Sch\"onlieb, Ferdia Sherry, Zakhar Shumaylov

Abstract: Hamilton's equations are fundamental for modeling complex physical systems, where preserving key properties such as energy and momentum is crucial for reliable long-term simulations. Geometric integrators are widely used for this purpose, but neural network-based methods that incorporate these principles remain underexplored. This work introduces SympFlow, a time-dependent symplectic neural network designed using parameterized Hamiltonian flow maps. This design allows for backward error analysis and ensures the preservation of the symplectic structure. SympFlow allows for two key applications: (i) providing a time-continuous symplectic approximation of the exact flow of a Hamiltonian system purely based on the differential equations it satisfies, and (ii) approximating the flow map of an unknown Hamiltonian system relying on trajectory data. We demonstrate the effectiveness of SympFlow on diverse problems, including chaotic and dissipative systems, showing improved energy conservation compared to general-purpose numerical methods and accurate approximations from sparse irregular data. We also provide a thorough theoretical analysis of SympFlow, showing it can approximate the flow of any time-dependent Hamiltonian system, and providing an a-posteriori error estimate in terms of energy conservation.

replace Algorithmic Data Minimization for Machine Learning over Internet-of-Things Data Streams

Authors: Ted Shaowang, Shinan Liu, Jonatas Marques, Nick Feamster, Sanjay Krishnan

Abstract: Machine learning can analyze vast amounts of data generated by IoT devices to identify patterns, make predictions, and enable real-time decision-making. By processing sensor data, machine learning models can optimize processes, improve efficiency, and enhance personalized user experiences in smart systems. However, IoT systems are often deployed in sensitive environments such as households and offices, where they may inadvertently expose identifiable information, including location, habits, and personal identifiers. This raises significant privacy concerns, necessitating the application of data minimization -- a foundational principle in emerging data regulations, which mandates that service providers only collect data that is directly relevant and necessary for a specified purpose. Despite its importance, data minimization lacks a precise technical definition in the context of sensor data, where collections of weak signals make it challenging to apply a binary "relevant and necessary" rule. This paper provides a technical interpretation of data minimization in the context of sensor streams, explores practical methods for implementation, and addresses the challenges involved. Through our approach, we demonstrate that our framework can reduce user identifiability by up to 16.7% while maintaining accuracy loss below 1%, offering a viable path toward privacy-preserving IoT data processing.

replace Interpretable Visualizations of Data Spaces for Classification Problems

Authors: Christian Jorgensen, Arthur Y. Lin, Rhushil Vasavada, Rose K. Cersonsky

Abstract: How do classification models "see" our data? Based on their success in delineating behaviors, there must be some lens through which it is easy to see the boundary between classes; however, our current set of visualization techniques makes this prospect difficult. In this work, we propose a hybrid supervised-unsupervised technique distinctly suited to visualizing the decision boundaries determined by classification problems. This method provides a human-interpretable map that can be analyzed qualitatively and quantitatively, which we demonstrate through visualizing and interpreting a decision boundary for chemical neurotoxicity. While we discuss this method in the context of chemistry-driven problems, its application can be generalized across subfields for "unboxing" the operations of machine-learning classification models.

replace A Survey on Deep Learning Approaches for Tabular Data Generation: Utility, Alignment, Fidelity, Privacy, Diversity, and Beyond

Authors: Mihaela C\u{a}t\u{a}lina Stoian, Eleonora Giunchiglia, Thomas Lukasiewicz

Abstract: Generative modelling has become the standard approach for synthesising tabular data. However, different use cases demand synthetic data to comply with different requirements to be useful in practice. In this survey, we review deep generative modelling approaches for tabular data from the perspective of five types of requirements: utility of the synthetic data, alignment of the synthetic data with domain-specific knowledge, statistical fidelity of the synthetic data distribution compared to the real data distribution, privacy-preserving capabilities, and sampling diversity. We group the approaches along two levels of granularity: (i) based on the requirements they address and (ii) according to the underlying model they utilise. Additionally, we summarise the appropriate evaluation methods for each requirement, the relationships among the requirements, and the specific characteristics of each model type. Finally, we discuss future directions for the field, along with opportunities to improve the current evaluation methods. Overall, this survey can be seen as a user guide to tabular data generation: helping readers navigate available models and evaluation methods to find those best suited to their needs.

replace Is CLIP ideal? No. Can we fix it? Yes!

Authors: Raphi Kang, Yue Song, Georgia Gkioxari, Pietro Perona

Abstract: Contrastive Language-Image Pre-Training (CLIP) is a popular method for learning multimodal latent spaces with well-organized semantics. Despite its wide range of applications, CLIP's latent space is known to fail at handling complex visual-textual interactions. Recent works attempt to address its shortcomings with data-centric or algorithmic approaches. But what if the problem is more fundamental, and lies in the geometry of CLIP? Toward this end, we rigorously analyze CLIP's latent space properties, and prove that no CLIP-like joint embedding space exists which can correctly do any two of the following at the same time: 1. represent basic descriptions and image content, 2. represent attribute binding, 3. represent spatial location and relationships, 4. represent negation. Informed by this analysis, we propose Dense Cosine Similarity Maps (DCSMs) as a principled and interpretable scoring method for CLIP-like models, which solves the fundamental limitations of CLIP by retaining the semantic topology of the image patches and text tokens. This method improves upon the performance of classical CLIP-like joint encoder models on a wide array of benchmarks. We share our code and data here for reproducibility: https://github.com/Raphoo/DCSM_Ideal_CLIP

URLs: https://github.com/Raphoo/DCSM_Ideal_CLIP

replace Reasoning-Grounded Natural Language Explanations for Language Models

Authors: Vojtech Cahlik, Rodrigo Alves, Pavel Kordik

Abstract: We propose a large language model explainability technique for obtaining faithful natural language explanations by grounding the explanations in a reasoning process. When converted to a sequence of tokens, the outputs of the reasoning process can become part of the model context and later be decoded to natural language as the model produces either the final answer or the explanation. To improve the faithfulness of the explanations, we propose to use a joint predict-explain approach, in which the answers and explanations are inferred directly from the reasoning sequence, without the explanations being dependent on the answers and vice versa. We demonstrate the plausibility of the proposed technique by achieving a high alignment between answers and explanations in several problem domains, observing that language models often simply copy the partial decisions from the reasoning sequence into the final answers or explanations. Furthermore, we show that the proposed use of reasoning can also improve the quality of the answers.

replace RRNCO: Towards Real-World Routing with Neural Combinatorial Optimization

Authors: Jiwoo Son, Zhikai Zhao, Federico Berto, Chuanbo Hua, Zhiguang Cao, Changhyun Kwon, Jinkyoo Park

Abstract: The practical deployment of Neural Combinatorial Optimization (NCO) for Vehicle Routing Problems (VRPs) is hindered by a critical sim-to-real gap. This gap stems not only from training on oversimplified Euclidean data but also from node-based architectures incapable of handling the node-and-edge-based features with correlated asymmetric cost matrices, such as those for real-world distance and duration. We introduce RRNCO, a novel architecture specifically designed to address these complexities. RRNCO's novelty lies in two key innovations. First, its Adaptive Node Embedding (ANE) efficiently fuses spatial coordinates with real-world distance features using a learned contextual gating mechanism. Second, its Neural Adaptive Bias (NAB) is the first mechanism to jointly model asymmetric distance, duration, and directional angles, enabling it to capture complex, realistic routing constraints. Moreover, we introduce a new VRP benchmark grounded in real-world data crucial for bridging this sim-to-real gap, featuring asymmetric distance and duration matrices from 100 diverse cities, enabling the training and validation of NCO solvers on tasks that are more representative of practical settings. Experiments demonstrate that RRNCO achieves state-of-the-art performance on this benchmark, significantly advancing the practical applicability of neural solvers for real-world logistics. Our code, dataset, and pretrained models are available at https://github.com/ai4co/real-routing-nco.

URLs: https://github.com/ai4co/real-routing-nco.

replace Physics-Informed Deep B-Spline Networks

Authors: Zhuoyuan Wang, Raffaele Romagnoli, Saviz Mowlavi, Yorie Nakahira

Abstract: Physics-informed machine learning offers a promising framework for solving complex partial differential equations (PDEs) by integrating observational data with governing physical laws. However, learning PDEs with varying parameters and changing initial conditions and boundary conditions (ICBCs) with theoretical guarantees remains an open challenge. In this paper, we propose physics-informed deep B-spline networks, a novel technique that approximates a family of PDEs with different parameters and ICBCs by learning B-spline control points through neural networks. The proposed B-spline representation reduces the learning task from predicting solution values over the entire domain to learning a compact set of control points, enforces strict compliance to initial and Dirichlet boundary conditions by construction, and enables analytical computation of derivatives for incorporating PDE residual losses. While existing approximation and generalization theories are not applicable in this setting - where solutions of parametrized PDE families are represented via B-spline bases - we fill this gap by showing that B-spline networks are universal approximators for such families under mild conditions. We also derive generalization error bounds for physics-informed learning in both elliptic and parabolic PDE settings, establishing new theoretical guarantees. Finally, we demonstrate in experiments that the proposed technique has improved efficiency-accuracy tradeoffs compared to existing techniques in a dynamical system problem with discontinuous ICBCs and can handle nonhomogeneous ICBCs and non-rectangular domains.

replace Almost Bayesian: The Fractal Dynamics of Stochastic Gradient Descent

Authors: Max Hennick, Stijn De Baerdemacker

Abstract: We show that the behavior of stochastic gradient descent is related to Bayesian statistics by showing that SGD is effectively diffusion on a fractal landscape, where the fractal dimension can be accounted for in a purely Bayesian way. By doing this we show that SGD can be regarded as a modified Bayesian sampler which accounts for accessibility constraints induced by the fractal structure of the loss landscape. We verify our results experimentally by examining the diffusion of weights during training. These results offer insight into the factors which determine the learning process, and seemingly answer the question of how SGD and purely Bayesian sampling are related.

replace Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning

Authors: Syeda Nahida Akter, Shrimai Prabhumoye, Matvei Novikov, Seungju Han, Ying Lin, Evelina Bakhturina, Eric Nyberg, Yejin Choi, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro

Abstract: Large Language Models (LLMs) have shown strong reasoning capabilities, particularly when enhanced through Reinforcement Learning (RL). While prior work has successfully applied RL to mathematical reasoning -- where rules and correctness are well-defined -- generalizing these methods to broader reasoning domains remains challenging due to limited data, the lack of verifiable reward structures, and diverse task requirements. In this work, we propose NEMOTRON-CROSSTHINK, a framework that systematically incorporates multi-domain corpora, including both synthetic and real-world question-answer pairs, into RL training to improve generalization across diverse reasoning tasks. NEMOTRON-CROSSTHINK addresses key challenges by (1) incorporating data from varied sources spanning STEM, humanities, social sciences, etc.; (2) applying structured templates (e.g., multiple-choice and open-ended) to control answer-space complexity; (3) filtering for verifiable answers; and (4) optimizing data blending strategies that utilizes data from multiple sources effectively. Our approach enables scalable and verifiable reward modeling beyond mathematics and demonstrates improved accuracies on both math (MATH-500: +30.1%, AMC23:+27.5%) and non-math reasoning benchmarks (MMLU-PRO: +12.8%, GPQA-DIAMOND: +11.3%, AGIEVAL: +15.1%, SUPERGPQA: +3.8%). Moreover, NEMOTRON-CROSSTHINK exhibits significantly improved response efficiency -- using 28% fewer tokens for correct answers -- highlighting more focused and effective reasoning. Through NEMOTRON-CROSSTHINK, we demonstrate that integrating multi-domain, multi-format data in RL leads to more accurate, efficient, and generalizable LLMs.

replace Geometric Learning Dynamics

Authors: Vitaly Vanchurin

Abstract: We present a unified geometric framework for modeling learning dynamics in physical, biological, and machine learning systems. The theory reveals three fundamental regimes, each emerging from the power-law relationship $g \propto \kappa^\alpha$ between the metric tensor $g$ in the space of trainable variables and the noise covariance matrix $\kappa$. The quantum regime corresponds to $\alpha = 1$ and describes Schr\"odinger-like dynamics that emerges from a discrete shift symmetry. The efficient learning regime corresponds to $\alpha = \tfrac{1}{2}$ and describes very fast machine learning algorithms. The equilibration regime corresponds to $\alpha = 0$ and describes classical models of biological evolution. We argue that the emergence of the intermediate regime $\alpha = \tfrac{1}{2}$ is a key mechanism underlying the emergence of biological complexity.

replace Anticipating Gaming to Incentivize Improvement: Guiding Agents in (Fair) Strategic Classification

Authors: Sura Alhanouti, Parinaz Naghizadeh

Abstract: As machine learning algorithms increasingly influence critical decision making in different application areas, understanding human strategic behavior in response to these systems becomes vital. We explore individuals' choice between genuinely improving their qualifications (``improvement'') vs. attempting to deceive the algorithm by manipulating their features (``manipulation'') in response to an algorithmic decision system. We further investigate an algorithm designer's ability to shape these strategic responses, and its fairness implications. Specifically, we formulate these interactions as a Stackelberg game, where a firm deploys a (fair) classifier, and individuals strategically respond. Our model incorporates both different costs and stochastic efficacy for manipulation and improvement. The analysis reveals different potential classes of agent responses, and characterizes optimal classifiers accordingly. Based on these, we identify when and why a (fair) strategic policy can not only prevent manipulation, but also incentivize agents to opt for improvement. Our findings shed light on the intertwined nature of microeconomic and ethical implications of firms' anticipation of strategic behavior when employing ML-driven decision systems.

replace MSDformer: Multi-scale Discrete Transformer For Time Series Generation

Authors: Shibo Feng, Zhicheng Chen, Xi Xiao, Zhong Zhang, Qing Li, Xingyu Gao, Peilin Zhao

Abstract: Discrete Token Modeling (DTM), which employs vector quantization techniques, has demonstrated remarkable success in modeling non-natural language modalities, particularly in time series generation. While our prior work SDformer established the first DTM-based framework to achieve state-of-the-art performance in this domain, two critical limitations persist in existing DTM approaches: 1) their inability to capture multi-scale temporal patterns inherent to complex time series data, and 2) the absence of theoretical foundations to guide model optimization. To address these challenges, we proposes a novel multi-scale DTM-based time series generation method, called Multi-Scale Discrete Transformer (MSDformer). MSDformer employs a multi-scale time series tokenizer to learn discrete token representations at multiple scales, which jointly characterize the complex nature of time series data. Subsequently, MSDformer applies a multi-scale autoregressive token modeling technique to capture the multi-scale patterns of time series within the discrete latent space. Theoretically, we validate the effectiveness of the DTM method and the rationality of MSDformer through the rate-distortion theorem. Comprehensive experiments demonstrate that MSDformer significantly outperforms state-of-the-art methods. Both theoretical analysis and experimental results demonstrate that incorporating multi-scale information and modeling multi-scale patterns can substantially enhance the quality of generated time series in DTM-based approaches. The code will be released upon acceptance.

replace A Systematic Evaluation of On-Device LLMs: Quantization, Performance, and Resources

Authors: Qingyu Song, Rui Liu, Wei Lin, Peiyu Liao, Wenqian Zhao, Yiwen Wang, Shoubo Hu, Yining Jiang, Mochun Long, Hui-Ling Zhen, Ning Jiang, Mingxuan Yuan, Qiao Xiang, Hong Xu

Abstract: Deploying Large Language Models (LLMs) on edge devices enhances privacy but faces performance hurdles due to limited resources. We introduce a systematic methodology to evaluate on-device LLMs, balancing capability, efficiency, and resource constraints. Through an extensive analysis of models (0.5B-14B) and seven post-training quantization (PTQ) methods on commodity hardware, we demonstrate that: 1) Heavily quantized large models consistently outperform smaller, high-precision models, with a performance threshold at ~3.5 effective bits-per-weight (BPW); 2) Resource utilization scales linearly with BPW, though power and memory footprints vary by quantization algorithm; and 3) With a reduction in model size, the primary constraint on throughput transitions from communication overhead to computational latency. We conclude by offering guidelines for optimizing LLMs in resource-constrained edge environments. Our codebase is available at https://anonymous.4open.science/r/LLMOnDevice/.

URLs: https://anonymous.4open.science/r/LLMOnDevice/.

replace Admission Control of Quasi-Reversible Queueing Systems: Optimization and Reinforcement Learning

Authors: C\'eline Comte (CNRS, LAAS-SARA), Pascal Moyal (IECL, PASTA)

Abstract: In this paper, we introduce a versatile scheme for optimizing the arrival rates of quasi-reversible queueing systems. We first propose an alternative definition of quasi-reversibility that encompasses reversibility and highlights the importance of the definition of customer classes. Then we introduce balanced arrival control policies, which generalize the notion of balanced arrival rates introduced in the context of Whittle networks, to the much broader class of quasi-reversible queueing systems. We prove that supplementing a quasi-reversible queueing system with a balanced arrival-control policy preserves the quasi-reversibility, and we specify the form of the stationary measures. We revisit two canonical examples of quasi-reversible queueing systems, Whittle networks and order-independent queues. Lastly, we focus on the problem of admission control and leverage our results in the frameworks of optimization and reinforcement learning.

replace TI-DeepONet: Learnable Time Integration for Stable Long-Term Extrapolation

Authors: Dibyajyoti Nayak, Somdatta Goswami

Abstract: Accurate temporal extrapolation remains a fundamental challenge for neural operators modeling dynamical systems, where predictions must extend far beyond the training horizon. Conventional DeepONet approaches rely on two limited paradigms: fixed-horizon rollouts, which predict full spatiotemporal solutions while ignoring temporal causality, and autoregressive schemes, which accumulate errors through sequential prediction. We introduce TI-DeepONet, a framework that integrates neural operators with adaptive numerical time-stepping to preserve the Markovian structure of dynamical systems while mitigating long-term error growth. Our method shifts the learning objective from direct state prediction to approximating instantaneous time-derivative fields, which are then integrated using standard numerical solvers. This naturally enables continuous-time prediction and allows the use of higher-order integrators at inference than those used in training, improving both efficiency and accuracy. We further propose TI(L)-DeepONet, which incorporates learnable coefficients for intermediate stages in multi-stage integration, adapting to solution-specific dynamics and enhancing fidelity. Across six canonical PDEs featuring chaotic, dissipative, dispersive, and high-dimensional behavior, TI(L)-DeepONet maeginally outperforms TI-DeepONet, and both achieve major reductions in relative L2 extrapolation error: about 96.3% compared to autoregressive methods and 83.6% compared to fixed-horizon approaches. Notably, both models maintain stable predictions over temporal domains nearly twice the training interval. This work establishes a physics-aware operator learning framework that bridges neural approximation with numerical analysis principles, addressing a key gap in long-term forecasting of complex physical systems.

replace Unveiling the Basin-Like Loss Landscape in Large Language Models

Authors: Huanran Chen, Yinpeng Dong, Zeming Wei, Yao Huang, Yichi Zhang, Hang Su, Jun Zhu

Abstract: We discover the emergence of \textit{basins} in the loss landscape of large language models. As model scale increases, LLMs become progressively more resilient to random perturbations in the parameter space, giving rise to expansive stability regions where models exhibit nearly identical performance, but outside of which their capabilities collapse. We observe that pre-training creates a \textit{basic capability} basin, and subsequent alignment fine-tuning forms \textit{specific capability} basins (e.g., safety, math, coding). Thus, we argue that benign fine-tuning confined to the basin should preserve prior capabilities. Besides, we also analyze the loss landscape for worst-case directions, which is consistently sharp and detrimental. We find that adversarial fine-tuning moves along the nearly worst-case directions, thus rapidly degrading model capabilities. Finally, we provide a theoretical analysis demonstrating that the basin size bounds the performance degradation of any fine-tuning, including the adversarial ones, while also guaranteeing the model robustness w.r.t. input perturbations, suggesting the benefit of enlarging basins.

replace Generalized and Personalized Federated Learning with Black-Box Foundation Models via Orthogonal Transformations

Authors: Eun Gyung Kong, Je Won Yeom, Yonghoon Jeon, Taesup Kim

Abstract: Federated Learning (FL) facilitates decentralized model training while preserving data privacy. However, achieving both robust generalization and effective personalization simultaneously in heterogeneous (non-IID) environments remains a formidable challenge. Furthermore, the widespread adoption of proprietary Foundation Models (FMs) introduces a critical requirement for dual privacy: (a) protecting sensitive client data and (b) securing the server's valuable intellectual property. This mandates strictly black-box access to the FM. To address these multifaceted challenges, we introduce FedOT, a novel FL framework optimized for black-box FMs. FedOT employs a shared global task-dependent classifier while facilitating local adaptation through client-specific orthogonal transformations applied externally to the FM embeddings. This architecture inherently guarantees that the FM's internal parameters remain inaccessible and unmodified. By enforcing orthogonality, FedOT effectively mitigates gradient conflicts across diverse clients, which is theoretically bounded, preserves the semantic integrity of the FM representations, and achieves robust performance under significant data heterogeneity. The synergy of global and local parameters optimally balances generalization and personalization, markedly outperforming baseline FL methods across diverse benchmarks. Extensive empirical analysis, including rigorous multi-seed validation and scalability assessments, substantiates the robustness, efficiency, and superior performance of FedOT.

replace Variational Deep Learning via Implicit Regularization

Authors: Jonathan Wenger, Beau Coker, Juraj Marusic, John P. Cunningham

Abstract: Modern deep learning models generalize remarkably well in-distribution, despite being overparametrized and trained with little to no explicit regularization. Instead, current theory credits implicit regularization imposed by the choice of architecture, hyperparameters, and optimization procedure. However, deep neural networks can be surprisingly non-robust, resulting in overconfident predictions and poor out-of-distribution generalization. Bayesian deep learning addresses this via model averaging, but typically requires significant computational resources as well as carefully elicited priors to avoid overriding the benefits of implicit regularization. Instead, in this work, we propose to regularize variational neural networks solely by relying on the implicit bias of (stochastic) gradient descent. We theoretically characterize this inductive bias in overparametrized linear models as generalized variational inference and demonstrate the importance of the choice of parametrization. Empirically, our approach demonstrates strong in- and out-of-distribution performance without additional hyperparameter tuning and with minimal computational overhead.

replace Towards Operational Automated Greenhouse Gas Plume Detection and Delineation

Authors: Brian D. Bue, Jake H. Lee, Andrew K. Thorpe, Philip G. Brodrick, Daniel Cusworth, Alana Ayasse, Vassiliki Mancoridis, Anagha Satish, Shujun Xiong, Riley Duren

Abstract: Operational deployment of a fully automated facility-scale greenhouse gas (GHG) plume detection system remains challenging for fine spatial resolution imaging spectrometers, despite recent advances in deep learning approaches. With the dramatic increase in data availability, however, automation continues to increase in importance for emissions monitoring. This work reviews and addresses several key obstacles in the field: data and label quality control, prevention of spatiotemporal biases, and correctly aligned modeling objectives. We demonstrate through rigorous experiments using multicampaign data from airborne and spaceborne instruments that convolutional neural networks (CNNs) are able to achieve operational detection performance when these obstacles are alleviated. We demonstrate that a multitask model that learns both instance detection and pixelwise segmentation simultaneously can successfully lead towards an operational pathway. We evaluate the model's plume detectability across emission source types and regions, identifying thresholds for operational deployment. Finally, we provide analysis-ready data, models, and source code for reproducibility, and work to define a set of best practices and validation standards to facilitate future contributions to the field.

replace Inference-Time Scaling of Discrete Diffusion Models via Importance Weighting and Optimal Proposal Design

Authors: Zijing Ou, Chinmay Pani, Yingzhen Li

Abstract: Discrete diffusion models have become highly effective across various domains. However, real-world applications often require the generative process to adhere to certain constraints. To this end, we propose a Sequential Monte Carlo (SMC) framework that enables scalable inference-time control of discrete diffusion models through principled importance weighting and optimal proposal construction. Specifically, our approach derives tractable importance weights for a range of intermediate targets and characterises the optimal proposal, for which we develop two practical approximations: a first-order gradient-based approximation and an amortised proposal trained to minimise the log-variance of the importance weights. Empirical results across synthetic tasks, language modelling, biology design, and text-to-image generation demonstrate that our framework enhances controllability and sample quality, highlighting the effectiveness of SMC as a versatile recipe for scaling discrete diffusion models at inference time.

replace Automated Modeling Method for Pathloss Model Discovery

Authors: Ahmad Anaqreh, Shih-Kai Chou, Bla\v{z} Bertalani\v{c}, Mihael Mohor\v{c}i\v{c}, Thomas Lagkas, Carolina Fortuna

Abstract: Modeling propagation is the cornerstone for designing and optimizing next-generation wireless systems, with a particular emphasis on 5G and beyond era. Traditional modeling methods have long relied on statistic-based techniques to characterize propagation behavior across different environments. With the expansion of wireless communication systems, there is a growing demand for methods that guarantee the accuracy and interpretability of modeling. Artificial intelligence (AI)-based techniques, in particular, are increasingly being adopted to overcome this challenge, although the interpretability is not assured with most of these methods. Inspired by recent advancements in AI, this paper proposes a novel approach that accelerates the discovery of path loss models while maintaining interpretability. The proposed method automates the formulation, evaluation, and refinement of the model, facilitating the discovery of the model. We examine two techniques: one based on Deep Symbolic Regression, offering full interpretability, and the second based on Kolmogorov-Arnold Networks, providing two levels of interpretability. Both approaches are evaluated on two synthetic and two real-world datasets. Our results show that Kolmogorov-Arnold Networks achieve the coefficient of determination value R^2 close to 1 with minimal prediction error, while Deep Symbolic Regression generates compact models with moderate accuracy. Moreover, on the selected examples, we demonstrate that automated methods outperform traditional methods, achieving up to 75% reduction in prediction errors, offering accurate and explainable solutions with potential to increase the efficiency of discovering next-generation path loss models.

replace Adaptive Deadline and Batch Layered Synchronized Federated Learning

Authors: Asaf Goren, Natalie Lang, Nir Shlezinger, Alejandro Cohen

Abstract: Federated learning (FL) enables collaborative model training across distributed edge devices while preserving data privacy, and typically operates in a round-based synchronous manner. However, synchronous FL suffers from latency bottlenecks due to device heterogeneity, where slower clients (stragglers) delay or degrade global updates. Prior solutions, such as fixed deadlines, client selection, and layer-wise partial aggregation, alleviate the effect of stragglers, but treat round timing and local workload as static parameters, limiting their effectiveness under strict time constraints. We propose ADEL-FL, a novel framework that jointly optimizes per-round deadlines and user-specific batch sizes for layer-wise aggregation. Our approach formulates a constrained optimization problem minimizing the expected L2 distance to the global optimum under total training time and global rounds. We provide a convergence analysis under exponential compute models and prove that ADEL-FL yields unbiased updates with bounded variance. Extensive experiments demonstrate that ADEL-FL outperforms alternative methods in both convergence rate and final accuracy under heterogeneous conditions.

replace Multiresolution Analysis and Statistical Thresholding on Dynamic Networks

Authors: Rapha\"el Romero, Tijl De Bie, Nick Heard, Alexander Modell

Abstract: Detecting structural change in dynamic network data has wide-ranging applications. Existing approaches typically divide the data into time bins, extract network features within each bin, and then compare these features over time. This introduces an inherent tradeoff between temporal resolution and the statistical stability of the extracted features. Despite this tradeoff, reminiscent of time-frequency tradeoffs in signal processing, most methods rely on a fixed temporal resolution. Choosing an appropriate resolution parameter is typically difficult and can be especially problematic in domains like cybersecurity, where anomalous behavior may emerge at multiple time scales. We address this challenge by proposing ANIE (Adaptive Network Intensity Estimation), a multi-resolution framework designed to automatically identify the time scales at which network structure evolves, enabling the joint detection of both rapid and gradual changes. Modeling interactions as Poisson processes, our method proceeds in two steps: (1) estimating a low-dimensional subspace of node behavior, and (2) deriving a set of novel empirical affinity coefficients that quantify change in interaction intensity between latent factors and support statistical testing for structural change across time scales. We provide theoretical guarantees for subspace estimation and the asymptotic behavior of the affinity coefficients, enabling model-based change detection. Experiments on synthetic networks show that ANIE adapts to the appropriate time resolution and is able to capture sharp structural changes while remaining robust to noise. Furthermore, applications to real-world data showcase the practical benefits of ANIE's multiresolution approach to detecting structural change over fixed resolution methods.

replace Risk-Sensitive Agent Compositions

Authors: Guruprerana Shabadi, Rajeev Alur

Abstract: From software development to robot control, modern agentic systems decompose complex objectives into a sequence of subtasks and choose a set of specialized AI agents to complete them. We formalize agentic workflows as directed acyclic graphs, called agent graphs, where edges represent AI agents and paths correspond to feasible compositions of agents. Real-world deployment requires selecting agent compositions that not only maximize task success but also minimize violations of safety, fairness, and privacy requirements which demands a careful analysis of the low-probability (tail) behaviors of compositions of agents. In this work, we consider risk minimization over the set of feasible agent compositions and seek to minimize the value-at-risk and the conditional value-at-risk of the loss distribution of the agent composition where the loss quantifies violations of these requirements. We introduce an efficient algorithm which traverses the agent graph and finds a near-optimal composition of agents. It uses a dynamic programming approach to approximate the value-at-risk of agent compositions by exploiting a union bound. Furthermore, we prove that the approximation is near-optimal asymptotically for a broad class of practical loss functions. We also show how our algorithm can be used to approximate the conditional value-at-risk as a byproduct. To evaluate our framework, we consider a suite of video game-like control benchmarks that require composing several agents trained with reinforcement learning and demonstrate our algorithm's effectiveness in approximating the value-at-risk and identifying the optimal agent composition.

replace Aligning Latent Spaces with Flow Priors

Authors: Yizhuo Li, Yuying Ge, Yixiao Ge, Ying Shan, Ping Luo

Abstract: This paper presents a novel framework for aligning learnable latent spaces to arbitrary target distributions by leveraging flow-based generative models as priors. Our method first pretrains a flow model on the target features to capture the underlying distribution. This fixed flow model subsequently regularizes the latent space via an alignment loss, which reformulates the flow matching objective to treat the latents as optimization targets. We formally prove that minimizing this alignment loss establishes a computationally tractable surrogate objective for maximizing a variational lower bound on the log-likelihood of latents under the target distribution. Notably, the proposed method eliminates computationally expensive likelihood evaluations and avoids ODE solving during optimization. As a proof of concept, we demonstrate in a controlled setting that the alignment loss landscape closely approximates the negative log-likelihood of the target distribution. We further validate the effectiveness of our approach through large-scale image generation experiments on ImageNet with diverse target distributions, accompanied by detailed discussions and ablation studies. With both theoretical and empirical validation, our framework paves a new way for latent space alignment.

replace AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification

Authors: Geonwoo Cho, Jaemoon Lee, Jaegyun Im, Subi Lee, Jihwan Lee, Sundong Kim

Abstract: Skill-based reinforcement learning (SBRL) enables rapid adaptation in environments with sparse rewards by pretraining a skill-conditioned policy. Effective skill learning requires jointly maximizing both exploration and skill diversity. However, existing methods often face challenges in simultaneously optimizing for these two conflicting objectives. In this work, we propose a new method, Adaptive Multi-objective Projection for balancing Exploration and skill Diversification (AMPED), which explicitly addresses both: during pre-training, a gradient-surgery projection balances the exploration and diversity gradients, and during fine-tuning, a skill selector exploits the learned diversity by choosing skills suited to downstream tasks. Our approach achieves performance that surpasses SBRL baselines across various benchmarks. Through an extensive ablation study, we identify the role of each component and demonstrate that each element in AMPED is contributing to performance. We further provide theoretical and empirical evidence that, with a greedy skill selector, greater skill diversity reduces fine-tuning sample complexity. These results highlight the importance of explicitly harmonizing exploration and diversity and demonstrate the effectiveness of AMPED in enabling robust and generalizable skill learning. Project Page: https://geonwoo.me/amped/

URLs: https://geonwoo.me/amped/

replace Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning

Authors: Shubham Parashar, Shurui Gui, Xiner Li, Hongyi Ling, Sushil Vemuri, Blake Olson, Eric Li, Yu Zhang, James Caverlee, Dileep Kalathil, Shuiwang Ji

Abstract: We aim to improve the reasoning capabilities of language models via reinforcement learning (RL). Recent RL post-trained models like DeepSeek-R1 have demonstrated reasoning abilities on mathematical and coding tasks. However, prior studies suggest that using RL alone to improve reasoning on inherently difficult tasks is less effective. Here, we draw inspiration from curriculum learning and propose to schedule tasks from easy to hard (E2H), allowing LLMs to build reasoning skills gradually. Our method is termed E2H Reasoner. Empirically, we observe that, although easy tasks are important initially, fading them out through appropriate scheduling is essential in preventing overfitting. Theoretically, we establish convergence guarantees for E2H Reasoner within an approximate policy iteration framework. We derive finite-sample complexity bounds and show that when tasks are appropriately decomposed and conditioned, learning through curriculum stages requires fewer total samples than direct learning. Experiments across multiple domains show that E2H Reasoner significantly improves the reasoning ability of small LLMs (1.5B to 3B), which otherwise struggle when trained with vanilla RL alone, highlighting the effectiveness of our method. Our code can be found on https://github.com/divelab/E2H-Reasoning.

URLs: https://github.com/divelab/E2H-Reasoning.

replace TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design

Authors: Geonwoo Cho, Jaegyun Im, Jihwan Lee, Hojun Yi, Sejin Kim, Sundong Kim

Abstract: Generalizing deep reinforcement learning agents to unseen environments remains a significant challenge. One promising solution is Unsupervised Environment Design (UED), a co-evolutionary framework in which a teacher adaptively generates tasks with high learning potential, while a student learns a robust policy from this evolving curriculum. Existing UED methods typically measure learning potential via regret, the gap between optimal and current performance, approximated solely by value-function loss. Building on these approaches, we introduce the transition-prediction error as an additional term in our regret approximation. To capture how training on one task affects performance on others, we further propose a lightweight metric called Co-Learnability. By combining these two measures, we present Transition-aware Regret Approximation with Co-learnability for Environment Design (TRACED). Empirical evaluations show that TRACED produces curricula that improve zero-shot generalization over strong baselines across multiple benchmarks. Ablation studies confirm that the transition-prediction error drives rapid complexity ramp-up and that Co-Learnability delivers additional gains when paired with the transition-prediction error. These results demonstrate how refined regret approximation and explicit modeling of task relationships can be leveraged for sample-efficient curriculum design in UED. Project Page: https://geonwoo.me/traced/

URLs: https://geonwoo.me/traced/

replace UniOD: A Universal Model for Outlier Detection across Diverse Domains

Authors: Dazhi Fu, Jicong Fan

Abstract: Outlier detection (OD), distinguishing inliers and outliers in completely unlabeled datasets, plays a vital role in science and engineering. Although there have been many insightful OD methods, most of them require troublesome hyperparameter tuning (a challenge in unsupervised learning) and costly model training for every task or dataset. In this work, we propose UniOD, a universal OD framework that leverages labeled datasets to train a single model capable of detecting outliers of datasets with different feature dimensions and heterogeneous feature spaces from diverse domains. Specifically, UniOD extracts uniform and comparable features across different datasets by constructing and factorizing multi-scale point-wise similarity matrices. It then employs graph neural networks to capture comprehensive within-dataset and between-dataset information simultaneously, and formulates outlier detection tasks as node classification tasks. As a result, once the training is complete, UniOD can identify outliers in datasets from diverse domains without any further model/hyperparameter selection and parameter optimization, which greatly improves convenience and accuracy in real applications. More importantly, we provide theoretical guarantees for the effectiveness of UniOD, consistent with our numerical results. We evaluate UniOD on 30 benchmark OD datasets against 17 baselines, demonstrating its effectiveness and superiority. Our code is available at https://github.com/fudazhiaka/UniOD.

URLs: https://github.com/fudazhiaka/UniOD.

replace Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI

Authors: Julien Pourcel, C\'edric Colas, Pierre-Yves Oudeyer

Abstract: Many program synthesis tasks prove too challenging for even state-of-the-art language models to solve in single attempts. Search-based evolutionary methods offer a promising alternative by exploring solution spaces iteratively, but their effectiveness remain limited by the fixed capabilities of the underlying generative model. We propose SOAR, a method that learns program synthesis by integrating language models into a self-improving evolutionary loop. SOAR alternates between (1) an evolutionary search that uses an LLM to sample and refine candidate solutions, and (2) a hindsight learning phase that converts search attempts into valid problem-solution pairs used to fine-tune the LLM's sampling and refinement capabilities\, -- \,enabling increasingly effective search in subsequent iterations. On the challenging ARC-AGI benchmark, SOAR achieves significant performance gains across model scales and iterations, leveraging positive transfer between the sampling and refinement finetuning tasks. These improvements carry over to test-time adaptation, enabling SOAR to solve 52\% of the public test set. Our code is open-sourced at: https://github.com/flowersteam/SOAR

URLs: https://github.com/flowersteam/SOAR

replace Efficient Neural Combinatorial Optimization Solver for the Min-max Heterogeneous Capacitated Vehicle Routing Problem

Authors: Xuan Wu, Di Wang, Chunguo Wu, Kaifang Qi, Chunyan Miao, Yubin Xiao, Jian Zhang, You Zhou

Abstract: Numerous Neural Combinatorial Optimization (NCO) solvers have been proposed to address Vehicle Routing Problems (VRPs). However, most of these solvers focus exclusively on single-vehicle VRP variants, overlooking the more realistic min-max Heterogeneous Capacitated Vehicle Routing Problem (MMHCVRP), which involves multiple vehicles. Existing MMHCVRP solvers typically select a vehicle and its next node to visit at each decoding step, but often make myopic decoding decisions and overlook key properties of MMHCVRP, including local topological relationships, vehicle permutation invariance, and node symmetry, resulting in suboptimal performance. To better address these limitations, we propose ECHO, an efficient NCO solver. First, ECHO exploits the proposed dual-modality node encoder to capture local topological relationships among nodes. Subsequently, to mitigate myopic decisions, ECHO employs the proposed Parameter-Free Cross-Attention mechanism to prioritize the vehicle selected in the preceding decoding step. Finally, leveraging vehicle permutation invariance and node symmetry, we introduce a tailored data augment strategy for MMHCVRP to stabilize the Reinforcement Learning training process. To assess the performance of ECHO, we conduct extensive experiments. The experimental results demonstrate that ECHO outperforms state-of-the-art NCO solvers across varying numbers of vehicles and nodes, and exhibits well-performing generalization across both scales and distribution patterns. Finally, ablation studies validate the effectiveness of all proposed methods.

replace Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models

Authors: Zizhuo Zhang, Jianing Zhu, Xinmu Ge, Zihua Zhao, Zhanke Zhou, Xuan Li, Xiao Feng, Jiangchao Yao, Bo Han

Abstract: While reinforcement learning with verifiable rewards (RLVR) is effective to improve the reasoning ability of large language models (LLMs), its reliance on human-annotated labels leads to the scaling up dilemma, especially for complex tasks. Recent self-rewarding methods investigate a label-free alternative to unlock the reasoning capabilities of LLMs, yet they frequently encounter the non-negligible training collapse issue, as the single-view supervision signal easily forms the self-consistent illusion, yielding the reward hacking. Inspired by the success of self-supervised learning, we propose \textit{Co-rewarding}, a novel self-supervised RL framework that improves training stability by seeking complementary supervision from another views. Specifically, we instantiate Co-rewarding in two ways: (1) \textit{Co-rewarding-I} is a data-side instantiation that derives reward signals from contrastive agreement across semantically analogous questions; and (2) \textit{Co-rewarding-II} is a model-side instantiation that maintains a slowly-updated reference teacher with pseudo labels to realize self-distillation. Intuitively, such instantiations introduce different levels of discrepancy to increase the difficulty of training collapse on trivial reasoning solutions. We also explore their orthogonally combined version to further boost the performance. Empirically, Co-rewarding exhibits stable training across various setups, and outperforms other self-rewarding baselines by $+3.31\%$ improvements on average on multiple mathematical reasoning benchmarks, especially by $+7.49\%$ on Llama-3.2-3B-Instruct. Notably, Co-rewarding reaches or even surpasses RLVR with ground-truth (GT) label in several cases, such as a Pass@$1$ of $94.01\%$ on GSM8K with Qwen3-8B-Base remarkably higher than GT. Our code is released at https://github.com/tmlr-group/Co-rewarding.

URLs: https://github.com/tmlr-group/Co-rewarding.

replace Retrieval-augmented Decoding for Improving Truthfulness in Open-ended Generation

Authors: Manh Nguyen, Sunil Gupta, Hung Le

Abstract: Ensuring truthfulness in large language models (LLMs) remains a critical challenge for reliable text generation. While supervised fine-tuning and reinforcement learning with human feedback have shown promise, they require a substantial amount of annotated data and computational resources, limiting scalability. In contrast, decoding-time interventions offer lightweight alternatives without model retraining. However, existing decoding strategies often face issues like prompt sensitivity, limited generalization, or dependence on internal model states. We propose Retrieval-Augmented Decoding (RAD), a context-aware adaptive decoding method that leverages a compact reference grounding space built from as few as 10 annotated examples and comprising pairs of context embeddings and next-token logits from truthful responses, to enable retrieval-based logit shaping during inference. At each decoding step, RAD retrieves high-quality semantically similar contexts from this grounding space and aggregates their associated next token logits to modify the model's current logits. Across four open-ended generation benchmarks and four LLMs, our method consistently outperforms strong baselines and shows robust cross-task generalization, underscoring the promise of context-aware decoding for enhancing factual reliability.

replace FeDaL: Federated Dataset Learning for General Time Series Foundation Models

Authors: Shengchao Chen, Guodong Long, Michael Blumenstein, Jing Jiang

Abstract: Dataset-level heterogeneity introduces significant domain biases that fundamentally degrade generalization on general Time Series Foundation Models (TSFMs), yet this challenge remains underexplored. This paper rethinks the from-scratch training of TSFMs using the paradigm of federated learning. We propose a novel Federated Dataset Learning (FeDaL) approach to tackle heterogeneous time series by learning dataset-agnostic temporal representations. Specifically, the distributed architecture of federated learning is a nature solution to decompose heterogeneous TS datasets into shared generalized knowledge and preserved personalized knowledge. Moreover, based on the TSFM architecture, FeDaL explicitly mitigates both local and global biases by adding two complementary mechanisms: Domain Bias Elimination (DBE) and Global Bias Elimination (GBE). FeDaL`s cross-dataset generalization has been extensively evaluated in real-world datasets spanning eight tasks (including various regression and classification), against 54 baselines. We further analyze federated scaling behavior, showing how data volume, client count, and join rate affect model performance under decentralization. Our code is publicly available at https://github.com/shengchaochen82/FeDaL

URLs: https://github.com/shengchaochen82/FeDaL

replace Trust Region Constrained Measure Transport in Path Space for Stochastic Optimal Control and Inference

Authors: Denis Blessing, Julius Berner, Lorenz Richter, Carles Domingo-Enrich, Yuanqi Du, Arash Vahdat, Gerhard Neumann

Abstract: Solving stochastic optimal control problems with quadratic control costs can be viewed as approximating a target path space measure, e.g. via gradient-based optimization. In practice, however, this optimization is challenging in particular if the target measure differs substantially from the prior. In this work, we therefore approach the problem by iteratively solving constrained problems incorporating trust regions that aim for approaching the target measure gradually in a systematic way. It turns out that this trust region based strategy can be understood as a geometric annealing from the prior to the target measure, where, however, the incorporated trust regions lead to a principled and educated way of choosing the time steps in the annealing path. We demonstrate in multiple optimal control applications that our novel method can improve performance significantly, including tasks in diffusion-based sampling, transition path sampling, and fine-tuning of diffusion models.

replace Generating solution paths of Markovian stochastic differential equations using diffusion models

Authors: Xuefeng Gao, Jiale Zha, Xun Yu Zhou

Abstract: This paper introduces a new approach to generating sample paths of unknown Markovian stochastic differential equations (SDEs) using diffusion models, a class of generative AI methods commonly employed in image and video applications. Unlike the traditional Monte Carlo methods for simulating SDEs, which require explicit specifications of the drift and diffusion coefficients, ours takes a model-free, data-driven approach. Given a finite set of sample paths from an SDE, we utilize conditional diffusion models to generate new, synthetic paths of the same SDE. Numerical experiments show that our method consistently outperforms two alternative methods in terms of the Kullback--Leibler (KL) divergence between the distributions of the target SDE paths and the generated ones. Moreover, we present a theoretical error analysis deriving an explicit bound on the said KL divergence. Finally, in simulation and empirical studies, we leverage these synthetically generated sample paths to boost the performance of reinforcement learning algorithms for continuous-time mean--variance portfolio selection, hinting promising applications of our study in financial analysis and decision-making.

replace Limitations of Public Chest Radiography Datasets for Artificial Intelligence: Label Quality, Domain Shift, Bias and Evaluation Challenges

Authors: Amy Rafferty, Ajitha Rajan

Abstract: Artificial intelligence has shown significant promise in chest radiography, where deep learning models can approach radiologist-level diagnostic performance. Progress has been accelerated by large public datasets such as MIMIC-CXR, ChestX-ray14, PadChest, and CheXpert, which provide hundreds of thousands of labelled images with pathology annotations. However, these datasets also present important limitations. Automated label extraction from radiology reports introduces errors, particularly in handling uncertainty and negation, and radiologist review frequently disagrees with assigned labels. In addition, domain shift and population bias restrict model generalisability, while evaluation practices often overlook clinically meaningful measures. We conduct a systematic analysis of these challenges, focusing on label quality, dataset bias, and domain shift. Our cross-dataset domain shift evaluation across multiple model architectures revealed substantial external performance degradation, with pronounced reductions in AUPRC and F1 scores relative to internal testing. To assess dataset bias, we trained a source-classification model that distinguished datasets with near-perfect accuracy, and performed subgroup analyses showing reduced performance for minority age and sex groups. Finally, expert review by two board-certified radiologists identified significant disagreement with public dataset labels. Our findings highlight important clinical weaknesses of current benchmarks and emphasise the need for clinician-validated datasets and fairer evaluation frameworks.

replace Null-Space Filtering for Data-Free Continual Model Merging: Preserving Stability, Promoting Plasticity

Authors: Zihuan Qiu, Lei Wang, Yang Cao, Runtong Zhang, Bing Su, Yi Xu, Fanman Meng, Linfeng Xu, Qingbo Wu, Hongliang Li

Abstract: Data-free continual model merging (DFCMM) aims to fuse independently fine-tuned models into a single backbone that evolves with incoming tasks without accessing task data. This paper revisits two fundamental desiderata for DFCMM: stability, avoiding interference with earlier tasks, and plasticity, adapting faithfully to each new task. This poses a challenge that existing approaches fail to address: how to bridge data-level desiderata with parameter-space optimization to ensure stability and plasticity in the absence of task data. To this end, we propose NUFILT (NUll-space FILTering), a data-free framework that directly links these desiderata into parameter-space optimization. Our key observation is that task vectors approximately align with representation subspaces, providing structural surrogates for enforcing stability and plasticity. Accordingly, we design a null-space projector that preserves prior responses by filtering overlapping components of new task vectors, ensuring stability. We further introduce a lightweight LoRA adapter that injects complementary task-specific signals to enable plasticity. The adapter is trained with a projection-based surrogate loss that preserves consistency with prior knowledge while introducing novel directions. This joint filtering-adaptation process enables the backbone to absorb new knowledge while retaining existing behaviors, with updates fused back in a layer-wise linear fashion without extra parameters or inference cost. Theoretically, we establish approximate subspace alignment guarantees that justify null-space filtering. Empirically, NUFILT achieves state-of-the-art performance with minimal forgetting on both vision and NLP benchmarks, improving average accuracy by 4-7% over OPCM and WUDI-Merging, while narrowing the gap to fine-tuning and reducing computation overhead. The code is available at: https://github.com/zihuanqiu/NUFILT

URLs: https://github.com/zihuanqiu/NUFILT

replace Deterministic Policy Gradient for Reinforcement Learning with Continuous Time and State

Authors: Ziheng Cheng, Xin Guo, Yufei Zhang

Abstract: The theory of continuous-time reinforcement learning (RL) has progressed rapidly in recent years. While the ultimate objective of RL is typically to learn deterministic control policies, most existing continuous-time RL methods rely on stochastic policies. Such approaches often require sampling actions at very high frequencies, and involve computationally expensive expectations over continuous action spaces, resulting in high-variance gradient estimates and slow convergence. In this paper, we introduce and develop deterministic policy gradient (DPG) methods for continuous-time RL. We derive a continuous-time policy gradient formula expressed as the expected gradient of an advantage rate function and establish a martingale characterization for both the value function and the advantage rate. These theoretical results provide tractable estimators for deterministic policy gradients in continuous-time RL. Building on this foundation, we propose a model-free continuous-time Deep Deterministic Policy Gradient (CT-DDPG) algorithm that enables stable learning for general reinforcement learning problems with continuous time-and-state. Numerical experiments show that CT-DDPG achieves superior stability and faster convergence compared to existing stochastic-policy methods, across a wide range of learning tasks with varying time discretizations and noise levels.

replace XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning

Authors: Daniel Palenicek, Florian Vogt, Joe Watson, Ingmar Posner, Jan Peters

Abstract: Sample efficiency is a central property of effective deep reinforcement learning algorithms. Recent work has improved this through added complexity, such as larger models, exotic network architectures, and more complex algorithms, which are typically motivated purely by empirical performance. We take a more principled approach by focusing on the optimization landscape of the critic network. Using the eigenspectrum and condition number of the critic's Hessian, we systematically investigate the impact of common architectural design decisions on training dynamics. Our analysis reveals that a novel combination of batch normalization (BN), weight normalization (WN), and a distributional cross-entropy (CE) loss produces condition numbers orders of magnitude smaller than baselines. This combination also naturally bounds gradient norms, a property critical for maintaining a stable effective learning rate under non-stationary targets and bootstrapping. Based on these insights, we introduce XQC: a well-motivated, sample-efficient deep actor-critic algorithm built upon soft actor-critic that embodies these optimization-aware principles. We achieve state-of-the art sample efficiency across 55 proprioception and 15 vision-based continuous control tasks, all while using significantly fewer parameters than competing methods. Our code is available at danielpalenicek.github.io/projects/xqc.

replace Machine Learning Detection of Lithium Plating in Lithium-ion Cells: A Gaussian Process Approach

Authors: Ayush Patnaik, Jackson Fogelquist, Adam B Zufall, Yiwei Ji, Stephen K Robinson, Peng Bai, Xinfan Lin

Abstract: Lithium plating during fast charging is a critical degradation mechanism that accelerates capacity fade and can trigger catastrophic safety failures. Recent work has shown that plating onset can manifest in incremental-capacity analysis as an additional high-voltage feature above 4.0 V, often appearing as a secondary peak or shoulder distinct from the main intercalation peak complex; however, conventional methods for computing dQ/dV rely on finite differencing with filtering, which amplifies sensor noise and introduces bias in feature location. In this paper, we propose a Gaussian Process (GP) framework for lithium plating detection by directly modeling the charge-voltage relationship Q(V) as a stochastic process with calibrated uncertainty. Leveraging the property that derivatives of GPs remain GPs, we infer dQ/dV analytically and probabilistically from the posterior, enabling robust detection without ad hoc smoothing. The framework provides three key benefits: (i) noise-aware inference with hyperparameters learned from data, (ii) closed-form derivatives with credible intervals for uncertainty quantification, and (iii) scalability to online variants suitable for embedded BMS. Experimental validation on Li-ion coin cells across a range of C-rates (0.2C-1C) and temperatures (0-40$^\circ$C) demonstrates that the GP-based method reliably resolves distinct high-voltage secondary peak features under low-temperature, high-rate charging, while correctly reporting no features in non-plating cases. The concurrence of GP-identified differential features, reduced charge throughput, capacity fade measured via reference performance tests, and post-mortem microscopy confirmation supports the interpretation of these signatures as plating-related, establishing a practical pathway for real-time lithium plating detection.

replace Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness

Authors: Tsubasa Takahashi, Shojiro Yamabe, Futa Waseda, Kento Sasaki

Abstract: Differential Attention (DA) has been proposed as a refinement to standard attention, suppressing redundant or noisy context through a subtractive structure and thereby reducing contextual hallucination. While this design sharpens task-relevant focus, we show that it also introduces a structural fragility under adversarial perturbations. Our theoretical analysis identifies negative gradient alignment-a configuration encouraged by DA's subtraction-as the key driver of sensitivity amplification, leading to increased gradient norms and elevated local Lipschitz constants. We empirically validate this Fragile Principle through systematic experiments on ViT/DiffViT and evaluations of pretrained CLIP/DiffCLIP, spanning five datasets in total. These results demonstrate higher attack success rates, frequent gradient opposition, and stronger local sensitivity compared to standard attention. Furthermore, depth-dependent experiments reveal a robustness crossover: stacking DA layers attenuates small perturbations via depth-dependent noise cancellation, though this protection fades under larger attack budgets. Overall, our findings uncover a fundamental trade-off: DA improves discriminative focus on clean inputs but increases adversarial vulnerability, underscoring the need to jointly design for selectivity and robustness in future attention mechanisms.

replace TsLLM: Augmenting LLMs for General Time Series Understanding and Prediction

Authors: Felix Parker, Nimeesha Chan, Chi Zhang, Kimia Ghobadi

Abstract: Time series data is fundamental to decision-making across many domains including healthcare, finance, power systems, and logistics. However, analyzing this data correctly often requires incorporating unstructured contextual information, answering domain-specific questions, and generating natural language explanations - capabilities that traditional time series models lack. While Large Language Models (LLMs) excel at contextual reasoning and knowledge integration, they struggle with numerical time series due to inefficient text-based representations and limited exposure to numerical data during pretraining. We address this gap by augmenting an LLM with specialized time series perception through a patch-based encoder-decoder architecture. We train this Time Series augmented LLM (TsLLM) on a large corpus of over 25 billion tokens of interleaved time series and text spanning diverse tasks: forecasting with contextual information, question-answering, anomaly detection, classification, report generation, and more, all unified as next token prediction. This training enables TsLLM to leverage both its language understanding and newly acquired temporal reasoning capabilities. While not designed to surpass specialized models on traditional benchmarks, TsLLM demonstrates strong performance on tasks requiring the integration of time series analysis with natural language - capabilities that existing approaches cannot provide. It also exhibits strong zero-shot and few-shot performance, showing it can adapt to new data without additional training.

replace Eliciting Chain-of-Thought Reasoning for Time Series Analysis using Reinforcement Learning

Authors: Felix Parker, Nimeesha Chan, Chi Zhang, Kimia Ghobadi

Abstract: Complex numerical time series analysis often demands multi-step reasoning capabilities beyond current models' reach. Tasks like medical diagnosis and weather forecasting require sequential reasoning processes - including counterfactual analysis, logical deduction, knowledge application, and multi-modal contextual integration - that existing time series models cannot explicitly perform. While recent research has shown large language models (LLMs) can achieve sophisticated Chain-of-Thought (CoT) reasoning through reinforcement learning (RL), these advances have primarily focused on mathematical and coding domains, with LLMs still demonstrating poor performance on time series tasks. We introduce Chain Of thought for Understanding Numerical Time Series (COUNTS), the first framework that trains LLMs to perform CoT reasoning across diverse time series tasks using RL with verifiable rewards. Our approach employs a Residual Vector-Quantized VAE to create high-fidelity discrete tokens that seamlessly integrate into a pre-trained LLM's vocabulary. COUNTS undergoes a two-stage training process: first, supervised fine-tuning on time series analysis tasks to master our novel representations, followed by Group Relative Policy Optimization training on verifiable problems using prompting strategies that encourage explicit reasoning steps before producing final answers. Our experiments demonstrate that this RL-driven approach with intermediate CoT reasoning significantly enhances LLM performance across various time series analysis tasks, opening new possibilities for complex temporal data reasoning.

replace Convergence of Distributionally Robust Q-Learning with Linear Function Approximation

Authors: Saptarshi Mandal, Yashaswini Murthy, R. Srikant

Abstract: Distributionally robust reinforcement learning (DRRL) focuses on designing policies that achieve good performance under model uncertainties. The goal is to maximize the worst-case long-term discounted reward, where the data for RL comes from a nominal model while the deployed environment can deviate from the nominal model within a prescribed uncertainty set. Existing convergence guarantees for DRRL are limited to tabular MDPs or are dependent on restrictive discount factor assumptions when function approximation is used. We present a convergence result for a robust Q-learning algorithm with linear function approximation without any discount factor restrictions. In this paper, the robustness is measured with respect to the total-variation distance uncertainty set. Our model free algorithm does not require generative access to the MDP and achieves an $\tilde{\mathcal{O}}(1/\epsilon^{4})$ sample complexity for an $\epsilon$-accurate value estimate. Our results close a key gap between the empirical success of robust RL algorithms and the non-asymptotic guarantees enjoyed by their non-robust counterparts. The key ideas in the paper also extend in a relatively straightforward fashion to robust Temporal-Difference (TD) learning with function approximation. The robust TD learning algorithm is discussed in the Appendix.

replace Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning

Authors: Ziyan Wang, Zheng Wang, Jie Fu, Xingwei Qu, Qi Cheng, Shengpu Tang, Minjia Zhang, Xiaoming Huo

Abstract: Reinforcement learning (RL) has become central to enhancing reasoning in large language models (LLMs). Yet on-policy algorithms such as Group Relative Policy Optimization (GRPO) often suffer in early training: noisy gradients from low-quality rollouts lead to unstable updates and inefficient exploration. We introduce Slow-Fast Policy Optimization (SFPO), a simple yet efficient framework to address the above limitations via decomposing each step into three stages: a short fast trajectory of inner steps on the same batch, a reposition mechanism to control off-policy drift, and a final slow correction. This reposition-before-update design preserves the objective and rollout process unchanged, making SFPO plug-compatible with existing policy-gradient pipelines. Extensive experiments demonstrate that SFPO consistently improves stability, reduces number of rollouts, and accelerates convergence of reasoning RL training. Specifically, it outperforms GRPO by up to 2.80 points in average on math reasoning benchmarks. It also achieves up to 4.93\texttimes{} fewer rollouts and an up to 4.19\texttimes{} reduction in wall-clock time to match GRPO's best accuracy. Project website is available at https://slow-fast-po.github.io/.

URLs: https://slow-fast-po.github.io/.

replace Post-hoc Stochastic Concept Bottleneck Models

Authors: Wiktor Jan Hoffmann, Sonia Laguna, Moritz Vandenhirtz, Emanuele Palumbo, Julia E. Vogt

Abstract: Concept Bottleneck Models (CBMs) are interpretable models that predict the target variable through high-level human-understandable concepts, allowing users to intervene on mispredicted concepts to adjust the final output. While recent work has shown that modeling dependencies between concepts can improve CBM performance, especially under interventions, such approaches typically require retraining the entire model, which may be infeasible when access to the original data or compute is limited. In this paper, we introduce Post-hoc Stochastic Concept Bottleneck Models (PSCBMs), a lightweight method that augments any pre-trained CBM with a multivariate normal distribution over concepts by adding only a small covariance-prediction module, without retraining the backbone model. We propose two training strategies and show on real-world data that PSCBMs consistently match or improve both concept and target accuracy over standard CBMs at test time. Furthermore, we show that due to the modeling of concept dependencies, PSCBMs perform much better than CBMs under interventions, while remaining far more efficient than retraining a similar stochastic model from scratch.

replace A Functional Perspective on Knowledge Distillation in Neural Networks

Authors: Israel Mason-Williams, Gabryel Mason-Williams, Helen Yannakoudakis

Abstract: Knowledge distillation is considered a compression mechanism when judged on the resulting student's accuracy and loss, yet its functional impact is poorly understood. We quantify the compression capacity of knowledge distillation and the resulting knowledge transfer from a functional perspective, decoupling compression from architectural reduction to provide an improved understanding of knowledge distillation. We employ a control-driven experimental protocol with hypothesis testing and random control distillation to isolate and understand knowledge transfer mechanisms across data modalities. To test the breadth and limits of our analyses, we study self-distillation, standard distillation, feature-map matching variants, distillation scaling laws across model sizes, and the impact of temperature on knowledge transfer. We find statistically supported knowledge transfer in some modalities and architectures; however, the extent of this transfer is less pronounced than anticipated, even under conditions that maximise knowledge sharing. Notably, in cases of significant functional transfer, we identify a consistent and severe asymmetric transfer of negative knowledge to the student, raising safety concerns for knowledge distillation. Across 22 experimental setups, 9 architectures, and 7 datasets, our results suggest that knowledge distillation functions less as a robust compression-by-transfer mechanism and more as a data-dependent regulariser whose transfer component is biased towards negative asymmetric transfer.

replace Towards Understanding Valuable Preference Data for Large Language Model Alignment

Authors: Zizhuo Zhang, Qizhou Wang, Shanshan Ye, Jianing Zhu, Jiangchao Yao, Bo Han, Masashi Sugiyama

Abstract: Large language model (LLM) alignment is typically achieved through learning from human preference comparisons, making the quality of preference data critical to its success. Existing studies often pre-process raw training datasets to identify valuable preference pairs using external reward models or off-the-shelf LLMs, achieving improved overall performance but rarely examining whether individual, selected data point is genuinely beneficial. We assess data quality through individual influence on validation data using our newly proposed truncated influence function (TIF), which mitigates the over-scoring present in traditional measures and reveals that preference data quality is inherently a property of the model. In other words, a data pair that benefits one model may harm another. This leaves the need to improve the preference data selection approaches to be adapting to specific models. To this end, we introduce two candidate scoring functions (SFs) that are computationally simpler than TIF and positively correlated with it. They are also model dependent and can serve as potential indicators of individual data quality for preference data selection. Furthermore, we observe that these SFs inherently exhibit errors when compared to TIF. To this end, we combine them to offset their diverse error sources, resulting in a simple yet effective data selection rule that enables the models to achieve a more precise selection of valuable preference data. We conduct experiments across diverse alignment benchmarks and various LLM families, with results demonstrating that better alignment performance can be achieved using less data, showing the generality of our findings and new methods. Our code is publicly available at https://github.com/tmlr-group/TIF_LossDiff-IRM.

URLs: https://github.com/tmlr-group/TIF_LossDiff-IRM.

replace Feature-driven reinforcement learning for photovoltaic in continuous intraday trading

Authors: Arega Getaneh Abate, Xiao-Bing Zhang, Xiufeng Liu, Ruyu Liu

Abstract: Sequential intraday electricity trading allows photovoltaic (PV) operators to reduce imbalance settlement costs as forecasts improve throughout the day. Yet deployable trading policies must jointly handle forecast uncertainty, intraday prices, liquidity, and the asymmetric economics of PV imbalance exposure. This paper proposes a feature-driven reinforcement learning (FDRL) framework for intraday PV trading in the Nordic market. Its main methodological contribution is a corrected reward that evaluates performance relative to a no-trade baseline, removing policy-independent noise that can otherwise push reinforcement learning toward inactive policies in high-price regimes. The framework combines this objective with a predominantly linear policy and a closed-form execution surrogate for efficient, interpretable training. In a strict walk-forward evaluation over 2021-2024 across four Nordic bidding zones (DK1, DK2, SE3, SE4), the method delivers statistically significant profit improvements over the spot-only baseline in every zone. Portfolio experiments show that a pooled cross-zone policy can match zone-specific models, while transfer-learning results indicate a two-cluster market structure and effective deployment in new zones with limited local data. The proposed framework offers an interpretable and computationally practical way to reduce imbalance costs, while the transfer results provide guidance for scaling strategies across bidding zones with different market designs.

replace Near-Equilibrium Propagation training in nonlinear wave systems

Authors: Karol Sajnok, Micha{\l} Matuszewski

Abstract: Backpropagation learning algorithm, the workhorse of modern artificial intelligence, is notoriously difficult to implement in physical neural networks. Equilibrium Propagation (EP) is an alternative with comparable efficiency and strong potential for in-situ training. We extend EP learning to both discrete and continuous complex-valued wave systems. In contrast to previous EP implementations, our scheme is valid in the weakly dissipative regime, and readily applicable to a wide range of physical settings, even without well defined nodes, where trainable inter-node connections can be replaced by trainable local potential. We test the method in driven-dissipative exciton-polariton condensates governed by generalized Gross-Pitaevskii dynamics. Numerical studies on standard benchmarks, including a simple logical task and handwritten-digit recognition, demonstrate stable convergence, establishing a practical route to in-situ learning in physical systems in which system control is restricted to local parameters.

replace Justitia: Fair and Efficient Scheduling of Task-parallel LLM Agents with Selective Pampering

Authors: Mingyan Yang, Guanjie Wang, Manqi Luo, Yifei Liu, Chen Chen, Han Zhao, Yu Feng, Quan Chen, Minyi Guo

Abstract: LLM agents, which often comprise parallel inference tasks, are commonly adopted to solve real-world problems. When serving such task-parallel LLM agents in shared GPU servers, the scheduler is expected to attain fast agent completion with guaranteed worst-case performance. For that objective, our insight is to selectively pampering agents based on their completion order under idealized fair-sharing. We design Justitia, a fair and also efficient scheduler for task-parallel LLM agents. Noticing that memory is prevalently a bottleneck in LLM serving, Justitia quantifies the true agent cost in a memory-centric manner. It also adopts a light-weight yet accurate method to predict agent costs. Finally, Justitia adopts a virtual-time based fair queuing algorithm to reduce the overall performance with guaranteed worst-case delay. We have implemented Justitia atop vLLM, and experimental results involving diverse agents show that it can substantially enhance the scheduling efficiency with fairness preserved.

replace Boosted GFlowNets: Improving Exploration via Sequential Learning

Authors: Pedro Dall'Antonia, Tiago da Silva, Daniel Augusto de Souza, C\'esar Lincoln C. Mattos, Diego Mesquita

Abstract: Generative Flow Networks (GFlowNets) are powerful samplers for compositional objects that, by design, sample proportionally to a given non-negative reward. Nonetheless, in practice, they often struggle to explore the reward landscape evenly: trajectories toward easy-to-reach regions dominate training, while hard-to-reach modes receive vanishing or uninformative gradients, leading to poor coverage of high-reward areas. We address this imbalance with Boosted GFlowNets, a method that sequentially trains an ensemble of GFlowNets, each optimizing a residual reward that compensates for the mass already captured by previous models. This residual principle reactivates learning signals in underexplored regions and, under mild assumptions, ensures a monotone non-degradation property: adding boosters cannot worsen the learned distribution and typically improves it. Empirically, Boosted GFlowNets achieve substantially better exploration and sample diversity on multimodal synthetic benchmarks and peptide design tasks, while preserving the stability and simplicity of standard trajectory-balance training.

replace Tractable Probabilistic Models for Investment Planning

Authors: Nicolas M. Cuadrado A., Mohannad Takrouri, Ji\v{r}\'i N\v{e}me\v{c}ek, Martin Tak\'a\v{c}, Jakub Mare\v{c}ek

Abstract: Investment planning in power utilities, such as generation and transmission expansion, requires decisions under substantial uncertainty over decade--long horizons for policies, demand, renewable availability, and outages, while maintaining reliability and computational tractability. Conventional approaches approximate uncertainty using finite scenario sets (modeled as a mixture of Diracs in statistical theory terms), which can become computationally intensive as scenario detail increases and provide limited probabilistic resolution for reliability assessment. We propose an alternative based on tractable probabilistic models, using sum--product networks (SPNs) to represent high--dimensional uncertainty in a compact, analytically tractable form that supports exact probabilistic queries (e.g., likelihoods, marginals, and conditionals). This framework enables the direct embedding of chance constraints into mixed--integer linear programming (MILP) models for investment planning to evaluate reliability events and enforce probabilistic feasibility requirements without enumerating large scenario trees. We demonstrate the approach on a representative planning case study and report reliability--cost trade--offs and computational behavior relative to standard scenario--based formulations.

replace Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning

Authors: Yuxuan Gu, Weimin Bai, Yifei Wang, Weijian Luo, He Sun

Abstract: Masked auto-regressive diffusion models (MAR) benefit from the expressive modeling ability of diffusion models and the flexibility of masked auto-regressive ordering. However, vanilla MAR suffers from slow inference due to its hierarchical inference mechanism: an outer AR unmasking loop and an inner diffusion denoising chain. Such decoupled structure not only harm the generation efficiency but also hinder the practical use of MAR for reinforcement learning (RL), an increasingly critical paradigm for generative model post-training.To address this fundamental issue, we introduce MARVAL (Masked Auto-regressive Variational Acceleration), a distillation-based framework that compresses the diffusion chain into a single AR generation step while preserving the flexible auto-regressive unmasking order. Such a distillation with MARVAL not only yields substantial inference acceleration but, crucially, makes RL post-training with verifiable rewards practical, resulting in scalable yet human-preferred fast generative models. Our contributions are twofold: (1) a novel score-based variational objective for distilling masked auto-regressive diffusion models into a single generation step without sacrificing sample quality; and (2) an efficient RL framework for masked auto-regressive models via MARVAL-RL. On ImageNet 256*256, MARVAL-Huge achieves an FID of 2.00 with more than 30 times speedup compared with MAR-diffusion, and MARVAL-RL yields consistent improvements in CLIP and image-reward scores on ImageNet datasets with entity names. In conclusion, MARVAL demonstrates the first practical path to distillation and RL of masked auto-regressive diffusion models, enabling fast sampling and better preference alignments.

replace CHIPS: Efficient CLIP Adaptation via Curvature-aware Hybrid Influence-based Data Selection

Authors: Xinlin Zhuang, Yichen Li, Xiwei Liu, Haolin Yang, Yifan Lu, Ziyun Zou, Yulong Li, Huifa Li, Dongliang Chen, Qinglei Wang, Weiyang Liu, Ying Qian, Jiangming Shi, Imran Razzak

Abstract: Adapting CLIP to vertical domains is typically approached by novel fine-tuning strategies or by continual pre-training (CPT) on large domain-specific datasets. Yet, data itself remains an underexplored factor in this process. We revisit this task from a data-centric perspective: Can effective data selection substitute for large-scale datasets in CPT? We introduce CHIPS (Curvature-aware Hybrid Influence in Projection Subspace), which assigns each image-text pair a utility score that integrates three complementary factors aligned with three goals: faithfulness via a curvature-aware and Newton-style alignment computed in CLIP's end-point subspace; scalability via an InfoNCE-aware curvature estimator with Johnson-Lindenstrauss (JL) sketching; and retention via a selection-aware relevance weight combined with learnability to balance target adaptation against general-domain preservation. We justify this design theoretically by proving a lower-bound guarantee on the proxy's correlation with full-parameter alignment and by characterizing the bias-variance trade-offs introduced by curvature mixing and JL sketching. We evaluate CHIPS empirically across various settings: 1) CHIPS attains state-of-the-art performance among selection baselines on 17 medical benchmarks, matches full-dataset CPT with 30% of the data, and outperforms half-dataset CPT using only 10%; 2) on 31 general-domain benchmarks, CHIPS yields the least performance drop under all retention ratios.

replace TimesNet-Gen: Deep Learning-based Site Specific Strong Motion Generation

Authors: Baris Yilmaz, Bevan Deniz Cilgin, Erdem Akag\"und\"uz, Salih Tileylioglu

Abstract: Effective earthquake risk reduction relies on accurate site-specific evaluations, which require models capable of representing the influence of local site conditions on ground motion characteristics. In this context, data-driven approaches that learn site-controlled signatures from recorded ground motions offer a promising direction. We address strong ground motion generation from time-domain accelerometer records and introduce TimesNet-Gen, a time-domain conditional generator. The proposed approach employs a latent bottleneck with station identity conditioning. Model performance is evaluated by comparing horizontal-to-vertical spectral ratio (HVSR) curves and fundamental site frequency ($f_0$) distributions between real and generated records on a station-wise basis. Station specificity is further summarized using a score derived from confusion matrices of the $f_0$ distributions. The results demonstrate strong station-wise alignment and favorable comparison with a spectrogram-based conditional variational autoencoder baseline for site-specific strong motion synthesis. The code will be made publicly available after the review process. Our codes are available via https://github.com/brsylmz23/TimesNet-Gen.

URLs: https://github.com/brsylmz23/TimesNet-Gen.

replace Learning Unmasking Policies for Diffusion Language Models

Authors: Metod Jazbec, Theo X. Olausson, Louis B\'ethune, Pierre Ablin, Michael Kirchhof, Jo\~ao Monteiro, Victor Turrisi, Jason Ramapuram, Marco Cuturi

Abstract: Diffusion (Large) Language Models (dLLMs) now match the downstream performance of their autoregressive counterparts on many tasks, while holding the promise of being more efficient during inference. One critical design aspect of dLLMs is the sampling procedure that selects which tokens to unmask at each diffusion step. Indeed, recent work has found that heuristic strategies such as confidence thresholding improve both sample quality and token throughput compared to random unmasking. However, such heuristics have downsides: they require manual tuning, and we observe that their performance degrades with larger block sizes. In this work, we instead propose to train sampling procedures using reinforcement learning. Specifically, we formalize masked diffusion sampling as a Markov decision process in which the dLLM serves as the environment, and propose a lightweight policy based on a single-layer transformer that maps dLLM token confidences to unmasking decisions. Our experiments show that these trained policies match the performance of state-of-the-art heuristics when combined with semi-autoregressive (block) generation, while outperforming them in the full-diffusion setting.

replace Rough Sets for Explainability of Spectral Graph Clustering

Authors: Bart{\l}omiej Starosta, S{\l}awomir T. Wierzcho\'n, Piotr Borkowski, Dariusz Czerski, Marcin Sydow, Eryk Laskowski, Mieczys{\l}aw A. K{\l}opotek

Abstract: Graph Spectral Clustering methods (GSC) allow representing clusters of diverse shapes, densities, etc. However, the results of such algorithms, when applied e.g. to text documents, are hard to explain to the user, especially due to embedding in the spectral space which has no obvious relation to document contents. Furthermore, the presence of documents without clear content meaning and the stochastic nature of the clustering algorithms deteriorate explainability. This paper proposes an enhancement to the explanation methodology, proposed in an earlier research of our team. It allows us to overcome the latter problems by taking inspiration from rough set theory.

replace Derivative-Informed Fourier Neural Operator: Universal Approximation and Applications to PDE-Constrained Optimization

Authors: Boyuan Yao, Dingcheng Luo, Lianghao Cao, Nikola Kovachki, Thomas O'Leary-Roseberry, Omar Ghattas

Abstract: We present approximation theories and efficient training methods for derivative-informed Fourier neural operators (DIFNOs) with applications to PDE-constrained optimization. A DIFNO is an FNO trained by minimizing its prediction error jointly on output and Fr\'echet derivative samples of a high-fidelity operator (e.g., a parametric PDE solution operator). As a result, a DIFNO can closely emulate not only the high-fidelity operator's response but also its sensitivities. To motivate the use of DIFNOs instead of conventional FNOs as surrogate models, we show that accurate surrogate-driven PDE-constrained optimization requires accurate surrogate Fr\'echet derivatives. Then, we establish (i) simultaneous universal approximation of continuously differentiable operators and their Fr\'echet derivatives by FNOs on compact sets, and (ii) universal approximation of continuously differentiable operators by FNOs in weighted Sobolev spaces with input measures that have unbounded supports. Our theoretical results certify the capability of FNOs for accurate derivative-informed operator learning and for the solution of PDE-constrained optimization problems. Furthermore, we develop efficient training schemes that leverage dimensionality reduction and multi-resolution techniques to significantly reduce memory and computational costs in Fr\'echet derivative learning. Numerical examples on nonlinear diffusion--reaction, Helmholtz, and Navier--Stokes equations demonstrate that DIFNOs are superior in sample complexity for operator learning and solving infinite-dimensional PDE-constrained inverse problems, achieving high accuracy at low training sample sizes.

replace SigMA: Path Signatures and Multi-head Attention for Learning Parameters in fBm-driven SDEs

Authors: Xianglin Wu, Chiheb Ben Hammouda, Cornelis W. Oosterlee

Abstract: Stochastic differential equations (SDEs) driven by fractional Brownian motion (fBm) are increasingly used to model systems with rough dynamics and long-range dependence, such as those arising in quantitative finance and reliability engineering. However, these processes are non-Markovian and lack a semimartingale structure, rendering many classical parameter estimation techniques inapplicable or computationally intractable beyond very specific cases. This work investigates two central questions: (i) whether integrating path signatures into deep learning architectures can improve the trade-off between estimation accuracy and model complexity, and (ii) what constitutes an effective architecture for leveraging signatures as feature maps. We introduce SigMA (Signature Multi-head Attention), a neural architecture that integrates path signatures with multi-head self-attention, supported by a convolutional preprocessing layer and a multilayer perceptron for effective feature encoding. SigMA learns model parameters from synthetically generated paths of fBm-driven SDEs, including fractional Brownian motion, fractional Ornstein-Uhlenbeck, and rough Heston models, with a particular focus on estimating the Hurst parameter and on joint multi-parameter inference, and it generalizes robustly to unseen trajectories. Extensive experiments on synthetic data and two real-world datasets (i.e., equity-index realized volatility and Li-ion battery degradation) show that SigMA consistently outperforms CNN, LSTM, vanilla Transformer, and Deep Signature baselines in accuracy, robustness, and model compactness. These results demonstrate that combining signature transforms with attention-based architectures provides an effective and scalable framework for parameter inference in stochastic systems with rough or persistent temporal structure.

replace Chorus: Harmonizing Context and Sensing Signals for Data-Free Model Customization in IoT

Authors: Liyu Zhang, Yejia Liu, Kwun Ho Liu, Runxi Huang, Xiaomin Ouyang

Abstract: A key bottleneck toward scalable IoT sensing is how to efficiently adapt AI models to new deployment conditions. In real-world IoT systems, sensor data is collected under diverse contexts, such as sensor placements or ambient environments, which alter signal patterns and degrade downstream performance. Traditional domain adaptation and generalization methods often ignore such contextual information or incorporate it in overly simplistic ways, making them ineffective under unseen context shifts after deployment. In this paper, we propose Chorus, a context-aware, data-free model customization approach that adapts models to unseen deployment conditions without requiring target-domain data. The key idea is to learn context representations that capture how contextual factors influence sensor data, and then use these representations as structured priors for context-aware customization under unseen shifts. Specifically, Chorus learns a shared sensor-context latent space through bidirectional cross-modal reconstruction on unlabeled sensor-context pairs, and regularizes the context embedding space to obtain compact and generalizable context representations. Building on the aligned representations, Chorus trains a lightweight gated head with limited labeled data to exploit context priors during inference, and introduces a context-caching mechanism that reuses cached context representations when no context shift is detected, thereby reducing inference overhead on smartphones. Experiments on IMU, speech enhancement, and WiFi sensing tasks under diverse context shifts show that Chorus outperforms state-of-the-art baselines by up to 20.2% in unseen contexts, with cached inference latency close to sensor-only deployment, while maintaining stable performance under continuous context transitions. A video demonstration of Chorus's performance in real world is available at https://youtu.be/ZBdro0jPNkE.

URLs: https://youtu.be/ZBdro0jPNkE.

replace What's the Price of Monotonicity? A Multi-Dataset Benchmark of Monotone-Constrained Gradient Boosting for Credit PD

Authors: Petr Koklev

Abstract: Financial institutions face a trade-off between predictive accuracy and interpretability when deploying machine learning models for credit risk. Monotonicity constraints align model behavior with domain knowledge, but their performance cost - the price of monotonicity - is not well quantified. This paper benchmarks monotone-constrained versus unconstrained gradient boosting models for credit probability of default across five public datasets and three libraries. We define the Price of Monotonicity (PoM) as the relative change in standard performance metrics when moving from unconstrained to constrained models, estimated via paired comparisons with bootstrap uncertainty. In our experiments, PoM in AUC ranges from essentially zero to about 2.9 percent: constraints are almost costless on large datasets (typically less than 0.2 percent, often indistinguishable from zero) and most costly on smaller datasets with extensive constraint coverage (around 2-3 percent). Thus, appropriately specified monotonicity constraints can often deliver interpretability with small accuracy losses, particularly in large-scale credit portfolios.

replace CARE What Fails: Contrastive Anchored-REflection for Verifiable Multimodal Reasoning

Authors: Yongxin Wang, Zhicheng Yang, Meng Cao, Mingfei Han, Haokun Lin, Yingying Zhu, Xiaojun Chang, Xiaodan Liang

Abstract: Group-relative reinforcement learning with verifiable rewards (RLVR) often wastes the most informative data it already has the failures. When all rollouts are wrong, gradients stall; when one happens to be correct, the update usually ignores why the others are close-but-wrong, and credit can be misassigned to spurious chains. We present CARE (Contrastive Anchored REflection), a failure-centric post-training framework for multimodal reasoning that turns errors into supervision. CARE combines: (i) an anchored-contrastive objective that forms a compact subgroup around the best rollout and a set of semantically proximate hard negatives, performs within-subgroup z-score normalization with negative-only scaling, and includes an all-negative rescue to prevent zero-signal batches; and (ii) Reflection-Guided Resampling (RGR), a one-shot structured self-repair that rewrites a representative failure and re-scores it with the same verifier, converting near-misses into usable positives without any test-time reflection. CARE improves accuracy and training smoothness while explicitly increasing the share of learning signal that comes from failures. On Qwen2.5-VL-7B, CARE lifts macro-averaged accuracy by 4.6 points over GRPO across six verifiable visual-reasoning benchmarks; with Qwen3-VL-8B it reaches competitive or state-of-the-art results on MathVista and MMMU-Pro under an identical evaluation protocol.

replace HGAN-SDEs: Learning Neural Stochastic Differential Equations with Hermite-Guided Adversarial Training

Authors: Yuanjian Xu, Yuan Shuai, Jianing Hao, Guang Zhang

Abstract: Neural Stochastic Differential Equations (Neural SDEs) provide a principled framework for modeling continuous-time stochastic processes and have been widely adopted in fields ranging from physics to finance. Recent advances suggest that Generative Adversarial Networks (GANs) offer a promising solution to learning the complex path distributions induced by SDEs. However, a critical bottleneck lies in designing a discriminator that faithfully captures temporal dependencies while remaining computationally efficient. Prior works have explored Neural Controlled Differential Equations (CDEs) as discriminators due to their ability to model continuous-time dynamics, but such architectures suffer from high computational costs and exacerbate the instability of adversarial training. To address these limitations, we introduce HGAN-SDEs, a novel GAN-based framework that leverages Neural Hermite functions to construct a structured and efficient discriminator. Hermite functions provide an expressive yet lightweight basis for approximating path-level dynamics, enabling both reduced runtime complexity and improved training stability. We establish the universal approximation property of our framework for a broad class of SDE-driven distributions and theoretically characterize its convergence behavior. Extensive empirical evaluations on synthetic and real-world systems demonstrate that HGAN-SDEs achieve superior sample quality and learning efficiency compared to existing generative models for SDEs

replace SoliReward: Mitigating Susceptibility to Reward Hacking and Annotation Noise in Video Generation Reward Models

Authors: Jiesong Lian, Ruizhe Zhong, Zixiang Zhou, Xiaoyue Mi, Long Hu, Yuan Zhou, Qinglin Lu, Yixue Hao, Junchi Yan

Abstract: Post-training alignment of video generation models with human preferences is a critical goal. Developing effective Reward Models (RMs) for this process faces significant methodological hurdles. Current data collection paradigms, reliant on in-prompt pairwise annotations, suffer from labeling noise. Concurrently, the architectural design of VLM-based RMs, particularly their output mechanisms, remains underexplored. Furthermore, RM is susceptible to reward hacking in post-training. To mitigate these limitations, we propose SoliReward, a systematic framework for video RM training. Our framework first sources high-quality, cost-efficient data via single-item binary annotations, then constructs preference pairs using a cross-prompt pairing strategy. Architecturally, we employ a Hierarchical Progressive Query Attention mechanism to enhance feature aggregation. Finally, we introduce a modified BT loss that explicitly accommodates win-tie scenarios. This approach regularizes the RM's score distribution for positive samples, providing more nuanced preference signals to alleviate over-focus on a small number of top-scoring samples. Our approach is validated on benchmarks evaluating physical plausibility, subject deformity, and semantic alignment, demonstrating improvements in direct RM evaluation metrics and in the efficacy of post-training on video generation models. Code and benchmark are available at https://github.com/lian700/SoliReward.

URLs: https://github.com/lian700/SoliReward.

replace On the Existence and Behavior of Secondary Attention Sinks

Authors: Jeffrey T. H. Wong, Cheng Zhang, Louis Mahon, Wayne Luk, Anton Isopoussu, Yiren Zhao

Abstract: Attention sinks are tokens, often the beginning-of-sequence (BOS) token, that receive disproportionately high attention despite limited semantic relevance. In this work, we identify a class of attention sinks, which we term secondary sinks, that differ fundamentally from the sinks studied in prior works, which we term primary sinks. While prior works have identified that tokens other than BOS can sometimes become sinks, they were found to exhibit properties analogous to the BOS token. Specifically, they emerge at the same layer, persist throughout the network and draw a large amount of attention mass. Whereas, we find the existence of secondary sinks that arise primarily in middle layers and can persist for a variable number of layers, and draw a smaller, but still significant, amount of attention mass. Through extensive experiments across 11 model families, we analyze where these secondary sinks appear, their properties, how they are formed, and their impact on the attention mechanism. Specifically, we show that: (1) these sinks are formed by specific middle-layer MLP modules; these MLPs map token representations to vectors that align with the direction of the primary sink of that layer. (2) The $\ell_2$-norm of these vectors determines the sink score of the secondary sink, and also the number of layers it lasts for, thereby leading to different impacts on the attention mechanisms accordingly. (3) The primary sink weakens in middle layers, coinciding with the emergence of secondary sinks. We observe that in larger-scale models, the location and lifetime of the sinks, together referred to as sink levels, appear in a more deterministic and frequent manner. Specifically, we identify three sink levels in QwQ-32B and six levels in Qwen3-14B. We open-sourced our findings at github.com/JeffreyWong20/Secondary-Attention-Sinks.

replace From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence

Authors: Marc Finzi, Shikai Qiu, Yiding Jiang, Pavel Izmailov, J. Zico Kolter, Andrew Gordon Wilson

Abstract: Can we learn more from data than existed in the generating process itself? Can new and useful information be constructed from merely applying deterministic transformations to existing data? Can the learnable content in data be evaluated without considering a downstream task? On these questions, Shannon information and Kolmogorov complexity come up nearly empty-handed, in part because they assume observers with unlimited computational capacity and do not target the useful information content. In this work, we identify and exemplify three seeming paradoxes in information theory: (1) information cannot be increased by deterministic transformations; (2) information is independent of the order of data; (3) likelihood modeling is merely distribution matching. To shed light on the tension between these results and modern practice, and to quantify the value of data, we introduce epiplexity, a formalization of information capturing what computationally bounded observers can learn from data. Epiplexity captures the structural content in data while excluding time-bounded entropy, the random unpredictable content exemplified by pseudorandom number generators and chaotic dynamical systems. With these concepts, we demonstrate how information can be created with computation, how it depends on the ordering of the data, and how likelihood modeling can produce more complex programs than present in the data generating process itself. We also present practical procedures to estimate epiplexity which we show capture differences across data sources, track with downstream performance, and highlight dataset interventions that improve out-of-distribution generalization. In contrast to principles of model selection, epiplexity provides a theoretical foundation for data selection, guiding how to select, generate, or transform data for learning systems.

replace The Active Discoverer Framework: Towards Autonomous Physics Reasoning through Neuro-Symbolic LaTeX Synthesis

Authors: Hyunjun Jeon

Abstract: Modern artificial intelligence excels at statistical interpolation within seen manifolds but fundamentally fails at the exact reasoning required for theoretical physics and mathematics. We identify the "Float Wall" -- a catastrophic collapse of neural extrapolation at scales beyond $10^{16}$ -- caused by standard floating-point representation and linguistic tokenization (BPE). To resolve this, we introduce the Active Discoverer Framework, a digit-native neuro-symbolic architecture designed for invariant discovery. At its core is NumberNet, a Siamese Arithmetic Transformer that utilizes least-significant-bit (LSB) sequence encoding to achieve 0% precision loss and cosmic-scale extrapolation up to $10^{50}$. To enforce physical grounding, we implement a Hamiltonian-based energy descent and Symmetry Grouping layer, ensuring the model respects Noether's theorem natively. The primary innovation is the Symbolic LaTeX Bottleneck: an active discovery loop where the model is forced to hypothesize unknown physical variables through an autoregressive LaTeX decoder. By reconciling numeric "hallucinations" with structurally valid mathematical expressions, the framework ensures that any discovered physics is parsimonious and human-interpretable. We evaluate this system against a 30-billion scale benchmark and the Universal Physics Pantheon, featuring 50 "Chaos Mode" systemic perturbations. Our results demonstrate that while traditional GBDT and LLM-based architectures collapse at cosmic scales, the Active Discoverer autonomously deduces universal constants such as the Gravitational Constant ($G$) with high fidelity. This framework establishes a path toward zero-hallucination artificial intelligence and truly autonomous scientific research agents.

replace Multimodal Rumor Detection Enhanced by External Evidence and Forgery Features

Authors: Han Li, Hua Sun

Abstract: Social media increasingly disseminates information through mixed image text posts, but rumors often exploit subtle inconsistencies and forged content, making detection based solely on post content difficult. Deep semantic mismatch rumors, which superficially align images and texts, pose particular challenges and threaten online public opinion. Existing multimodal rumor detection methods improve cross modal modeling but suffer from limited feature extraction, noisy alignment, and inflexible fusion strategies, while ignoring external factual evidence necessary for verifying complex rumors. To address these limitations, we propose a multimodal rumor detection model enhanced with external evidence and forgery features. The model uses a ResNet34 visual encoder, a BERT text encoder, and a forgery feature module extracting frequency domain traces and compression artifacts via Fourier transformation. BLIP generated image descriptions bridge image and text semantic spaces. A dual contrastive learning module computes contrastive losses between text image and text description pairs, improving detection of semantic inconsistencies. A gated adaptive feature scaling fusion mechanism dynamically adjusts multimodal fusion and reduces redundancy. Experiments on Weibo and Twitter datasets demonstrate that our model outperforms mainstream baselines in macro accuracy, recall, and F1 score.

replace Why Inference in Large Models Becomes Decomposable After Training

Authors: Jidong Jin

Abstract: Inference in large-scale AI models is typically performed on dense parameter matrices, leading to inference cost and system complexity that scale unsustainably with model size. This limitation does not arise from insufficient model capacity, but from treating post-training inference systems as monolithic operators while ignoring internal structures formed during learning. We show that gradient update events in large models are highly localized and selective, leaving many parameter dependencies statistically indistinguishable from their initialization distribution after training. As a result, post-training inference systems are structurally non-uniform and inherently decomposable. Based on this observation, we introduce a post-training statistical criterion and a structural annealing procedure that removes unsupported dependencies and reveals stable, independent substructures. This work establishes a post-training, model-agnostic structural view of inference systems and enables structured, parallel inference without modifying model functionality or interfaces.

replace The Geometric Mechanics of Contrastive Learning: Alignment Potentials, Entropic Dispersion, and Modality Gap

Authors: Yichao Cai, Zhen Zhang, Yuhang Liu, Javen Qinfeng Shi

Abstract: InfoNCE-based contrastive learning is often characterized as promoting alignment and uniformity, yet the induced population geometry and the reasons multimodal training can preserve a modality gap remain underexplored. We present a measure-theoretic view where training reshapes probability measures on a fixed embedding manifold. In the large-batch limit, we prove value and gradient consistency, showing that stochastic InfoNCE tracks a closed-form deterministic energy and revealing a geometric bifurcation between unimodal and symmetric multimodal regimes. Unimodally, the intrinsic functional over the representation measures is strictly convex with a unique Gibbs equilibrium; at low temperature, entropy only breaks ties among well-aligned solutions, so uniformity is entropic dispersion within the alignment basin. Multimodally, symmetric InfoNCE contains a persistent negative symmetric divergence coupling: at fixed temperature, each modality's marginal acts as a logarithmic barrier shaping the other's effective landscape, which can structurally favor a population-level modality gap under conditional heterogeneity. We validate these predictions in controlled synthetic settings and by measuring the divergence gap on pretrained CLIP embeddings on real-world data. Our analysis shifts the lens from pointwise discrimination to population geometry, suggesting that closing the modality gap requires explicitly regularizing cross-modal divergence in addition to pairwise alignment.

replace Continual GUI Agents

Authors: Ziwei Liu, Borui Kang, Hangjie Yuan, Zixiang Zhao, Wei Li, Yifan Zhu, Tao Feng

Abstract: As digital environments (data distribution) are in flux, with new GUI data arriving over time-introducing new domains or resolutions-agents trained on static environments deteriorate in performance. In this work, we introduce Continual GUI Agents, a new task that requires GUI agents to perform continual learning under shifted domains and resolutions. We find existing methods fail to maintain stable grounding as GUI distributions shift over time, due to the diversity of UI interaction points and regions in fluxing scenarios. To address this, we introduce GUI-Anchoring in Flux (GUI-AiF), a new reinforcement fine-tuning framework that stabilizes continual learning through two novel rewards: Anchoring Point Reward in Flux (APR-iF) and Anchoring Region Reward in Flux (ARR-iF). These rewards guide the agents to align with shifting interaction points and regions, mitigating the tendency of existing reward strategies to over-adapt to static grounding cues (e.g., fixed coordinates or element scales). Extensive experiments show GUI-AiF surpasses state-of-the-art baselines. Our work establishes the first continual learning framework for GUI agents, revealing the untapped potential of reinforcement fine-tuning for continual GUI Agents.

replace Seg-MoE: Multi-Resolution Segment-wise Mixture-of-Experts for Time Series Forecasting Transformers

Authors: Evandro S. Ortigossa, Eran Segal

Abstract: Transformer-based models have recently made significant advances in accurate time-series forecasting, but even these architectures struggle to scale efficiently while capturing long-term temporal dynamics. Mixture-of-Experts (MoE) layers are a proven solution to scaling problems in natural language processing. However, existing MoE approaches for time-series forecasting rely on token-wise routing mechanisms, which may fail to exploit the natural locality and continuity of temporal data. In this work, we introduce Seg-MoE, a sparse MoE design that routes and processes contiguous time-step segments rather than making independent expert decisions. Token segments allow each expert to model intra-segment interactions directly, naturally aligning with inherent temporal patterns. We integrate Seg-MoE layers into a time-series Transformer and evaluate it on multiple multivariate long-term forecasting benchmarks. Seg-MoE consistently achieves state-of-the-art forecasting accuracy across almost all prediction horizons, outperforming both dense Transformers and prior token-wise MoE models. Comprehensive ablation studies confirm that segment-level routing is the key factor driving these gains. Our results show that aligning the MoE routing granularity with the inherent structure of time series provides a powerful, yet previously underexplored, inductive bias, opening new avenues for conditionally sparse architectures in sequential data modeling.

replace Embedding Compression via Spherical Coordinates

Authors: Han Xiao

Abstract: We present a compression method for unit-norm embeddings that achieves 1.5$\times$ compression, 25% better than the best prior lossless method. The method exploits that spherical coordinates of high-dimensional unit vectors concentrate around $\pi/2$, causing IEEE 754 exponents to collapse to a single value and high-order mantissa bits to become predictable, enabling entropy coding of both. Reconstruction error is below 1e-7, under float32 machine epsilon. Evaluation across 26 configurations spanning text, image, and multi-vector embeddings confirms consistent improvement.

replace Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models

Authors: Jinbin Bai, Yixuan Li, Yuchen Zhu, Yi Xin, Qingyu Shi, Aosong Feng, Xiaohong Liu, Molei Tao, Jianru Xue, Xiangtai Li, Ming-Hsuan Yang

Abstract: Inference-time compute has re-emerged as a practical way to improve LLM reasoning. Most test-time scaling (TTS) algorithms rely on autoregressive decoding, which is ill-suited to discrete diffusion language models (dLLMs) due to their parallel decoding over the entire sequence. As a result, developing effective and efficient TTS methods to unlock dLLMs' full generative potential remains an underexplored challenge. To address this, we propose Prism (Pruning, Remasking, and Integrated Self-verification Method), an efficient TTS framework for dLLMs that (i) performs Hierarchical Trajectory Search (HTS) which dynamically prunes and reallocates compute in an early-to-mid denoising window, (ii) introduces Local branching with partial remasking to explore diverse implementations while preserving high-confidence tokens, and (iii) replaces external verifiers with Self-Verified Feedback (SVF) obtained via self-evaluation prompts on intermediate completions. Across four mathematical reasoning and code generation benchmarks on three dLLMs, including LLaDA 8B Instruct, Dream 7B Instruct, and LLaDA 2.0-mini, our Prism achieves a favorable performance-efficiency trade-off, matching best-of-N performance with substantially fewer function evaluations (NFE). The code is released at https://github.com/viiika/Prism.

URLs: https://github.com/viiika/Prism.

replace Grounding Generated Videos in Feasible Plans via World Models

Authors: Christos Ziakas, Amir Bar, Alessandra Russo

Abstract: Large-scale video generative models have shown emerging capabilities as zero-shot visual planners, yet video-generated plans often violate temporal consistency and physical constraints, leading to failures when mapped to executable actions. To address this, we propose Grounding Video Plans with World Models (GVP-WM), a planning method that grounds video-generated plans into feasible action sequences using a learned action-conditioned world model. At test-time, GVP-WM first generates a video plan from initial and goal observations, then projects the video guidance onto the manifold of dynamically feasible latent trajectories via video-guided latent collocation. In particular, we formulate grounding as a goal-conditioned latent-space trajectory optimization problem that jointly optimizes latent states and actions under world-model dynamics, while preserving semantic alignment with the video-generated plan. Empirically, GVP-WM recovers feasible long-horizon plans from zero-shot image-to-video-generated and motion-blurred videos that violate physical constraints, across navigation and manipulation simulation tasks.

replace IFNSO: Iteration-Free Newton-Schulz Orthogonalization

Authors: Chen Hu, Qianxi Zhao, Xiaochen Yuan, Hong Zhang, Ding Yuan, Yanbin Wu, Xiying Li

Abstract: The Newton-Schulz (NS) iteration has become a key technique for orthogonalization in optimizers such as Muon and for optimization on the Stiefel manifold. Despite its effectiveness, the conventional NS iteration incurs significant computational overhead due to repeated high-dimensional matrix multiplications. To overcome these limitations, we propose Iteration-Free Newton-Schulz Orthogonalization (IFNSO), a novel framework that consolidates the traditional iterative structure into a unified and Iteration-Free formulation. By analyzing the contribution of individual matrix powers, we streamline the process by removing insignificant terms and introducing a polynomial with learnable coefficients. These coefficients are optimized to ensure both superior computational efficiency and stable convergence. Extensive experiments demonstrate that IFNSO achieves superior performance compared to existing methods. Our code is available at: https://github.com/greekinRoma/Unified_Newton_Schulz_Orthogonalization.

URLs: https://github.com/greekinRoma/Unified_Newton_Schulz_Orthogonalization.

replace Learning to Repair Lean Proofs from Compiler Feedback

Authors: Evan Wang, Simon Chess, Daniel Lee, Siyuan Ge, Ajit Mallavarapu, Jarod Alper, Vasily Ilin

Abstract: As neural theorem provers become increasingly agentic, the ability to interpret and act on compiler feedback is critical. However, existing Lean datasets consist almost exclusively of correct proofs, offering little supervision for understanding and repairing failures. We study Lean proof repair as a supervised learning problem: given an erroneous proof and compiler feedback, predict both a corrected proof and a natural-language diagnosis grounded in the same feedback. We introduce APRIL (Automated Proof Repair in Lean), a dataset of 260,000 supervised tuples pairing systematically generated proof failures with compiler diagnostics and aligned repair and explanation targets. Training language models on APRIL substantially improves repair accuracy and feedback-conditioned reasoning; in our single-shot repair evaluation setting, a finetuned 4B-parameter model outperforms the strongest open-source baseline. We view diagnostic-conditioned supervision as a complementary training signal for feedback-using provers. Our dataset is available at https://huggingface.co/datasets/uw-math-ai/APRIL.

URLs: https://huggingface.co/datasets/uw-math-ai/APRIL.

replace Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions

Authors: Yuntai Bao, Xuhong Zhang, Jintao Chen, Ge Su, Yuxiang Cai, Hao Peng, Bing Sun, Haiqin Weng, Liu Yan, Jianwei Yin

Abstract: Intervention-based model steering offers a lightweight and interpretable alternative to prompting and fine-tuning. However, by adapting strong optimization objectives from fine-tuning, current methods are susceptible to overfitting and often underperform, sometimes generating unnatural outputs. We hypothesize that this is because effective steering requires the faithful identification of internal model mechanisms, not the enforcement of external preferences. To this end, we build on the principles of distributed alignment search (DAS), the standard for causal variable localization, to propose a new steering method: Concept DAS (CDAS). While we adopt the core mechanism of DAS, distributed interchange intervention (DII), we introduce a novel distribution matching objective tailored for the steering task by aligning intervened output distributions with counterfactual distributions. CDAS differs from prior work in two main ways: first, it learns interventions via weak-supervised distribution matching rather than probability maximization; second, it uses DIIs that naturally enable bi-directional steering and allow steering factors to be derived from data, reducing the effort required for hyperparameter tuning and resulting in more faithful and stable control. On AxBench, a large-scale model steering benchmark, we show that CDAS does not always outperform preference-optimization methods but may benefit more from increased model scale. In two safety-related case studies, overriding refusal behaviors of safety-aligned models and neutralizing a chain-of-thought backdoor, CDAS achieves systematic steering while maintaining general model utility. These results indicate that CDAS is complementary to preference-optimization approaches and conditionally constitutes a robust approach to intervention-based model steering. Our code is available at https://github.com/colored-dye/concept_das.

URLs: https://github.com/colored-dye/concept_das.

replace SDFed: Bridging Local Global Discrepancy via Subspace Refinement and Divergence Control in Federated Prompt Learning

Authors: Yicheng Di, Wei Yuan, Tieke He, Yuan Liu, Hongzhi Yin

Abstract: Vision-language pretrained models offer strong transferable representations, yet adapting them in privacy-sensitive multi-party settings is challenging due to the high communication cost of federated optimization and the limited local data on clients. Federated prompt learning mitigates this issue by keeping the VLPM backbone frozen and collaboratively training lightweight prompt parameters. However, existing approaches typically enforce a unified prompt structure and length across clients, which is inadequate under practical client heterogeneity in both data distributions and system resources, and may further introduce conflicts between globally shared and locally optimal knowledge. To address these challenges, we propose \textbf{SDFed}, a heterogeneous federated prompt learning framework that bridges Local-Global Discrepancy via Subspace Refinement and Divergence Control. SDFed maintains a fixed-length global prompt for efficient aggregation while allowing each client to learn a variable-length local prompt to better match its data characteristics and capacity. To mitigate local-global conflicts and facilitate effective knowledge transfer, SDFed introduces a subspace refinement method for local prompts and an information retention and divergence control strategy that preserves key local information while maintaining appropriate separability between global and local representations. Extensive experiments on several datasets demonstrate that SDFed consistently improves performance and robustness in heterogeneous federated settings.

replace Central Dogma Transformer II: An AI Microscope for Understanding Cellular Regulatory Mechanisms

Authors: Nobuyuki Ota

Abstract: Current biological AI models lack interpretability -- their internal representations do not correspond to biological relationships that researchers can examine. Understanding gene regulation requires models whose learned structure can be directly interrogated to generate experimentally testable hypotheses. CDT-II mirrors the central dogma in its architecture -- DNA self-attention, RNA self-attention, and cross-attention for transcriptional control -- requiring only genomic embeddings and raw per-cell expression. Applied to K562 CRISPRi data with five genes held out entirely, CDT-II predicts perturbation effects (per-gene mean r = 0.84), recovers the GFI1B regulatory network (6.6-fold enrichment, P = 3.5 x 10^{-17}), and shows that cross-attention focuses on ENCODE regulatory elements including CTCF sites (mean 7.67x across 28 targets, P < 0.001). Gradient-based attribution accurately predicts downstream consequences of perturbing therapeutic targets (mean r = 0.82). Applied to TFRC, the target of the anti-TfR1 antibody PPMX-T003, gradient analysis identifies genes involved in erythrocyte structure, iron-dependent DNA synthesis, and oxidative stress -- pathways that align with anemia and reticulocyte decrease reported in Phase 1 trials and ferroptosis demonstrated in preclinical studies, without any clinical data as input, establishing CDT-II as an AI microscope that reveals clinically relevant regulatory structure from perturbation experiments alone.

replace KAN-FIF: Spline-Parameterized Lightweight Physics-based Tropical Cyclone Estimation on Meteorological Satellite

Authors: Jiakang Shen, Qinghui Chen, Runtong Wang, Chenrui Xu, Jinglin Zhang, Cong Bai, Feng Zhang

Abstract: Tropical cyclones (TC) are among the most destructive natural disasters, causing catastrophic damage to coastal regions through extreme winds, heavy rainfall, and storm surges. Timely monitoring of tropical cyclones is crucial for reducing loss of life and property, yet it is hindered by the computational inefficiency and high parameter counts of existing methods on resource-constrained edge devices. Current physics-guided models suffer from linear feature interactions that fail to capture high-order polynomial relationships between TC attributes, leading to inflated model sizes and hardware incompatibility. To overcome these challenges, this study introduces the Kolmogorov-Arnold Network-based Feature Interaction Framework (KAN-FIF), a lightweight multimodal architecture that integrates MLP and CNN layers with spline-parameterized KAN layers. For Maximum Sustained Wind (MSW) prediction, experiments demonstrate that the KAN-FIF framework achieves a $94.8\%$ reduction in parameters (0.99MB vs 19MB) and $68.7\%$ faster inference per sample (2.3ms vs 7.35ms) compared to baseline model Phy-CoCo, while maintaining superior accuracy with $32.5\%$ lower MAE. The offline deployment experiment of the FY-4 series meteorological satellite processor on the Qingyun-1000 development board achieved a 14.41ms per-sample inference latency with the KAN-FIF framework, demonstrating promising feasibility for operational TC monitoring and extending deployability to edge-device AI applications. The code is released at https://github.com/Jinglin-Zhang/KAN-FIF.

URLs: https://github.com/Jinglin-Zhang/KAN-FIF.

replace Towards On-Policy SFT: Distribution Discriminant Theory and its Applications in LLM Training

Authors: Miaosen Zhang, Yishan Liu, Shuxia Lin, Xu Yang, Qi Dai, Chong Luo, Weihao Jiang, Peng Hou, Anxiang Zeng, Xin Geng, Baining Guo

Abstract: Supervised fine-tuning (SFT) is computationally efficient but often yields inferior generalization compared to reinforcement learning (RL). This gap is primarily driven by RL's use of on-policy data. We propose a framework to bridge this chasm by enabling On-Policy SFT. We first present \textbf{\textit{Distribution Discriminant Theory (DDT)}}, which explains and quantifies the alignment between data and the model-induced distribution. Leveraging DDT, we introduce two complementary techniques: (i) \textbf{\textit{In-Distribution Finetuning (IDFT)}}, a loss-level method to enhance generalization ability of SFT, and (ii) \textbf{\textit{Hinted Decoding}}, a data-level technique that can re-align the training corpus to the model's distribution. Extensive experiments demonstrate that our framework achieves generalization performance surpassing prominent offline RL algorithms, including DPO and SimPO, while maintaining the efficiency of an SFT pipeline. The proposed framework thus offers a practical alternative in domains where RL is infeasible. We open-source the code here: https://github.com/zhangmiaosen2000/Towards-On-Policy-SFT

URLs: https://github.com/zhangmiaosen2000/Towards-On-Policy-SFT

replace On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs

Authors: Rosie Zhao, Anshul Shah, Xiaoyu Zhu, Xinke Deng, Zhongyu Jiang, Yang Yang, Joerg Liebelt, Arnab Mondal

Abstract: Reinforcement learning (RL) fine-tuning has become a key technique for enhancing large language models (LLMs) on reasoning-intensive tasks, motivating its extension to vision language models (VLMs). While RL-tuned VLMs improve on visual reasoning benchmarks, they remain vulnerable to weak visual grounding, hallucinations, and over-reliance on textual cues. We show that simple, controlled textual perturbations -- misleading captions or incorrect chain-of-thought (CoT) traces -- cause substantial drops in robustness and confidence, and that these effects are more pronounced when CoT consistency is taken into account across open-source multimodal reasoning models. In contrast, closed models exhibit similar failure modes but maintain markedly greater robustness and reasoning consistency, suggesting that the gap reflects a shortcoming in current open-source RL fine-tuning rather than an inherent limitation of the task. To better understand these vulnerabilities, we further analyze RL fine-tuning dynamics and uncover an accuracy-faithfulness trade-off: fine-tuning raises benchmark accuracy, but can simultaneously erode the reliability of the accompanying CoT and its robustness to contextual shifts. Although adversarial augmentation improves robustness, it does not by itself prevent faithfulness drift. Incorporating a faithfulness-aware reward can restore alignment between answers and reasoning, but when paired with augmentation, training risks collapsing onto shortcut strategies and robustness remains elusive. Together, these findings highlight the limitations of accuracy-only evaluations and motivate training and assessment protocols that jointly emphasize correctness, robustness, and the faithfulness of visually grounded reasoning.

replace Flow-Factory: A Unified Framework for Reinforcement Learning in Flow-Matching Models

Authors: Bowen Ping, Chengyou Jia, Minnan Luo, Hangwei Qian, Ivor Tsang

Abstract: Reinforcement learning has emerged as a promising paradigm for aligning diffusion and flow-matching models with human preferences, yet practitioners face fragmented codebases, model-specific implementations, and engineering complexity. We introduce Flow-Factory, a unified framework that decouples algorithms, models, and rewards through through a modular, registry-based architecture. This design enables seamless integration of new algorithms and architectures, as demonstrated by our support for GRPO, DiffusionNFT, and AWM across Flux, Qwen-Image, and WAN video models. By minimizing implementation overhead, Flow-Factory empowers researchers to rapidly prototype and scale future innovations with ease. Flow-Factory provides production-ready memory optimization, flexible multi-reward training, and seamless distributed training support. The codebase is available at https://github.com/X-GenGroup/Flow-Factory.

URLs: https://github.com/X-GenGroup/Flow-Factory.

replace Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking

Authors: Yongzhong Xu

Abstract: Grokking -- the delayed transition from memorization to generalization in small algorithmic tasks -- remains poorly understood. We present a geometric analysis of optimization dynamics in transformers trained on modular arithmetic. PCA of attention weight trajectories reveals that training evolves predominantly within a low-dimensional execution subspace, with a single principal component capturing 68-83% of trajectory variance. To probe loss-landscape geometry, we measure commutator defects -- the non-commutativity of successive gradient steps -- and project them onto this learned subspace. We find that curvature grows sharply in directions orthogonal to the execution subspace while the trajectory remains largely confined to it. Importantly, curvature growth consistently precedes generalization across learning rates and hyperparameter regimes, with the lead time obeying a power law in the grokking timescale. Causal intervention experiments show that motion along the learned subspace is necessary for grokking, while artificially increasing curvature is insufficient. Together, these results support a geometric account in which grokking reflects escape from a metastable regime characterized by low-dimensional confinement and transverse curvature accumulation. All findings replicate across this learning-rate range, a qualitatively different slow regime (lr=5e-5, wd=0.1, 3 layers), and three random seeds, though alignment dynamics differ quantitatively between regimes. Causal intervention experiments establish that orthogonal gradient flow is necessary but not sufficient for grokking: suppressing it prevents generalization with a monotonic dose-response across four operations, while artificially boosting curvature defects has no effect.

replace Early-Warning Signals of Grokking via Loss-Landscape Geometry

Authors: Yongzhong Xu

Abstract: Grokking -- the abrupt transition from memorization to generalization after prolonged training -- has been linked to confinement on low-dimensional execution manifolds in modular arithmetic. Whether this mechanism extends beyond arithmetic remains open. We study two sequence-learning benchmarks: SCAN compositional generalization and Dyck-1 depth prediction. Across both tasks and a wide range of learning rates, the commutator defect -- a curvature measure derived from non-commuting gradient updates -- rises well before generalization, with lead times following a superlinear power law (alpha approximately 1.18 for SCAN, approximately 1.13 for Dyck), consistent with prior results on modular arithmetic. Weight-space PCA reveals that spectral concentration is not a universal precursor; the commutator defect is. Causal interventions demonstrate a mechanistic role: amplifying non-commutativity accelerates grokking (roughly 32% on SCAN, roughly 50% on Dyck), while suppressing orthogonal gradient flow delays or prevents it. The three task families form a spectrum of causal sensitivity -- modular arithmetic is rigid, Dyck is responsive, SCAN is intermediate -- yet suppression delays or prevents grokking in all cases, establishing necessity as a universal finding. These results identify the commutator defect as a robust, architecture-agnostic, causally implicated early-warning signal for delayed generalization in transformers.

replace The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure

Authors: Yongzhong Xu

Abstract: Grokking -- the abrupt transition from memorization to generalization long after near-zero training loss -- has been studied mainly in single-task settings. We extend geometric analysis to multi-task modular arithmetic, training shared-trunk Transformers on dual-task (mod-add + mod-mul) and tri-task (mod-add + mod-mul + mod-sq) objectives across a systematic weight decay sweep. Five consistent phenomena emerge. (1) Staggered grokking order: multiplication generalizes first, followed by squaring, then addition, with consistent delays across seeds. (2) Universal integrability: optimization trajectories remain confined to an empirically invariant low-dimensional execution manifold; commutator defects orthogonal to this manifold reliably precede generalization. (3) Weight decay phase structure: grokking timescale, curvature depth, reconstruction threshold, and defect lead covary systematically with weight decay, revealing distinct dynamical regimes and a sharp no-decay failure mode. (4) Holographic incompressibility: final solutions occupy only 4--8 principal trajectory directions yet are distributed across full-rank weights and destroyed by minimal perturbations; SVD truncation, magnitude pruning, and uniform scaling all fail to preserve performance. (5) Transverse fragility and redundancy: removing less than 10% of orthogonal gradient components eliminates grokking, yet dual-task models exhibit partial recovery under extreme deletion, suggesting redundant center manifolds enabled by overparameterization. Together, these results support a dynamical picture in which multi-task grokking constructs a compact superposition subspace in parameter space, with weight decay acting as compression pressure and excess parameters supplying geometric redundancy in optimization pathways.

replace Conditionally Site-Independent Neural Evolution of Antibody Sequences

Authors: Stephen Zhewen Lu, Aakarsh Vermani, Kohei Sanno, Jiarui Lu, Frederick A Matsen, Milind Jagota, Yun S. Song

Abstract: Common deep learning approaches for antibody engineering focus on modeling the marginal distribution of sequences. By treating sequences as independent samples, however, these methods overlook affinity maturation as a rich and largely untapped source of information about the evolutionary process by which antibodies explore the underlying fitness landscape. In contrast, classical phylogenetic models explicitly represent evolutionary dynamics but lack the expressivity to capture complex epistatic interactions. We bridge this gap with CoSiNE, a continuous-time Markov chain parameterized by a deep neural network. Mathematically, we prove that CoSiNE provides a first-order approximation to the intractable sequential point mutation process, capturing epistatic effects with an error bound that is quadratic in branch length. Empirically, CoSiNE outperforms state-of-the-art language models in zero-shot variant effect prediction by explicitly disentangling selection from context-dependent somatic hypermutation. Finally, we introduce Guided Gillespie, a classifier-guided sampling scheme that steers CoSiNE at inference time, enabling efficient optimization of antibody binding affinity toward specific antigens.

replace Fuz-RL: A Fuzzy-Guided Robust Framework for Safe Reinforcement Learning under Uncertainty

Authors: Xu Wan, Chao Yang, Cheng Yang, Jie Song, Mingyang Sun

Abstract: Safe Reinforcement Learning (RL) is crucial for achieving high performance while ensuring safety in real-world applications. However, the complex interplay of multiple uncertainty sources in real environments poses significant challenges for interpretable risk assessment and robust decision-making. To address these challenges, we propose Fuz-RL, a fuzzy measure-guided robust framework for safe RL. Specifically, our framework develops a novel fuzzy Bellman operator for estimating robust value functions using Choquet integrals. Theoretically, we prove that solving the Fuz-RL problem (in Constrained Markov Decision Process (CMDP) form) is equivalent to solving distributionally robust safe RL problems (in robust CMDP form), effectively avoiding min-max optimization. Empirical analyses on safe-control-gym and safety-gymnasium scenarios demonstrate that Fuz-RL effectively integrates with existing safe RL baselines in a model-free manner, significantly improving both safety and control performance under various types of uncertainties in observation, action, and dynamics.

replace Efficient Continual Learning in Language Models via Thalamically Routed Cortical Columns

Authors: Afshin Khadangi

Abstract: Large language models deployed in the wild must adapt to evolving data, user behavior, and task mixtures without erasing previously acquired capabilities. In practice, this remains difficult: sequential updates induce catastrophic forgetting, while many stabilization methods rely on external procedures that are costly, brittle, or difficult to scale. We present TRC$^{2}$ (Thalamically Routed Cortical Columns), a decoder-only architecture that makes continual adaptation a property of the backbone itself. TRC$^{2}$ combines stacked cortical columns with a thalamic modulatory pathway for selective inter-column communication and a hippocampal pathway for event-selective retrieval, delayed surprise-based writing, and replay-driven consolidation. This design localizes fast plasticity while preserving a slower stable computation pathway. We further introduce a causal memory-update scheme and an online replay controller that adjusts consolidation strength from measured forgetting. Across a task-sequential language-modeling stream over C4, WikiText-103, and GSM8K, TRC$^{2}$ consistently improves task-boundary modeling quality and substantially reduces cumulative forgetting relative to Transformer, Mamba, MoE, and DeepSeek baselines trained under the same pipeline. Ablations show that the thalamic and hippocampal components are central to the retention gains, while the full model remains competitive in throughput and training cost.

replace Induction Meets Biology: Mechanisms of Repeat Detection in Protein Language Models

Authors: Gal Kesten-Pomeranz, Yaniv Nikankin, Anja Reusch, Tomer Tsaban, Ora Schueler-Furman, Yonatan Belinkov

Abstract: Protein sequences are abundant in repeating segments, both as exact copies and as approximate segments with mutations. These repeats are important for protein structure and function, motivating decades of algorithmic work on repeat identification. Recent work has shown that protein language models (PLMs) identify repeats, by examining their behavior in masked-token prediction. To elucidate their internal mechanisms, we investigate how PLMs detect both exact and approximate repeats. We find that the mechanism for approximate repeats functionally subsumes that of exact repeats. We then characterize this mechanism, revealing two main stages: PLMs first build feature representations using both general positional attention heads and biologically specialized components, such as neurons that encode amino-acid similarity. Then, induction heads attend to aligned tokens across repeated segments, promoting the correct answer. Our results reveal how PLMs solve this biological task by combining language-based pattern matching with specialized biological knowledge, thereby establishing a basis for studying more complex evolutionary processes in PLMs.

replace EvoX: Meta-Evolution for Automated Discovery

Authors: Shu Liu, Shubham Agarwal, Monishwaran Maheswaran, Mert Cemri, Zhifei Li, Qiuyang Mang, Ashwin Naren, Ethan Boneh, Audrey Cheng, Melissa Z. Pan, Alexander Du, Kurt Keutzer, Alvin Cheung, Alexandros G. Dimakis, Koushik Sen, Matei Zaharia, Ion Stoica

Abstract: Recent work such as AlphaEvolve has shown that combining LLM-driven optimization with evolutionary search can effectively improve programs, prompts, and algorithms across domains. In this paradigm, previously evaluated solutions are reused to guide the model toward new candidate solutions. Crucially, the effectiveness of this evolution process depends on the search strategy: how prior solutions are selected and varied to generate new candidates. However, most existing methods rely on fixed search strategies with predefined knobs (e.g., explore-exploit ratios) that remain static throughout execution. While effective in some settings, these approaches often fail to adapt across tasks, or even within the same task as the search space changes over time. We introduce EvoX, an adaptive evolution method that optimizes its own evolution process. EvoX jointly evolves candidate solutions and the search strategies used to generate them, continuously updating how prior solutions are selected and varied based on progress. This enables the system to dynamically shift between different search strategies during the optimization process. Across nearly 200 real-world optimization tasks, EvoX outperforms existing AI-driven evolutionary methods including AlphaEvolve, OpenEvolve, GEPA, and ShinkaEvolve on the majority of tasks.

replace A Gauge Theory of Superposition: Toward a Sheaf-Theoretic Atlas of Neural Representations

Authors: Hossein Javidnia

Abstract: We develop a discrete gauge-theoretic framework for superposition in large language models (LLMs) that replaces the single-global-dictionary premise with a sheaf-theoretic atlas of local semantic charts. Contexts are clustered into a stratified context complex; each chart carries a local feature space and a local information-geometric metric (Fisher/Gauss-Newton) identifying predictively consequential feature interactions. This yields a Fisher-weighted interference energy and three measurable obstructions to global interpretability: (O1) local jamming (active load exceeds Fisher bandwidth), (O2) proxy shearing (mismatch between geometric transport and a fixed correspondence proxy), and (O3) nontrivial holonomy (path-dependent transport around loops). We prove and instantiate four results on a frozen open LLM (Llama-3.2-3B Instruct) using WikiText-103, a C4-derived English web-text subset, and the-stack-smol. (A) After constructive gauge fixing on a spanning tree, each chord residual equals the holonomy of its fundamental cycle, making holonomy computable and gauge-invariant. (B) Shearing lower-bounds a data-dependent transfer mismatch energy, turning $D_{\mathrm{shear}}$ into an unavoidable failure bound. (C) We obtain non-vacuous certified jamming/interference bounds with high coverage and zero violations across seeds and hyperparameters. (D) Bootstrap and sample-size experiments show stable estimation of $D_{\mathrm{shear}}$ and $D_{\mathrm{hol}}$, with improved concentration on well-conditioned subsystems.

replace CHLU: The Causal Hamiltonian Learning Unit as a Symplectic Primitive for Deep Learning

Authors: Pratik Jawahar, Maurizio Pierini

Abstract: Current deep learning primitives dealing with temporal dynamics suffer from a fundamental dichotomy: they are either discrete and unstable (LSTMs) \citep{pascanu_difficulty_2013}, leading to exploding or vanishing gradients; or they are continuous and dissipative (Neural ODEs) \citep{dupont_augmented_2019}, which destroy information over time to ensure stability. We propose the \textbf{Causal Hamiltonian Learning Unit} (pronounced: \textit{clue}), a novel Physics-grounded computational learning primitive. By enforcing a Relativistic Hamiltonian structure and utilizing symplectic integration, a CHLU strictly conserves phase-space volume, as an attempt to solve the memory-stability trade-off. We show that the CHLU is designed for infinite-horizon stability, as well as controllable noise filtering. We then demonstrate a CHLU's generative ability using the MNIST dataset as a proof-of-principle.

replace Scaling Reward Modeling without Human Supervision

Authors: Jingxuan Fan, Yueying Li, Zhenting Qi, Dinghuai Zhang, Kiant\'e Brantley, Sham M. Kakade, Hanlin Zhang

Abstract: Learning from feedback is an instrumental process for advancing the capabilities and safety of frontier models, yet its effectiveness is often constrained by cost and scalability. We present a pilot study that explores scaling reward models through unsupervised approaches. We operationalize reward-based scaling (RBS), in its simplest form, as preference learning over document prefixes and suffixes drawn from large-scale web corpora. Its advantage is demonstrated in various aspects: despite using no human annotations, training on 11M tokens of math-focused web data yields steady gains on RewardBench v1 and v2, and these improvements consistently transfer across diverse initialization backbones spanning model families and scales. Across models, our method improves RewardBench v2 accuracy by up to +7.7 points on average, with gains of up to +16.1 on in-domain math subsets and consistent improvements on out-of-domain safety and general subsets. When applied to best-of-N selection and policy optimization, these reward models substantially improve downstream math performance and match or exceed strong supervised reward model baselines of similar size. Overall, we demonstrate the feasibility and promise of training reward models without costly and potentially unreliable human annotations.

replace Aura: Universal Multi-dimensional Exogenous Integration for Aviation Time Series

Authors: Jiafeng Lin, Mengren Zheng, Simeng Ye, Yuxuan Wang, Huan Zhang, Yuhui Liu, Zhongyi Pei, Jianmin Wang

Abstract: Time series forecasting has witnessed an increasing demand across diverse industrial applications, where accurate predictions are pivotal for informed decision-making. Beyond numerical time series data, reliable forecasting in practical scenarios requires integrating diverse exogenous factors. Such exogenous information is often multi-dimensional or even multimodal, introducing heterogeneous interactions that unimodal time series models struggle to capture. In this paper, we delve into an aviation maintenance scenario and identify three distinct types of exogenous factors that influence temporal dynamics through distinct interaction modes. Based on this empirical insight, we propose Aura, a universal framework that explicitly organizes and encodes heterogeneous external information according to its interaction mode with the target time series. Specifically, Aura utilizes a tailored tripartite encoding mechanism to embed heterogeneous features into well-established time series models, ensuring seamless integration of non-sequential context. Extensive experiments on a large-scale, three-year industrial dataset from China Southern Airlines, covering the Boeing 777 and Airbus A320 fleets, demonstrate that Aura consistently achieves state-of-the-art performance across all baselines and exhibits superior adaptability. Our findings highlight Aura's potential as a general-purpose enhancement for aviation safety and reliability.

replace Preserving Continuous Symmetry in Discrete Spaces: Geometric-Aware Quantization for SO(3)-Equivariant GNNs

Authors: Haoyu Zhou, Ping Xue, Hao Zhang, Tianfan Fu

Abstract: Equivariant Graph Neural Networks (GNNs) are essential for physically consistent molecular simulations but suffer from high computational costs and memory bottlenecks, especially with high-order representations. While low-bit quantization offers a solution, applying it naively to rotation-sensitive features destroys the SO(3)-equivariant structure, leading to significant errors and violations of conservation laws. To address this issue, in this work, we propose a Geometric-Aware Quantization (GAQ) framework that compresses and accelerates equivariant models while rigorously preserving continuous symmetry in discrete spaces. Our approach introduces three key contributions: (1) a Magnitude-Direction Decoupled Quantization (MDDQ) scheme that separates invariant lengths from equivariant orientations to maintain geometric fidelity; (2) a symmetry-aware training strategy that treats scalar and vector features with distinct quantization schedules; and (3) a robust attention normalization mechanism to stabilize gradients in low-bit regimes. Experiments on the rMD17 benchmark demonstrate that our W4A8 models match the accuracy of FP32 baselines (9.31 meV vs. 23.20 meV) while reducing Local Equivariance Error (LEE) by over 30x compared to naive quantization. On consumer hardware, GAQ achieves 2.39x inference speedup and 4x memory reduction, enabling stable, energy-conserving molecular dynamics simulations for nanosecond timescales.

replace DC-Merge: Improving Model Merging with Directional Consistency

Authors: Han-Chen Zhang, Zi-Hao Zhou, Mao-Lin Luo, Shimin Di, Min-Ling Zhang, Tong Wei

Abstract: Model merging aims to integrate multiple task-adapted models into a unified model that preserves the knowledge of each task. In this paper, we identify that the key to this knowledge retention lies in maintaining the directional consistency of singular spaces between merged multi-task vector and individual task vectors. However, this consistency is frequently compromised by two issues: i) an imbalanced energy distribution within task vectors, where a small fraction of singular values dominate the total energy, leading to the neglect of semantically important but weaker components upon merging, and ii) the geometric inconsistency of task vectors in parameter space, which causes direct merging to distort their underlying directional geometry. To address these challenges, we propose DC-Merge, a method for directional-consistent model merging. It first balances the energy distribution of each task vector by smoothing its singular values, ensuring all knowledge components are adequately represented. These energy-balanced vectors are then projected onto a shared orthogonal subspace to align their directional geometries with minimal reconstruction error. Finally, the aligned vectors are aggregated in the shared orthogonal subspace and projected back to the original parameter space. Extensive experiments on vision and vision-language benchmarks show that DC-Merge consistently achieves state-of-the-art performance in both full fine-tuning and LoRA settings. The implementation code is available at https://github.com/Tobeginwith/DC-Merge.

URLs: https://github.com/Tobeginwith/DC-Merge.

replace Frequency-Separable Hamiltonian Neural Network for Multi-Timescale Dynamics

Authors: Yaojun Li, Yulong Yang, Christine Allen-Blanchette

Abstract: While Hamiltonian mechanics provides a powerful inductive bias for neural networks modeling dynamical systems, Hamiltonian Neural Networks and their variants often fail to capture complex temporal dynamics spanning multiple timescales. This limitation is commonly linked to the spectral bias of deep neural networks, which favors learning low-frequency, slow-varying dynamics. Prior approaches have sought to address this issue through symplectic integration schemes that enforce energy conservation or by incorporating geometric constraints to impose structure on the configuration-space. However, such methods either remain limited in their ability to fully capture multiscale dynamics or require substantial domain specific assumptions. In this work, we exploit the observation that Hamiltonian functions admit decompositions into explicit fast and slow modes and can be reconstructed from these components. We introduce the Frequency-Separable Hamiltonian Neural Network (FS-HNN), which parameterizes the system Hamiltonian using multiple networks, each governed by Hamiltonian dynamics and trained on data sampled at distinct timescales. We further extend this framework to partial differential equations by learning a state- and boundary-conditioned symplectic operators. Empirically, we show that FS-HNN improves long-horizon extrapolation performance on challenging dynamical systems and generalizes across a broad range of ODE and PDE problems.

replace HEARTS: Benchmarking LLM Reasoning on Health Time Series

Authors: Sirui Li, Shuhan Xiao, Mihir Joshi, Ahmed Metwally, Daniel McDuff, Wei Wang, Yuzhe Yang

Abstract: The rise of large language models (LLMs) has shifted time series analysis from narrow analytics to general-purpose reasoning. Yet, existing benchmarks cover only a small set of health time series modalities and tasks, failing to reflect the diverse domains and extensive temporal dependencies inherent in real-world physiological modeling. To bridge these gaps, we introduce HEARTS (Health Reasoning over Time Series), a unified benchmark for evaluating hierarchical reasoning capabilities of LLMs over general health time series. HEARTS integrates 16 real-world datasets across 12 health domains and 20 signal modalities, and defines a comprehensive taxonomy of 110 tasks grouped into four core capabilities: Perception, Inference, Generation, and Deduction. Evaluating 14 state-of-the-art LLMs on more than 20K test samples reveals intriguing findings. First, LLMs substantially underperform specialized models, and their performance is only weakly related to general reasoning scores. Moreover, LLMs often rely on simple heuristics and struggle with multi-step temporal reasoning. Finally, performance declines with increasing temporal complexity, with similar failure modes within model families, indicating that scaling alone is insufficient. By making these gaps measurable, HEARTS provides a standardized testbed and living benchmark for developing next-generation LLM agents capable of reasoning over diverse health signals.

replace Implementation of Quantum Implicit Neural Representation in Deterministic and Probabilistic Autoencoders for Image Reconstruction/Generation Tasks

Authors: Saadet M\"uzehher Eren

Abstract: We propose a quantum implicit neural representation (QINR)-based autoencoder (AE) and variational autoencoder (VAE) for image reconstruction and generation tasks. Our purpose is to demonstrate that the QINR in VAEs and AEs can transform information from the latent space into highly rich, periodic, and high-frequency features. Additionally, we aim to show that the QINR-VAE can be more stable than various quantum generative adversarial network (QGAN) models in image generation because it can address the low diversity problem. Our quantum-classical hybrid models consist of a classical convolutional neural network (CNN) encoder and a quantum-based QINR decoder. We train the QINR-AE/VAE with binary cross-entropy with logits (BCEWithLogits) as the reconstruction loss. For the QINR-VAE, we additionally employ Kullback-Leibler divergence for latent regularization with beta/capacity scheduling to prevent posterior collapse. We introduce learnable angle-scaling in data reuploading to address optimization challenges. We test our models on the MNIST, E-MNIST, and Fashion-MNIST datasets to reconstruct and generate images. Our results demonstrate that the QINR structure in VAE can produce a wider variety of images with a small amount of data than various generative models that have been studied. We observe that the generated and reconstructed images from the QINR-VAE/AE are clear with sharp boundaries and details. Overall, we find that the addition of QINR-based quantum layers into the AE/VAE frameworks enhances the performance of reconstruction and generation with a constrained set of parameters.

replace Failure Detection in Chemical Processes Using Symbolic Machine Learning: A Case Study on Ethylene Oxidation

Authors: Julien Amblard, Niklas Groll, Matthew Tait, Mark Law, G\"urkan Sin, Alessandra Russo

Abstract: Over the past decade, Artificial Intelligence has significantly advanced, mostly driven by large-scale neural approaches. However, in the chemical process industry, where safety is critical, these methods are often unsuitable due to their brittleness, and lack of explainability and interpretability. Furthermore, open-source real-world datasets containing historical failures are scarce in this domain. In this paper, we investigate an approach for predicting failures in chemical processes using symbolic machine learning and conduct a feasibility study in the context of an ethylene oxidation process. Our method builds on a state-of-the-art symbolic machine learning system capable of learning predictive models in the form of probabilistic rules from context-dependent noisy examples. This system is a general-purpose symbolic learner, which makes our approach independent of any specific chemical process. To address the lack of real-world failure data, we conduct our feasibility study leveraging data generated from a chemical process simulator. Experimental results show that symbolic machine learning can outperform baseline methods such as random forest and multilayer perceptron, while preserving interpretability through the generation of compact, rule-based predictive models. Finally, we explain how such learned rule-based models could be integrated into agents to assist chemical plant operators in decision-making during potential failures.

replace NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks

Authors: Nandan Kumar Jha, Brandon Reagen

Abstract: We introduce NerVE, a unified eigenspectral framework for understanding how feed-forward networks (FFNs) in large language models (LLMs) organize and regulate information flow in high-dimensional latent space. Despite FFNs dominating the parameter budget, their high-dimensional dynamics remain poorly understood. NerVE addresses this gap through lightweight, memory-efficient tracking of eigenspectrum dynamics via four complementary metrics: Spectral Entropy (dispersion), Participation Ratio (effective dimensionality), Eigenvalue Early Enrichment (top-heaviness), and Jensen-Shannon divergence (distributional shifts). Our key insight is that FFN nonlinearities reinject variance across eigenmodes, fundamentally governing latent dimension utilization, and that optimizer geometry strongly modulates the extent of this variance reinjection. We validate NerVE across model scales, and diverse architectural and optimizer configurations, each uniquely shaping FFN dynamics: normalization schemes controlling variance flow; FFN weight geometries constraining latent space; positional encoding and activation functions regulating information flow; and optimizer choices redistributing effective capacity across depth. Across these settings, NerVE consistently recovers stable spectral signatures that correlate with model's generalization ability and respond predictably to design choices, generalizing beyond transformer to MLP-Mixer architectures, providing actionable insights for architectural and optimizer choices beyond trial-and-error.

replace OrthoFormer: Instrumental Variable Estimation in Transformer Hidden States via Neural Control Functions

Authors: Charles Luo

Abstract: Transformer architectures excel at sequential modeling yet remain fundamentally limited by correlational learning - they capture spurious associations induced by latent confounders rather than invariant causal mechanisms. We identify this as an epistemological challenge: standard Transformers conflate static background factors (intrinsic identity, style, context) with dynamic causal flows (state evolution, mechanism), leading to catastrophic out-of-distribution failure. We propose OrthoFormer, a causally grounded architecture that embeds instrumental variable estimation directly into Transformer blocks via neural control functions. Our framework rests on four theoretical pillars: Structural Directionality (time-arrow enforcement), Representation Orthogonality (latent-noise separation), Causal Sparsity (Markov Blanket approximation), and End-to-End Consistency (gradient- detached stage separation). We prove that OrthoFormer achieves bias strictly less than OLS for any valid instrument lag, with residual bias decaying geometrically as O(\r{ho}k ). We characterize the bias-variance-exogeneity trilemma inherent in self-instrumenting and identify the neural forbidden regression - where removing gradient detachment improves prediction loss while destroying causal validity. Experiments confirm all theoretical predictions. OrthoFormer represents a paradigm shift from correlational to causal sequence modeling, with implications for robustness, interpretability, and reliable decision-making under distribution shift.

replace DyQ-VLA: Temporal-Dynamic-Aware Quantization for Embodied Vision-Language-Action Models

Authors: Zihao Zheng, Hangyu Cao, Sicheng Tian, Jiayu Chen, Maoliang Li, Xinhao Sun, Hailong Zou, Zhaobo Zhang, Xuanzhe Liu, Donggang Cao, Hong Mei, Xiang Chen

Abstract: Vision-Language-Action (VLA) models are dominant in embodied intelligence but are constrained by inference overheads. While model quantization alleviates these bottlenecks for edge deployment, static quantization approaches remain suboptimal for VLAs due to two critical challenges: (1) Temporal-dynamic sensitivity, where fixed precision wastes resources by ignoring stage-varying error tolerances; and (2) Real-time allocation, where identifying real-time sensitivity to guide bit allocation remains unsolved. To address these challenges, we propose DyQ-VLA, a dynamic quantization framework for VLAs. Specifically, a sensitivity-aware switching strategy leverages real-time kinematic proxies to trigger the bit-width switch, while a kinematic-guided module dynamically allocates the optimal bit-width. Experiments show that DyQ-VLA requires only 30.9% of the original memory footprint while maintaining 99.5% of its original performance, achieving 1.49x simulation and up to 1.43x real-world speedups.

replace Distributional Regression with Tabular Foundation Models: Evaluating Probabilistic Predictions via Proper Scoring Rules

Authors: Jonas Landsgesell, Pascal Knoll

Abstract: Tabular foundation models such as TabPFN and TabICL already produce full predictive distributions, yet the benchmarks used to evaluate them (TabArena, TALENT, and others) still rely almost exclusively on point-estimate metrics (RMSE, $R^2$). This mismatch implicitly rewards models that elicit a good conditional mean while ignoring the quality of the predicted distribution. We make two contributions. First, we propose supplementing standard point metrics with proper scoring rules (CRPS, CRLS, and the Interval Score) and provide a head-to-head comparison of realTabPFNv2.5 and TabICLv2 with regards to some proper scoring rules across 20 OpenML regression datasets. Second, we show analytically and empirically that different proper scoring rules induce different model rankings and different inductive biases during training, even though each rule is individually minimized by the true distribution. Fine-tuning realTabPFNv2.5 with scoring rules not seen during pretraining (CRLS, $\beta=1.8$ energy score) yields consistent improvements on the corresponding metrics, confirming that the training loss shapes the model beyond what propriety alone guarantees. Together, these findings argue for (i) reporting distributional metrics in tabular regression benchmarks and (ii) making the training objective of foundation models adaptable (via fine-tuning or task-token conditioning) to the scoring rule relevant to the downstream decision problem.

replace Learning Adaptive LLM Decoding

Authors: Chloe H. Su, Zhe Ye, Samuel Tenka, Aidan Yang, Soonho Kong, Udaya Ghai

Abstract: Decoding from large language models (LLMs) typically relies on fixed sampling hyperparameters (e.g., temperature, top-p), despite substantial variation in task difficulty and uncertainty across prompts and individual decoding steps. We propose to learn adaptive decoding policies that dynamically select sampling strategies at inference time, conditioned on available compute resources. Rather than fine-tuning the language model itself, we introduce lightweight decoding adapters trained with reinforcement learning and verifiable terminal rewards (e.g. correctness on math and coding tasks). At the sequence level, we frame decoding as a contextual bandit problem: a policy selects a decoding strategy (e.g. greedy, top-k, min-p) for each prompt, conditioned on the prompt embedding and a parallel sampling budget. At the token level, we model decoding as a partially observable Markov decision process (POMDP), where a policy selects sampling actions at each token step based on internal model features and the remaining token budget. Experiments on the MATH and CodeContests benchmarks show that the learned adapters improve the accuracy-budget tradeoff: on MATH, the token-level adapter improves Pass@1 accuracy by up to 10.2% over the best static baseline under a fixed token budget, while the sequence-level adapter yields 2-3% gains under fixed parallel sampling. Ablation analyses support the contribution of both sequence- and token-level adaptation.

replace Proxy-Guided Measurement Calibration

Authors: Saketh Vishnubhatla, Shu Wan, Andre Harrison, Adrienne Raglin, Huan Liu

Abstract: Aggregate outcome variables collected through surveys and administrative records are often subject to systematic measurement error. For instance, in disaster loss databases, county-level losses reported may differ from the true damages due to variations in on-the-ground data collection capacity, reporting practices, and event characteristics. Such miscalibration complicates downstream analysis and decision-making. We study the problem of outcome miscalibration and propose a framework guided by proxy variables for estimating and correcting the systematic errors. We model the data-generating process using a causal graph that separates latent content variables driving the true outcome from the latent bias variables that induce systematic errors. The key insight is that proxy variables that depend on the true outcome but are independent of the bias mechanism provide identifying information for quantifying the bias. Leveraging this structure, we introduce a two-stage approach that utilizes variational autoencoders to disentangle content and bias latents, enabling us to estimate the effect of bias on the outcome of interest. We analyze the assumptions underlying our approach and evaluate it on synthetic data, semi-synthetic datasets derived from randomized trials, and a real-world case study of disaster loss reporting.

replace Variational Routing: A Scalable Bayesian Framework for Calibrated Mixture-of-Experts Transformers

Authors: Albus Yizhuo Li, Matthew Wicker

Abstract: Foundation models are increasingly being deployed in contexts where understanding the uncertainty of their outputs is critical to ensuring responsible deployment. While Bayesian methods offer a principled approach to uncertainty quantification, their computational overhead renders their use impractical for training or inference at foundation model scale. State-of-the-art models achieve parameter counts in the trillions through carefully engineered sparsity including Mixture-of-Experts (MoE) layers. In this work, we demonstrate calibrated uncertainty at scale by introducing Variational Mixture-of-Experts Routing (VMoER), a structured Bayesian approach for modelling uncertainty in MoE layers. VMoER confines Bayesian inference to the expert-selection stage which is typically done by a deterministic routing network. We instantiate VMoER using two inference strategies: amortised variational inference over routing logits and inferring a temperature parameter for stochastic expert selection. Across tested foundation models, VMoER improves routing stability under noise by 38\%, reduces calibration error by 94\%, and increases out-of-distribution AUROC by 12\%, while incurring less than 1\% additional FLOPs. These results suggest VMoER offers a scalable path toward robust and uncertainty-aware foundation models.

replace Estimating condition number with Graph Neural Networks

Authors: Erin Carson, Xinye Chen

Abstract: For large sparse matrices, we almost never compute the condition number exactly because that would require computing the full SVD or full eigenvalue decompositionIn this paper, we propose a fast method for estimating the condition number of sparse matrices using graph neural networks (GNNs). To enable efficient training and inference of GNNs, our proposed feature engineering for GNNs achieves $\mathrm{O}(\mathrm{nnz} + n)$, where $\mathrm{nnz}$ is the number of non-zero elements in the matrix and $n$ denotes the matrix dimension. We propose two prediction schemes for estimating the matrix condition number using GNNs. One follows by decomposing the condition number and predicts the relatively more computationally intensive part $\|\mathbf{A}^{-1}\|$, while the other is to predict the whole condition number $\kappa$. Our approach can be extended to an arbitrary norm. The extensive experiments for the two schemes are conducted for 1-norm and 2-norm condition number estimation, which show that our method achieves a significant speedup over the traditional numerical estimation methods.

replace A Grammar of Machine Learning Workflows

Authors: Simon Roth

Abstract: Data leakage affected 294 published papers across 17 scientific fields (Kapoor & Narayanan, 2023); a living survey has since grown that count to 648 across 30 fields. The dominant response has been documentation: checklists, linters, best-practice guides. Documentation reduces errors but does not close structural failures. This paper proposes a structural remedy: a grammar that decomposes the supervised learning lifecycle into 8 kernel primitives connected by a typed directed acyclic graph (DAG), with four hard constraints that reject the two most damaging leakage classes at call time. The grammar's core contribution is the terminal assess constraint: a runtime-enforced evaluate/assess boundary where repeated test-set assessment is rejected by a guard on a nominally distinct Evidence type. A companion study across 2,047 experimental instances quantifies why this matters: selection leakage inflates performance by d_z = 0.93 and memorization leakage by d_z = 0.53-1.11. Two maintained implementations (Python, R) demonstrate the claims. The appendix specification lets anyone build a conforming version.

replace TOSSS: a CVE-based Software Security Benchmark for Large Language Models

Authors: Marc Damie, Murat Bilgehan Ertan, Domenico Essoussi, Angela Makhanu, Ga\"etan Peter, Roos Wensveen

Abstract: With their increasing capabilities, Large Language Models (LLMs) are now used across many industries. They have become useful tools for software engineers and support a wide range of development tasks. As LLMs are increasingly used in software development workflows, a critical question arises: are LLMs good at software security? At the same time, organizations worldwide invest heavily in cybersecurity to reduce exposure to disruptive attacks. The integration of LLMs into software engineering workflows may introduce new vulnerabilities and weaken existing security efforts. We introduce TOSSS (Two-Option Secure Snippet Selection), a benchmark that measures the ability of LLMs to choose between secure and vulnerable code snippets. Existing security benchmarks for LLMs cover only a limited range of vulnerabilities. In contrast, TOSSS relies on the CVE database and provides an extensible framework that can integrate newly disclosed vulnerabilities over time. Our benchmark gives each model a security score between 0 and 1 based on its behavior; a score of 1 indicates that the model always selects the secure snippet, while a score of 0 indicates that it always selects the vulnerable one. We evaluate 14 widely used open-source and closed-source models on C/C++ and Java code and observe scores ranging from 0.48 to 0.89. LLM providers already publish many benchmark scores for their models, and TOSSS could become a complementary security-focused score to include in these reports.

replace Differentiable Thermodynamic Phase-Equilibria for Machine Learning

Authors: Karim K. Ben Hicham, Moreno Ascani, Jan G. Rittig, Alexander Mitsos

Abstract: Accurate prediction of phase equilibria remains a central challenge in chemical engineering. Physics-consistent machine learning methods that incorporate thermodynamic structure into neural networks have recently shown strong performance for activity-coefficient modeling. However, extending such approaches to equilibrium data arising from an extremum principle, such as liquid-liquid equilibria, remains difficult. Here we present DISCOMAX, a differentiable algorithm for phase-equilibrium calculation that guarantees thermodynamic consistency at both training and inference, only subject to a user-specified discretization. The method is rooted in statistical thermodynamics, and works via a discrete enumeration with subsequent masked softmax aggregation of feasible states, and together with a straight-through gradient estimator to enable physics-consistent end-to-end learning of neural $g^{E}$-models. We evaluate the approach on binary liquid-liquid equilibrium data and demonstrate that it outperforms existing surrogate-based methods, while offering a general framework for learning from different kinds of equilibrium data.

replace Relaxed Efficient Acquisition of Context and Temporal Features

Authors: Yunni Qu (The University of North Carolina at Chapel Hill), Dzung Dinh (The University of North Carolina at Chapel Hill), Grant King (University of Michigan), Whitney Ringwald (University of Minnisota Twin Cities), Bing Cai Kok (The University of North Carolina at Chapel Hill), Kathleen Gates (The University of North Carolina at Chapel Hill), Aidan Wright (University of Michigan), Junier Oliva (The University of North Carolina at Chapel Hill)

Abstract: In many biomedical applications, measurements are not freely available at inference time: each laboratory test, imaging modality, or assessment incurs financial cost, time burden, or patient risk. Longitudinal active feature acquisition (LAFA) seeks to optimize predictive performance under such constraints by adaptively selecting measurements over time, yet the problem remains inherently challenging due to temporally coupled decisions (missed early measurements cannot be revisited, and acquisition choices influence all downstream predictions). Moreover, real-world clinical workflows typically begin with an initial onboarding phase, during which relatively stable contextual descriptors (e.g., demographics or baseline characteristics) are collected once and subsequently condition longitudinal decision-making. Despite its practical importance, the efficient selection of onboarding context has not been studied jointly with temporally adaptive acquisition. We therefore propose REACT (Relaxed Efficient Acquisition of Context and Temporal features), an end-to-end differentiable framework that simultaneously optimizes (i) selection of onboarding contextual descriptors and (ii) adaptive feature--time acquisition plans for longitudinal measurements under cost constraints. REACT employs a Gumbel--Sigmoid relaxation with straight-through estimation to enable gradient-based optimization over discrete acquisition masks, allowing direct backpropagation from prediction loss and acquisition cost. Across real-world longitudinal health and behavioral datasets, REACT achieves improved predictive performance at lower acquisition costs compared to existing longitudinal acquisition baselines, demonstrating the benefit of modeling onboarding and temporally coupled acquisition within a unified optimization framework.

replace Slack More, Predict Better: Proximal Relaxation for Probabilistic Latent Variable Model-based Soft Sensors

Authors: Zehua Zou, Yiran Ma, Yulong Zhang, Zhengnan Li, Zeyu Yang, Jinhao Xie, Xiaoyu Jiang, Zhichao Chen

Abstract: Nonlinear Probabilistic Latent Variable Models (NPLVMs) are a cornerstone of soft sensor modeling due to their capacity for uncertainty delineation. However, conventional NPLVMs are trained using amortized variational inference, where neural networks parameterize the variational posterior. While facilitating model implementation, this parameterization converts the distributional optimization problem within an infinite-dimensional function space to parameter optimization within a finite-dimensional parameter space, which introduces an approximation error gap, thereby degrading soft sensor modeling accuracy. To alleviate this issue, we introduce KProxNPLVM, a novel NPLVM that pivots to relaxing the objective itself and improving the NPLVM's performance. Specifically, we first prove the approximation error induced by the conventional approach. Based on this, we design the Wasserstein distance as the proximal operator to relax the learning objective, yielding a new variational inference strategy derived from solving this relaxed optimization problem. Based on this foundation, we provide a rigorous derivation of KProxNPLVM's optimization implementation, prove the convergence of our algorithm can finally sidestep the approximation error, and propose the KProxNPLVM by summarizing the abovementioned content. Finally, extensive experiments on synthetic and real-world industrial datasets are conducted to demonstrate the efficacy of the proposed KProxNPLVM.

replace Attention Sinks Are Provably Necessary in Softmax Transformers: Evidence from Trigger-Conditional Tasks

Authors: Yuval Ran-Milo

Abstract: Transformers often display an attention sink: probability mass concentrates on a fixed, content-agnostic position. Are sinks a byproduct of the optimization/training regime? Or are they sometimes functionally necessary in softmax Transformers? Are sinks a byproduct of the optimization/training regime? Or are they sometimes functionally necessary in softmax Transformers? We prove that, in some settings, it is the latter: computing a simple trigger-conditional behavior necessarily induces a sink in softmax self-attention models. Our results formalize a familiar intuition: normalization over a probability simplex must force attention to collapse onto a stable anchor to realize a default state (e.g., when the model needs to ignore the input). We instantiate this with a concrete task: when a designated trigger token appears, the model must return the average of all preceding token representations, and otherwise output zero, a task which mirrors the functionality of attention heads in the wild (Barbero et al., 2025; Guo et al., 2024). We also prove that non-normalized ReLU attention can solve the same task without any sink, confirming that the normalization constraint is the fundamental driver of sink behavior. Experiments validate our predictions and demonstrate they extend beyond the theoretically analyzed setting: softmax models develop strong sinks while ReLU attention eliminates them in both single-head and multi-head variants.

replace KEPo: Knowledge Evolution Poison on Graph-based Retrieval-Augmented Generation

Authors: Qizhi Chen, Chao Qi, Yihong Huang, Muquan Li, Rongzheng Wang, Dongyang Zhang, Ke Qin, Shuang Liang

Abstract: Graph-based Retrieval-Augmented Generation (GraphRAG) constructs the Knowledge Graph (KG) from external databases to enhance the timeliness and accuracy of Large Language Model (LLM) generations. However, this reliance on external data introduces new attack surfaces. Attackers can inject poisoned texts into databases to manipulate LLMs into producing harmful target responses for attacker-chosen queries. Existing research primarily focuses on attacking conventional RAG systems. However, such methods are ineffective against GraphRAG. This robustness derives from the KG abstraction of GraphRAG, which reorganizes injected text into a graph before retrieval, thereby enabling the LLM to reason based on the restructured context instead of raw poisoned passages. To expose latent security vulnerabilities in GraphRAG, we propose Knowledge Evolution Poison (KEPo), a novel poisoning attack method specifically designed for GraphRAG. For each target query, KEPo first generates a toxic event containing poisoned knowledge based on the target answer. By fabricating event backgrounds and forging knowledge evolution paths from original facts to the toxic event, it then poisons the KG and misleads the LLM into treating the poisoned knowledge as the final result. In multi-target attack scenarios, KEPo further connects multiple attack corpora, enabling their poisoned knowledge to mutually reinforce while expanding the scale of poisoned communities, thereby amplifying attack effectiveness. Experimental results across multiple datasets demonstrate that KEPo achieves state-of-the-art attack success rates for both single-target and multi-target attacks, significantly outperforming previous methods.

replace Mitigating the Multiplicity Burden: The Role of Calibration in Reducing Predictive Multiplicity of Classifiers

Authors: Mustafa Cavus

Abstract: As machine learning models are increasingly deployed in high-stakes environments, ensuring both probabilistic reliability and prediction stability has become critical. This paper examines the interplay between classification calibration and predictive multiplicity - the phenomenon in which multiple near-optimal models within the Rashomon set yield conflicting credit outcomes for the same applicant. Using nine diverse credit risk benchmark datasets, we investigate whether predictive multiplicity concentrates in regions of low predictive confidence and how post-hoc calibration can mitigate algorithmic arbitrariness. Our empirical analysis reveals that minority class observations bear a disproportionate multiplicity burden, as confirmed by significant disparities in predictive multiplicity and prediction confidence. Furthermore, our empirical comparisons indicate that applying post-hoc calibration methods - specifically Platt Scaling, Isotonic Regression, and Temperature Scaling - is associated with lower obscurity across the Rashomon set. Among the tested techniques, Platt Scaling and Isotonic Regression provide the most robust reduction in predictive multiplicity. These findings suggest that calibration can function as a consensus-enforcing layer and may support procedural fairness by mitigating predictive multiplicity.

replace MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?

Authors: Xingze Zou, Jing Wang, Yuhua Zheng, Xueyi Chen, Haolei Bai, Lingcheng Kong, Syed A. R. Abu-Bakar, Zhaode Wang, Chengfei Lv, Haoji Hu, Huan Wang

Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in code generation, yet their potential for generating kernels specifically for mobile devices remains largely unexplored. In this work, we extend the scope of automated kernel generation to the mobile domain to investigate the central question: Can LLMs write efficient kernels for mobile devices? To enable systematic investigation, we introduce MobileKernelBench, a comprehensive evaluation framework comprising a benchmark prioritizing operator diversity and cross-framework interoperability, coupled with an automated pipeline that bridges the host-device gap for on-device verification. Leveraging this framework, we conduct extensive evaluation on the CPU backend of Mobile Neural Network (MNN), revealing that current LLMs struggle with the engineering complexity and data scarcity inherent to mobile frameworks; standard models and even fine-tuned variants exhibit high compilation failure rates (over 54%) and negligible performance gains due to hallucinations and a lack of domain-specific grounding. To overcome these limitations, we propose the Mobile Kernel Agent (MoKA), a multi-agent system equipped with repository-aware reasoning and a plan-and-execute paradigm. Validated on MobileKernelBench, MoKA achieves state-of-the-art performance, boosting compilation success to 93.7% and enabling 27.4% of generated kernels to deliver measurable speedups over native libraries.

replace Separable neural architectures as a primitive for unified predictive and generative intelligence

Authors: Reza T. Batley, Apurba Sarker, Rajib Mostakim, Andrew Klichine, Sourav Saha

Abstract: Intelligent systems across physics, language and perception often exhibit factorisable structure, yet are typically modelled by monolithic neural architectures that do not explicitly exploit this structure. The separable neural architecture (SNA) addresses this by formalising a representational class that unifies additive, quadratic and tensor-decomposed neural models. By constraining interaction order and tensor rank, SNAs impose a structural inductive bias that factorises high-dimensional mappings into low-arity components. Separability need not be a property of the system itself: it often emerges in the coordinates or representations through which the system is expressed. Crucially, this coordinate-aware formulation reveals a structural analogy between chaotic spatiotemporal dynamics and linguistic autoregression. By treating continuous physical states as smooth, separable embeddings, SNAs enable distributional modelling of chaotic systems. This approach mitigates the nonphysical drift characteristics of deterministic operators whilst remaining applicable to discrete sequences. The compositional versatility of this approach is demonstrated across four domains: autonomous waypoint navigation via reinforcement learning, inverse generation of multifunctional microstructures, distributional modelling of turbulent flow and neural language modelling. These results establish the separable neural architecture as a domain-agnostic primitive for predictive and generative intelligence, capable of unifying both deterministic and distributional representations.

replace Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models

Authors: Samy Jelassi, Mujin Kwun, Rosie Zhao, Yuanzhi Li, Nicolo Fusi, Yilun Du, Sham M. Kakade, Carles Domingo-Enrich

Abstract: Cross-entropy (CE) training provides dense and scalable supervision for language models, but it optimizes next-token prediction under teacher forcing rather than sequence-level behavior under model rollouts. We introduce a feature-matching objective for language-model fine-tuning that targets sequence-level statistics of the completion distribution, providing dense semantic feedback without requiring a task-specific verifier or preference model. To optimize this objective efficiently, we propose energy-based fine-tuning (EBFT), which uses strided block-parallel sampling to generate multiple rollouts from nested prefixes concurrently, batches feature extraction over these rollouts, and uses the resulting embeddings to perform an on-policy policy-gradient update. We present a theoretical perspective connecting EBFT to KL-regularized feature-matching and energy-based modeling. Empirically, across Q&A coding, unstructured coding, and translation, EBFT matches RLVR and outperforms SFT on downstream accuracy while achieving a lower validation cross-entropy than both methods.

replace A Fractional Fox H-Function Kernel for Support Vector Machines: Robust Classification via Weighted Transmutation Operators

Authors: Gustavo Dorrego

Abstract: Support Vector Machines (SVMs) rely heavily on the choice of the kernel function to map data into high-dimensional feature spaces. While the Gaussian Radial Basis Function (RBF) is the industry standard, its exponential decay makes it highly susceptible to structural noise and outliers, often leading to severe overfitting in complex datasets. In this paper, we propose a novel class of non-stationary kernels derived from the fundamental solution of the generalized time-space fractional diffusion-wave equation. By leveraging a structure-preserving transmutation method over Weighted Sobolev Spaces, we introduce the Amnesia-Weighted Fox Kernel, an exact analytical Mercer kernel governed by the Fox H-function. Unlike standard kernels, our formulation incorporates an aging weight function (the "Amnesia Effect") to penalize distant outliers and a fractional asymptotic power-law decay to allow for robust, heavy-tailed feature mapping (analogous to L\'evy flights). Numerical experiments on both synthetic datasets and real-world high-dimensional radar data (Ionosphere) demonstrate that the proposed Amnesia-Weighted Fox Kernel consistently outperforms the standard Gaussian RBF baseline, reducing the classification error rate by approximately 50\% while maintaining structural robustness against outliers.

replace Surprised by Attention: Predictable Query Dynamics for Time Series Anomaly Detection

Authors: Kadir-Kaan \"Ozer, Ren\'e Ebeling, Markus Enzweiler

Abstract: Multivariate time series anomalies often manifest as shifts in cross-channel dependencies rather than simple amplitude excursions. In autonomous driving, for instance, a steering command might be internally consistent but decouple from the resulting lateral acceleration. Residual-based detectors can miss such anomalies when flexible sequence models still reconstruct signals plausibly despite altered coordination. We introduce AxonAD, an unsupervised detector that treats multi-head attention query evolution as a short horizon predictable process. A gradient-updated reconstruction pathway is coupled with a history-only predictor that forecasts future query vectors from past context. This is trained via a masked predictor-target objective against an exponential moving average (EMA) target encoder. At inference, reconstruction error is combined with a tail-aggregated query mismatch score, which measures cosine deviation between predicted and target queries on recent timesteps. This dual approach provides sensitivity to structural dependency shifts while retaining amplitude-level detection. On proprietary in-vehicle telemetry with interval annotations and on the TSB-AD multi-variate suite (17 datasets, 180 series) with threshold-free and range-aware metrics, AxonAD improves ranking quality and temporal localization over strong baselines. Ablations confirm that query prediction and combined scoring are the primary drivers of the observed gains. Code is available at the URL https://github.com/iis-esslingen/AxonAD.

URLs: https://github.com/iis-esslingen/AxonAD.

replace Exact Federated Continual Unlearning for Ridge Heads on Frozen Foundation Models

Authors: Yijun Quan, Wentai Wu, Giovanni Montana

Abstract: Foundation models are commonly deployed as frozen feature extractors with a small trainable head to adapt to private, user-generated data in federated settings. The ``right to be forgotten'' requires removing the influence of specific samples or users from the trained model on demand. Existing federated unlearning methods target general deep models and rely on approximate reconstruction or selective retraining, making exactness costly or elusive. We study this problem in a practically relevant but under-explored regime: a frozen foundation model with a ridge-regression head. The exact optimum depends on the data only through two additive sufficient statistics, which we turn into a communication protocol supporting an arbitrary stream of add and delete requests via fixed-size messages. The server maintains a head that is, in exact arithmetic, pointwise identical to centralized retraining after every request. We provide deterministic retrain-equivalence guarantees, order and partition invariance, two server-side variants, and a Bayesian certificate of zero KL divergence. Experiments on four benchmarks confirm the guarantees: both variants match centralized ridge retraining to within $10^{-9}$ relative Frobenius error and complete each request at orders-of-magnitude lower cost than federated retraining baselines.

replace 3DTCR: A Physics-Based Generative Framework for Vortex-Following 3D Reconstruction to Improve Tropical Cyclone Intensity Forecasting

Authors: Jun Liu, Xiaohui Zhong, Kai Zheng, Jiarui Li, Yifei Li, Tao Zhou, Wenxu Qian, Shun Dai, Ruian Tie, Yangyang Zhao, Hao Li

Abstract: Tropical cyclone (TC) intensity forecasting remains challenging as current numerical and AI-based weather models fail to satisfactorily represent extreme TC structure and intensity. Although intensity time-series forecasting has achieved significant advances, it outputs intensity sequences rather than the three-dimensional inner-core fine-scale structure and physical mechanisms governing TC evolution. High-resolution numerical simulations can capture these features but remain computationally expensive and inefficient for large-scale operational applications. Here we present 3DTCR, a physics-based generative framework combining physical constraints with generative AI efficiency for 3D TC structure reconstruction. Trained on a six-year, 3-km-resolution moving-domain WRF dataset, 3DTCR enables region-adaptive vortex-following reconstruction using conditional Flow Matching(CFM), optimized via latent domain adaptation and two-stage transfer learning. The framework mitigates limitations imposed by low-resolution targets and over-smoothed forecasts, improving the representation of TC inner-core structure and intensity while maintaining track stability. Results demonstrate that 3DTCR outperforms the ECMWF high-resolution forecasting system (ECMWF-HRES) in TC intensity prediction at nearly all lead times up to 5 days and reduces the RMSE of maximum WS10M by 36.5% relative to its FuXi inputs. These findings highlight 3DTCR as a physics-based generative framework that efficiently resolves fine-scale structures at lower computational cost, which may offer a promising avenue for improving TC intensity forecasting.

replace PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization

Authors: Yangsong Zhang, Anujith Muraleedharan, Rikhat Akizhanov, Abdul Ahad Butt, G\"ul Varol, Pascal Fua, Fabio Pizzati, Ivan Laptev

Abstract: Recent progress in text-conditioned human motion generation has been largely driven by diffusion models trained on large-scale human motion data. Building on this progress, recent methods attempt to transfer such models for character animation and real robot control by applying a Whole-Body Controller (WBC) that converts diffusion-generated motions into executable trajectories. While WBC trajectories become compliant with physics, they may expose substantial deviations from original motion. To address this issue, we here propose PhysMoDPO, a Direct Preference Optimization framework. Unlike prior work that relies on hand-crafted physics-aware heuristics such as foot-sliding penalties, we integrate WBC into our training pipeline and optimize diffusion model such that the output of WBC becomes compliant both with physics and original text instructions. To train PhysMoDPO we deploy physics-based and task-specific rewards and use them to assign preference to synthesized trajectories. Our extensive experiments on text-to-motion and spatial control tasks demonstrate consistent improvements of PhysMoDPO in both physical realism and task-related metrics on simulated robots. Moreover, we demonstrate that PhysMoDPO results in significant improvements when applied to zero-shot motion transfer in simulation and for real-world deployment on a G1 humanoid robot.

replace-cross Edgeworth Accountant: An Analytical Approach to Differential Privacy Composition

Authors: Hua Wang, Sheng Gao, Huanyu Zhang, Milan Shen, Weijie J. Su, Jiayuan Wu

Abstract: In privacy-preserving data analysis, many procedures and algorithms are structured as compositions of multiple private building blocks. As such, an important question is how to efficiently compute the overall privacy loss under composition. This paper introduces the Edgeworth Accountant, an analytical approach to composing differential privacy guarantees for private algorithms. Leveraging the $f$-differential privacy framework, the Edgeworth Accountant accurately tracks privacy loss under composition, enabling a closed-form expression of privacy guarantees through privacy-loss log-likelihood ratios (PLLRs). As implied by its name, this method applies the Edgeworth expansion to estimate and define the probability distribution of the sum of the PLLRs. Furthermore, by using a technique that simplifies complex distributions into simpler ones, we demonstrate the Edgeworth Accountant's applicability to any noise-addition mechanism. Its main advantage is providing $(\epsilon, \delta)$-differential privacy bounds that are non-asymptotic and do not significantly increase computational cost. This feature sets it apart from previous approaches, in which the running time increases with the number of mechanisms under composition. We conclude by showing how our Edgeworth Accountant offers accurate estimates and tight upper and lower bounds on $(\epsilon, \delta)$-differential privacy guarantees, especially tailored for training private models in deep learning and federated analytics.

replace-cross Quadratic Gradient: A Unified Framework Bridging Gradient Descent and Newton-Type Methods by Synthesizing Hessians and Gradients

Authors: John Chiang

Abstract: It might be inadequate for the line search technique for Newton's method to use only one floating point number. A column vector of the same size as the gradient might be better than a mere float number to accelerate each of the gradient elements with different rates. Moreover, a square matrix of the same order as the Hessian matrix might be helpful to correct the Hessian matrix. Chiang applied something between a column vector and a square matrix, namely a diagonal matrix, to accelerate the gradient and further proposed a faster gradient variant called quadratic gradient. In this paper, we present a new way to build a new version of the quadratic gradient. This new quadratic gradient doesn't satisfy the convergence conditions of the fixed Hessian Newton's method. However, experimental results show that it sometimes has a better performance than the original one in convergence rate. Also, Chiang speculates that there might be a relation between the Hessian matrix and the learning rate for the first-order gradient descent method. We prove that the floating number $\frac{1}{\epsilon + \max \{| \lambda_i | \}}$ can be a good learning rate of the gradient methods, where $\epsilon$ is a number to avoid division by zero and $\lambda_i$ the eigenvalues of the Hessian matrix.

replace-cross Offline Estimation of Controlled Markov Chains: Minimaxity and Sample Complexity

Authors: Imon Banerjee, Harsha Honnappa, Vinayak Rao

Abstract: In this work, we study a natural nonparametric estimator of the transition probability matrices of a finite controlled Markov chain. We consider an offline setting with a fixed dataset, collected using a so-called logging policy. We develop sample complexity bounds for the estimator and establish conditions for minimaxity. Our statistical bounds depend on the logging policy through its mixing properties. We show that achieving a particular statistical risk bound involves a subtle and interesting trade-off between the strength of the mixing properties and the number of samples. We demonstrate the validity of our results under various examples, such as ergodic Markov chains, weakly ergodic inhomogeneous Markov chains, and controlled Markov chains with non-stationary Markov, episodic, and greedy controls. Lastly, we use these sample complexity bounds to establish concomitant ones for offline evaluation of stationary Markov control policies.

replace-cross Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

Authors: Zun Li, Marc Lanctot, Kevin R. McKee, Luke Marris, Ian Gemp, Daniel Hennes, Paul Muller, Kate Larson, Yoram Bachrach, Michael P. Wellman

Abstract: Opponent modeling methods typically involve two crucial steps: building a belief distribution over opponents' strategies, and exploiting this opponent model by playing a best response. However, existing approaches typically require domain-specific heurstics to come up with such a model, and algorithms for approximating best responses are hard to scale in large, imperfect information domains. In this work, we introduce a scalable and generic multiagent training regime for opponent modeling using deep game-theoretic reinforcement learning. We first propose Generative Best Respoonse (GenBR), a best response algorithm based on Monte-Carlo Tree Search (MCTS) with a learned deep generative model that samples world states during planning. This new method scales to large imperfect information domains and can be plug and play in a variety of multiagent algorithms. We use this new method under the framework of Policy Space Response Oracles (PSRO), to automate the generation of an \emph{offline opponent model} via iterative game-theoretic reasoning and population-based training. We propose using solution concepts based on bargaining theory to build up an opponent mixture, which we find identifying profiles that are near the Pareto frontier. Then GenBR keeps updating an \emph{online opponent model} and reacts against it during gameplay. We conduct behavioral studies where human participants negotiate with our agents in Deal-or-No-Deal, a class of bilateral bargaining games. Search with generative modeling finds stronger policies during both training time and test time, enables online Bayesian co-player prediction, and can produce agents that achieve comparable social welfare and Nash bargaining score negotiating with humans as humans trading among themselves.

replace-cross Trainability barriers and opportunities in quantum generative modeling

Authors: Manuel S. Rudolph, Sacha Lerch, Supanut Thanasilp, Oriel Kiss, Oxana Shaya, Sofia Vallecorsa, Michele Grossi, Zo\"e Holmes

Abstract: Quantum generative models provide inherently efficient sampling strategies and thus show promise for achieving an advantage using quantum hardware. In this work, we investigate the barriers to the trainability of quantum generative models posed by barren plateaus and exponential loss concentration. We explore the interplay between explicit and implicit models and losses, and show that using quantum generative models with explicit losses such as the KL divergence leads to a new flavour of barren plateaus. In contrast, the implicit Maximum Mean Discrepancy loss can be viewed as the expectation value of an observable that is either low-bodied and provably trainable, or global and untrainable depending on the choice of kernel. In parallel, we find that solely low-bodied implicit losses cannot in general distinguish high-order correlations in the target data, while some quantum loss estimation strategies can. We validate our findings by comparing different loss functions for modelling data from High-Energy-Physics.

replace-cross Faster Stochastic Algorithms for Minimax Optimization under Polyak--{\L}ojasiewicz Conditions

Authors: Lesi Chen, Boyuan Yao, Luo Luo

Abstract: This paper considers stochastic first-order algorithms for minimax optimization under Polyak--{\L}ojasiewicz (PL) conditions. We propose SPIDER-GDA for solving the finite-sum problem of the form $\min_x \max_y f(x,y)\triangleq \frac{1}{n} \sum_{i=1}^n f_i(x,y)$, where the objective function $f(x,y)$ is $\mu_x$-PL in $x$ and $\mu_y$-PL in $y$; and each $f_i(x,y)$ is $L$-smooth. We prove SPIDER-GDA could find an $\epsilon$-optimal solution within ${\mathcal O}\left((n + \sqrt{n}\,\kappa_x\kappa_y^2)\log (1/\epsilon)\right)$ stochastic first-order oracle (SFO) complexity, which is better than the state-of-the-art method whose SFO upper bound is ${\mathcal O}\big((n + n^{2/3}\kappa_x\kappa_y^2)\log (1/\epsilon)\big)$, where $\kappa_x\triangleq L/\mu_x$ and $\kappa_y\triangleq L/\mu_y$. For the ill-conditioned case, we provide an accelerated algorithm to reduce the computational cost further. It achieves $\tilde{{\mathcal O}}\big((n+\sqrt{n}\,\kappa_x\kappa_y)\log (\kappa_y/\epsilon) \log(1/\epsilon)\big)$ SFO upper bound when $\kappa_y \gtrsim \sqrt{n}$. Our ideas can also be applied to a more general setting where the objective function only satisfies the PL condition for one variable. Numerical experiments validate the superiority of proposed methods.

replace-cross On Meta-Prompting

Authors: Adrian de Wynter, Xun Wang, Qilong Gu, Si-Qing Chen

Abstract: Modern large language models (LLMs) are capable of interpreting input strings as instructions, or prompts, and carry out tasks based on them. Unlike traditional learners, LLMs cannot use back-propagation to obtain feedback, and condition their output in situ in a phenomenon known as in-context learning (ICL). Many approaches to prompting and pre-training these models involve the automated generation of these prompts, also known as meta-prompting, or prompting to obtain prompts. However, they do not formally describe the properties and behavior of the LLMs themselves. We propose a theoretical framework based on category theory to generalize and describe ICL and LLM behavior when interacting with users. Our framework allows us to obtain formal results around task agnosticity and equivalence of various meta-prompting approaches. Using our framework and experimental results we argue that meta-prompting is more effective than basic prompting at generating desirable outputs.

replace-cross 3D-LFM: Lifting Foundation Model

Authors: Mosam Dabhi, Laszlo A. Jeni, Simon Lucey

Abstract: The lifting of 3D structure and camera from 2D landmarks is at the cornerstone of the entire discipline of computer vision. Traditional methods have been confined to specific rigid objects, such as those in Perspective-n-Point (PnP) problems, but deep learning has expanded our capability to reconstruct a wide range of object classes (e.g. C3DPO and PAUL) with resilience to noise, occlusions, and perspective distortions. All these techniques, however, have been limited by the fundamental need to establish correspondences across the 3D training data -- significantly limiting their utility to applications where one has an abundance of "in-correspondence" 3D data. Our approach harnesses the inherent permutation equivariance of transformers to manage varying number of points per 3D data instance, withstands occlusions, and generalizes to unseen categories. We demonstrate state of the art performance across 2D-3D lifting task benchmarks. Since our approach can be trained across such a broad class of structures we refer to it simply as a 3D Lifting Foundation Model (3D-LFM) -- the first of its kind.

replace-cross A Review of Deep Learning Methods for Photoplethysmography Data

Authors: Guangkun Nie, Jiabao Zhu, Gongzheng Tang, Deyun Zhang, Shijia Geng, Qinghao Zhao, Shenda Hong

Abstract: Background: Photoplethysmography (PPG) is a non-invasive optical sensing technique widely used to capture hemodynamic information and is extensively deployed in both clinical monitoring systems and wearable devices. In recent years, the integration of deep learning has substantially advanced PPG signal analysis and broadened its applications across both healthcare and non-healthcare domains. Methods: We conducted a comprehensive review of studies applying deep learning to PPG data published between January 1, 2017 and December 31, 2025, retrieved from Google Scholar, PubMed, and Dimensions. The included studies were analyzed from three key perspectives: tasks, models, and data. Results: A total of 460 papers were included that applied deep learning techniques to PPG signal analysis. These studies span a wide range of application domains, including traditional physiological monitoring tasks such as cardiovascular assessment, as well as emerging applications such as sleep analysis, cross-modality signal reconstruction, and biometric identification. Conclusions: Deep learning has significantly advanced PPG signal analysis by enabling more effective extraction of physiological information. Compared with traditional machine learning approaches based on handcrafted features, deep learning methods generally achieve improved performance and provide greater flexibility in model development. Nevertheless, several challenges remain, including the limited availability of large-scale high-quality datasets, insufficient validation in real-world environments, and concerns regarding model interpretability, scalability, and computational efficiency. Addressing these challenges and exploring emerging research directions will be essential for further advancing deep learning-based PPG analysis.

replace-cross Combining Evidence Across Filtrations

Authors: Yo Joong Choe, Aaditya Ramdas

Abstract: In sequential anytime-valid inference, any admissible procedure must be based on e-processes: generalizations of test martingales that quantify the accumulated evidence against a composite null hypothesis at any stopping time. This paper proposes a method for combining e-processes constructed in different filtrations but for the same null. Although e-processes in the same filtration can be combined effortlessly (by averaging), e-processes in different filtrations cannot because their validity in a coarser filtration does not translate to a finer filtration. This issue arises in sequential tests of randomness and independence, as well as in the evaluation of sequential forecasters. We establish that a class of functions called adjusters can lift arbitrary e-processes across filtrations. The result yields a generally applicable "adjust-then-combine" procedure, which we demonstrate on the problem of testing randomness in real-world financial data. Furthermore, we prove a characterization theorem for adjusters that formalizes a sense in which using adjusters is necessary. There are two major implications. First, if we have a powerful e-process in a coarsened filtration, then we readily have a powerful e-process in the original filtration. Second, when we coarsen the filtration to construct an e-process, there is a logarithmic cost to recovering validity in the original filtration.

replace-cross Ayn: A Tiny yet Competitive Indian Legal Language Model Pretrained from Scratch

Authors: Mitodru Niyogi, Eric Gaussier, Arnab Bhattacharya

Abstract: Decoder-only Large Language Models (LLMs) are currently the model of choice for many Natural Language Processing (NLP) applications. Through instruction fine-tuning and prompting approaches, such LLMs have been efficiently used to solve both general and domain-specific tasks. However, they are costly to train and, to a certain extent, costly to use as well, and one can wonder whether LLMs can be replaced by domain-specific Tiny Language Models (TLMs), which typically contain less than 100M parameters. We address this question in this study by comparing the performance of an 88M TLM pretrained from scratch for 185 A100 hours on a specific domain with a domain-specific tokenizer (here, the Indian legal domain) with LLMs of various sizes between 1B and 8B for solving domain-specific tasks. We show in particular that our legal TLM, Ayn, can indeed outperform LLMs up to 80 times larger on the legal case judgment prediction task, rival LLMs up to 30 times larger on the summarization task, and still be competitive with these larger LLMs on general tasks.

replace-cross Federated Multi-Agent Mapping for Planetary Exploration

Authors: Tiberiu-Ioan Szatmari, Abhishek Cauligi

Abstract: Multi-agent robotic exploration stands to play an important role in space exploration as the next generation of robotic systems ventures to far-flung environments. A key challenge in this new paradigm will be to effectively share and utilize the vast amount of data generated onboard while operating in bandwidth-constrained regimes typical of space missions. Federated learning (FL) is a promising tool for bridging this gap. Drawing inspiration from the upcoming CADRE Lunar rover mission, we propose a federated multi-agent mapping approach that jointly trains a global map model across agents without transmitting raw data. Our method leverages implicit neural mapping to generate parsimonious, adaptable representations, reducing data transmission by up to 93.8% compared to raw maps. Furthermore, we enhance this approach with meta-initialization on Earth-based traversability datasets to significantly accelerate map convergence; reducing iterations required to reach target performance by 80% compared to random initialization. We demonstrate the efficacy of our approach on Martian terrains and glacier datasets, achieving downstream path planning F1 scores as high as 0.95 while outperforming on map reconstruction losses.

replace-cross Policy Iteration for Two-Player General-Sum Stochastic Stackelberg Games

Authors: Mikoto Kudo, Youhei Akimoto

Abstract: We address two-player general-sum stochastic Stackelberg games (SSGs), where the leader's policy is optimized considering the best-response follower whose policy is optimal for its reward under the leader. Existing policy gradient and value iteration approaches for SSGs do not guarantee monotone improvement in the leader's policy under the best-response follower. Consequently, their performance is not guaranteed when their limits are not stationary Stackelberg equilibria (SSEs), which do not necessarily exist. In this paper, we derive a policy improvement theorem for SSGs under the best-response follower and propose a novel policy iteration algorithm that guarantees monotone improvement in the leader's performance. Additionally, we introduce Pareto-optimality as an extended optimality of the SSE and prove that our method converges to the Pareto front when the leader is myopic.

replace-cross MetaGS: A Meta-Learned Gaussian-Phong Model for Out-of-Distribution 3D Scene Relighting

Authors: Yumeng He, Yunbo Wang, Xiaokang Yang

Abstract: Out-of-distribution (OOD) 3D relighting requires novel view synthesis under unseen lighting conditions that differ significantly from the observed images. Existing relighting methods, which assume consistent light source distributions between training and testing, often degrade in OOD scenarios. We introduce MetaGS to tackle this challenge from two perspectives. First, we propose a meta-learning approach to train 3D Gaussian splatting, which explicitly promotes learning generalizable Gaussian geometries and appearance attributes across diverse lighting conditions, even with biased training data. Second, we embed fundamental physical priors from the Blinn-Phong reflection model into Gaussian splatting, which enhances the decoupling of shading components and leads to more accurate 3D scene reconstruction. Results on both synthetic and real-world datasets demonstrate the effectiveness of MetaGS in challenging OOD relighting tasks, supporting efficient point-light relighting and generalizing well to unseen environment lighting maps.

replace-cross Projection Methods for Operator Learning and Universal Approximation

Authors: Emanuele Zappala

Abstract: We obtain a new universal approximation theorem for continuous (possibly nonlinear) operators on arbitrary Banach spaces using the Leray-Schauder mapping. Moreover, we introduce and study a method for operator learning in Banach spaces $L^p$ of functions with multiple variables, based on orthogonal projections on polynomial bases. We derive a universal approximation result for operators where we learn a linear projection and a finite dimensional mapping under some additional assumptions. For the case of $p=2$, we give some sufficient conditions for the approximation results to hold. This article serves as the theoretical framework for a deep learning methodology in operator learning.

replace-cross Nested Music Transformer: Sequentially Decoding Compound Tokens in Symbolic Music and Audio Generation

Authors: HaeJun Yoo, Hao-Wen Dong, Jongmin Jung, Dasaem Jeong

Abstract: Representing symbolic music with compound tokens, where each token consists of several different sub-tokens representing a distinct musical feature or attribute, offers the advantage of reducing sequence length. While previous research has validated the efficacy of compound tokens in music sequence modeling, predicting all sub-tokens simultaneously can lead to suboptimal results as it may not fully capture the interdependencies between them. We introduce the Nested Music Transformer (NMT), an architecture tailored for decoding compound tokens autoregressively, similar to processing flattened tokens, but with low memory usage. The NMT consists of two transformers: the main decoder that models a sequence of compound tokens and the sub-decoder for modeling sub-tokens of each compound token. The experiment results showed that applying the NMT to compound tokens can enhance the performance in terms of better perplexity in processing various symbolic music datasets and discrete audio tokens from the MAESTRO dataset.

replace-cross Exploring Low-Dimensional Subspaces in Diffusion Models for Controllable Image Editing

Authors: Siyi Chen, Huijie Zhang, Minzhe Guo, Yifu Lu, Peng Wang, Qing Qu

Abstract: Recently, diffusion models have emerged as a powerful class of generative models. Despite their success, there is still limited understanding of their semantic spaces. This makes it challenging to achieve precise and disentangled image generation without additional training, especially in an unsupervised way. In this work, we improve the understanding of their semantic spaces from intriguing observations: among a certain range of noise levels, (1) the learned posterior mean predictor (PMP) in the diffusion model is locally linear, and (2) the singular vectors of its Jacobian lie in low-dimensional semantic subspaces. We provide a solid theoretical basis to justify the linearity and low-rankness in the PMP. These insights allow us to propose an unsupervised, single-step, training-free LOw-rank COntrollable image editing (LOCO Edit) method for precise local editing in diffusion models. LOCO Edit identified editing directions with nice properties: homogeneity, transferability, composability, and linearity. These properties of LOCO Edit benefit greatly from the low-dimensional semantic subspace. Our method can further be extended to unsupervised or text-supervised editing in various text-to-image diffusion models (T-LOCO Edit). Finally, extensive empirical experiments demonstrate the effectiveness and efficiency of LOCO Edit.

replace-cross Nonlinear Gaussian process tomography with imposed non-negativity constraints on physical quantities for plasma diagnostics

Authors: Kenji Ueda, Masaki Nishiura

Abstract: We propose a novel tomographic method, nonlinear Gaussian process tomography (nonlinear GPT), that uses the Laplace approximation to impose constraints on non-negative physical quantities, such as the emissivity in plasma optical diagnostics. While positive-valued posteriors have previously been introduced through sampling-based approaches in the original GPT method, our alternative approach implements a logarithmic Gaussian process (log-GP) for faster computation and more natural enforcement of non-negativity. The effectiveness of the proposed log-GP tomography is demonstrated through a case study using the Ring Trap 1 (RT-1) device, where log-GPT outperforms existing methods, standard GPT, and the Minimum Fisher Information (MFI) methods in terms of reconstruction accuracy. The results highlight the effectiveness of nonlinear GPT for imposing physical constraints in applications to an inverse problem.

replace-cross Nonlinear Bayesian Doppler Tomography for Simultaneous Reconstruction of Flow and Temperature

Authors: Kenji Ueda, Masaki. Nishiura

Abstract: We present a nonlinear Bayesian tomographic framework for Doppler spectral imaging that enables simultaneous reconstruction of emissivity, ion temperature, and flow velocity from line-integrated spectra. The method employs nonlinear Gaussian process tomography (GPT) with a Laplace approximation while retaining the full Doppler forward model. A log-Gaussian process prior stabilizes the velocity reconstruction in low-emissivity regions where Doppler information becomes weak, preventing the unphysical divergence of velocity estimates commonly encountered in conventional spectral tomography. The reconstruction method is verified using synthetic phantom data and applied to coherence imaging spectroscopy (CIS) measurements in the RT-1 device, resolving spatial structures of ion temperature and toroidal ion flow characteristic of magnetospheric plasma in the RT-1 device. The framework extends existing CIS tomography to regimes with strong flows and large temperature variations and provides a general Bayesian approach for Doppler spectral tomography that can be integrated with complementary spectroscopic diagnostics.

replace-cross MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data

Authors: Chika Maduabuchi, Ericmoore Jossou, Matteo Bucci

Abstract: High-speed video (HSV) phase detection (PD) segmentation is crucial for monitoring vapor, liquid, and microlayer phases in industrial processes. While CNN-based models like U-Net have shown success in simplified shadowgraphy-based two-phase flow (TPF) analysis, their application to complex HSV PD tasks remains unexplored, and vision foundation models (VFMs) have yet to address the complexities of either shadowgraphy-based or PD TPF video segmentation. Existing uncertainty quantification (UQ) methods lack pixel-level reliability for critical metrics like contact line density and dry area fraction, and the absence of large-scale, multimodal experimental datasets tailored to PD segmentation further impedes progress. To address these gaps, we propose MSEG-VCUQ. This hybrid framework integrates U-Net CNNs with the transformer-based Segment Anything Model (SAM) to achieve enhanced segmentation accuracy and cross-modality generalization. Our approach incorporates systematic UQ for robust error assessment and introduces the first open-source multimodal HSV PD datasets. Empirical results demonstrate that MSEG-VCUQ outperforms baseline CNNs and VFMs, enabling scalable and reliable PD segmentation for real-world boiling dynamics.

replace-cross RoboMD: Uncovering Robot Vulnerabilities through Semantic Potential Fields

Authors: Som Sagar, Jiafei Duan, Sreevishakh Vasudevan, Yifan Zhou, Heni Ben Amor, Dieter Fox, Ransalu Senanayake

Abstract: Robot manipulation policies, while central to the promise of physical AI, are highly vulnerable in the presence of external variations in the real world. Diagnosing these vulnerabilities is hindered by two key challenges: (i) the relevant variations to test against are often unknown, and (ii) direct testing in the real world is costly and unsafe. We introduce a framework that tackles both issues by learning a separate deep reinforcement learning (deep RL) policy for vulnerability prediction through virtual runs on a continuous vision-language embedding trained with limited success-failure data. By treating this embedding space, which is rich in semantic and visual variations, as a potential field, the policy learns to move toward vulnerable regions while being repelled from success regions. This vulnerability prediction policy, trained on virtual rollouts, enables scalable and safe vulnerability analysis without expensive physical trials. By querying this policy, our framework builds a probabilistic vulnerability-likelihood map. Experiments across simulation benchmarks and a physical robot arm show that our framework uncovers up to 23% more unique vulnerabilities than state-of-the-art vision-language baselines, revealing subtle vulnerabilities overlooked by heuristic testing. Additionally, we show that fine-tuning the manipulation policy with the vulnerabilities discovered by our framework improves manipulation performance with much less fine-tuning data.

replace-cross VisionZip: Longer is Better but Not Necessary in Vision Language Models

Authors: Senqiao Yang, Yukang Chen, Zhuotao Tian, Chengyao Wang, Jingyao Li, Bei Yu, Jiaya Jia

Abstract: Recent advancements in vision-language models have enhanced performance by increasing the length of visual tokens, making them much longer than text tokens and significantly raising computational costs. However, we observe that the visual tokens generated by popular vision encoders, such as CLIP and SigLIP, contain significant redundancy. To address this, we introduce VisionZip, a simple yet effective method that selects a set of informative tokens for input to the language model, reducing visual token redundancy and improving efficiency while maintaining model performance. The proposed VisionZip can be widely applied to image and video understanding tasks and is well-suited for multi-turn dialogues in real-world scenarios, where previous methods tend to underperform. Experimental results show that VisionZip outperforms the previous state-of-the-art method by at least 5% performance gains across nearly all settings. Moreover, our method significantly enhances model inference speed, improving the prefilling time by 8x and enabling the LLaVA-Next 13B model to infer faster than the LLaVA-Next 7B model while achieving better results. Furthermore, we analyze the causes of this redundancy and encourage the community to focus on extracting better visual features rather than merely increasing token length. Our code is available at https://github.com/dvlab-research/VisionZip .

URLs: https://github.com/dvlab-research/VisionZip

replace-cross Improved Approximation Algorithms for Orthogonally Constrained Problems Using Semidefinite Optimization

Authors: Ryan Cory-Wright, Jean Pauphilet

Abstract: Building on the blueprint from Goemans and Williamson (1995) for the Max-Cut problem, we construct a polynomial-time approximation algorithm for orthogonally constrained quadratic optimization problems. First, we derive a semidefinite relaxation and propose a randomized rounding algorithm to generate feasible solutions from the relaxation. Second, we derive constant-factor approximation guarantees for our algorithm. When optimizing for $m$ orthonormal vectors in dimension $n$, we leverage strong duality and semidefinite complementary slackness to show that our algorithm achieves a $1/3$-approximation ratio. For any $m$ of the form $2^q$ for some integer $q$, we also construct an instance where the performance of our algorithm is exactly $(m+2)/(3m)$, which can be made arbitrarily close to $1/3$ by taking $m \rightarrow + \infty$, hence showing that our analysis is tight.

replace-cross Amortized Bayesian Mixture Models

Authors: \v{S}imon Kucharsk\'y, Paul Christian B\"urkner

Abstract: Finite mixtures are a broad class of models useful in scenarios where observed data is generated by multiple distinct processes but without explicit information about the responsible process for each data point. Estimating Bayesian mixture models is computationally challenging due to issues such as high-dimensional posterior inference and label switching. Furthermore, traditional methods such as MCMC are applicable only if the likelihoods for each mixture component are analytically tractable. Amortized Bayesian Inference (ABI) is a simulation-based framework for estimating Bayesian models using generative neural networks. This allows the fitting of models without explicit likelihoods, and provides fast inference. ABI is therefore an attractive framework for estimating mixture models. This paper introduces a novel extension of ABI tailored to mixture models. We factorize the posterior into a distribution of the parameters and a distribution of (categorical) mixture indicators, which allows us to use a combination of generative neural networks for parameter inference, and classification networks for mixture membership identification. The proposed framework accommodates both independent and dependent mixture models, enabling filtering and smoothing. We validate and demonstrate our approach through synthetic and real-world datasets.

replace-cross Fairness-aware Contextual Dynamic Pricing with Strategic Buyers

Authors: Pangpang Liu, Will Wei Sun

Abstract: Contextual pricing strategies are prevalent in online retailing, where the seller adjusts prices based on products' attributes and buyers' characteristics. Although such strategies can enhance seller's profits, they raise concerns about fairness when significant price disparities emerge among specific groups, such as gender or race. These disparities can lead to adverse perceptions of fairness among buyers and may even violate the law and regulation. In contrast, price differences can incentivize disadvantaged buyers to strategically manipulate their group identity to obtain a lower price. In this paper, we investigate contextual dynamic pricing with fairness constraints, taking into account buyers' strategic behaviors when their group status is private and unobservable from the seller. We propose a dynamic pricing policy that simultaneously achieves price fairness and discourages strategic behaviors. Our policy achieves an upper bound of $O(\sqrt{T}+H(T))$ regret over $T$ time horizons, where the term $H(T)$ arises from buyers' assessment of the fairness of the pricing policy based on their learned price difference. When buyers are able to learn the fairness of the price policy, this upper bound reduces to $O(\sqrt{T})$. We also prove an $\Omega(\sqrt{T})$ regret lower bound of any pricing policy under our problem setting. We support our findings with extensive experimental evidence, showcasing our policy's effectiveness. In our real data analysis, we observe the existence of price discrimination against race in the loan application even after accounting for other contextual information. Our proposed pricing policy demonstrates a significant improvement, achieving 35.06% reduction in regret compared to the benchmark policy.

replace-cross ELASTIC: Efficient Once For All Iterative Search for Object Detection on Microcontrollers

Authors: Tony Tran, Qin Lin, Bin Hu

Abstract: Deploying high-performance object detectors on TinyML platforms poses significant challenges due to tight hardware constraints and the modular complexity of modern detection pipelines. Neural Architecture Search (NAS) offers a path toward automation, but existing methods either restrict optimization to individual modules, sacrificing cross-module synergy, or require global searches that are computationally intractable. We propose ELASTIC (Efficient Once for AlL IterAtive Search for ObjecT DetectIon on MiCrocontrollers), a unified, hardware-aware NAS framework that alternates optimization across modules (e.g., backbone, neck, and head) in a cyclic fashion. ELASTIC introduces a novel Population Passthrough mechanism in evolutionary search that retains high-quality candidates between search stages, yielding faster convergence, up to an 8% final mAP gain, and eliminates search instability observed without population passthrough. In a controlled comparison, empirical results show ELASTIC achieves +4.75% higher mAP and 2x faster convergence than progressive NAS strategies on SVHN, and delivers a +9.09% mAP improvement on PascalVOC given the same search budget. ELASTIC achieves 72.3% mAP on PascalVOC, outperforming MCUNET by 20.9% and TinyissimoYOLO by 16.3%. When deployed on MAX78000/MAX78002 microcontrollers, ELASTICderived models outperform Analog Devices' TinySSD baselines, reducing energy by up to 71.6%, lowering latency by up to 2.4x, and improving mAP by up to 6.99 percentage points across multiple datasets. The experimental videos and codes are available on the project website (https://nail-uh.github.io/elastic.github.io/).

URLs: https://nail-uh.github.io/elastic.github.io/).

replace-cross Boosting Large Language Models with Mask Fine-Tuning

Authors: Mingyuan Zhang, Yue Bai, Huan Wang, Yizhou Wang, Qihua Dong, Yitian Zhang, Yun Fu

Abstract: The large language model (LLM) is typically integrated into the mainstream optimization protocol. No work has questioned whether maintaining the model integrity is \textit{indispensable} for promising performance. In this work, we introduce Mask Fine-Tuning (MFT), a novel LLM fine-tuning paradigm demonstrating that carefully breaking the model's structural integrity can surprisingly improve performance without updating model weights. MFT learns and applies binary masks to well-optimized models, using the standard LLM fine-tuning objective as supervision. Based on fully fine-tuned models, MFT uses the same fine-tuning datasets to achieve consistent performance gains across domains and backbones (e.g., an average gain of \textbf{2.70 / 4.15} in IFEval with LLaMA2-7B / 3.1-8B). Detailed ablation studies and analyses examine the proposed MFT from different perspectives, such as sparse ratio and loss surface. Additionally, by deploying it on well-trained models, MFT is compatible with collaborating with other LLM optimization procedures to enhance the general model. Furthermore, this study extends the functionality of the masking operation beyond its conventional network-pruning context for model compression to a broader model capability scope.

replace-cross The Big Send-off: Scalable and Performant Collectives for Deep Learning

Authors: Siddharth Singh, Keshav Pradeep, Mahua Singh, Cunyang Wei, Abhinav Bhatele

Abstract: Collective communication is becoming increasingly important in data center and supercomputer workloads with an increase in distributed AI related jobs. However, existing libraries that provide collective support such as NCCL, RCCL, and Cray-MPICH exhibit several performance and scalability limitations on modern GPU supercomputers. To address these challenges, we introduce the Performant Collective Communication Library (PCCL), specifically targeted for distributed deep learning (DL) workloads. PCCL provides highly optimized implementations of key collectives used in distributed DL: all-gather, reduce-scatter, and all-reduce. PCCL uses a hierarchical design with learning-based adaptive selection of the best performing algorithms to scale efficiently to thousands of GPUs. It achieves substantial performance speedups over RCCL on 2048 GCDs of Frontier -- up to 168x for reduce-scatter, 33x for all-gather and 10x for all-reduce. More modest but still significant gains up to 5.7x over NCCL are observed on Perlmutter. These gains translate directly to performance improvement of production DL workloads: up to 4.9x speedup over RCCL in DeepSpeed ZeRO-3 training, and up to 2.4x speedup in DDP training.

replace-cross A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

Authors: Rui Xin, Niloofar Mireshghallah, Shuyue Stella Li, Michael Duan, Hyunwoo Kim, Yejin Choi, Yulia Tsvetkov, Sewoong Oh, Pang Wei Koh

Abstract: Sanitizing sensitive text data typically involves removing personally identifiable information (PII) or generating synthetic data under the assumption that these methods adequately protect privacy; however, their effectiveness is often only assessed by measuring the leakage of explicit identifiers but ignoring nuanced textual markers that can lead to re-identification. We challenge the above illusion of privacy by proposing a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release. Our approach shows that seemingly innocuous auxiliary information -- such as routine social activities -- can be used to infer sensitive attributes like age or substance use history from sanitized data. For instance, we demonstrate that Azure's commercial PII removal tool fails to protect 74\% of information in the MedQA dataset. Although differential privacy mitigates these risks to some extent, it significantly reduces the utility of the sanitized text for downstream tasks. Our findings indicate that current sanitization techniques offer a \textit{false sense of privacy}, highlighting the need for more robust methods that protect against semantic-level information leakage.

replace-cross Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations

Authors: Jinming Hu, Hassan Nawaz, Yi-Fan Hou, Yuting Rui, Lijie Chi, Yuxinxin Chen, Arif Ullah, Pavlo O. Dral

Abstract: We have developed Aitomia - a platform powered by AI to assist in performing AI-driven atomistic and quantum chemical (QC) simulations. This evolving intelligent assistant platform is equipped with chatbots and AI agents to help experts and guide non-experts in setting up and running atomistic simulations, analyzing simulation results, and summarizing them for the user in both textual and graphical forms. Aitomia combines LLM-based agents with the MLatom platform to support AI-driven atomistic simulations as well as conventional quantum-chemical calculations, including DFT, semiempirical methods such as GFN2-xTB, and selected high-level wavefunction-based methods, through interfaces to widely used programs such as Gaussian, ORCA, PySCF, and xtb, covering tasks from ground-state and excited-state calculations to geometry optimization, thermochemistry, and spectra simulations. The multi-agent implementation enables autonomous execution of complex computational workflows, such as reaction enthalpy calculations. Aitomia was the first intelligent assistant publicly launched on cloud computing platforms for broad-scope atomistic simulations (Aitomistic Lab@XMU at https://atom.xmu.edu.cn and Aitomistic Hub at https://aitomistic.xyz). Aitomia lowers the barrier to performing atomistic simulations, thereby democratizing simulations and accelerating research and development in relevant fields.

URLs: https://atom.xmu.edu.cn, https://aitomistic.xyz).

replace-cross Incentivize Contribution and Learn Parameters Too: Federated Learning with Strategic Data Owners

Authors: Drashthi Doshi, Aditya Vema Reddy Kesari, Avishek Ghosh, Swaprava Nath, Suhas S Kowshik

Abstract: Classical federated learning (FL) assumes that the clients have a limited amount of noisy data with which they voluntarily participate and contribute towards learning a global, more accurate model in a principled manner. The learning happens in a distributed fashion without sharing the data with the center. However, these methods do not consider the incentive of an agent for participating and contributing to the process, given that data collection and running a distributed algorithm is costly for the clients. The question of rationality of contribution has been asked recently in the literature and some results exist that consider this problem. This paper addresses the question of simultaneous parameter learning and incentivizing contribution in a truthful manner, which distinguishes it from the extant literature. Our first mechanism incentivizes each client to contribute to the FL process at a Nash equilibrium and simultaneously learn the model parameters. We also ensure that agents are incentivized to truthfully reveal information in the intermediate stages of the algorithm. However, this equilibrium outcome can be away from the optimal, where clients contribute with their full data and the algorithm learns the optimal parameters. We propose a second mechanism that enables the full data contribution along with optimal parameter learning. Large scale experiments with real (federated) datasets (CIFAR-10, FEMNIST, and Twitter) show that these algorithms converge quite fast in practice, yield good welfare guarantees and better model performance for all agents.

replace-cross Beyond Either-Or Reasoning: Transduction and Induction as Cooperative Problem-Solving Paradigms

Authors: Janis Zenkner, Tobias Sesterhenn, Christian Bartelt

Abstract: Traditionally, in Programming-by-example (PBE) the goal is to synthesize a program from a small set of input-output examples. Lately, PBE has gained traction as a few-shot reasoning benchmark, relaxing the requirement to produce a program artifact altogether which allows transductive methods to directly the missing output sample. Transduction and induction are complementary reasoning modes--where induction derives general rules from examples, transduction leverages the examples directly to infer specific outputs without intermediate generalization. Yet existing approaches either treat them as mutually exclusive or couple them in hybrid structures where one paradigm dictates a fixed trajectory for the other -- undermining the latter's reasoning potential and creating cascading errors. We move away from these hierarchical models and introduce cooperative transductive-inductive problem solving: by interleaving both reasoning modes and ensuring neither unconditionally dominates the other, we preserve the search autonomy and reasoning capacity of each paradigm. We instantiate this concept in TIIPS. Across three PBE domains, TIIPS consistently outperforms state-of-the-art baselines and generates programs that more closely mirror ground-truth trajectories in both syntax and semantics, indicating a better match to the intended program behavior. Our findings highlight cooperative reasoning as a promising new direction for harnessing the full power of symbolic, inductive and neural, transductive reasoning.

replace-cross Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs

Authors: Amr Hegazy, Mostafa Elhoushi, Amr Alanwar

Abstract: Controlling undesirable Large Language Model (LLM) behaviors, such as the generation of unsafe content or failing to adhere to safety guidelines, often relies on costly fine-tuning. Activation steering provides an alternative for inference-time control, but existing methods typically lack fine-grained, adaptive mechanisms. We introduce a novel approach using a lightweight, trainable controller network integrated during inference. This controller network observes specific intermediate LLM activations and predicts both a global scaling factor and layer-specific weights. The predicted global scaling factor and layer-specific weights then dynamically modulate the intensity of a steering patch, derived from a pre-computed "refusal direction" vector, applied across the LLM's layers during generation. Trained on activations from both harmful and benign prompts, our controller learns to discriminatively apply nuanced, layer-aware interventions, activating steering primarily for harmful inputs. Experiments using safety benchmarks like ToxicChat & In-The-Wild Jailbreak Prompts demonstrate that our weighted steering controller significantly increases refusal rates compared to the base LLM, achieving targeted behavioral modification without altering the original model parameters. Our experiments with Llama-3.1-8B, Llama-3.2-1B & Mistral-7B show our approach outperforms existing methods, presenting an efficient and adaptive method for fine-grained control over LLM behavior at inference time.

replace-cross Unified Text-Image-to-Video Generation: A Training-Free Approach to Flexible Visual Conditioning

Authors: Bolin Lai, Sangmin Lee, Xu Cao, Xiang Li, James M. Rehg

Abstract: Text-image-to-video (TI2V) generation is a critical problem for controllable video generation using both semantic and visual conditions. Most existing methods typically add visual conditions to text-to-video (T2V) foundation models by finetuning, which is costly in resources and only limited to a few pre-defined conditioning settings. To tackle these constraints, we introduce a unified formulation for TI2V generation with flexible visual conditioning. Furthermore, we propose an innovative training-free approach, dubbed FlexTI2V, that can condition T2V foundation models on an arbitrary amount of images at arbitrary positions. Specifically, we firstly invert the condition images to noisy representation in a latent space. Then, in the denoising process of T2V models, our method uses a novel random patch swapping strategy to incorporate visual features into video representations through local image patches. To balance creativity and fidelity, we use a dynamic control mechanism to adjust the strength of visual conditioning to each video frame. Extensive experiments validate that our method surpasses previous training-free image conditioning methods by a notable margin. Our method can also generalize to both UNet-based and transformer-based architectures.

replace-cross Stable Thompson Sampling: Valid Inference via Variance Inflation

Authors: Budhaditya Halder, Shubhayan Pan, Koulik Khamaru

Abstract: We consider the problem of statistical inference when the data is collected via a Thompson Sampling-type algorithm. While Thompson Sampling (TS) is known to be both asymptotically optimal and empirically effective, its adaptive sampling scheme poses challenges for constructing confidence intervals for model parameters. We propose and analyze a variant of TS, called Stable Thompson Sampling, in which the posterior variance is inflated by a logarithmic factor. We show that this modification leads to asymptotically normal estimates of the arm means, despite the non-i.i.d. nature of the data. Importantly, this statistical benefit comes at a modest cost: the variance inflation increases regret by only a logarithmic factor compared to standard TS. Our results reveal a principled trade-off: by paying a small price in regret, one can enable valid statistical inference for adaptive decision-making algorithms.

replace-cross Quantifying task-relevant representational similarity using decision variable correlation

Authors: Yu Eric Qian, Wilson S. Geisler, Xue-Xin Wei

Abstract: Previous studies have compared neural activities in the visual cortex to representations in deep neural networks trained on image classification. Interestingly, while some suggest that their representations are highly similar, others argued the opposite. Here, we propose a new approach to characterize the similarity of the decision strategies of two observers (models or brains) using decision variable correlation (DVC). DVC quantifies the image-by-image correlation between the decoded decisions based on the internal neural representations in a classification task. Thus, it can capture task-relevant information rather than general representational alignment. We evaluate DVC using monkey V4/IT recordings and network models trained on image classification tasks. We find that model-model similarity is comparable to monkey-monkey similarity, whereas model-monkey similarity is consistently lower. Strikingly, DVC decreases with increasing network performance on ImageNet-1k. Adversarial training does not improve model-monkey similarity in task-relevant dimensions assessed using DVC, although it markedly increases the model-model similarity. Similarly, pre-training on larger datasets does not improve model-monkey similarity. These results suggest a divergence between the task-relevant representations in monkey V4/IT and those learned by models trained on image classification tasks.

replace-cross Enhancing Sample Efficiency in Multi-Agent RL with Uncertainty Quantification and Selective Exploration

Authors: Tom Danino, Nahum Shimkin

Abstract: Multi-agent reinforcement learning (MARL) methods have achieved state-of-the-art results on a range of multi-agent tasks. Yet, MARL algorithms typically require significantly more environment interactions than their single-agent counterparts to converge, a problem exacerbated by the difficulty in exploring over a large joint action space and the high variance intrinsic to MARL environments. To tackle these issues, we propose a novel algorithm that combines a decomposed centralized critic with decentralized ensemble learning, incorporating several key contributions. The main component in our scheme is a selective exploration method that leverages ensemble kurtosis. We extend the global decomposed critic with a diversity-regularized ensemble of individual critics and utilize its excess kurtosis to guide exploration toward high-uncertainty states and actions. To improve sample efficiency, we train the centralized critic with a novel truncated variation of the TD($\lambda$) algorithm, enabling efficient off-policy learning with reduced variance. On the actor side, our suggested algorithm adapts the mixed samples approach to MARL, mixing on-policy and off-policy loss functions for training the actors. This approach balances between stability and efficiency and outperforms purely off-policy learning. The evaluation shows our method outperforms state-of-the-art baselines on standard MARL benchmarks, including a variety of SMAC II maps.

replace-cross NetArena: Dynamic Benchmarks for AI Agents in Network Automation

Authors: Yajie Zhou, Jiajun Ruan, Eric S. Wang, Sadjad Fouladi, Francis Y. Yan, Kevin Hsieh, Zaoxing Liu

Abstract: As AI agents expand into high-stakes domains like network system operations, evaluating their real-world reliability becomes increasingly critical. However, existing benchmarks risk contamination due to static design, show high statistical variance from limited dataset size, and fail to reflect the complexity of production environments. We present NetArena, a dynamic benchmark generation framework for network applications. NetArena introduces a novel abstraction and unified interface that generalize across diverse tasks, enabling dynamic benchmarking despite the heterogeneity of network workloads. At runtime, users can generate unlimited queries on demand. NetArena integrates with network emulators to measure correctness, safety, and latency during execution. We demonstrate NetArena on three representative applications and find that (1) NetArena significantly improves statistical reliability across AI agents, reducing confidence-interval overlap from 85% to 0, (2) agents achieve only 13-38% average performance (as low as 3%) for large-scale, realistic queries, and (3) it exposes more fine-grained behaviors that static, correctness-only benchmarks miss. NetArena also enables use cases such as SFT and RL fine-tuning on network system tasks. Code is available at https://github.com/Froot-NetSys/NetArena.

URLs: https://github.com/Froot-NetSys/NetArena.

replace-cross Model-based Implicit Neural Representation for sub-wavelength Radio Localization

Authors: Baptiste Chatelier (IETR, INSA Rennes, MERCE-France), Vincent Corlay (MERCE-France), Musa Furkan Keskin (INSA Rennes, IETR), Matthieu Crussi\`ere (INSA Rennes, IETR), Henk Wymeersch (INSA Rennes, IETR), Luc Le Magoarou (INSA Rennes, IETR)

Abstract: The increasing deployment of large antenna arrays at base stations has significantly improved the spatial resolution and localization accuracy of radio-localization methods. However, traditional signal processing techniques struggle in complex radio environments, particularly in scenarios dominated by non line of sight (NLoS) propagation paths, resulting in degraded localization accuracy. Recent developments in machine learning have facilitated the development of machine learning-assisted localization techniques, enhancing localization accuracy in complex radio environments. However, these methods often involve substantial computational complexity during both the training and inference phases. This work extends the well-established fingerprinting-based localization framework by simultaneously reducing its memory requirements and improving its accuracy. Specifically, a model-based neural network is used to learn the location-to-channel mapping, and then serves as a generative neural channel model. This generative model augments the fingerprinting comparison dictionary while reducing the memory requirements. The proposed method outperforms fingerprinting baselines by achieving sub-wavelength localization accuracy, even in complex static NLoS environments. Remarkably, it offers an improvement by several orders of magnitude in localization accuracy, while simultaneously reducing memory requirements by an order of magnitude compared to classical fingerprinting methods.

replace-cross Delphos: A reinforcement learning framework for assisting discrete choice model specification

Authors: Gabriel Nova, Stephane Hess, Sander van Cranenburgh

Abstract: We introduce Delphos, a deep reinforcement learning framework for assisting the discrete choice model specification process. Delphos aims to support the modeller by providing automated, data-driven suggestions for utility specifications, thereby reducing the effort required to develop and refine utility functions. Delphos conceptualises model specification as a sequential decision-making problem, inspired by the way human choice modellers iteratively construct models through a series of reasoned specification decisions. In this setting, an agent learns to specify high-performing candidate models by choosing a sequence of modelling actions, such as selecting variables, accommodating both generic and alternative-specific taste parameters, applying non-linear transformations, and including interactions with covariates, while interacting with a modelling environment that estimates each candidate and returns a reward signal. Specifically, Delphos uses a Deep Q-Network that receives delayed rewards based on modelling outcomes (e.g., log-likelihood) and behavioural expectations (e.g., parameter signs), and distributes this signal across the sequence of actions to learn which modelling decisions lead to well-performing candidates. We evaluate Delphos on both simulated and empirical datasets using multiple reward settings. In simulated cases, learning curves, Q-value patterns, and performance metrics show that the agent learns to adaptively explore strategies to propose well-performing models across search spaces, while covering only a small fraction of the feasible modelling space. We further apply the framework to two empirical datasets to demonstrate its practical use. These experiments illustrate the ability of Delphos to generate competitive, behaviourally plausible models and highlight the potential of this adaptive, learning-based framework to assist the model specification process.

replace-cross AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models

Authors: Arpita Chowdhury, Zheda Mai, Zihe Wang, Sooyoung Jeon, Lemeng Wang, Jiacheng Hou, Wei-Lun Chao

Abstract: The rise of vision foundation models (VFMs) calls for systematic evaluation. A common approach pairs VFMs with large language models (LLMs) as general-purpose heads, followed by evaluation on broad Visual Question Answering (VQA) benchmarks. However, this protocol has two key blind spots: (i) the instruction tuning data may not align with VQA test distributions, meaning a wrong prediction can stem from such data mismatch rather than a VFM' visual shortcomings; (ii) VQA benchmarks often require multiple visual abilities, making it hard to tell whether errors stem from lacking all required abilities or just a single critical one. To address these gaps, we introduce AVA-Bench, the first benchmark that explicitly disentangles 14 Atomic Visual Abilities (AVAs) -- foundational skills like localization, depth estimation, and spatial understanding that collectively support complex visual reasoning tasks. By decoupling AVAs and matching training and test distributions within each, AVA-Bench pinpoints exactly where a VFM excels or falters. Applying AVA-Bench to leading VFMs thus reveals distinctive "ability fingerprints," turning VFM selection from educated guesswork into principled engineering. Notably, we find that a 0.5B LLM yields similar VFM rankings as a 7B LLM while cutting GPU hours by 8x, enabling more efficient evaluation. By offering a comprehensive and transparent benchmark, we hope AVA-Bench lays the foundation for the next generation of VFMs.

replace-cross Convergence and clustering analysis for Mean Shift with radially symmetric, positive definite kernels

Authors: Susovan Pal

Abstract: The mean shift (MS) is a non-parametric, density-based, iterative algorithm with prominent usage in clustering and image segmentation. A rigorous proof for the convergence of its mode estimate sequence in full generality remains unknown. In this paper, we show that for\textit{ sufficiently large bandwidth} convergence is guaranteed in any dimension with \textit{any radially symmetric and strictly positive definite kernels}. Although the author acknowledges that our result is partially more restrictive than that of \cite{YT} due to the lower limit of the bandwidth, our kernel class is not covered by the kernel class in \cite{YT}, and the proof technique is different. Moreover, we show theoretically and experimentally that while for Gaussian kernel, accurate clustering at \textit{large bandwidths} is generally impossible, it may still be possible for other radially symmetric, strictly positive definite kernels.

replace-cross Disentangled Feature Importance

Authors: Jin-Hong Du, Kathryn Roeder, Larry Wasserman

Abstract: Feature importance (FI) measures are widely used to assess the contributions of predictors to an outcome, but they may target different notions of relevance. When predictors are correlated, traditional statistical FI methods are often tailored for feature selection and correlation can therefore be treated as conditional redundancy. By contrast, for model interpretation, FI is more naturally defined through marginal predictive relevance. In this context, we show that most existing approaches target identical population functionals under squared-error loss and exhibit correlation-induced bias. To address this limitation, we introduce Disentangled Feature Importance (DFI), a nonparametric generalization of the classical $R^2$ decomposition via canonical entropic optimal transport (EOT). DFI transforms correlated features into independent latent features using an EOT coupling for general covariate laws, including mixed and discrete settings. Importance scores are computed in this disentangled space and attributed back through the transition kernel's sensitivity. Under arbitrary feature dependencies, DFI provides a principled decomposition of latent importance scores that sum to the total predictive variability for latent additive models and to interaction-weighted functional ANOVA variances more generally. We develop semiparametric theory for DFI. Under the EOT formulation, we establish root-$n$ consistency and asymptotic normality for nondegenerate importance estimators in the latent space and the original feature space. Notably, our estimators achieve second-order estimation error, which vanishes if both regression function and EOT kernel estimation errors are $o_{\mathbb{P}}(n^{-1/4})$. By design, DFI avoids the computational burden of repeated submodel refitting and the challenges of conditional covariate distribution estimation, thereby achieving computational efficiency.

replace-cross Rationale-Enhanced Decoding for Multi-modal Chain-of-Thought

Authors: Shin'ya Yamaguchi, Kosuke Nishida, Daiki Chijiwa

Abstract: Large vision-language models (LVLMs) have demonstrated remarkable capabilities by integrating pre-trained vision encoders with large language models (LLMs). Similar to single-modal LLMs, chain-of-thought (CoT) prompting has been adapted for LVLMs to enhance multi-modal reasoning by generating intermediate rationales based on visual and textual inputs. While CoT is assumed to improve grounding and accuracy in LVLMs, our experiments reveal a key challenge: existing LVLMs often ignore the contents of generated rationales in CoT reasoning. To address this, we re-formulate multi-modal CoT reasoning as a KL-constrained reward maximization focused on rationale-conditional log-likelihood. As the optimal solution, we propose rationale-enhanced decoding (RED), a novel plug-and-play inference-time decoding strategy. RED harmonizes visual and rationale information by multiplying distinct image-conditional and rationale-conditional next token distributions. Extensive experiments show that RED consistently and significantly improves reasoning over standard CoT and other decoding methods across multiple benchmarks and LVLMs. Our work offers a practical and effective approach to improve both the faithfulness and accuracy of CoT reasoning in LVLMs, paving the way for more reliable rationale-grounded multi-modal systems. Code is available at https://github.com/yshinya6/red/.

URLs: https://github.com/yshinya6/red/.

replace-cross CircuitProbe: Tracing Visual Temporal Evidence Flow in Video Language Models

Authors: Yiming Zhang, Zhuokai Zhao, Chengzhang Yu, Kun Wang, Zhendong Chu, Qiankun Li, Zihan Chen, Yang Liu, Zenghui Ding, Yining Sun, Qingsong Wen

Abstract: Autoregressive large vision--language models (LVLMs) interface video and language by projecting video features into the LLM's embedding space as continuous visual token embeddings. However, it remains unclear where temporal evidence is represented and how it causally influences decoding. To address this gap, we present CircuitProbe, a circuit-level analysis framework that dissects the end-to-end video-language pathway through two stages: (i) Visual Auditing, which localizes object semantics within the projected video-token sequence and reveals their causal necessity via targeted ablations and controlled substitutions; and (ii) Semantic Tracing, which uses logit-lens probing to track the layer-wise emergence of object and temporal concepts, augmented with temporal frame interventions to assess sensitivity to temporal structure. Based on the resulting analysis, we design a targeted surgical intervention that strictly follows our observations: identifying temporally specialized attention heads and selectively amplifying them within the critical layer interval revealed by Semantic Tracing. This analysis-driven intervention yields consistent improvements (up to 2.4% absolute) on the temporal-heavy TempCompass benchmark, validating the correctness, effectiveness, and practical value of the proposed circuit-level analysis for temporal understanding in LVLMs.

replace-cross Cropping outperforms dropout as an augmentation strategy for self-supervised training of text embeddings

Authors: Rita Gonz\'alez-M\'arquez, Philipp Berens, Dmitry Kobak

Abstract: Text embeddings, i.e. vector representations of entire texts, play an important role in many NLP applications, such as retrieval-augmented generation, clustering, or visualizing collections of texts for data exploration. Currently, top-performing embedding models are derived from pre-trained language models via supervised contrastive fine-tuning. This fine-tuning strategy relies on an external notion of similarity and annotated data for generation of positive pairs. Here we study self-supervised fine-tuning and systematically compare the two most well-known augmentation strategies used for fine-tuning text embeddings models. We assess embedding quality on MTEB and additional in-domain evaluations and show that cropping augmentation strongly outperforms the dropout-based approach. We find that on out-of-domain data, the quality of resulting embeddings is substantially below the supervised state-of-the-art models, but for in-domain data, self-supervised fine-tuning can produce high-quality text embeddings after very short fine-tuning. Finally, we show that representation quality increases towards the last transformer layers, which undergo the largest change during fine-tuning; and that fine-tuning only those last layers is sufficient to reach similar embedding quality.

replace-cross Extending Foundational Monocular Depth Estimators to Fisheye Cameras with Calibration Tokens

Authors: Suchisrit Gangopadhyay, Jung-Hee Kim, Xien Chen, Patrick Rim, Hyoungseob Park, Alex Wong

Abstract: We propose a method to extend foundational monocular depth estimators (FMDEs), trained on perspective images, to fisheye images. Despite being trained on tens of millions of images, FMDEs are susceptible to the covariate shift introduced by changes in camera calibration (intrinsic, distortion) parameters, leading to erroneous depth estimates. Our method aligns the distribution of latent embeddings encoding fisheye images to those of perspective images, enabling the reuse of FMDEs for fisheye cameras without retraining or finetuning. To this end, we introduce a set of Calibration Tokens as a light-weight adaptation mechanism that modulates the latent embeddings for alignment. By exploiting the already expressive latent space of FMDEs, we posit that modulating their embeddings avoids the negative impact of artifacts and loss introduced in conventional recalibration or map projection to a canonical reference frame in the image space. Our method is self-supervised and does not require fisheye images but leverages publicly available large-scale perspective image datasets. This is done by recalibrating perspective images to fisheye images, and enforcing consistency between their estimates during training. We evaluate our approach with several FMDEs, on both indoors and outdoors, where we consistently improve over state-of-the-art methods using a single set of tokens for both. Code available at: https://github.com/JungHeeKim29/calibration-token.

URLs: https://github.com/JungHeeKim29/calibration-token.

replace-cross Sampling as Bandits: Evaluation-Efficient Design for Black-Box Densities

Authors: Takuo Matsubara, Andrew Duncan, Simon Cotter, Konstantinos Zygalakis

Abstract: We propose bandit importance sampling (BIS), a powerful importance sampling framework tailored for settings in which evaluating the target density is computationally expensive. BIS facilitates accurate sampling while minimizing the required number of target-density evaluations. In contrast to adaptive importance sampling, which optimizes a proposal distribution, BIS directly optimizes the set of samples through a sequential selection process driven by multi-armed bandits. BIS serves as a general framework that accommodates user-defined bandit strategies. Theoretically, the weak convergence of the weighted samples, and thus the consistency of the Monte Carlo estimator, is established regardless of the specific strategy employed. In this paper, we present a practical strategy that leverages Gaussian process surrogates to guide sample selection, adapting the principles of Bayesian optimization for sampling. Comprehensive numerical studies demonstrate the superior performance of BIS across multimodal, heavy-tailed distributions, and real-world Bayesian inference tasks involving Markov random fields.

replace-cross The Future of Artificial Intelligence and the Mathematical and Physical Sciences (AI+MPS)

Authors: Andrew Ferguson, Marisa LaFleur, Lars Ruthotto, Jesse Thaler, Yuan-Sen Ting, Pratyush Tiwary, Soledad Villar, E. Paulo Alves, Jeremy Avigad, Simon Billinge, Camille Bilodeau, Keith Brown, Emmanuel Candes, Arghya Chattopadhyay, Bingqing Cheng, Jonathan Clausen, Connor Coley, Andrew Connolly, Fred Daum, Sijia Dong, Chrisy Xiyu Du, Cora Dvorkin, Cristiano Fanelli, Eric B. Ford, Luis Manuel Frutos, Nicol\'as Garc\'ia Trillos, Cecilia Garraffo, Robert Ghrist, Rafael Gomez-Bombarelli, Gianluca Guadagni, Sreelekha Guggilam, Sergei Gukov, Juan B. Guti\'errez, Salman Habib, Johannes Hachmann, Boris Hanin, Philip Harris, Murray Holland, Elizabeth Holm, Hsin-Yuan Huang, Shih-Chieh Hsu, Nick Jackson, Olexandr Isayev, Heng Ji, Aggelos Katsaggelos, Jeremy Kepner, Yannis Kevrekidis, Michelle Kuchera, J. Nathan Kutz, Branislava Lalic, Ann Lee, Matt LeBlanc, Josiah Lim, Rebecca Lindsey, Yongmin Liu, Peter Y. Lu, Sudhir Malik, Vuk Mandic, Vidya Manian, Emeka P. Mazi, Pankaj Mehta, Peter Melchior, Brice M\'enard, Jennifer Ngadiuba, Stella Offner, Elsa Olivetti, Shyue Ping Ong, Christopher Rackauckas, Philippe Rigollet, Chad Risko, Philip Romero, Grant Rotskoff, Brett Savoie, Uros Seljak, David Shih, Gary Shiu, Dima Shlyakhtenko, Eva Silverstein, Taylor Sparks, Thomas Strohmer, Christopher Stubbs, Stephen Thomas, Suriyanarayanan Vaikuntanathan, Rene Vidal, Francisco Villaescusa-Navarro, Gregory Voth, Benjamin Wandelt, Rachel Ward, Melanie Weber, Risa Wechsler, Stephen Whitelam, Olaf Wiest, Mike Williams, Zhuoran Yang, Yaroslava G. Yingling, Bin Yu, Shuwen Yue, Ann Zabludoff, Huimin Zhao, Tong Zhang

Abstract: This community paper developed out of the NSF Workshop on the Future of Artificial Intelligence (AI) and the Mathematical and Physics Sciences (MPS), which was held in March 2025 with the goal of understanding how the MPS domains (Astronomy, Chemistry, Materials Research, Mathematical Sciences, and Physics) can best capitalize on, and contribute to, the future of AI. We present here a summary and snapshot of the MPS community's perspective, as of Spring/Summer 2025, in a rapidly developing field. The link between AI and MPS is becoming increasingly inextricable; now is a crucial moment to strengthen the link between AI and Science by pursuing a strategy that proactively and thoughtfully leverages the potential of AI for scientific discovery and optimizes opportunities to impact the development of AI by applying concepts from fundamental science. To achieve this, we propose activities and strategic priorities that: (1) enable AI+MPS research in both directions; (2) build up an interdisciplinary community of AI+MPS researchers; and (3) foster education and workforce development in AI for MPS researchers and students. We conclude with a summary of suggested priorities for funding agencies, educational institutions, and individual researchers to help position the MPS community to be a leader in, and take full advantage of, the transformative potential of AI+MPS.

replace-cross PREDICT-GBM: A multi-center platform to advance personalized glioblastoma radiotherapy planning

Authors: L. Zimmer, J. Weidner, M. Balcerak, F. Kofler, M. Krupa, I. Ezhov, S. Cepeda, R. Zhang, J. Lowengrub, B. Menze, B. Wiestler

Abstract: Glioblastoma recurrence is largely driven by diffuse infiltration beyond radiologically visible tumor margins, yet standard radiotherapy, the mainstay of glioblastoma treatment, relies on uniform expansions that ignore patient-specific biological and anatomical factors. While computational models promise to map this invisible growth and guide personalized treatment planning, their clinical translation is hindered by the lack of standardized, large-scale benchmarking and reproducible validation workflows. To bridge this gap, we present PREDICT-GBM, a comprehensive open-source platform that integrates a curated, longitudinal, multi-center dataset of 243 patients with a standardized evaluation pipeline, and fuels model development and validation. We demonstrate PREDICT-GBM's potential by training and benchmarking a novel U-Net-based recurrence prediction model against state-of-the-art biophysical and data-driven methods. Our results show that both biophysical and deep-learning approaches significantly outperform standard-of-care protocols in predicting future recurrence sites while maintaining iso-volumetric treatment constraints. Notably, our U-Net model achieved a superior coverage of enhancing recurrence (79.37 +/- 2.08 %), markedly surpassing the standard-of-care (paired Wilcoxon signed-rank test, p = 0.0000057). Furthermore, the biophysical model GliODIL reached 78.91 +/- 2.08 % (p = 0.00045), validating the platform's ability to compare diverse modeling paradigms. By providing the first rigorous, reproducible ecosystem for model training and validation, PREDICT-GBM eliminates a major bottleneck for personalized, computationally guided radiotherapy. This work establishes a new standard for developing computationally guided, personalized radiotherapy, with the platform, models, and data openly available at github.com/BrainLesion/PredictGBM

replace-cross Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning

Authors: Mohammed Sabry, Anya Belz

Abstract: Mechanism-targeted synthetic data is increasingly proposed as a way to steer pretraining toward desirable capabilities, but it remains unclear how such interventions should be evaluated. We study this question for in-context learning (ICL) under matched compute (iso-FLOPs) using Bi-Induct, a lightweight data rewrite that interleaves short directional copy snippets into a natural pretraining stream: forward-copy (induction), backward-copy (anti-induction, as a directional control), or a balanced mix. Across 0.13B-1B decoder-only models, we evaluate (i) few-shot performance on standard LM benchmarks and function-style ICL probes, (ii) head-level copy telemetry, and (iii) held-out perplexity as a guardrail. Bi-Induct reliably increases induction-head activity, but this does not translate into consistent improvements in few-shot generalization: on standard LM benchmarks, Bi-Induct is largely performance-neutral relative to natural-only training, while on function-style probes the 1B natural-only model performs best. Despite explicit backward-copy cues, anti-induction scores remain near zero across scales, revealing a strong forward/backward asymmetry. Targeted ablations show a sharper distinction: removing the top 2% induction heads per layer harms ICL more than matched random ablations, with the largest relative drop occurring in the natural-only models. This indicates that natural-only training produces more centralized, load-bearing induction circuitry, whereas Bi-Induct tends to create more distributed and redundant induction activity. Our main conclusion is that eliciting a mechanism is not the same as making it load-bearing. For data-centric foundation model design, this suggests that synthetic data interventions should be evaluated not only by signature amplification, but by whether they create causally necessary computation while preserving natural-data modeling quality.

replace-cross When Scores Learn Geometry: Rate Separations under the Manifold Hypothesis

Authors: Xiang Li, Zebang Shen, Ya-Ping Hsieh, Niao He

Abstract: Score-based methods, such as diffusion models and Bayesian inverse problems, are often interpreted as learning the data distribution in the low-noise limit ($\sigma \to 0$). In this work, we propose an alternative perspective: their success arises from implicitly learning the data manifold rather than the full distribution. Our claim is based on a novel analysis of scores in the small-$\sigma$ regime that reveals a sharp separation of scales: information about the data manifold is $\Theta(\sigma^{-2})$ stronger than information about the distribution. We argue that this insight suggests a paradigm shift from the less practical goal of distributional learning to the more attainable task of geometric learning, which provably tolerates $O(\sigma^{-2})$ larger errors in score approximation. We illustrate this perspective through three consequences: i) in diffusion models, concentration on data support can be achieved with a score error of $o(\sigma^{-2})$, whereas recovering the specific data distribution requires a much stricter $o(1)$ error; ii) more surprisingly, learning the uniform distribution on the manifold-an especially structured and useful object-is also $O(\sigma^{-2})$ easier; and iii) in Bayesian inverse problems, the maximum entropy prior is $O(\sigma^{-2})$ more robust to score errors than generic priors. Finally, we validate our theoretical findings with preliminary experiments on large-scale models, including Stable Diffusion.

replace-cross Steer Away From Mode Collisions: Improving Composition In Diffusion Models

Authors: Debottam Dutta, Jianchong Chen, Rajalaxmi Rajagopalan, Yu-Lin Wei, Romit Roy Choudhury

Abstract: We propose to improve multi-concept prompt fidelity in text-to-image diffusion models. We begin with common failure cases - prompts like "a cat and a dog" that sometimes yields images where one concept is missing, faint, or colliding awkwardly with another. We hypothesize that this happens when the diffusion model drifts into mixed modes that over-emphasize a single concept it learned strongly during training. Instead of re-training, we introduce a corrective sampling strategy that steers away from regions where the joint prompt behavior overlaps too strongly with any single concept in the prompt. The goal is to steer towards "pure" joint modes where all concepts can coexist with balanced visual presence. We further show that existing multi-concept guidance schemes can operate in unstable weight regimes that amplify imbalance; we characterize favorable regions and adapt sampling to remain within them. Our approach, CO3, is plug-and-play, requires no model tuning, and complements standard classifier-free guidance. Experiments on diverse multi-concept prompts indicate improvements in concept coverage, balance and robustness, with fewer dropped or distorted concepts compared to standard baselines and prior compositional methods. Results suggest that lightweight corrective guidance can substantially mitigate brittle semantic alignment behavior in modern diffusion systems. Code is available at https://github.com/debottam-dutta7/co3

URLs: https://github.com/debottam-dutta7/co3

replace-cross The silence of the weights: a structural pruning strategy for attention-based audio signal architectures with second order metrics

Authors: Andrea Diecidue, Carlo Alberto Barbano, Piero Fraternali, Mathieu Fontaine, Enzo Tartaglione

Abstract: Transformer-based models have become the state of the art across multiple domains, from natural language processing to machine listening, thanks to the attention mechanisms. However, the attention layers require a large number of parameters and high-end hardware for both training and inference. We propose a novel channel-pruning technique explicitly targeted at the attention mechanism, decoupling the pruning of each head and the four layers in the attention block: query, key, value, and output projection matrices, employing a second-order metric to score the network's parameters. We compare our technique against head-pruning strategies and magnitude-driven scoring metrics, investigating the effects of pruning on Audio Spectrogram Transformer (AST) and Whisper. Our results show that even after pruning 50\% of the parameters in the attention block, performance is largely preserved.

replace-cross ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning

Authors: Yichao Liang, Dat Nguyen, Cambridge Yang, Tianyang Li, Joshua B. Tenenbaum, Carl Edward Rasmussen, Adrian Weller, Zenna Tavares, Tom Silver, Kevin Ellis

Abstract: Long-horizon embodied planning is challenging because the world does not only change through an agent's actions: exogenous processes (e.g., water heating, dominoes cascading) unfold concurrently with the agent's actions. We propose a framework for abstract world models that jointly learns (i) symbolic state representations and (ii) causal processes for both endogenous actions and exogenous mechanisms. Each causal process models the time course of a stochastic cause-effect relation. We learn these world models from limited data via variational Bayesian inference combined with LLM proposals. Across five simulated tabletop robotics environments, the learned models enable fast planning that generalizes to held-out tasks with more objects and more complex goals, outperforming a range of baselines.

replace-cross TrackFormers Part 2: Enhanced Transformer-Based Models for High-Energy Physics Track Reconstruction

Authors: Sascha Caron, Nadezhda Dobreva, Maarten Kimpel, Uraz Odyurt, Slav Pshenov, Roberto Ruiz de Austri Bazan, Eugene Shalugin, Zef Wolffs, Yue Zhao

Abstract: High-Energy Physics experiments are rapidly escalating in generated data volume, a trend that will intensify with the upcoming High-Luminosity LHC upgrade. This surge in data necessitates critical revisions across the data processing pipeline, with particle track reconstruction being a prime candidate for improvement. In our previous work, we introduced "TrackFormers", a collection of Transformer-based one-shot encoder-only models that effectively associate hits with expected tracks. In this study, we extend our earlier efforts by conducting detailed investigations into more custom Transformer attention mechanisms, a new design combining geometric projection and lightweight clustering, and a joint model conditioning classification on a regressor's predictions. Furthermore, we discuss new datasets that allow the training on hit level for a range of physics processes. These developments collectively aim to boost both the accuracy and potentially the efficiency of our tracking models, offering a robust solution to meet the demands of next-generation high-energy physics experiments.

replace-cross Purrception: Variational Flow Matching for Vector-Quantized Image Generation

Authors: R\u{a}zvan-Andrei Mati\c{s}an, Vincent Tao Hu, Grigory Bartosh, Bj\"orn Ommer, Cees G. M. Snoek, Max Welling, Jan-Willem van de Meent, Mohammad Mahdi Derakhshani, Floor Eijkelboom

Abstract: We introduce Purrception, a variational flow matching approach for vector-quantized image generation that provides explicit categorical supervision while maintaining continuous transport dynamics. Our method adapts Variational Flow Matching to vector-quantized latents by learning categorical posteriors over codebook indices while computing velocity fields in the continuous embedding space. This combines the geometric awareness of continuous methods with the discrete supervision of categorical approaches, enabling uncertainty quantification over plausible codes and temperature-controlled generation. We evaluate Purrception on ImageNet-1k 256x256 generation. Training converges faster than both continuous flow matching and discrete flow matching baselines while achieving competitive FID scores with state-of-the-art models. This demonstrates that Variational Flow Matching can effectively bridge continuous transport and discrete supervision for improved training efficiency in image generation.

replace-cross Diverse Text-to-Image Generation via Contrastive Noise Optimization

Authors: Byungjun Kim, Soobin Um, Jong Chul Ye

Abstract: Text-to-image (T2I) diffusion models have demonstrated impressive performance in generating high-fidelity images, largely enabled by text-guided inference. However, this advantage often comes with a critical drawback: limited diversity, as outputs tend to collapse into similar modes under strong text guidance. Existing approaches typically optimize intermediate latents or text conditions during inference, but these methods deliver only modest gains or remain sensitive to hyperparameter tuning. In this work, we introduce Contrastive Noise Optimization, a simple yet effective method that addresses the diversity issue from a distinct perspective. Unlike prior techniques that adapt intermediate latents, our approach shapes the initial noise to promote diverse outputs. Specifically, we develop a contrastive loss defined in the Tweedie data space and optimize a batch of noise latents. Our contrastive optimization repels instances within the batch to maximize diversity while keeping them anchored to a reference sample to preserve fidelity. We further provide theoretical insights into the mechanism of this preprocessing to substantiate its effectiveness. Extensive experiments across multiple T2I backbones demonstrate that our approach achieves a superior quality-diversity Pareto frontier while remaining robust to hyperparameter choices.

replace-cross Beyond AlphaEarth: Toward Human-Centered Geospatial Foundation Models via POI-Guided Contrastive Learning

Authors: Junyuan Liu, Quan Qin, Guangsheng Dong, Xinglei Wang, Jiazhuang Feng, Zichao Zeng, Tao Cheng

Abstract: Recent geospatial foundation models (GFMs) produce spatially extensive representations of the Earth's surface that capture rich physical and environmental patterns. Among them, the AlphaEarth Foundation (AE) represents a major step, generating 10 m embeddings from multi-source Earth Observation (EO) data that include diverse environmental and spectral characteristics. However, such EO-driven representations primarily encode physical and spectral patterns rather than human activities or urban semantics, limiting their ability to capture the functional dimensions of cities and making the learned representations difficult to interpret or query using natural language. We introduce AETHER (AlphaEarth-POI Enriched Representation Learning), a lightweight framework that aligns AlphaEarth with human-centered urban analysis through multimodal alignment guided by Points of Interest (POIs). By enforcing both cross-modal AE-POI alignment and intra-modal multi-scale consistency, AETHER integrates functional urban semantics with EO-driven representations and grounds the embedding space in natural language. The resulting representations support both urban mapping tasks and natural language-conditioned spatial retrieval. Experiments across four downstream tasks in Greater London and Singapore demonstrate consistent state-of-the-art performance, with relative improvements ranging from 4.5% to 21.9%. Furthermore, the aligned embedding space enables spatial localization through natural language queries. By aligning EO-based foundation models with human-centered semantics, AETHER improves the interpretability of geospatial representations and advances geospatial representation learning toward human-centered, language-accessible geospatial foundation models.

replace-cross Transfer Learning with Distance Covariance for Random Forest: Error Bounds and an EHR Application

Authors: Chenze Li, Subhadeep Paul

Abstract: We propose a method for transfer learning in nonparametric regression using a random forest (RF) with distance covariance-based feature weights, assuming the unknown source and target regression functions are sparsely different. Our method obtains residuals from a source domain-trained Centered RF (CRF) in the target domain, then fits another CRF to these residuals with feature splitting probabilities proportional to feature-residual sample distance covariance. We derive an upper bound on the mean square error rate of the procedure as a function of sample sizes and difference dimension, theoretically demonstrating transfer learning benefits in random forests. A major difficulty for transfer learning in random forests is the lack of explicit regularization in the method. Our results explain why shallower trees with preferential selection of features lead to both lower bias and lower variance for fitting a low-dimensional function. We show that in the residual random forest, this implicit regularization is enabled by sample distance covariance. In simulations, we show that the results obtained for the CRFs also hold numerically for the standard RF (SRF) method with data-driven feature split selection. Beyond transfer learning, our results also show the benefit of distance-covariance-based weights on the performance of RF when some features dominate. Our method shows significant gains in predicting the mortality of ICU patients in smaller-bed target hospitals using a large multi-hospital dataset of electronic health records for 200,000 ICU patients.

replace-cross Qubit-centric Transformer for Surface Code Decoding

Authors: Seong-Joon Park, Hee-Youl Kwak, Yongjune Kim

Abstract: For reliable large-scale quantum computation, quantum error correction (QEC) is essential to protect logical information distributed across multiple physical qubits. Taking advantage of recent advances in deep learning, neural network-based decoders have emerged as a promising approach to improve the reliability of QEC. We propose the qubit-centric transformer (QCT), a novel and universal QEC decoder based on a transformer architecture with a qubit-centric attention mechanism. Our decoder transforms input syndromes from the stabilizer domain into qubit-centric tokens via a specialized embedding strategy. These qubit-centric tokens are processed through attention layers to effectively identify the underlying logical error. Furthermore, we introduce a graph-based masking method that incorporates the topological structure of quantum codes, enforcing attention toward relevant qubit interactions. Across various code distances for surface codes, QCT achieves state-of-the-art decoding performance, significantly outperforming existing neural decoders and the belief propagation (BP) with ordered statistics decoding (OSD) baseline. Notably, QCT achieves a high threshold of 18.1% under depolarizing noise, which closely approaches the theoretical bound of 18.9% and surpasses both the BP+OSD and the minimum-weight perfect matching (MWPM) thresholds. This qubit-centric approach provides a scalable and robust framework for surface code decoding, advancing the path toward fault-tolerant quantum computing.

replace-cross PRISM: Enhancing Protein Inverse Folding through Fine-Grained Retrieval on Structure-Sequence Multimodal Representations

Authors: Sazan Mahbub, Souvik Kundu, Eric P. Xing

Abstract: Designing protein sequences that fold into a target 3-D structure, termed as the inverse folding problem, is central to protein engineering. However, it remains challenging due to the vast sequence space and the importance of local structural constraints. Existing deep learning approaches achieve strong recovery rates, however, lack explicit mechanisms to reuse fine-grained structure-sequence patterns conserved across natural proteins. To mitigate this, we present PRISM a multimodal retrieval-augmented generation framework for inverse folding. PRISM retrieves fine-grained representations of potential motifs from known proteins and integrates them with a hybrid self-cross attention decoder. PRISM is formulated as a latent-variable probabilistic model and implemented with an efficient approximation, combining theoretical grounding with practical scalability. Experiments across multiple benchmarks, including CATH-4.2, TS50, TS500, CAMEO 2022, and the PDB date split, demonstrate the fine-grained multimodal retrieval efficacy of PRISM in yielding SoTA perplexity and amino acid recovery, while also improving the foldability metrics (RMSD, TM-score, pLDDT).

replace-cross The Living Forecast: Evolving Day-Ahead Predictions into Intraday Reality

Authors: Kutay B\"olat, Peter Palensky, Simon Tindemans

Abstract: Accurate intraday forecasts are essential for power system operations, complementing day-ahead forecasts that gradually lose relevance as new information becomes available. This paper introduces a Bayesian updating mechanism that converts fully probabilistic day-ahead forecasts into intraday forecasts without retraining or re-inference. The approach conditions the Gaussian mixture output of a conditional variational autoencoder-based forecaster on observed measurements, yielding an updated distribution for the remaining horizon that preserves its probabilistic structure. This enables consistent point, quantile, and ensemble forecasts while remaining computationally efficient and suitable for real-time applications. Experiments on household electricity consumption and photovoltaic generation datasets demonstrate that the proposed method improves forecast accuracy up to 25% across likelihood-, sample-, quantile-, and point-based metrics. The largest gains occur in time steps with strong temporal correlation to observed data, and the use of pattern dictionary-based covariance structures further enhances performance. The results highlight a theoretically grounded framework for intraday forecasting in modern power systems.

replace-cross AWARE: Audio Watermarking with Adversarial Resistance to Edits

Authors: Kosta Pavlovi\'c, Lazar Stanarevi\'c, Petar Nedi\'c, Elena Ne\v{s}ovi\'c Slavko Kova\v{c}evi\'c, Igor Djurovi\'c

Abstract: Prevailing practice in learning-based audio watermarking is to pursue robustness by expanding the set of simulated distortions during training. However, such surrogates are narrow and prone to overfitting. This paper presents AWARE (Audio Watermarking with Adversarial Resistance to Edits), an alternative approach that avoids reliance on attack-simulation stacks and handcrafted differentiable distortions. Embedding is obtained through adversarial optimization in the time-frequency domain under a level-proportional perceptual budget. Detection employs a time-order-agnostic detector with a Bitwise Readout Head (BRH) that aggregates temporal evidence into one score per watermark bit, enabling reliable watermark decoding even under desynchronization and temporal cuts. Empirically, AWARE attains high audio quality and speech intelligibility (PESQ/STOI) and consistently low BER across various audio edits, often surpassing representative state-of-the-art learning-based systems.

replace-cross RESCUE: Retrieval Augmented Secure Code Generation

Authors: Jiahao Shi, Tianyi Zhang

Abstract: Despite recent advances, Large Language Models (LLMs) still generate vulnerable code. Retrieval-Augmented Generation (RAG) has the potential to enhance LLMs for secure code generation by incorporating external security knowledge. However, the conventional RAG design struggles with the noise of raw security-related documents, and existing retrieval methods overlook the significant security semantics implicitly embedded in task descriptions. To address these issues, we propose \textsc{Rescue}, a new RAG framework for secure code generation with two key innovations. First, we propose a hybrid knowledge base construction method that combines LLM-assisted cluster-then-summarize distillation with program slicing, producing both high-level security guidelines and concise, security-focused code examples. Second, we design a hierarchical multi-faceted retrieval that traverses the constructed knowledge base from top to bottom and integrates multiple security-critical facts at each hierarchical level, ensuring comprehensive and accurate retrieval. We evaluated \textsc{Rescue} on four benchmarks and compared it with five state-of-the-art secure code generation methods on six LLMs. The results demonstrate that \textsc{Rescue} improves the SecurePass@1 metric by an average of 4.8 points, establishing a new state-of-the-art performance for security. Furthermore, we performed in-depth analysis and ablation studies to rigorously validate the effectiveness of individual components in \textsc{Rescue}. Our code is available at https://github.com/steven1518/RESCUE.

URLs: https://github.com/steven1518/RESCUE.

replace-cross HAMLOCK: HArdware-Model LOgically Combined attacK

Authors: Sanskar Amgain, Daniel Lobo, Atri Chatterjee, Swarup Bhunia, Fnu Suya

Abstract: The growing use of third-party hardware accelerators (e.g., FPGAs, ASICs) for deep neural networks (DNNs) introduces new security vulnerabilities. Conventional model-level backdoor attacks, which only poison a model's weights to misclassify inputs with a specific trigger, are often detectable because the entire attack logic is embedded within the model (i.e., software), creating a traceable layer-by-layer activation path. This paper introduces the HArdware-Model Logically Combined Attack (HAMLOCK), a far stealthier threat that distributes the attack logic across the hardware-software boundary. The software (model) is now only minimally altered by tuning the activations of few neurons to produce uniquely high activation values when a trigger is present. A malicious hardware Trojan detects those unique activations by monitoring the corresponding neurons' most significant bit or the 8-bit exponents and triggers another hardware Trojan to directly manipulate the final output logits for misclassification. This decoupled design is highly stealthy, as the model itself contains no complete backdoor activation path as in conventional attacks and hence, appears fully benign. Empirically, across benchmarks like MNIST, CIFAR10, GTSRB, and ImageNet, HAMLOCK achieves a near-perfect attack success rate with a negligible clean accuracy drop. More importantly, HAMLOCK circumvents the state-of-the-art model-level defenses without any adaptive optimization. The hardware Trojan is also undetectable, incurring area and power overheads as low as 0.01%, which is easily masked by process and environmental noise. Our findings expose a critical vulnerability at the hardware-software interface, demanding new cross-layer defenses against this emerging threat.

replace-cross Taxonomy and Trends in Reinforcement Learning for Robotics and Control Systems: A Structured Review

Authors: Kumater Ter, Abolanle Adetifa, Daniel Udekwe

Abstract: Reinforcement learning (RL) has become a foundational approach for enabling intelligent robotic behavior in dynamic and uncertain environments. This work presents an in-depth review of RL principles, advanced deep reinforcement learning (DRL) algorithms, and their integration into robotic and control systems. Beginning with the formalism of Markov Decision Processes (MDPs), the study outlines essential elements of the agent-environment interaction and explores core algorithmic strategies including actor-critic methods, value-based learning, and policy gradients. Emphasis is placed on modern DRL techniques such as DDPG, TD3, PPO, and SAC, which have shown promise in solving high-dimensional, continuous control tasks. A structured taxonomy is introduced to categorize RL applications across domains such as locomotion, manipulation, multi-agent coordination, and human-robot interaction, along with training methodologies and deployment readiness levels. The review synthesizes recent research efforts, highlighting technical trends, design patterns, and the growing maturity of RL in real-world robotics. Overall, this work aims to bridge theoretical advances with practical implementations, providing a consolidated perspective on the evolving role of RL in autonomous robotic systems.

replace-cross SemBench: A Benchmark for Semantic Query Processing Engines

Authors: Jiale Lao, Andreas Zimmerer, Olga Ovcharenko, Tianji Cong, Matthew Russo, Gerardo Vitagliano, Michael Cochez, Fatma \"Ozcan, Gautam Gupta, Thibaud Hottelier, H. V. Jagadish, Kris Kissel, Sebastian Schelter, Andreas Kipf, Immanuel Trummer

Abstract: We present a benchmark targeting a novel class of systems: semantic query processing engines. Those systems rely inherently on generative and reasoning capabilities of state-of-the-art large language models (LLMs). They extend SQL with semantic operators, configured by natural language instructions, that are evaluated via LLMs and enable users to perform various operations on multimodal data. Our benchmark introduces diversity across three key dimensions: scenarios, modalities, and operators. Included are scenarios ranging from movie review analysis to car damage detection. Within these scenarios, we cover different data modalities, including images, audio, and text. Finally, the queries involve a diverse set of operators, including semantic filters, joins, mappings, ranking, and classification operators. We evaluated our benchmark on three academic systems (LOTUS, Palimpzest, and ThalamusDB) and one industrial system, Google BigQuery. Although these results reflect a snapshot of systems under continuous development, our study offers crucial insights into their current strengths and weaknesses, illuminating promising directions for future research.

replace-cross VLAD-Grasp: Zero-shot Grasp Detection via Vision-Language Models

Authors: Manav Kulshrestha, S. Talha Bukhari, Damon Conover, Aniket Bera

Abstract: Robotic grasping is a fundamental capability for enabling autonomous manipulation, with usually infinite solutions. State-of-the-art approaches for grasping rely on learning from large-scale datasets comprising expert annotations of feasible grasps. Curating such datasets is challenging, and hence, learning-based methods are limited by the solution coverage of the dataset, and require retraining to handle novel objects. Towards this, we present VLAD-Grasp, a Vision-Language model Assisted zero-shot approach for Detecting Grasps. Our method (1) prompts a large vision-language model to generate a goal image where a virtual cylindrical proxy intersects the object's geometry, explicitly encoding an antipodal grasp axis in image space, then (2) predicts depth and segmentation to lift this generated image into 3D, and (3) aligns generated and observed object point clouds via principal components and correspondence-free optimization to recover an executable grasp pose. Unlike prior work, our approach is training-free and does not require curated grasp datasets, while achieving performance competitive with the state-of-the-art methods on the Cornell and Jacquard datasets. Furthermore, we demonstrate zero-shot generalization to real-world objects on a Franka Research 3 robot, highlighting vision-language models as powerful priors for robotic manipulation.

replace-cross DRL-Based Beam Positioning for LEO Satellite Constellations with Weighted Least Squares

Authors: Po-Heng Chou, Chiapin Wang, Kuan-Hao Chen, Wei-Chen Hsiao

Abstract: This paper investigates a lightweight deep reinforcement learning (DRL)-assisted weighting framework for CSI-free multi-satellite positioning in LEO constellations, where each visible satellite provides one serving beam (one pilot response) per epoch. A discrete-action Deep Q-Network (DQN) learns satellite weights directly from received pilot measurements and geometric features, while an augmented weighted least squares (WLS) estimator provides physics-consistent localization and jointly estimates the receiver clock bias. The proposed hybrid design targets an accuracy-runtime trade-off rather than absolute supervised optimality. In a representative 2-D setting with 10 visible satellites, the proposed approach achieves sub-meter accuracy (0.395m RMSE) with low computational overhead, supporting practical deployment for resource-constrained LEO payloads.

replace-cross Privacy-Preserving Explainable AIoT Application via SHAP Entropy Regularization

Authors: Dilli Prasad Sharma, Xiaowei Sun, Liang Xue, Xiaodong Lin, Pulei Xiong

Abstract: The widespread integration of Artificial Intelligence of Things (AIoT) in smart home environments has amplified the demand for transparent and interpretable machine learning models. To foster user trust and comply with emerging regulatory frameworks, the Explainable AI (XAI) methods, particularly post-hoc techniques such as SHapley Additive exPlanations (SHAP), and Local Interpretable Model-Agnostic Explanations (LIME), are widely employed to elucidate model behavior. However, recent studies have shown that these explanation methods can inadvertently expose sensitive user attributes and behavioral patterns, thereby introducing new privacy risks. To address these concerns, we propose a novel privacy-preserving approach based on SHAP entropy regularization to mitigate privacy leakage in explainable AIoT applications. Our method incorporates an entropy-based regularization objective that penalizes low-entropy SHAP attribution distributions during training, promoting a more uniform spread of feature contributions. To evaluate the effectiveness of our approach, we developed a suite of SHAP-based privacy attacks that strategically leverage model explanation outputs to infer sensitive information. We validate our method through comparative evaluations using these attacks alongside utility metrics on benchmark smart home energy consumption datasets. Experimental results demonstrate that SHAP entropy regularization substantially reduces privacy leakage compared to baseline models, while maintaining high predictive accuracy and faithful explanation fidelity. This work contributes to the development of privacy-preserving explainable AI techniques for secure and trustworthy AIoT applications.

replace-cross Decoupled Action Expert: Confining Task Knowledge to the Conditioning Pathway

Authors: Jian Zhou, Sihao Lin, Shuai Fu, Zerui Li, Gengze Zhou, Qi WU

Abstract: Many recent Vision-Language-Action models employ diffusion or flow-matching backbones with hundreds of millions of parameters for action generation. However, unlike image synthesis where the output spans millions of diverse pixels, a manipulation policy generates only short sequences of low-dimensional, physically correlated action values, a far simpler target that should not demand such capacity. We confirm this intuition and show that task-specific knowledge in these policies can be fully confined to the conditioning pathway, leaving the action backbone task-agnostic. To establish this, we introduce a decoupled training recipe: a general-purpose action head is first pretrained on observation-free forward-kinematics data, then frozen while only the conditioning pathway is trained for downstream tasks. Using Diffusion Policy as a testbed, we show that on both MimicGen and LIBERO, a single frozen backbone shared across all tasks matches normally trained counterparts. This confirms that the action expert encodes little task-specific knowledge. Ablations show that the specific pretraining signal (joint positions, end-effector poses, or no conditioning at all) has no effect on downstream performance, indicating that the backbone learns only general trajectory structure. Pushing this finding further, we replace the 244M U-Net in Diffusion Policy with a 5M-parameter MLP backbone that matches or exceeds its performance, calling into question the large capacity budgets allocated to action generation in current VLA designs.

replace-cross Point of Order: Action-Aware LLM Persona Modeling for Realistic Civic Simulation

Authors: Scott Merrill, Shashank Srivastava

Abstract: Large language models offer opportunities to simulate multi-party deliberation, but realistic modeling remains limited by a lack of speaker-attributed data. Transcripts produced via automatic speech recognition (ASR) assign anonymous speaker labels (e.g., Speaker_1), preventing models from capturing consistent human behavior. This work introduces a reproducible pipeline to transform public Zoom recordings into speaker-attributed transcripts with metadata like persona profiles and pragmatic action tags (e.g., [propose_motion]). We release three local government deliberation datasets: Appellate Court hearings, School Board meetings, and Municipal Council sessions. Fine-tuning LLMs to model specific participants using this "action-aware" data produces a 67% reduction in perplexity and nearly doubles classifier-based performance metrics for speaker fidelity and realism. Turing-style human evaluations show our simulations are often indistinguishable from real deliberations, providing a practical and scalable method for complex realistic civic simulations.

replace-cross MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models

Authors: Chieh-Yun Chen, Zhonghao Wang, Qi Chen, Zhifan Ye, Min Shi, Yue Zhao, Yinan Zhao, Hui Qu, Wei-An Lin, Yiru Shen, Ajinkya Kale, Irfan Essa, Humphrey Shi

Abstract: Reinforcement learning from human feedback (RLHF) with reward models has advanced alignment of generative models to human aesthetic and perceptual preferences. However, jointly optimizing multiple rewards often incurs an alignment tax, improving one dimension while degrading others. To address this, we introduce two complementary methods: MapReduce LoRA and Reward-aware Token Embedding (RaTE). MapReduce LoRA trains preference-specific LoRA experts in parallel and iteratively merges them to refine a shared base model; RaTE learns reward-specific token embeddings that compose at inference for flexible preference control. Experiments on Text-to-Image generation (Stable Diffusion 3.5 Medium and FLUX.1-dev) show improvements of 36.1%, 4.6%, and 55.7%, and 32.7%, 4.3%, and 67.1% on GenEval, PickScore, and OCR, respectively. On Text-to-Video generation (HunyuanVideo), visual and motion quality improve by 48.1% and 90.0%, respectively. On the language task, Helpful Assistant, with Llama-2 7B, helpful and harmless improve by 43.4% and 136.7%, respectively. Our framework sets a new state-of-the-art multi-preference alignment recipe across modalities.

replace-cross Uncertainty Quantification and Data Efficiency in AI: An Information-Theoretic Perspective

Authors: Osvaldo Simeone, Yaniv Romano

Abstract: In context-specific applications such as robotics, telecommunications, and healthcare, artificial intelligence systems often face the challenge of limited training data. This scarcity introduces epistemic uncertainty, i.e., reducible uncertainty stemming from incomplete knowledge of the underlying data distribution, which fundamentally limits predictive performance. This review paper examines formal methodologies that address data-limited regimes through two complementary approaches: quantifying epistemic uncertainty and mitigating data scarcity via synthetic data augmentation. We begin by reviewing generalized Bayesian learning frameworks that characterize epistemic uncertainty through generalized posteriors in the model parameter space, as well as ``post-Bayes'' learning frameworks. We continue by presenting information-theoretic generalization bounds that formalize the relationship between training data quantity and predictive uncertainty, providing a theoretical justification for generalized Bayesian learning. Moving beyond methods with asymptotic statistical validity, we survey uncertainty quantification methods that provide finite-sample statistical guarantees, including conformal prediction and conformal risk control. Finally, we examine recent advances in data efficiency by combining limited labeled data with abundant model predictions or synthetic data. Throughout, we take an information-theoretic perspective, highlighting the role of information measures in quantifying the impact of data scarcity.

replace-cross Echo-CoPilot: A Multiple-Perspective Agentic Framework for Reliable Echocardiography Interpretation

Authors: Moein Heidari, Ali Mehrabian, Mohammad Amin Roohi, Wenjin Chen, David J. Foran, Jasmine Grewal, Ilker Hacihaliloglu

Abstract: Echocardiography interpretation requires integrating multi-view temporal evidence with quantitative measurements and guideline-grounded reasoning, yet existing foundation-model pipelines largely solve isolated subtasks and fail when tool outputs are noisy or values fall near clinical cutoffs. We propose Echo-CoPilot, an end-to-end agentic framework that combines a multi-perspective workflow with knowledge-graph guided measurement selection. Echo-CoPilot runs three independent ReAct-style agents, structural, pathological, and quantitative, that invoke specialized echocardiography tools to extract parameters while querying EchoKG to determine which measurements are required for the clinical question and which should be avoided. A self-contrast language model then compares the evidence-grounded perspectives, generates a discrepancy checklist, and re-queries EchoKG to apply the appropriate guideline thresholds and resolve conflicts, reducing hallucinated measurement selection and borderline flip-flops. On MIMICEchoQA, Echo-CoPilot provides higher accuracy compared to SOTA baselines and, under a stochasticity stress test, achieves higher reliability through more consistent conclusions and fewer answer changes across repeated runs. Our code is publicly available at~\href{https://github.com/moeinheidari7829/Echo-CoPilot}{\textcolor{magenta}{GitHub}}.

URLs: https://github.com/moeinheidari7829/Echo-CoPilot

replace-cross Towards High-Fidelity Gaussian Splatting with Queried-Convolution Neural Networks

Authors: Abhinav Kumar, Tristan Aumentado-Armstrong, Lazar Valkov, Gopal Sharma, Alex Levinshtein, Radek Grzeszczuk, Suren Kumar

Abstract: Gaussian Splatting has revolutionized the field of Novel View Synthesis (NVS) with faster training and real-time rendering. However, its reconstruction fidelity still trails behind the powerful radiance models such as Zip-NeRF. Motivated by our theoretical result that both queries (such as coordinates) and neighborhood are important to learn high-fidelity signals, this paper proposes Queried-Convolutions (Qonvolutions), a simple yet powerful modification using the neighborhood properties of convolution. Qonvolutions convolve a low-fidelity signal with queries to output residual and achieve high-fidelity reconstruction. We empirically demonstrate that combining Gaussian splatting with Qonvolution neural networks (QNNs) results in state-of-the-art NVS on real-world scenes, even outperforming Zip-NeRF on image fidelity. QNNs also enhance performance of 1D regression, 2D regression and 2D super-resolution tasks.

replace-cross Two-Step Data Augmentation for Masked Face Detection and Recognition: Turning Fake Masks to Real

Authors: Yan Yang, George Bebis, Mircea Nicolescu

Abstract: The absence of large-scale masked face datasets challenges masked face detection and recognition. We propose a two-step generative data augmentation framework combining rule-based mask warping with unpaired image-to-image translation via GANs, producing masked face samples that go beyond rule-based overlays. Trained on about 19,100 images in the target domain (3.8% of IAMGAN's scale), or, including out-of-domain transfer pretraining, 59,600 and 11.8%, the proposed approach yields consistent improvements over rule-based warping alone and achieves results complementary to IAMGAN's, showing that both steps contribute. Evaluation is conducted directly on the generated samples and is qualitative; quantitative metrics like FID and KID were not applied as any real reference distribution would unfairly favor the model with closer training data. We introduce a non-mask preservation loss to reduce non-mask distortions and stabilize training, and stochastic noise injection to enhance sample diversity. Note: The paper originated as a coursework submission completed under resource constraints. Following an inexplicable scholarship termination, the author took on part-time employment to maintain research continuity, which led to a mid-semester domain pivot from medical imaging to masked face tasks due to company data restrictions. The work was completed alongside concurrent coursework with delayed compute access and without any AI assistance. It was submitted to a small venue at the semester end under an obligatory publication requirement and accepted without revision requests. Subsequent invitations to submit to first-tier venues were not pursued due to continued funding absence. Downstream evaluation on recognition or detection performance was not completed by the submission deadline. The note is added in response to subsequent comparisons and criticisms that did not account for these conditions.

replace-cross Assessing generative modeling approaches for free energy estimates in condensed matter

Authors: Maximilian Schebek, Jiajun He, Emil Hoffmann, Yuanqi Du, Frank No\'e, Jutta Rogal

Abstract: The accurate estimation of free energy differences between two states is a long-standing challenge in molecular simulations. Traditional approaches generally rely on sampling multiple intermediate states to ensure sufficient overlap in phase space and are, consequently, computationally expensive. Boltzmann Generators and related generative-model-based methods have recently addressed this challenge by learning a direct probability density transform between two states. However, it remains unclear which approach provides the best trade-off between efficiency, accuracy, and scalability. In this work, we review and benchmark selected generative approaches for condensed-matter systems, including discrete and continuous normalizing flows for targeted free energy perturbation and FEAT (Free Energy Estimators with Adaptive Transport) combined with the escorted Jarzynski equality, using coarse-grained monatomic ice and Lennard-Jones solids as benchmark systems. All models yield highly accurate free energy estimates and, depending on the system, may require fewer energy evaluations than traditional methods. Continuous flows and FEAT are most efficient in energy evaluations, whereas discrete flows have substantially lower inference cost. By releasing all data together with our results, we enable future benchmarking of free energy estimation methods in condensed-phase systems.

replace-cross AceFF: A State-of-the-Art Machine Learning Potential for Small Molecules

Authors: Stephen E. Farr, Stefan Doerr, Antonio Mirarchi, Francesc Sabanes Zariquiey, Gianni De Fabritiis

Abstract: We introduce AceFF, a pre-trained machine learning interatomic potential (MLIP) optimized for small molecule drug discovery. While MLIPs have emerged as efficient alternatives to Density Functional Theory (DFT), generalizability across diverse chemical spaces remains difficult. AceFF addresses this via a refined TensorNet2 architecture trained on a comprehensive dataset of drug-like compounds. This approach yields a force field that balances high-throughput inference speed with DFT-level accuracy. \mbox{AceFF} fully supports the essential medicinal chemistry elements (H, B, C, N, O, F, Si, P, S, Cl, Br, I) and is explicitly trained to handle charged states. Validation against rigorous benchmarks, including complex torsional energy scans, molecular dynamics trajectories, batched minimizations, and tests of force and energy accuracy, demonstrates that AceFF is state-of-the-art for organic molecules in the accuracy and speed regime important for drug discovery. The AceFF-2 model weights and inference code are available at https://huggingface.co/Acellera/AceFF-2.0.

URLs: https://huggingface.co/Acellera/AceFF-2.0.

replace-cross Channel Selected Stratified Nested Cross Validation for Clinically Relevant EEG Based Parkinsons Disease Detection

Authors: Nicholas R. Rasmussen, Rodrigue Rizk, Longwei Wang, Arun Singh, KC Santosh

Abstract: The early detection of Parkinsons disease remains a critical challenge in clinical neuroscience, with electroencephalography offering a noninvasive and scalable pathway toward population level screening. While machine learning has shown promise in this domain, many reported results suffer from methodological flaws, most notably patient level data leakage, inflating performance estimates and limiting clinical translation. To address these modeling pitfalls, we propose a unified evaluation framework grounded in nested cross validation and incorporating three complementary safeguards: (i) patient level stratification to eliminate subject overlap and ensure unbiased generalization, (ii) multi layered windowing to harmonize heterogeneous EEG recordings while preserving temporal dynamics, and (iii) inner loop channel selection to enable principled feature reduction without information leakage. Applied across three independent datasets with a heterogeneous number of channels, a convolutional neural network trained under this framework achieved 80.6% accuracy and demonstrated state of the art performance under held out population block testing, comparable to other methods in the literature. This performance underscores the necessity of nested cross validation as a safeguard against bias and as a principled means of selecting the most relevant information for patient level decisions, providing a reproducible foundation that can extend to other biomedical signal analysis domains.

replace-cross Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models

Authors: Youwei Liu, Jian Wang, Hanlin Wang, Beichen Guo, Wenjie Li

Abstract: Recent advances in world models have shown promise for modeling future dynamics of environmental states, enabling agents to reason and act without accessing real environments. Current methods mainly perform single-step or fixed-horizon rollouts, leaving their potential for complex task planning under-exploited. We propose Imagine-then-Plan (\texttt{ITP}), a unified framework for agent learning via lookahead imagination, where an agent's policy model interacts with the learned world model, yielding multi-step ``imagined'' trajectories. Since the imagination horizon may vary by tasks and stages, we introduce a novel adaptive lookahead mechanism by trading off the ultimate goal and task progress. The resulting imagined trajectories provide rich signals about future consequences, such as achieved progress and potential conflicts, which are fused with current observations, formulating a partially \textit{observable} and \textit{imaginable} Markov decision process to guide policy learning. We instantiate \texttt{ITP} with both training-free and reinforcement-trained variants. Extensive experiments across representative agent benchmarks demonstrate that \texttt{ITP} significantly outperforms competitive baselines. Further analyses validate that our adaptive lookahead largely enhances agents' reasoning capability, providing valuable insights into addressing broader, complex tasks. Our code and data will be publicly available at https://github.com/loyiv/ITP.

URLs: https://github.com/loyiv/ITP.

replace-cross Stable Differentiable Modal Synthesis for Learning Nonlinear Dynamics

Authors: Victor Zheleznov, Stefan Bilbao, Alec Wright, Simon King

Abstract: Modal methods are a long-standing approach to physical modelling synthesis. Extensions to nonlinear problems are possible, leading to coupled nonlinear systems of ordinary differential equations. Recent work in scalar auxiliary variable techniques has enabled construction of explicit and stable numerical solvers for such systems. On the other hand, neural ordinary differential equations have been successful in modelling nonlinear systems from data. In this work, we examine how scalar auxiliary variable techniques can be combined with neural ordinary differential equations to yield a stable differentiable model capable of learning nonlinear dynamics. The proposed approach leverages the analytical solution for linear vibration of the system's modes so that physical parameters of a system remain easily accessible after the training without the need for a parameter encoder in the model architecture. Compared to our previous work that used multilayer perceptrons to parametrise nonlinear dynamics, we employ gradient networks that allow an interpretation in terms of a closed-form and non-negative potential required by scalar auxiliary variable techniques. As a proof of concept, we generate synthetic data for the nonlinear transverse vibration of a string and show that the model can be trained to reproduce the nonlinear dynamics of the system. Sound examples are presented.

replace-cross Denoising the Deep Sky: Physics-Based CCD Noise Formation for Astronomical Imaging

Authors: Shuhong Liu, Xining Ge, Ziying Gu, Quanfeng Xu, Lin Gu, Ziteng Cui, Xuangeng Chu, Jun Liu, Dong Li, Tatsuya Harada

Abstract: Astronomical imaging remains noise-limited under practical observing conditions. Standard calibration pipelines remove structured artifacts but largely leave stochastic noise unresolved. Although learning-based denoising has shown strong potential, progress is constrained by scarce paired training data and the requirement for physically interpretable models in scientific workflows. We propose a physics-based noise synthesis framework tailored to CCD noise formation in the telescope. The pipeline models photon shot noise, photo-response non-uniformity, dark-current noise, readout effects, and localized outliers arising from cosmic-ray hits and hot pixels. To obtain low-noise inputs for synthesis, we stack multiple unregistered exposures to produce high-SNR bases. Realistic noisy counterparts synthesized from these bases using our noise model enable the construction of abundant paired datasets for supervised learning. Extensive experiments on our real-world multi-band dataset curated from two ground-based telescopes demonstrate the effectiveness of our framework in both photometric and scientific accuracy.

replace-cross CRAFT: Calibrated Reasoning with Answer-Faithful Traces via Reinforcement Learning for Multi-Hop Question Answering

Authors: Yu Liu, Wenxiao Zhang, Diandian Guo, Cong Cao, Fangfang Yuan, Qiang Sun, Yanbing Liu, Jin B. Hong, Zhiyuan Ma

Abstract: Retrieval-augmented large language models, when optimized with outcome-level rewards, can achieve strong answer accuracy on multi-hop questions. However, under noisy retrieval, models frequently suffer from "right-answer-wrong-reason failures": they may exploit spurious shortcuts or produce reasoning traces weakly grounded in the supporting evidence. Furthermore, the lack of structured output control prevents reliable auditing of the underlying reasoning quality. To address this, we propose CRAFT (Calibrated Reasoning with Answer-Faithful Traces), a reinforcement learning framework for the response generation stage of retrieval-augmented multi-hop question answering. CRAFT trains models to produce structured reasoning traces with configurable levels of auditability (e.g., by selectively retaining planning, evidence citation, or reasoning steps). Training combines two complementary forms of supervision: deterministic rewards enforce verifiable constraints, including format compliance, answer correctness, and citation-set validity, while a judge-based reward audits semantic faithfulness by evaluating reasoning consistency and evidence grounding. Experiments show that CRAFT improves both answer accuracy and reasoning faithfulness across model scales. Notably, semantic judge-based rewards improve answer accuracy rather than compromise it, enabling CRAFT (7B) to achieve performance competitive with strong closed-source models.

replace-cross HoRD: Robust Humanoid Control via History-Conditioned Reinforcement Learning and Online Distillation

Authors: Puyue Wang, Jiawei Hu, Yan Gao, Junyan Wang, Yu Zhang, Gillian Dobbie, Tao Gu, Wafa Johal, Ting Dang, Hong Jia

Abstract: Humanoid robots can suffer significant performance drops under small changes in dynamics, task specifications, or environment setup. We propose HoRD, a two-stage learning framework for robust humanoid control under domain shift. First, we train a high-performance teacher policy via history-conditioned reinforcement learning, where the policy infers latent dynamics context from recent state--action trajectories to adapt online to diverse randomized dynamics. Second, we perform online distillation to transfer the teacher's robust control capabilities into a transformer-based student policy that operates on sparse root-relative 3D joint keypoint trajectories. By combining history-conditioned adaptation with online distillation, HoRD enables a single policy to adapt zero-shot to unseen domains without per-domain retraining. Extensive experiments show HoRD outperforms strong baselines in robustness and transfer, especially under unseen domains and external perturbations. Code and project page are available at https://tonywang-0517.github.io/hord/.

URLs: https://tonywang-0517.github.io/hord/.

replace-cross Circuit Representations of Random Forests with Applications to XAI

Authors: Chunxi Ji, Adnan Darwiche

Abstract: We make three contributions in this paper. First, we present an approach for compiling a random forest classifier into a set of circuits, where each circuit directly encodes the instances in some class of the classifier. We show empirically that our proposed approach is significantly more efficient than existing similar approaches. Next, we utilize this approach to further obtain circuits that are tractable for computing the complete and general reasons of a decision, which are instance abstractions that play a fundamental role in computing explanations. Finally, we propose algorithms for computing the robustness of a decision and all shortest ways to flip it. We illustrate the utility of our contributions by using them to enumerate all sufficient reasons, necessary reasons and contrastive explanations of decisions; to compute the robustness of decisions; and to identify all shortest ways to flip the decisions made by random forest classifiers learned from a wide range of datasets.

replace-cross The Wisdom of Many Queries: Complexity-Diversity Principle for Dense Retriever Training

Authors: Xincan Feng, Noriki Nishida, Yusuke Sakai, Yuji Matsumoto

Abstract: Synthetic query generation has become essential for training dense retrievers, yet prior methods generate one query per document, focusing solely on query quality. We are the first to systematically study multi-query synthesis and discover a quality-diversity trade-off: high-quality queries benefit in-domain tasks, while diverse queries benefit out-of-domain (OOD) generalization. Through controlled experiments on 4 benchmark types across Contriever, RetroMAE, and Qwen3-Embedding, we find that diversity benefit strongly correlates with query complexity (r$\geq$0.95, p<0.05), approximated by content words (CW). We formalize this as the Complexity-Diversity Principle (CDP): query complexity determines optimal diversity. Based on CDP, we propose complexity-aware training: multi-query synthesis for high-complexity tasks and CW-weighted training for existing data. Both strategies improve OOD performance on reasoning-intensive benchmarks, with compounded gains when combined.

replace-cross SERFN: Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows

Authors: Chenyu Yang, Denis Tarasov, Davide Liconti, Hehui Zheng, Robert K. Katzschmann

Abstract: Real-world fine-tuning of dexterous manipulation policies remains challenging due to limited real-world interaction budgets and highly multimodal action distributions. Diffusion-based policies, while expressive, do not permit conservative likelihood-based updates during fine-tuning because action probabilities are intractable. In contrast, conventional Gaussian policies collapse under multimodality, particularly when actions are executed in chunks, and standard per-step critics fail to align with chunked execution, leading to poor credit assignment. We present SERFN, a sample-efficient off-policy fine-tuning framework with normalizing flow (NF) to address these challenges. The normalizing flow policy yields exact likelihoods for multimodal action chunks, allowing conservative, stable policy updates through likelihood regularization and thereby improving sample efficiency. An action-chunked critic evaluates entire action sequences, aligning value estimation with the policy's temporal structure and improving long-horizon credit assignment. To our knowledge, this is the first demonstration of a likelihood-based, multimodal generative policy combined with chunk-level value learning on real robotic hardware. We evaluate SERFN on two challenging dexterous manipulation tasks in the real world: cutting tape with scissors retrieved from a case, and in-hand cube rotation with a palm-down grasp -- both of which require precise, dexterous control over long horizons. On these tasks, SERFN achieves stable, sample-efficient adaptation where standard methods struggle.

replace-cross GOT-JEPA: Generic Object Tracking with Model Adaptation and Occlusion Handling using Joint-Embedding Predictive Architecture

Authors: Shih-Fang Chen, Jun-Cheng Chen, I-Hong Jhuo, Yen-Yu Lin

Abstract: The human visual system tracks objects by integrating current observations with previously observed information, adapting to target and scene changes, and reasoning about occlusion at fine granularity. In contrast, recent generic object trackers are often optimized for training targets, which limits robustness and generalization in unseen scenarios, and their occlusion reasoning remains coarse, lacking detailed modeling of occlusion patterns. To address these limitations in generalization and occlusion perception, we propose GOT-JEPA, a model-predictive pretraining framework that extends JEPA from predicting image features to predicting tracking models. Given identical historical information, a teacher predictor generates pseudo-tracking models from a clean current frame, and a student predictor learns to predict the same pseudo-tracking models from a corrupted version of the current frame. This design provides stable pseudo supervision and explicitly trains the predictor to produce reliable tracking models under occlusions, distractors, and other adverse observations, improving generalization to dynamic environments. Building on GOT-JEPA, we further propose OccuSolver to enhance occlusion perception for object tracking. OccuSolver adapts a point-centric point tracker for object-aware visibility estimation and detailed occlusion-pattern capture. Conditioned on object priors iteratively generated by the tracker, OccuSolver incrementally refines visibility states, strengthens occlusion handling, and produces higher-quality reference labels that progressively improve subsequent model predictions. Extensive evaluations on seven benchmarks show that our method effectively enhances tracker generalization and robustness.

replace-cross MARVL: Multi-Stage Guidance for Robotic Manipulation via Vision-Language Models

Authors: Xunlan Zhou, Xuanlin Chen, Shaowei Zhang, Xiangkun Li, ShengHua Wan, Xiaohai Hu, Lei Yuan, Le Gan, De-chuan Zhan

Abstract: Designing dense reward functions is pivotal for efficient robotic Reinforcement Learning (RL). However, most dense rewards rely on manual engineering, which fundamentally limits the scalability and automation of reinforcement learning. While Vision-Language Models (VLMs) offer a promising path to reward design, naive VLM rewards often misalign with task progress, struggle with spatial grounding, and show limited understanding of task semantics. To address these issues, we propose MARVL-Multi-stAge guidance for Robotic manipulation via Vision-Language models. MARVL fine-tunes a VLM for spatial and semantic consistency and decomposes tasks into multi-stage subtasks with task direction projection for trajectory sensitivity. Empirically, MARVL significantly outperforms existing VLM-reward methods on the Meta-World benchmark, demonstrating superior sample efficiency and robustness on sparse-reward manipulation tasks.

replace-cross Investigating Nonlinear Quenching Effects on Polar Field Buildup in the Sun Using Physics-Informed Neural Networks

Authors: Jithu J. Athalathil, Mohammed H. Talafha, Bhargav Vaidya

Abstract: The solar dynamo relies on the regeneration of the poloidal magnetic field through processes strongly modulated by nonlinear feedbacks such as tilt quenching (TQ) and latitude quenching (LQ). These mechanisms play a decisive role in regulating the buildup of the Sun's polar field and, in turn, the amplitude of future solar cycles. In this work, we employ Physics-Informed Neural Networks (PINN) to solve the surface flux transport (SFT) equation, embedding physical constraints directly into the neural network framework. By systematically varying transport parameters, we isolate the relative contributions of TQ and LQ to polar dipole buildup. We use the residual dipole moment as a diagnostic for cycle-to-cycle amplification and show that TQ suppression strengthens with increasing diffusivity, while LQ dominates in advection-dominated regimes. The ratio $\Delta D_{\mathrm{LQ}}/\Delta D_{\mathrm{TQ}}$ exhibits a smooth inverse-square dependence on the dynamo effectivity range, refining previous empirical fits with improved accuracy and reduced scatter. The results further reveal that the need for a decay term is not essential for PINN set-up due to the training process. Compared with the traditional 1D SFT model, the PINN framework achieves significantly lower error metrics and more robust recovery of nonlinear trends. Our results suggest that the nonlinear interplay between LQ and TQ can naturally produce alternations between weak and strong cycles, providing a physical explanation for the observed even-odd cycle modulation. These findings demonstrate the potential of PINN as an accurate, efficient, and physically consistent tool for solar cycle prediction.

replace-cross Aletheia tackles FirstProof autonomously

Authors: Tony Feng, Junehyuk Jung, Sang-hyun Kim, Carlo Pagano, Sergei Gukov, Chiang-Chiang Tsai, David Woodruff, Adel Javanmard, Aryan Mokhtari, Dawsen Hwang, Yuri Chervonyi, Jonathan N. Lee, Garrett Bingham, Trieu H. Trinh, Vahab Mirrokni, Quoc V. Le, Thang Luong

Abstract: We report the performance of Aletheia (Feng et al., 2026b), a mathematics research agent powered by Gemini 3 Deep Think, on the inaugural FirstProof challenge. Within the allowed timeframe of the challenge, Aletheia autonomously solved 6 problems (2, 5, 7, 8, 9, 10) out of 10 according to majority expert assessments; we note that experts were not unanimous on Problem 8 (only). For full transparency, we explain our interpretation of FirstProof and disclose details about our experiments as well as our evaluation. Raw prompts and outputs are available at https://github.com/google-deepmind/superhuman/tree/main/aletheia.

URLs: https://github.com/google-deepmind/superhuman/tree/main/aletheia.

replace-cross Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models

Authors: Sasha Robinson, Katherine M. Collins, Ilia Sucholutsky, Kelsey R. Allen

Abstract: With increasing integration of Large Language Models (LLMs) into areas of high-stakes human decision-making, it is important to understand the risks they introduce as advisors. To be useful advisors, LLMs must sift through large amounts of content, written with both benevolent and malicious intent, and then use this information to convince a user to take a specific action. This involves two social capacities: vigilance (the ability to determine which information to use, and which to discard) and persuasion (synthesizing the available evidence to make a convincing argument). While existing work has investigated these capacities in isolation, there has been little prior investigation of how these capacities may be linked. Here, we use a simple multi-turn puzzle-solving game, Sokoban, to study LLMs' abilities to persuade and be rationally vigilant towards other LLM agents. We find that puzzle-solving performance, persuasive capability, and vigilance are dissociable capacities in LLMs. Performing well on the game does not automatically mean a model can detect when it is being misled, even if the possibility of deception is explicitly mentioned. However, LLMs do consistently modulate their token use, using fewer tokens to reason when advice is benevolent and more when it is malicious, even if they are still persuaded to take actions leading them to failure. To our knowledge, our work presents the first investigation of the relationship between persuasion, vigilance, and task performance in LLMs, and suggests that monitoring all three independently will be critical for future work in AI safety.

replace-cross Precedence-Constrained Decision Trees and Coverings

Authors: Micha{\l} Szyfelbein, Dariusz Dereniowski

Abstract: This work considers a number of optimization problems and reductive relations between them. The two main problems we are interested in are the \emph{Optimal Decision Tree} and \emph{Set Cover}. We study these two fundamental tasks under precedence constraints, that is, if a test (or set) $X$ is a predecessor of $Y$, then in any feasible decision tree $X$ needs to be an ancestor of $Y$ (or respectively, if $Y$ is added to set cover, then so must be $X$). For the Optimal Decision Tree we consider two optimization criteria: worst case identification time (height of the tree) or the average identification time. Similarly, for the Set Cover we study two cost measures: the size of the cover or the average cover time. Our approach is to develop a number of algorithmic reductions, where an approximation algorithm for one problem provides an approximation for another via a black-box usage of a procedure for the former. En route we introduce other optimization problems either to complete the `reduction landscape' or because they hold the essence of combinatorial structure of our problems. The latter is brought by a problem of finding a maximum density precedence closed subfamily, where the density is defined as the ratio of the number of items the family covers to its size. By doing so we provide $\mathcal{O}^*(\sqrt{m})$-approximation algorithms for all of the aforementioned problems. The picture is complemented by a number of hardness reductions that provide $o(m^{1/12-\epsilon})$-inapproximability results for the decision tree and covering problems. Besides giving a complete set of results for general precedence constraints, we also provide polylogarithmic approximation guarantees for two most typically studied and applicable precedence types, outforests and inforests. By providing corresponding hardness results, we show these results to be tight.

replace-cross Survey on Neural Routing Solvers

Authors: Yunpeng Ba, Xi Lin, Changliang Zhou, Ruihao Zheng, Zhenkun Wang, Xinyan Liang, Zhichao Lu, Jianyong Sun, Yuhua Qian, Qingfu Zhang

Abstract: Neural routing solvers (NRSs) that leverage deep learning to tackle vehicle routing problems have demonstrated notable potential for practical applications. By learning implicit heuristic rules from data, NRSs replace the handcrafted counterparts in classic heuristic frameworks, thereby reducing reliance on costly manual design and trial-and-error adjustments. This survey makes two main contributions: (1) The heuristic nature of NRSs is highlighted, and existing NRSs are reviewed from the perspective of heuristics. A hierarchical taxonomy based on heuristic principles is further introduced. (2) A generalization-focused evaluation pipeline is proposed to address limitations of the conventional pipeline. Comparative benchmarking of representative NRSs across both pipelines uncovers a series of previously unreported gaps in current research.

replace-cross TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models

Authors: Reihaneh Iranmanesh, Saeedeh Davoudi, Pasha Abrishamchian, Ophir Frieder, Nazli Goharian

Abstract: This paper presents a comprehensive evaluation framework for assessing the cultural competence of large language models (LLMs) in Persian. Existing Persian cultural benchmarks rely predominantly on multiple-choice formats and English-centric metrics that fail to capture Persian's morphological complexity and semantic nuance. Our framework introduces a Persian-specific short-answer evaluation that combines rule-based morphological normalization with a hybrid syntactic and semantic similarity module, enabling robust soft-match scoring beyond exact string overlap. Through systematic evaluation of 15 state-of-the-art open- and closed-source models across three culturally grounded Persian datasets, we demonstrate that our hybrid evaluation improves scoring consistency by +10 compared to exact-match baselines by capturing meaning that surface-level methods cannot detect. Our human evaluation further confirms that the proposed semantic similarity metric achieves higher agreement with human judgments than LLM-based judges. We publicly release our evaluation framework, providing the first standardized benchmark for measuring cultural understanding in Persian and establishing a reproducible foundation for cross-cultural LLM evaluation research.

replace-cross Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acoustics

Authors: Mandip Goswami

Abstract: We introduce Whisper-RIR-Mega, a benchmark dataset of paired clean and reverberant speech for evaluating automatic speech recognition (ASR) robustness to room acoustics. Each sample pairs a clean LibriSpeech utterance with the same utterance convolved with a real room impulse response from the RIR-Mega corpus, with stratified splits by reverberation time (RT60) and direct-to-reverberant ratio (DRR). We evaluate five Whisper models (tiny through large-v3) on 1600 test samples and report word error rate (WER) and character error rate (CER) under clean and reverberant conditions. Reverberation consistently degrades performance across all model sizes; the reverb penalty in WER ranges from 2.31 to 15.50 percentage points depending on the model. Whisper-large-v3 shows the smallest penalty; Whisper-tiny shows the largest. We release the dataset, evaluation code, and baseline results to support reproducible research on robust ASR.

replace-cross BLINK: Behavioral Latent Modeling of NK Cell Cytotoxicity

Authors: Iman Nematollahi, Jose Francisco Villena-Ossa, Alina Moter, Kiana Farhadyar, Gabriel Kalweit, Abhinav Valada, Toni Cathomen, Evelyn Ullrich, Maria Kalweit

Abstract: Machine learning models of cellular interaction dynamics hold promise for understanding cell behavior. Natural killer (NK) cell cytotoxicity is a prominent example of such interaction dynamics and is commonly studied using time-resolved multi-channel fluorescence microscopy. Although tumor cell death events can be annotated at single frames, NK cytotoxic outcome emerges over time from cellular interactions and cannot be reliably inferred from frame-wise classification alone. We introduce BLINK, a trajectory-based recurrent state-space model that serves as a cell world model for NK-tumor interactions. BLINK learns latent interaction dynamics from partially observed NK-tumor interaction sequences and predicts apoptosis increments that accumulate into cytotoxic outcomes. Experiments on long-term time-lapse NK-tumor recordings show improved cytotoxic outcome detection and enable forecasting of future outcomes, together with an interpretable latent representation that organizes NK trajectories into coherent behavioral modes and temporally structured interaction phases. BLINK provides a unified framework for quantitative evaluation and structured modeling of NK cytotoxic behavior at the single-cell level.

replace-cross On the Statistical Optimality of Optimal Decision Trees

Authors: Zineng Xu, Subhro Ghosh, Yan Shuo Tan

Abstract: While globally optimal empirical risk minimization (ERM) decision trees have become computationally feasible and empirically successful, rigorous theoretical guarantees for their statistical performance remain limited. In this work, we develop a comprehensive statistical theory for ERM trees under random design in both high-dimensional regression and classification. We first establish sharp oracle inequalities that bound the excess risk of the ERM estimator relative to the best possible approximation achievable by any tree with at most $L$ leaves, thereby characterizing the interpretability-accuracy trade-off. We derive these results using a novel uniform concentration framework based on empirically localized Rademacher complexity. Furthermore, we derive minimax optimal rates over a novel function class: the piecewise sparse heterogeneous anisotropic Besov (PSHAB) space. This space explicitly captures three key structural features encountered in practice: sparsity, anisotropic smoothness, and spatial heterogeneity. While our main results are established under sub-Gaussianity, we also provide robust guarantees that hold under heavy-tailed noise settings. Together, these findings provide a principled foundation for the optimality of ERM trees and introduce empirical process tools broadly applicable to other highly adaptive, data-driven procedures.

replace-cross MobileFetalCLIP: Selective Repulsive Knowledge Distillation for Mobile Fetal Ultrasound Analysis

Authors: Numan Saeed, Fadillah Adamsyah Maani, Mohammad Yaqub

Abstract: Fetal ultrasound AI could transform prenatal care in low-resource settings, yet current foundation models exceed 300M visual parameters, precluding deployment on point-of-care devices. Standard knowledge distillation fails under such extreme capacity gaps (~26x), as compact students waste capacity mimicking architectural artifacts of oversized teachers. We introduce Selective Repulsive Knowledge Distillation, which decomposes contrastive KD into diagonal and off-diagonal components: matched pair alignment is preserved while the off-diagonal weight decays into negative values, repelling the student from the teacher's inter-class confusions and forcing discovery of architecturally native features. Our 11.4M parameter student surpasses the 304M-parameter FetalCLIP teacher on zero-shot HC18 biometry validity (88.6% vs. 83.5%) and brain sub-plane F1 (0.784 vs. 0.702), while running at 1.6 ms on iPhone 16 Pro, enabling real-time assistive AI on handheld ultrasound devices. Our code, models, and app are publicly available at https://github.com/numanai/MobileFetalCLIP.

URLs: https://github.com/numanai/MobileFetalCLIP.

replace-cross Sharp Phase Boundary for Quantum Reverse Diffusion

Authors: Ammar Fayad

Abstract: Classical reverse diffusion is generated by changing the drift at fixed noise. We show that the quantum version of this principle obeys an exact law with a sharp phase boundary. For Gaussian pure-loss dynamics, the canonical model of continuous-variable decoherence, we prove that the unrestricted instantaneous reverse optimum exhibits a noiseless-to-noisy transition: below a critical squeezing-to-thermal ratio, reversal can be noiseless; above it, complete positivity forces irreducible reverse noise whose minimum cost we determine in closed form. The optimal reverse diffusion is uniquely covariance-aligned and simultaneously minimizes the geometric, metrological, and thermodynamic price of reversal. For multimode trajectories, the exact cost is additive in a canonical set of mode-resolved data, and a globally continuous protocol attains this optimum on every mixed-state interval. If a pure nonclassical endpoint is included, the same pointwise law holds for every $t>0$, but the optimum diverges as $2/t$: exact Gaussian reversal of a pure quantum state is dynamically unattainable. These results establish the exact Gaussian benchmark against which any broader theory of quantum reverse diffusion must be measured.

replace-cross HyperTokens: Controlling Token Dynamics for Continual Video-Language Understanding

Authors: Toan Nguyen, Yang Liu, Celso De Melo, Flora D. Salim

Abstract: Continual VideoQA with multimodal LLMs is hindered by interference between tasks and the prohibitive cost of storing task-specific prompts. We introduce HyperTokens, a transformer-based token generator that produces fine-tuning tokens on demand, giving explicit control over prompt updates while keeping memory fixed. To suppress forgetting, we propose meta-inspired regularisers that look ahead to avoid task-specific sharp directions and anchor the evolving generator to prior tasks. We further connect our objective to sharpness-aware optimisation, providing insight into why it encourages flatter cross-task minima and improves retention. Beyond regularisation, HyperTokens exploits lightweight auxiliary multimodal supervision through shared generation weights; guided by a causal perspective, we design feasible objectives and surrogate mutual-information losses to regularise anti-causal cross-modal directions. Across two standard continual VideoQA benchmarks, HyperTokens achieves higher average accuracy with substantially lower forgetting. Finally, we introduce a challenging cross-modal ImageQA->VideoQA protocol and show that HyperTokens enables robust continual transfer in this setting.

replace-cross AgrI Challenge: A Data-Centric AI Competition for Cross-Team Validation in Agricultural Vision

Authors: Mohammed Brahimi, Karim Laabassi, Mohamed Seghir Hadj Ameur, Aicha Boutorh, Badia Siab-Farsi, Amin Khouani, Omar Farouk Zouak, Seif Eddine Bouziane, Kheira Lakhdari, Abdelkader Nabil Benghanem

Abstract: Machine learning models in agricultural vision often achieve high accuracy on curated datasets but fail to generalize under real field conditions due to distribution shifts between training and deployment environments. Moreover, most machine learning competitions focus primarily on model design while treating datasets as fixed resources, leaving the role of data collection practices in model generalization largely unexplored. We introduce the AgrI Challenge, a data-centric competition framework in which multiple teams independently collect field datasets, producing a heterogeneous multi-source benchmark that reflects realistic variability in acquisition conditions. To systematically evaluate cross-domain generalization across independently collected datasets, we propose Cross-Team Validation (CTV), an evaluation paradigm that treats each team's dataset as a distinct domain. CTV includes two complementary protocols: Train-on-One-Team-Only (TOTO), which measures single-source generalization, and Leave-One-Team-Out (LOTO), which evaluates collaborative multi-source training. Experiments reveal substantial generalization gaps under single-source training: models achieve near-perfect validation accuracy yet exhibit validation-test gaps of up to 16.20% (DenseNet121) and 11.37% (Swin Transformer) when evaluated on datasets collected by other teams. In contrast, collaborative multi-source training dramatically improves robustness, reducing the gap to 2.82% and 1.78%, respectively. The challenge also produced a publicly available dataset of 50,673 field images of six tree species collected by twelve independent teams, providing a diverse benchmark for studying domain shift and data-centric learning in agricultural vision.

replace-cross Concept-Guided Fine-Tuning: Steering ViTs away from Spurious Correlations to Improve Robustness

Authors: Yehonatan Elisha, Oren Barkan, Noam Koenigstein

Abstract: Vision Transformers (ViTs) often degrade under distribution shifts because they rely on spurious correlations, such as background cues, rather than semantically meaningful features. Existing regularization methods, typically relying on simple foreground-background masks, which fail to capture the fine-grained semantic concepts that define an object (e.g., ``long beak'' and ``wings'' for a ``bird''). As a result, these methods provide limited robustness to distribution shifts. To address this limitation, we introduce a novel finetuning framework that steers model reasoning toward concept-level semantics. Our approach optimizes the model's internal relevance maps to align with spatially grounded concept masks. These masks are generated automatically, without manual annotation: class-relevant concepts are first proposed using an LLM-based, label-free method, and then segmented using a VLM. The finetuning objective aligns relevance with these concept regions while simultaneously suppressing focus on spurious background areas. Notably, this process requires only a minimal set of images and uses half of the dataset classes. Extensive experiments on five out-of-distribution benchmarks demonstrate that our method improves robustness across multiple ViT-based models. Furthermore, we show that the resulting relevance maps exhibit stronger alignment with semantic object parts, offering a scalable path toward more robust and interpretable vision models. Finally, we confirm that concept-guided masks provide more effective supervision for model robustness than conventional segmentation maps, supporting our central hypothesis.

replace-cross A prospective clinical feasibility study of a conversational diagnostic AI in an ambulatory primary care clinic

Authors: Peter Brodeur, Jacob M. Koshy, Anil Palepu, Khaled Saab, Ava Homiar, Roma Ruparel, Charles Wu, Ryutaro Tanno, Joseph Xu, Amy Wang, David Stutz, Wei-Hung Weng, Hannah M. Ferrera, David Barrett, Lindsey Crowley, Jihyeon Lee, Spencer E. Rittner, Ellery Wulczyn, Selena K. Zhang, Elahe Vedadi, Christine G. Kohn, Kavita Kulkarni, Vinay Kadiyala, Sara Mahdavi, Wendy Du, Jessica M. Williams, David Feinbloom, Renee Wong, Tao Tu, Petar Sirkovic, Alessio Orlandi, Christopher Semturs, Yun Liu, Juraj Gottweis, Dale R. Webster, Jo\"elle Barral, Katherine Chou, Pushmeet Kohli, Avinatan Hassidim, Yossi Matias, James Manyika, Rob Fields, Jonathan X. Li, Marc L. Cohen, Vivek Natarajan, Mike Schaekermann, Alan Karthikesalingam, Adam Rodman

Abstract: Large language model (LLM)-based AI systems have shown promise for patient-facing diagnostic and management conversations in simulated settings. Translating these systems into clinical practice requires assessment in real-world workflows with rigorous safety oversight. We report a prospective, single-arm feasibility study of an LLM-based conversational AI, the Articulate Medical Intelligence Explorer (AMIE), conducting clinical history taking and presentation of potential diagnoses for patients to discuss with their provider at urgent care appointments at a leading academic medical center. 100 adult patients completed an AMIE text-chat interaction up to 5 days before their appointment. We sought to assess the conversational safety and quality, patient and clinician experience, and clinical reasoning capabilities compared to primary care providers (PCPs). Human safety supervisors monitored all patient-AMIE interactions in real time and did not need to intervene to stop any consultations based on pre-defined criteria. Patients reported high satisfaction and their attitudes towards AI improved after interacting with AMIE (p < 0.001). PCPs found AMIE's output useful with a positive impact on preparedness. AMIE's differential diagnosis (DDx) included the final diagnosis, per chart review 8 weeks post-encounter, in 90% of cases, with 75% top-3 accuracy. Blinded assessment of AMIE and PCP DDx and management (Mx) plans suggested similar overall DDx and Mx plan quality, without significant differences for DDx (p = 0.6) and appropriateness and safety of Mx (p = 0.1 and 1.0, respectively). PCPs outperformed AMIE in the practicality (p = 0.003) and cost effectiveness (p = 0.004) of Mx. While further research is needed, this study demonstrates the initial feasibility, safety, and user acceptance of conversational AI in a real-world setting, representing crucial steps towards clinical translation.

replace-cross STRIDE: Structured Lagrangian and Stochastic Residual Dynamics via Flow Matching

Authors: Prakrut Kotecha, Ganga Nair B, Shishir Kolathaya

Abstract: Robotic systems operating in unstructured environments must operate under significant uncertainty arising from intermittent contacts, frictional variability, and unmodeled compliance. While recent model-free approaches have demonstrated impressive performance, many deployment settings still require predictive models that support planning, constraint handling, and online adaptation. Analytical rigid-body models provide strong physical structure but often fail to capture complex interaction effects, whereas purely data-driven models may violate physical consistency, exhibit data bias, and accumulate long-horizon drift. In this work, we propose STRIDE, a dynamics learning framework that explicitly separates conservative rigid-body mechanics from uncertain, effectively stochastic non-conservative interaction effects. The structured component is modeled using a Lagrangian Neural Network (LNN) to preserve energy-consistent inertial dynamics, while residual interaction forces are represented using Conditional Flow Matching (CFM) to capture multi-modal interaction phenomena. The two components are trained jointly end-to-end, enabling the model to retain physical structure while representing complex stochastic behavior. We evaluate STRIDE on systems of increasing complexity, including a pendulum, the Unitree Go1 quadruped, and the Unitree G1 humanoid. Results show 20% reduction in long-horizon prediction error and 30% reduction in contact force prediction error compared to deterministic residual baselines, supporting more reliable model-based control in uncertain robotic environments.

replace-cross Structural Causal Bottleneck Models

Authors: Simon Bing, Jonas Wahl, Jakob Runge

Abstract: We introduce structural causal bottleneck models (SCBMs), a novel class of structural causal models. At the core of SCBMs lies the assumption that causal effects between high-dimensional variables only depend on low-dimensional summary statistics, or bottlenecks, of the causes. SCBMs provide a flexible framework for task-specific dimension reduction while being estimable via standard, simple learning algorithms in practice. We analyse identifiability in SCBMs, connect them to information bottlenecks in the sense of Tishby & Zaslavsky (2015), and illustrate how to estimate them experimentally. We also demonstrate the benefit of bottlenecks for effect estimation in low-sample transfer learning settings. We argue that SCBMs provide an alternative to existing causal dimension reduction frameworks like causal representation learning or causal abstraction learning.

replace-cross A Survey of Reinforcement Learning For Economics

Authors: Pranjal Rawat

Abstract: This survey (re)introduces reinforcement learning methods to economists. The curse of dimensionality limits how far exact dynamic programming can be effectively applied, forcing us to rely on suitably "small" problems or our ability to convert "big" problems into smaller ones. While this reduction has been sufficient for many classical applications, a growing class of economic models resists such reduction. Reinforcement learning algorithms offer a natural, sample-based extension of dynamic programming, extending tractability to problems with high-dimensional states, continuous actions, and strategic interactions. I review the theory connecting classical planning to modern learning algorithms and demonstrate their mechanics through simulated examples in pricing, inventory control, strategic games, and preference elicitation. I also examine the practical vulnerabilities of these algorithms, noting their brittleness, sample inefficiency, sensitivity to hyperparameters, and the absence of global convergence guarantees outside of tabular settings. The successes of reinforcement learning remain strictly bounded by these constraints, as well as a reliance on accurate simulators. When guided by economic structure, reinforcement learning provides a remarkably flexible framework. It stands as an imperfect, but promising, addition to the computational economist's toolkit. A companion survey (Rust and Rawat, 2026b) covers the inverse problem of inferring preferences from observed behavior. All simulation code is publicly available.

replace-cross Emotion is Not Just a Label: Latent Emotional Factors in LLM Processing

Authors: Benjamin Reichman, Adar Avsian, Samuel Webster, Larry Heck

Abstract: Large language models are routinely deployed on text that varies widely in emotional tone, yet their reasoning behavior is typically evaluated without accounting for emotion as a source of representational variation. Prior work has largely treated emotion as a prediction target, for example in sentiment analysis or emotion classification. In contrast, we study emotion as a latent factor that shapes how models attend to and reason over text. We analyze how emotional tone systematically alters attention geometry in transformer models, showing that metrics such as locality, center-of-mass distance, and entropy vary across emotions and correlate with downstream question-answering performance. To facilitate controlled study of these effects, we introduce Affect-Uniform ReAding QA (AURA-QA), a question-answering dataset with emotionally balanced, human-authored context passages. Finally, an emotional regularization framework is proposed that constrains emotion-conditioned representational drift during training. Experiments across multiple QA benchmarks demonstrate that this approach improves reading comprehension in both emotionally-varying and non-emotionally varying datasets, yielding consistent gains under distribution shift and in-domain improvements on several benchmarks.

replace-cross Think Before You Lie: How Reasoning Leads to Honesty

Authors: Ann Yuan, Asma Ghandeharioun, Carter Blum, Alicia Machado, Jessica Hoffmann, Daphne Ippolito, Martin Wattenberg, Lucas Dixon, Katja Filippova

Abstract: While existing evaluations of large language models (LLMs) measure deception rates, the underlying conditions that give rise to deceptive behavior are poorly understood. We investigate this question using a novel dataset of realistic moral trade-offs where honesty incurs variable costs. Contrary to humans, who tend to become less honest given time to deliberate (Capraro, 2017; Capraro et al., 2019), we find that reasoning consistently increases honesty across scales and for several LLM families. This effect is not only a function of the reasoning content, as reasoning traces are often poor predictors of final behaviors. Rather, we show that the underlying geometry of the representational space itself contributes to the effect. Namely, we observe that deceptive regions within this space are metastable: deceptive answers are more easily destabilized by input paraphrasing, output resampling, and activation noise than honest ones. We interpret the effect of reasoning in this vein: generating deliberative tokens as part of moral reasoning entails the traversal of a biased representational space, ultimately nudging the model toward its more stable, honest defaults.

replace-cross Kernel Tests of Equivalence

Authors: Xing Liu, Axel Gandy

Abstract: We propose novel kernel-based tests for assessing the equivalence between distributions. Traditional goodness-of-fit testing is inappropriate for concluding the absence of distributional differences, because failure to reject the null hypothesis may simply be a result of lack of test power, also known as the Type-II error. This motivates \emph{equivalence testing}, which aims to assess the \emph{absence} of a statistically meaningful effect under controlled error rates. However, existing equivalence tests are either limited to parametric distributions or focus only on specific moments rather than the full distribution. We address these limitations using two kernel-based statistical discrepancies: the \emph{kernel Stein discrepancy} and the \emph{Maximum Mean Discrepancy}. The null hypothesis of our proposed tests assumes the candidate distribution differs from the nominal distribution by at least a pre-defined margin, which is measured by these discrepancies. We propose two approaches for computing the critical values of the tests, one using an asymptotic normality approximation, and another based on bootstrapping. Numerical experiments are conducted to assess the performance of these tests.

replace-cross Bayesian Optimization with Gaussian Processes to Accelerate Stationary Point Searches

Authors: Rohit Goswami (Institute IMX and Lab-COSMO, \'Ecole polytechnique f\'ed\'erale de Lausanne)

Abstract: Accelerating the explorations of stationary points on potential energy surfaces building local surrogates spans decades of effort. Done correctly, surrogates reduce required evaluations by an order of magnitude while preserving the accuracy of the underlying theory. We present a unified Bayesian Optimization view of minimization, single point saddle searches, and double ended saddle searches through a unified six-step surrogate loop, differing only in the inner optimization target and acquisition criterion. The framework uses Gaussian process regression with derivative observations, inverse-distance kernels, and active learning. The Optimal Transport GP extensions of farthest point sampling with Earth mover's distance, MAP regularization via variance barrier and oscillation detection, and adaptive trust radius form concrete extensions of the same basic methodology, improving accuracy and efficiency. We also demonstrate random Fourier features decouple hyperparameter training from predictions enabling favorable scaling for high-dimensional systems. Accompanying pedagogical Rust code demonstrates that all applications use the exact same Bayesian optimization loop, bridging the gap between theoretical formulation and practical execution.

replace-cross PACED: Distillation and Self-Distillation at the Frontier of Student Competence

Authors: Yuanda Xu, Hejian Sang, Zhengze Zhou, Ran He, Zhipeng Wang

Abstract: Standard LLM distillation wastes compute on two fronts: problems the student has already mastered (near-zero gradients) and problems far beyond its reach (incoherent gradients that erode existing capabilities). We show that this waste is not merely intuitive but structurally inevitable: the gradient signal-to-noise ratio in distillation provably vanishes at both pass-rate extremes. This theoretical observation leads to Paced, a framework that concentrates distillation on the zone of proximal development -- the frontier of a student model's competence -- via a principled pass-rate weight $w(p) = p^\alpha(1 - p)^\beta$ derived from the boundary-vanishing structure of distillation gradients. Key results: (1) Theory: We prove that the Beta kernel $w(p) = p^\alpha(1-p)^\beta$ is a leading-order weight family arising from the SNR structure of distillation, and that it is minimax-robust -- under bounded multiplicative misspecification, worst-case efficiency loss is only $O(\delta^2)$. (2)Distillation: On distillation from a larger teacher to a smaller student model with forward KL, Paced achieves significant gain over the base model, while keeping benchmark forgetting at a low level. (3)Self-distillation: On instruction-tuned models with reverse KL, gains are exceeding baselines as well. (4)Two-stage synergy: A forward-KL-then-reverse-KL schedule yields the strongest results in our setting, reaching substantial improvements on standard reasoning benchmarks -- supporting a mode-coverage-then-consolidation interpretation of the distillation process. All configurations require only student rollouts to estimate pass rates, need no architectural changes, and are compatible with any KL direction.

replace-cross Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol

Authors: Christopher Altman

Abstract: Autonomous agents, especially delegated systems with memory, persistent context, and multi-step planning, pose a measurement problem not present in stateless models: an agent that preserves continued operation as a terminal objective and one that does so merely instrumentally can produce observationally similar trajectories. External behavioral monitoring cannot reliably distinguish between them. We introduce the Unified Continuation-Interest Protocol (UCIP), a multi-criterion detection framework that moves this distinction from behavior to the latent structure of agent trajectories. UCIP encodes trajectories with a Quantum Boltzmann Machine (QBM), a classical algorithm based on the density-matrix formalism of quantum statistical mechanics, and measures the von Neumann entropy of the reduced density matrix induced by a bipartition of hidden units. We test whether agents with terminal continuation objectives (Type A) produce latent states with higher entanglement entropy than agents whose continuation is merely instrumental (Type B). Higher entanglement reflects stronger cross-partition statistical coupling. On gridworld agents with known ground-truth objectives, UCIP achieves 100% detection accuracy and 1.0 AUC-ROC on held-out non-adversarial evaluation under the frozen Phase I gate. The entanglement gap between Type A and Type B agents is Delta = 0.381 (p < 0.001, permutation test). Pearson r = 0.934 across an 11-point interpolation sweep indicates that, within this synthetic family, UCIP tracks graded changes in continuation weighting rather than merely a binary label. Among the tested models, only the QBM achieves positive Delta. All computations are classical; "quantum" refers only to the mathematical formalism. UCIP does not detect consciousness or subjective experience; it detects statistical structure in latent representations that correlates with known objectives.

replace-cross Shape-of-You: Fused Gromov-Wasserstein Optimal Transport for Semantic Correspondence in-the-Wild

Authors: Jiin Im, Sisung Liu, Je Hyeong Hong

Abstract: Semantic correspondence is essential for handling diverse in-the-wild images lacking explicit correspondence annotations. While recent 2D foundation models offer powerful features, adapting them for unsupervised learning via nearest-neighbor pseudo-labels has key limitations: it operates locally, ignoring structural relationships, and consequently its reliance on 2D appearance fails to resolve geometric ambiguities arising from symmetries or repetitive features. In this work, we address this by reformulating pseudo-label generation as a Fused Gromov-Wasserstein (FGW) problem, which jointly optimizes inter-feature similarity and intra-structural consistency. Our framework, Shape-of-You (SoY), leverages a 3D foundation model to define this intra-structure in the geometric space, resolving abovementioned ambiguity. However, since FGW is a computationally prohibitive quadratic problem, we approximate it through anchor-based linearization. The resulting probabilistic transport plan provides a structurally consistent but noisy supervisory signal. Thus, we introduce a soft-target loss dynamically blending guidance from this plan with network predictions to build a learning framework robust to this noise. SoY achieves state-of-the-art performance on SPair-71k and AP-10k datasets, establishing a new benchmark in semantic correspondence without explicit geometric annotations. Code is available at Shape-of-You.

replace-cross Revisiting Model Stitching In the Foundation Model Era

Authors: Zheda Mai, Ke Zhang, Fu-En Wang, Zixiao Ken Wang, Albert Y. C. Chen, Lu Xia, Min Sun, Wei-Lun Chao, Cheng-Hao Kuo

Abstract: Model stitching, connecting early layers of one model (source) to later layers of another (target) via a light stitch layer, has served as a probe of representational compatibility. Prior work finds that models trained on the same dataset remain stitchable (negligible accuracy drop) despite different initializations or objectives. We revisit stitching for Vision Foundation Models (VFMs) that vary in objectives, data, and modality mix (e.g., CLIP, DINOv2, SigLIP 2) and ask: Are heterogeneous VFMs stitchable? We introduce a systematic protocol spanning the stitch points, stitch layer families, training losses, and downstream tasks. Three findings emerge. (1) Stitch layer training matters: conventional approaches that match the intermediate features at the stitch point or optimize the task loss end-to-end struggle to retain accuracy, especially at shallow stitch points. (2) With a simple feature-matching loss at the target model's penultimate layer, heterogeneous VFMs become reliably stitchable across vision tasks. (3) For deep stitch points, the stitched model can surpass either constituent model at only a small inference overhead (for the stitch layer). Building on these findings, we further propose the VFM Stitch Tree (VST), which shares early layers across VFMs while retaining their later layers, yielding a controllable accuracy-latency trade-off for multimodal LLMs that often leverage multiple VFMs. Taken together, our study elevates stitching from a diagnostic probe to a practical recipe for integrating complementary VFM strengths and pinpointing where their representations align or diverge.