new Lightweight LSTM Model for Energy Theft Detection via Input Data Reduction

Authors: Caylum Collier, Krishnendu Guha

Abstract: With the increasing integration of smart meters in electrical grids worldwide, detecting energy theft has become a critical and ongoing challenge. Artificial intelligence (AI)-based models have demonstrated strong performance in identifying fraudulent consumption patterns; however, previous works exploring the use of machine learning solutions for this problem demand high computational and energy costs, limiting their practicality -- particularly in low-theft scenarios where continuous inference can result in unnecessary energy usage. This paper proposes a lightweight detection unit, or watchdog mechanism, designed to act as a pre-filter that determines when to activate a long short-term memory (LSTM) model. This mechanism reduces the volume of input fed to the LSTM model, limiting it to instances that are more likely to involve energy theft thereby preserving detection accuracy while substantially reducing energy consumption associated with continuous model execution. The proposed system was evaluated through simulations across six scenarios with varying theft severity and number of active thieves. Results indicate a power consumption reduction exceeding 64\%, with minimal loss in detection accuracy and consistently high recall. These findings support the feasibility of a more energy-efficient and scalable approach to energy theft detection in smart grids. In contrast to prior work that increases model complexity to achieve marginal accuracy gains, this study emphasizes practical deployment considerations such as inference efficiency and system scalability. The results highlight the potential for deploying sustainable, AI-assisted monitoring systems within modern smart grid infrastructures.

new Regulation Compliant AI for Fusion: Real-Time Image Analysis-Based Control of Divertor Detachment in Tokamaks

Authors: Nathaniel Chen, Cheolsik Byun, Azarakash Jalalvand, Sangkyeun Kim, Andrew Rothstein, Filippo Scotti, Steve Allen, David Eldon, Keith Erickson, Egemen Kolemen

Abstract: While artificial intelligence (AI) has been promising for fusion control, its inherent black-box nature will make compliant implementation in regulatory environments a challenge. This study implements and validates a real-time AI enabled linear and interpretable control system for successful divertor detachment control with the DIII-D lower divertor camera. Using D2 gas, we demonstrate feedback divertor detachment control with a mean absolute difference of 2% from the target for both detachment and reattachment. This automatic training and linear processing framework can be extended to any image based diagnostic for regulatory compliant controller necessary for future fusion reactors.

new Controllable diffusion-based generation for multi-channel biological data

Authors: Haoran Zhang, Mingyuan Zhou, Wesley Tansey

Abstract: Spatial profiling technologies in biology, such as imaging mass cytometry (IMC) and spatial transcriptomics (ST), generate high-dimensional, multi-channel data with strong spatial alignment and complex inter-channel relationships. Generative modeling of such data requires jointly capturing intra- and inter-channel structure, while also generalizing across arbitrary combinations of observed and missing channels for practical application. Existing diffusion-based models generally assume low-dimensional inputs (e.g., RGB images) and rely on simple conditioning mechanisms that break spatial correspondence and ignore inter-channel dependencies. This work proposes a unified diffusion framework for controllable generation over structured and spatial biological data. Our model contains two key innovations: (1) a hierarchical feature injection mechanism that enables multi-resolution conditioning on spatially aligned channels, and (2) a combination of latent-space and output-space channel-wise attention to capture inter-channel relationships. To support flexible conditioning and generalization to arbitrary subsets of observed channels, we train the model using a random masking strategy, enabling it to reconstruct missing channels from any combination of inputs. We demonstrate state-of-the-art performance across both spatial and non-spatial prediction tasks, including protein imputation in IMC and gene-to-protein prediction in single-cell datasets, and show strong generalization to unseen conditional configurations.

new Harnessing Near-Infrared Spectroscopy and Machine Learning for Traceable Classification of Hanwoo and Holstein Beef

Authors: AMM Nurul Alam, Abdul Samad, AMM Shamsul Alam, Jahan Ara Monti, Ayesha Muazzam

Abstract: This study evaluates the use of Near-Infrared spectroscopy (NIRS) combined with advanced machine learning (ML) techniques to differentiate Hanwoo beef (HNB) and Holstein beef (HLB) to address food authenticity, mislabeling, and adulteration. Rapid and non-invasive spectral data were attained by a portable NIRS, recording absorbance data within the wavelength range of 700 to 1100 nm. A total of 40 Longissimus lumborum samples, evenly split between HNB and HLB, were obtained from a local hypermarket. Data analysis using Principal Component Analysis (PCA) demonstrated distinct spectral patterns associated with chemical changes, clearly separating the two beef varieties and accounting for 93.72% of the total variance. ML models, including Linear Discriminant Analysis (LDA), Support Vector Machine (SVM), Logistic Regression (LR), Random Forest, Gradient Boosting (GB), K-Nearest Neighbors, Decision Tree (DT), Naive Bayes (NB), and Neural Networks (NN), were implemented, optimized through hyperparameter tuning, and validated by 5-fold cross-validation techniques to enhance model robustness and prevent overfitting. Random Forest provided the highest predictive accuracy with a Receiver Operating Characteristic (ROC) Area Under the Curve (AUC) of 0.8826, closely followed by the SVM model at 0.8747. Furthermore, GB and NN algorithms exhibited satisfactory performances, with cross-validation scores of 0.752. Notably, the NN model achieved the highest recall rate of 0.7804, highlighting its suitability in scenarios requiring heightened sensitivity. DT and NB exhibited comparatively lower predictive performance. The LR and SVM models emerged as optimal choices by effectively balancing high accuracy, precision, and recall. This study confirms that integrating NIRS with ML techniques offers a powerful and reliable method for meat authenticity, significantly contributing to detecting food fraud.

new Scaling Transformers for Time Series Forecasting: Do Pretrained Large Models Outperform Small-Scale Alternatives?

Authors: Sanjay Chakraborty, Ibrahim Delibasoglu, Fredrik Heintz

Abstract: Large pre-trained models have demonstrated remarkable capabilities across domains, but their effectiveness in time series forecasting remains understudied. This work empirically examines whether pre-trained large-scale time series models (LSTSMs) trained on diverse datasets can outperform traditional non-pretrained small-scale transformers in forecasting tasks. We analyze state-of-the-art (SOTA) pre-trained universal time series models (e.g., Moirai, TimeGPT) alongside conventional transformers, evaluating accuracy, computational efficiency, and interpretability across multiple benchmarks. Our findings reveal the strengths and limitations of pre-trained LSTSMs, providing insights into their suitability for time series tasks compared to task-specific small-scale architectures. The results highlight scenarios where pretraining offers advantages and where simpler models remain competitive.

new Hyperbolic Kernel Graph Neural Networks for Neurocognitive Decline Analysis from Multimodal Brain Imaging

Authors: Meimei Yang, Yongheng Sun, Qianqian Wang, Andrea Bozoki, Maureen Kohi, Mingxia Liu

Abstract: Multimodal neuroimages, such as diffusion tensor imaging (DTI) and resting-state functional MRI (fMRI), offer complementary perspectives on brain activities by capturing structural or functional interactions among brain regions. While existing studies suggest that fusing these multimodal data helps detect abnormal brain activity caused by neurocognitive decline, they are generally implemented in Euclidean space and can't effectively capture intrinsic hierarchical organization of structural/functional brain networks. This paper presents a hyperbolic kernel graph fusion (HKGF) framework for neurocognitive decline analysis with multimodal neuroimages. It consists of a multimodal graph construction module, a graph representation learning module that encodes brain graphs in hyperbolic space through a family of hyperbolic kernel graph neural networks (HKGNNs), a cross-modality coupling module that enables effective multimodal data fusion, and a hyperbolic neural network for downstream predictions. Notably, HKGNNs represent graphs in hyperbolic space to capture both local and global dependencies among brain regions while preserving the hierarchical structure of brain networks. Extensive experiments involving over 4,000 subjects with DTI and/or fMRI data suggest the superiority of HKGF over state-of-the-art methods in two neurocognitive decline prediction tasks. HKGF is a general framework for multimodal data analysis, facilitating objective quantification of structural/functional brain connectivity changes associated with neurocognitive decline.

new Beyond Token Pruning: Operation Pruning in Vision-Language Models

Authors: Aoming Liu, Reuben Tan, Boqing Gong, Bryan A. Plummer

Abstract: Prior Vision Language Model (VLM) token pruning reduces computation by eliminating attention and feed-forward operations for pruned tokens while maintaining all operations for critical tokens. However, this binary approach conflates token/operation redundancy - critical operations may be removed along with discarded tokens, while preserved tokens retain all potentially redundant operations. To surgically eliminate redundant operations while preserving critical ones, we propose Greedily Sorted Operation Pruning (GSOP), a data-driven method that directly prunes operations rather than tokens. GSOP first decomposes a VLM decoder's computations into atomic operations along three dimensions: token groups, layer positions, and computation modules. GSOP determines the pruning order of operations through greedy sorting: GSOP iteratively selects the redundant operation that incurs minimal performance drop considering previously pruned operations. Different computational budgets can be accommodated without re-searching by simply pruning operations according to this order until the desired budget is met. GSOP enhances sorting efficiency through: a) leveraging historical operation rankings to avoid redundant evaluations; b) excluding the ``free-to-prune" and ``danger-to-prune" operations from sorting. GSOP achieves compelling efficiency-performance tradeoffs, reducing computation by 70% with only 4% performance loss while maintaining up to 18% higher performance than state-of-the-art methods when transferred across diverse VLMs and tasks. Real GPU efficiency evaluations confirm its practical value. The code is in https://github.com/zxcvfd13502/GSOP.

URLs: https://github.com/zxcvfd13502/GSOP.

new Causal-Paced Deep Reinforcement Learning

Authors: Geonwoo Cho, Jaegyun Im, Doyoon Kim, Sundong Kim

Abstract: Designing effective task sequences is crucial for curriculum reinforcement learning (CRL), where agents must gradually acquire skills by training on intermediate tasks. A key challenge in CRL is to identify tasks that promote exploration, yet are similar enough to support effective transfer. While recent approach suggests comparing tasks via their Structural Causal Models (SCMs), the method requires access to ground-truth causal structures, an unrealistic assumption in most RL settings. In this work, we propose Causal-Paced Deep Reinforcement Learning (CP-DRL), a curriculum learning framework aware of SCM differences between tasks based on interaction data approximation. This signal captures task novelty, which we combine with the agent's learnability, measured by reward gain, to form a unified objective. Empirically, CP-DRL outperforms existing curriculum methods on the Point Mass benchmark, achieving faster convergence and higher returns. CP-DRL demonstrates reduced variance with comparable final returns in the Bipedal Walker-Trivial setting, and achieves the highest average performance in the Infeasible variant. These results indicate that leveraging causal relationships between tasks can improve the structure-awareness and sample efficiency of curriculum reinforcement learning. We provide the full implementation of CP-DRL to facilitate the reproduction of our main results at https://github.com/Cho-Geonwoo/CP-DRL.

URLs: https://github.com/Cho-Geonwoo/CP-DRL.

new DiceHuBERT: Distilling HuBERT with a Self-Supervised Learning Objective

Authors: Hyung Gun Chi, Zakaria Aldeneh, Tatiana Likhomanenko, Oggi Rudovic, Takuya Higuchi, Li-Wei Chen, Shinji Watanabe, Ahmed Hussen Abdelaziz

Abstract: We introduce DiceHuBERT, a knowledge distillation framework for compressing HuBERT, a widely used self-supervised learning (SSL)-based speech foundation model. Unlike existing distillation methods that rely on layer-wise and feature-wise mapping between teacher and student models, DiceHuBERT leverages HuBERT's iterative self-distillation mechanism by directly replacing the original model with a student model. This replacement allows the student to be trained using the same SSL objective used when pre-training HuBERT, eliminating the need for additional modules or architectural constraints. Experimental results on SUPERB show that DiceHuBERT consistently outperforms existing distillation methods, improving phoneme recognition performance by over 21% and ASR performance by more than 14%. Furthermore, DiceHuBERT demonstrates competitive performance across multiple tasks, highlighting its clear advantage.

new Multicollinearity Resolution Based on Machine Learning: A Case Study of Carbon Emissions

Authors: Xuanming Zhang

Abstract: This study proposes an analytical framework that integrates DBSCAN clustering with the Elastic Net regression model to address multifactorial problems characterized by structural complexity and multicollinearity, exemplified by carbon emissions analysis. DBSCAN is employed for unsupervised learning to objectively cluster features, while the Elastic Net is utilized for high-dimensional feature selection and complexity control. The Elastic Net is specifically chosen for its ability to balance feature selection and regularization by combining L1 (lasso) and L2 (ridge) penalties, making it particularly suited for datasets with correlated predictors. Applying this framework to energy consumption data from 46 industries in China (2000-2019) resulted in the identification of 16 categories. Emission characteristics and drivers were quantitatively assessed for each category, demonstrating the framework's capacity to identify primary emission sources and provide actionable insights. This research underscores the global applicability of the framework for analyzing complex regional challenges, such as carbon emissions, and highlights qualitative features that humans find meaningful may not be accurate for the model.

new Efficient Certified Reasoning for Binarized Neural Networks

Authors: Jiong Yang, Yong Kiam Tan, Mate Soos, Magnus O. Myreen, Kuldeep S. Meel

Abstract: Neural networks have emerged as essential components in safety-critical applications -- these use cases demand complex, yet trustworthy computations. Binarized Neural Networks (BNNs) are a type of neural network where each neuron is constrained to a Boolean value; they are particularly well-suited for safety-critical tasks because they retain much of the computational capacities of full-scale (floating-point or quantized) deep neural networks, but remain compatible with satisfiability solvers for qualitative verification and with model counters for quantitative reasoning. However, existing methods for BNN analysis suffer from either limited scalability or susceptibility to soundness errors, which hinders their applicability in real-world scenarios. In this work, we present a scalable and trustworthy approach for both qualitative and quantitative verification of BNNs. Our approach introduces a native representation of BNN constraints in a custom-designed solver for qualitative reasoning, and in an approximate model counter for quantitative reasoning. We further develop specialized proof generation and checking pipelines with native support for BNN constraint reasoning, ensuring trustworthiness for all of our verification results. Empirical evaluations on a BNN robustness verification benchmark suite demonstrate that our certified solving approach achieves a $9\times$ speedup over prior certified CNF and PB-based approaches, and our certified counting approach achieves a $218\times$ speedup over the existing CNF-based baseline. In terms of coverage, our pipeline produces fully certified results for $99\%$ and $86\%$ of the qualitative and quantitative reasoning queries on BNNs, respectively. This is in sharp contrast to the best existing baselines which can fully certify only $62\%$ and $4\%$ of the queries, respectively.

new Echo State Transformer: When chaos brings memory

Authors: Yannis Bendi-Ouis (Mnemosyne), Xavier Hinaut (Mnemosyne)

Abstract: While Large Language Models and their underlying Transformer architecture are remarkably efficient, they do not reflect how our brain processes and learns a diversity of cognitive tasks such as language and working memory. Furthermore, sequential data processing with Transformers encounters a fundamental barrier: quadratic complexity growth with sequence length. Motivated by these limitations, our ambition is to create more efficient models that are less reliant on intensive computations and massive volumes of data. We introduce Echo State Transformers (EST), a hybrid architecture that elegantly resolves this challenge while demonstrating exceptional performance in low-data regimes. EST integrates the Transformer attention mechanisms with principles from Reservoir Computing to create a fixedsize window distributed memory system. Drawing inspiration from Echo State Networks, the most prominent instance of the Reservoir Computing paradigm, our architecture integrates a new module called ''Working Memory'' based on several reservoirs (i.e. random recurrent networks) working in parallel. These reservoirs work as independent memory units with distinct internal dynamics. A novelty here is that the classical reservoir hyperparameters controlling the dynamics are now trained. Thus, the EST dynamically adapts the memory/non-linearity trade-off in reservoirs. By maintaining a fixed number of memory units regardless of sequence length, EST achieves constant computational complexity at each processing step, effectively breaking the quadratic scaling problem of standard Transformers. Evaluations on the STREAM benchmark, which comprises 12 diverse sequential processing tasks, demonstrate that EST outperforms GRUs, LSTMs, and even Transformers on 8 of these tasks. These findings highlight that Echo State Transformers can be an effective replacement to GRUs and LSTMs while complementing standard Transformers at least on resource-constrained environments and low-data scenarios across diverse sequential processing tasks.

new PlaceFM: A Training-free Geospatial Foundation Model of Places

Authors: Mohammad Hashemi, Hossein Amiri, Andreas Zufle

Abstract: Spatial structure is central to effective geospatial intelligence systems. While foundation models have shown promise, they often lack the flexibility to reason about places, which are context-rich regions spanning different spatial granularities. We propose PlaceFM, a spatial foundation model that captures place representations using a training-free graph condensation method. PlaceFM condenses a nationwide POI graph built from integrated Foursquare and OpenStreetMap data in the U.S., generating general-purpose embeddings of places. These embeddings can be seamlessly integrated into geolocation data pipelines to support a wide range of downstream tasks. Without requiring pretraining, PlaceFM offers a scalable and adaptable solution for multi-scale geospatial analysis.

new Domain Knowledge in Artificial Intelligence: Using Conceptual Modeling to Increase Machine Learning Accuracy and Explainability

Authors: V. C. Storey, J. Parsons, A. Castellanos, M. Tremblay, R. Lukyanenko, W. Maass, A. Castillo

Abstract: Machine learning enables the extraction of useful information from large, diverse datasets. However, despite many successful applications, machine learning continues to suffer from performance and transparency issues. These challenges can be partially attributed to the limited use of domain knowledge by machine learning models. This research proposes using the domain knowledge represented in conceptual models to improve the preparation of the data used to train machine learning models. We develop and demonstrate a method, called the Conceptual Modeling for Machine Learning (CMML), which is comprised of guidelines for data preparation in machine learning and based on conceptual modeling constructs and principles. To assess the impact of CMML on machine learning outcomes, we first applied it to two real-world problems to evaluate its impact on model performance. We then solicited an assessment by data scientists on the applicability of the method. These results demonstrate the value of CMML for improving machine learning outcomes.

new Large Language Model Agent for Modular Task Execution in Drug Discovery

Authors: Janghoon Ock, Radheesh Sharma Meda, Srivathsan Badrinarayanan, Neha S. Aluru, Achuth Chandrasekhar, Amir Barati Farimani

Abstract: We present a modular framework powered by large language models (LLMs) that automates and streamlines key tasks across the early-stage computational drug discovery pipeline. By combining LLM reasoning with domain-specific tools, the framework performs biomedical data retrieval, domain-specific question answering, molecular generation, property prediction, property-aware molecular refinement, and 3D protein-ligand structure generation. In a case study targeting BCL-2 in lymphocytic leukemia, the agent autonomously retrieved relevant biomolecular information-including FASTA sequences, SMILES representations, and literature-and answered mechanistic questions with improved contextual accuracy over standard LLMs. It then generated chemically diverse seed molecules and predicted 67 ADMET-related properties, which guided iterative molecular refinement. Across two refinement rounds, the number of molecules with QED > 0.6 increased from 34 to 55, and those passing at least four out of five empirical drug-likeness rules rose from 29 to 52, within a pool of 194 molecules. The framework also employed Boltz-2 to generate 3D protein-ligand complexes and provide rapid binding affinity estimates for candidate compounds. These results demonstrate that the approach effectively supports molecular screening, prioritization, and structure evaluation. Its modular design enables flexible integration of evolving tools and models, providing a scalable foundation for AI-assisted therapeutic discovery.

new MolProphecy: Bridging Medicinal Chemists' Knowledge and Molecular Pre-Trained Models via a Multi-Modal Framework

Authors: Jianping Zhao, Qiong Zhou, Tian Wang, Yusi Fan, Qian Yang, Li Jiao, Chang Liu, Zhehao Guo, Qi Lu, Fengfeng Zhou, Ruochi Zhang

Abstract: MolProphecy is a human-in-the-loop (HITL) multi-modal framework designed to integrate chemists' domain knowledge into molecular property prediction models. While molecular pre-trained models have enabled significant gains in predictive accuracy, they often fail to capture the tacit, interpretive reasoning central to expert-driven molecular design. To address this, MolProphecy employs ChatGPT as a virtual chemist to simulate expert-level reasoning and decision-making. The generated chemist knowledge is embedded by the large language model (LLM) as a dedicated knowledge representation and then fused with graph-based molecular features through a gated cross-attention mechanism, enabling joint reasoning over human-derived and structural features. Evaluated on four benchmark datasets (FreeSolv, BACE, SIDER, and ClinTox), MolProphecy outperforms state-of-the-art (SOTA) models, achieving a 15.0 percent reduction in RMSE on FreeSolv and a 5.39 percent improvement in AUROC on BACE. Analysis reveals that chemist knowledge and structural features provide complementary contributions, improving both accuracy and interpretability. MolProphecy offers a practical and generalizable approach for collaborative drug discovery, with the flexibility to incorporate real chemist input in place of the current simulated proxy--without the need for model retraining. The implementation is publicly available at https://github.com/zhangruochi/MolProphecy.

URLs: https://github.com/zhangruochi/MolProphecy.

new Predictive Maintenance Optimization for Smart Vending Machines Using IoT and Machine Learning

Authors: Md. Nisharul Hasan (Department of Industrial Engineering, Lamar University, Beaumont, Texas, USA)

Abstract: The increasing proliferation of vending machines in public and commercial environments has placed a growing emphasis on operational efficiency and customer satisfaction. Traditional maintenance approaches either reactive or time-based preventive are limited in their ability to preempt machine failures, leading to unplanned downtimes and elevated service costs. This research presents a novel predictive maintenance framework tailored for vending machines by leveraging Internet of Things (IoT) sensors and machine learning (ML) algorithms. The proposed system continuously monitors machine components and operating conditions in real time and applies predictive models to forecast failures before they occur. This enables timely maintenance scheduling, minimizing downtime and extending machine lifespan. The framework was validated through simulated fault data and performance evaluation using classification algorithms. Results show a significant improvement in early fault detection and a reduction in redundant service interventions. The findings indicate that predictive maintenance systems, when integrated into vending infrastructure, can transform operational efficiency and service reliability.

new FoGE: Fock Space inspired encoding for graph prompting

Authors: Sotirios Panagiotis Chytas, Rudrasis Chakraborty, Vikas Singh

Abstract: Recent results show that modern Large Language Models (LLM) are indeed capable of understanding and answering questions about structured data such as graphs. This new paradigm can lead to solutions that require less supervision while, at the same time, providing a model that can generalize and answer questions beyond the training labels. Existing proposals often use some description of the graph to create an ``augmented'' prompt fed to the LLM. For a chosen class of graphs, if a well-tailored graph encoder is deployed to play together with a pre-trained LLM, the model can answer graph-related questions well. Existing solutions to graph-based prompts range from graph serialization to graph transformers. In this work, we show that the use of a parameter-free graph encoder based on Fock space representations, a concept borrowed from mathematical physics, is remarkably versatile in this problem setting. The simple construction, inherited directly from the theory with a few small adjustments, can provide rich and informative graph encodings, for a wide range of different graphs. We investigate the use of this idea for prefix-tuned prompts leveraging the capabilities of a pre-trained, frozen LLM. The modifications lead to a model that can answer graph-related questions -- from simple graphs to proteins to hypergraphs -- effectively and with minimal, if any, adjustments to the architecture. Our work significantly simplifies existing solutions and generalizes well to multiple different graph-based structures effortlessly.

new Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting

Authors: Yuqi Li, Chuanguang Yang, Hansheng Zeng, Zeyu Dong, Zhulin An, Yongjun Xu, Yingli Tian, Hao Wu

Abstract: Spatiotemporal forecasting tasks, such as traffic flow, combustion dynamics, and weather forecasting, often require complex models that suffer from low training efficiency and high memory consumption. This paper proposes a lightweight framework, Spectral Decoupled Knowledge Distillation (termed SDKD), which transfers the multi-scale spatiotemporal representations from a complex teacher model to a more efficient lightweight student network. The teacher model follows an encoder-latent evolution-decoder architecture, where its latent evolution module decouples high-frequency details and low-frequency trends using convolution and Transformer (global low-frequency modeler). However, the multi-layer convolution and deconvolution structures result in slow training and high memory usage. To address these issues, we propose a frequency-aligned knowledge distillation strategy, which extracts multi-scale spectral features from the teacher's latent space, including both high and low frequency components, to guide the lightweight student model in capturing both local fine-grained variations and global evolution patterns. Experimental results show that SDKD significantly improves performance, achieving reductions of up to 81.3% in MSE and in MAE 52.3% on the Navier-Stokes equation dataset. The framework effectively captures both high-frequency variations and long-term trends while reducing computational complexity. Our codes are available at https://github.com/itsnotacie/SDKD

URLs: https://github.com/itsnotacie/SDKD

new Beyond Parallelism: Synergistic Computational Graph Effects in Multi-Head Attention

Authors: Haitz S\'aez de Oc\'ariz Borde

Abstract: Multi-head attention powers Transformer networks, the primary deep learning architecture behind the success of large language models (LLMs). Yet, the theoretical advantages of multi-head versus single-head attention, beyond mere parallel processing, remain underexplored. In this paper, we reframe multi-head attention as a system of potentially synergistic computational graphs, where each head functions as a feedforward directed acyclic graph (DAG) with a common sink state. We provide intuition and preliminary theoretical analysis of mixing time and minimax fidelity in this framework. Our results show that multi-head attention can synergistically enhance information propagation, yielding faster mixing times and minimax fidelity amplification under specific head-diversity conditions. Finally, we train single-head and multi-head Transformers, each with the same total number of parameters, on sequence manipulation tasks and empirically verify the predicted effects.

new InvisibleInk: High-Utility and Low-Cost Text Generation with Differential Privacy

Authors: Vishnu Vinod, Krishna Pillutla, Abhradeep Guha Thakurta

Abstract: As major progress in LLM-based long-form text generation enables paradigms such as retrieval-augmented generation (RAG) and inference-time scaling, safely incorporating private information into the generation remains a critical open question. We present InvisibleInk, a highly scalable long-form text generation framework satisfying rigorous differential privacy guarantees with respect to the sensitive references. It interprets sampling from the LLM's next-token-distribution as the exponential mechanism over the LLM logits with two innovations. First, we reduce the privacy cost by isolating and clipping only the sensitive information in the model logits (relative to the public logits). Second, we improve text quality by sampling from a small superset of the top-$k$ private tokens. Empirical evaluations demonstrate a consistent $8\times$ reduction in computation cost over state-of-the-art baselines to generate long-form private text of the same utility across privacy levels. In summary, InvisibleInk is able to generate private long-form text at less than $10\times$ the computation cost of non-private generation.

new Introducing Answered with Evidence -- a framework for evaluating whether LLM responses to biomedical questions are founded in evidence

Authors: Julian D Baldwin, Christina Dinh, Arjun Mukerji, Neil Sanghavi, Saurabh Gombar

Abstract: The growing use of large language models (LLMs) for biomedical question answering raises concerns about the accuracy and evidentiary support of their responses. To address this, we present Answered with Evidence, a framework for evaluating whether LLM-generated answers are grounded in scientific literature. We analyzed thousands of physician-submitted questions using a comparative pipeline that included: (1) Alexandria, fka the Atropos Evidence Library, a retrieval-augmented generation (RAG) system based on novel observational studies, and (2) two PubMed-based retrieval-augmented systems (System and Perplexity). We found that PubMed-based systems provided evidence-supported answers for approximately 44% of questions, while the novel evidence source did so for about 50%. Combined, these sources enabled reliable answers to over 70% of biomedical queries. As LLMs become increasingly capable of summarizing scientific content, maximizing their value will require systems that can accurately retrieve both published and custom-generated evidence or generate such evidence in real time.

new Physics Augmented Machine Learning Discovery of Composition-Dependent Constitutive Laws for 3D Printed Digital Materials

Authors: Steven Yang, Michal Levin, Govinda Anantha Padmanabha, Miriam Borshevsky, Ohad Cohen, D. Thomas Seidl, Reese E. Jones, Nikolaos Bouklas, Noy Cohen

Abstract: Multi-material 3D printing, particularly through polymer jetting, enables the fabrication of digital materials by mixing distinct photopolymers at the micron scale within a single build to create a composite with tunable mechanical properties. This work presents an integrated experimental and computational investigation into the composition-dependent mechanical behavior of 3D printed digital materials. We experimentally characterize five formulations, combining soft and rigid UV-cured polymers under uniaxial tension and torsion across three strain and twist rates. The results reveal nonlinear and rate-dependent responses that strongly depend on composition. To model this behavior, we develop a physics-augmented neural network (PANN) that combines a partially input convex neural network (pICNN) for learning the composition-dependent hyperelastic strain energy function with a quasi-linear viscoelastic (QLV) formulation for time-dependent response. The pICNN ensures convexity with respect to strain invariants while allowing non-convex dependence on composition. To enhance interpretability, we apply $L_0$ sparsification. For the time-dependent response, we introduce a multilayer perceptron (MLP) to predict viscoelastic relaxation parameters from composition. The proposed model accurately captures the nonlinear, rate-dependent behavior of 3D printed digital materials in both uniaxial tension and torsion, achieving high predictive accuracy for interpolated material compositions. This approach provides a scalable framework for automated, composition-aware constitutive model discovery for multi-material 3D printing.

new MedGround-R1: Advancing Medical Image Grounding via Spatial-Semantic Rewarded Group Relative Policy Optimization

Authors: Huihui Xu, Yuanpeng Nie, Hualiang Wang, Ying Chen, Wei Li, Junzhi Ning, Lihao Liu, Hongqiu Wang, Lei Zhu, Jiyao Liu, Xiaomeng Li, Junjun He

Abstract: Medical Image Grounding (MIG), which involves localizing specific regions in medical images based on textual descriptions, requires models to not only perceive regions but also deduce spatial relationships of these regions. Existing Vision-Language Models (VLMs) for MIG often rely on Supervised Fine-Tuning (SFT) with large amounts of Chain-of-Thought (CoT) reasoning annotations, which are expensive and time-consuming to acquire. Recently, DeepSeek-R1 demonstrated that Large Language Models (LLMs) can acquire reasoning abilities through Group Relative Policy Optimization (GRPO) without requiring CoT annotations. In this paper, we adapt the GRPO reinforcement learning framework to VLMs for Medical Image Grounding. We propose the Spatial-Semantic Rewarded Group Relative Policy Optimization to train the model without CoT reasoning annotations. Specifically, we introduce Spatial-Semantic Rewards, which combine spatial accuracy reward and semantic consistency reward to provide nuanced feedback for both spatially positive and negative completions. Additionally, we propose to use the Chain-of-Box template, which integrates visual information of referring bounding boxes into the reasoning process, enabling the model to explicitly reason about spatial regions during intermediate steps. Experiments on three datasets MS-CXR, ChestX-ray8, and M3D-RefSeg demonstrate that our method achieves state-of-the-art performance in Medical Image Grounding. Ablation studies further validate the effectiveness of each component in our approach. Code, checkpoints, and datasets are available at https://github.com/bio-mlhui/MedGround-R1

URLs: https://github.com/bio-mlhui/MedGround-R1

new What to Do Next? Memorizing skills from Egocentric Instructional Video

Authors: Jing Bi, Chenliang Xu

Abstract: Learning to perform activities through demonstration requires extracting meaningful information about the environment from observations. In this research, we investigate the challenge of planning high-level goal-oriented actions in a simulation setting from an egocentric perspective. We present a novel task, interactive action planning, and propose an approach that combines topological affordance memory with transformer architecture. The process of memorizing the environment's structure through extracting affordances facilitates selecting appropriate actions based on the context. Moreover, the memory model allows us to detect action deviations while accomplishing specific objectives. To assess the method's versatility, we evaluate it in a realistic interactive simulation environment. Our experimental results demonstrate that the proposed approach learns meaningful representations, resulting in improved performance and robust when action deviations occur.

new A Weakly Supervised Transformer to Support Rare Disease Diagnosis from Electronic Health Records: Methods and Applications in Rare Pulmonary Disease

Authors: Kimberly F. Greco, Zongxin Yang, Mengyan Li, Han Tong, Sara Morini Sweet, Alon Geva, Kenneth D. Mandl, Benjamin A. Raby, Tianxi Cai

Abstract: Rare diseases affect an estimated 300-400 million people worldwide, yet individual conditions often remain poorly characterized and difficult to diagnose due to their low prevalence and limited clinician familiarity. While computational phenotyping algorithms show promise for automating rare disease detection, their development is hindered by the scarcity of labeled data and biases in existing label sources. Gold-standard labels from registries and expert chart reviews are highly accurate but constrained by selection bias and the cost of manual review. In contrast, labels derived from electronic health records (EHRs) cover a broader range of patients but can introduce substantial noise. To address these challenges, we propose a weakly supervised, transformer-based framework that combines a small set of gold-standard labels with a large volume of iteratively updated silver-standard labels derived from EHR data. This hybrid approach enables the training of a highly accurate and generalizable phenotyping model that scales rare disease detection beyond the scope of individual clinical expertise. Our method is initialized by learning embeddings of medical concepts based on their semantic meaning or co-occurrence patterns in EHRs, which are then refined and aggregated into patient-level representations via a multi-layer transformer architecture. Using two rare pulmonary diseases as a case study, we validate our model on EHR data from Boston Children's Hospital. Our framework demonstrates notable improvements in phenotype classification, identification of clinically meaningful subphenotypes through patient clustering, and prediction of disease progression compared to baseline methods. These results highlight the potential of our approach to enable scalable identification and stratification of rare disease patients for clinical care and research applications.

new Learning Beyond Euclid: Curvature-Adaptive Generalization for Neural Networks on Manifolds

Authors: Krisanu Sarkar

Abstract: In this work, we develop new generalization bounds for neural networks trained on data supported on Riemannian manifolds. Existing generalization theories often rely on complexity measures derived from Euclidean geometry, which fail to account for the intrinsic structure of non-Euclidean spaces. Our analysis introduces a geometric refinement: we derive covering number bounds that explicitly incorporate manifold-specific properties such as sectional curvature, volume growth, and injectivity radius. These geometric corrections lead to sharper Rademacher complexity bounds for classes of Lipschitz neural networks defined on compact manifolds. The resulting generalization guarantees recover standard Euclidean results when curvature is zero but improve substantially in settings where the data lies on curved, low-dimensional manifolds embedded in high-dimensional ambient spaces. We illustrate the tightness of our bounds in negatively curved spaces, where the exponential volume growth leads to provably higher complexity, and in positively curved spaces, where the curvature acts as a regularizing factor. This framework provides a principled understanding of how intrinsic geometry affects learning capacity, offering both theoretical insight and practical implications for deep learning on structured data domains.

new Completion of the DrugMatrix Toxicogenomics Database using 3-Dimensional Tensors

Authors: Tan Nguyen, Guojing Cong

Abstract: We explore applying a tensor completion approach to complete the DrugMatrix toxicogenomics dataset. Our hypothesis is that by preserving the 3-dimensional structure of the data, which comprises tissue, treatment, and transcriptomic measurements, and by leveraging a machine learning formulation, our approach will improve upon prior state-of-the-art results. Our results demonstrate that the new tensor-based method more accurately reflects the original data distribution and effectively captures organ-specific variability. The proposed tensor-based methodology achieved lower mean squared errors and mean absolute errors compared to both conventional Canonical Polyadic decomposition and 2-dimensional matrix factorization methods. In addition, our non-negative tensor completion implementation reveals relationships among tissues. Our findings not only complete the world's largest in-vivo toxicogenomics database with improved accuracy but also offer a promising methodology for future studies of drugs that may cross species barriers, for example, from rats to humans.

new Generalized Adaptive Transfer Network: Enhancing Transfer Learning in Reinforcement Learning Across Domains

Authors: Abhishek Verma, Nallarasan V, Balaraman Ravindran

Abstract: Transfer learning in Reinforcement Learning (RL) enables agents to leverage knowledge from source tasks to accelerate learning in target tasks. While prior work, such as the Attend, Adapt, and Transfer (A2T) framework, addresses negative transfer and selective transfer, other critical challenges remain underexplored. This paper introduces the Generalized Adaptive Transfer Network (GATN), a deep RL architecture designed to tackle task generalization across domains, robustness to environmental changes, and computational efficiency in transfer. GATN employs a domain-agnostic representation module, a robustness-aware policy adapter, and an efficient transfer scheduler to achieve these goals. We evaluate GATN on diverse benchmarks, including Atari 2600, MuJoCo, and a custom chatbot dialogue environment, demonstrating superior performance in cross-domain generalization, resilience to dynamic environments, and reduced computational overhead compared to baselines. Our findings suggest GATN is a versatile framework for real-world RL applications, such as adaptive chatbots and robotic control.

new Deep Learning-Based Forecasting of Hotel KPIs: A Cross-City Analysis of Global Urban Markets

Authors: C. J. Atapattu, Xia Cui, N. R Abeynayake

Abstract: This study employs Long Short-Term Memory (LSTM) networks to forecast key performance indicators (KPIs), Occupancy (OCC), Average Daily Rate (ADR), and Revenue per Available Room (RevPAR), across five major cities: Manchester, Amsterdam, Dubai, Bangkok, and Mumbai. The cities were selected for their diverse economic profiles and hospitality dynamics. Monthly data from 2018 to 2025 were used, with 80% for training and 20% for testing. Advanced time series decomposition and machine learning techniques enabled accurate forecasting and trend identification. Results show that Manchester and Mumbai exhibited the highest predictive accuracy, reflecting stable demand patterns, while Dubai and Bangkok demonstrated higher variability due to seasonal and event-driven influences. The findings validate the effectiveness of LSTM models for urban hospitality forecasting and provide a comparative framework for data-driven decision-making. The models generalisability across global cities highlights its potential utility for tourism stakeholders and urban planners.

new On the Mathematical Impossibility of Safe Universal Approximators

Authors: Jasper Yao

Abstract: We establish fundamental mathematical limits on universal approximation theorem (UAT) system alignment by proving that catastrophic failures are an inescapable feature of any useful computational system. Our central thesis is that for any universal approximator, the expressive power required for useful computation is inextricably linked to a dense set of instabilities that make perfect, reliable control a mathematical impossibility. We prove this through a three-level argument that leaves no escape routes for any class of universal approximator architecture. i) Combinatorial Necessity: For the vast majority of practical universal approximators (e.g., those using ReLU activations), we prove that the density of catastrophic failure points is directly proportional to the network's expressive power. ii) Topological Necessity: For any theoretical universal approximator, we use singularity theory to prove that the ability to approximate generic functions requires the ability to implement the dense, catastrophic singularities that characterize them. iii) Empirical Necessity: We prove that the universal existence of adversarial examples is empirical evidence that real-world tasks are themselves catastrophic, forcing any successful model to learn and replicate these instabilities. These results, combined with a quantitative "Impossibility Sandwich" showing that the minimum complexity for usefulness exceeds the maximum complexity for safety, demonstrate that perfect alignment is not an engineering challenge but a mathematical impossibility. This foundational result reframes UAT safety from a problem of "how to achieve perfect control" to one of "how to operate safely in the presence of irreducible uncontrollability," with profound implications for the future of UAT development and governance.

new Rethinking Data Protection in the (Generative) Artificial Intelligence Era

Authors: Yiming Li, Shuo Shao, Yu He, Junfeng Guo, Tianwei Zhang, Zhan Qin, Pin-Yu Chen, Michael Backes, Philip Torr, Dacheng Tao, Kui Ren

Abstract: The (generative) artificial intelligence (AI) era has profoundly reshaped the meaning and value of data. No longer confined to static content, data now permeates every stage of the AI lifecycle from the training samples that shape model parameters to the prompts and outputs that drive real-world model deployment. This shift renders traditional notions of data protection insufficient, while the boundaries of what needs safeguarding remain poorly defined. Failing to safeguard data in AI systems can inflict societal and individual, underscoring the urgent need to clearly delineate the scope of and rigorously enforce data protection. In this perspective, we propose a four-level taxonomy, including non-usability, privacy preservation, traceability, and deletability, that captures the diverse protection needs arising in modern (generative) AI models and systems. Our framework offers a structured understanding of the trade-offs between data utility and control, spanning the entire AI pipeline, including training datasets, model weights, system prompts, and AI-generated content. We analyze representative technical approaches at each level and reveal regulatory blind spots that leave critical assets exposed. By offering a structured lens to align future AI technologies and governance with trustworthy data practices, we underscore the urgency of rethinking data protection for modern AI techniques and provide timely guidance for developers, researchers, and regulators alike.

new Adaptive Cubic Regularized Second-Order Latent Factor Analysis Model

Authors: Jialiang Wang, Junzhou Wang, Xin Liao

Abstract: High-dimensional and incomplete (HDI) data, characterized by massive node interactions, have become ubiquitous across various real-world applications. Second-order latent factor models have shown promising performance in modeling this type of data. Nevertheless, due to the bilinear and non-convex nature of the SLF model's objective function, incorporating a damping term into the Hessian approximation and carefully tuning associated parameters become essential. To overcome these challenges, we propose a new approach in this study, named the adaptive cubic regularized second-order latent factor analysis (ACRSLF) model. The proposed ACRSLF adopts the two-fold ideas: 1) self-tuning cubic regularization that dynamically mitigates non-convex optimization instabilities; 2) multi-Hessian-vector product evaluation during conjugate gradient iterations for precise second-order information assimilation. Comprehensive experiments on two industrial HDI datasets demonstrate that the ACRSLF converges faster and achieves higher representation accuracy than the advancing optimizer-based LFA models.

new Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

Authors: Shirley Wu, Parth Sarthi, Shiyu Zhao, Aaron Lee, Herumb Shandilya, Adrian Mladenic Grobelnik, Nurendra Choudhary, Eddie Huang, Karthik Subbian, Linjun Zhang, Diyi Yang, James Zou, Jure Leskovec

Abstract: Compound AI systems integrating multiple components, such as Large Language Models, specialized tools, and traditional machine learning models, are increasingly deployed to solve complex real-world tasks. However, optimizing compound systems remains challenging due to their non-differentiable structures and diverse configuration types across components, including prompts, hyperparameters, and model parameters. To address this challenge, we propose Optimas, a unified framework for effective optimization of compound systems. The core idea of Optimas is to maintain one Local Reward Function (LRF) per component, each satisfying a local-global alignment property, i.e., each component's local reward correlates with the global system performance. In each iteration, Optimas efficiently adapts the LRFs to maintain this property while simultaneously maximizing each component's local reward. This approach enables independent updates of heterogeneous configurations using the designated optimization method, while ensuring that local improvements consistently lead to performance gains. We present extensive evaluations across five real-world compound systems to demonstrate that Optimas outperforms strong baselines by an average improvement of 11.92%, offering a general and effective approach for improving compound systems. Our website is at https://optimas.stanford.edu.

URLs: https://optimas.stanford.edu.

new Optimisation Is Not What You Need

Authors: Alfredo Ibias

Abstract: The Artificial Intelligence field has focused on developing optimisation methods to solve multiple problems, specifically problems that we thought to be only solvable through cognition. The obtained results have been outstanding, being able to even surpass the Turing Test. However, we have found that these optimisation methods share some fundamental flaws that impede them to become a true artificial cognition. Specifically, the field have identified catastrophic forgetting as a fundamental problem to develop such cognition. This paper formally proves that this problem is inherent to optimisation methods, and as such it will always limit approaches that try to solve the Artificial General Intelligence problem as an optimisation problem. Additionally, it addresses the problem of overfitting and discuss about other smaller problems that optimisation methods pose. Finally, it empirically shows how world-modelling methods avoid suffering from either problem. As a conclusion, the field of Artificial Intelligence needs to look outside the machine learning field to find methods capable of developing an artificial cognition.

new Monitoring of Static Fairness

Authors: Thomas A. Henzinger, Mahyar Karimi, Konstantin Kueffner, Kaushik Mallik

Abstract: Machine-learned systems are in widespread use for making decisions about humans, and it is important that they are fair, i.e., not biased against individuals based on sensitive attributes. We present a general framework of runtime verification of algorithmic fairness for systems whose models are unknown, but are assumed to have a Markov chain structure, with or without full observation of the state space. We introduce a specification language that can model many common algorithmic fairness properties, such as demographic parity, equal opportunity, and social burden. We build monitors that observe a long sequence of events as generated by a given system, and output, after each observation, a quantitative estimate of how fair or biased the system was on that run until that point in time. The estimate is proven to be correct modulo a variable error bound and a given confidence level, where the error bound gets tighter as the observed sequence gets longer. We present two categories of monitoring algorithms, namely ones with a uniform error bound across all time points, and ones with weaker non-uniform, pointwise error bounds at different time points. Our monitoring algorithms use statistical tools that are adapted to suit the dynamic requirements of monitoring and the special needs of the fairness specifications. Using a prototype implementation, we show how we can monitor if a bank is fair in giving loans to applicants from different social backgrounds, and if a college is fair in admitting students while maintaining a reasonable financial burden on the society. In these experiments, our monitors took less than a millisecond to update their verdicts after each observation.

new From 2:4 to 8:16 sparsity patterns in LLMs for Outliers and Weights with Variance Correction

Authors: Egor Maximov, Yulia Kuzkina, Azamat Kanametov, Alexander Prutko, Aleksei Goncharov, Maxim Zhelnin, Egor Shvetsov

Abstract: As large language models (LLMs) grow in size, efficient compression techniques like quantization and sparsification are critical. While quantization maintains performance with reduced precision, structured sparsity methods, such as N:M sparsification, often fall short due to limited flexibility, and sensitivity to outlier weights. We explore 8:16 semi-structured sparsity, demonstrating its ability to surpass the Performance Threshold-where a compressed model matches the accuracy of its uncompressed or smaller counterpart under equivalent memory constraints. Compared to 2:4 sparsity, 8:16 offers greater flexibility with minimal storage overhead (0.875 vs. 0.75 bits/element). We also apply sparse structured patterns for salient weights, showing that structured sparsity for outliers is competitive with unstructured approaches leading to equivalent or better results. Finally, we demonstrate that simple techniques such as variance correction and SmoothQuant like weight equalization improve sparse models performance.

new Automated Grading of Students' Handwritten Graphs: A Comparison of Meta-Learning and Vision-Large Language Models

Authors: Behnam Parsaeifard, Martin Hlosta, Per Bergamin

Abstract: With the rise of online learning, the demand for efficient and consistent assessment in mathematics has significantly increased over the past decade. Machine Learning (ML), particularly Natural Language Processing (NLP), has been widely used for autograding student responses, particularly those involving text and/or mathematical expressions. However, there has been limited research on autograding responses involving students' handwritten graphs, despite their prevalence in Science, Technology, Engineering, and Mathematics (STEM) curricula. In this study, we implement multimodal meta-learning models for autograding images containing students' handwritten graphs and text. We further compare the performance of Vision Large Language Models (VLLMs) with these specially trained metalearning models. Our results, evaluated on a real-world dataset collected from our institution, show that the best-performing meta-learning models outperform VLLMs in 2-way classification tasks. In contrast, in more complex 3-way classification tasks, the best-performing VLLMs slightly outperform the meta-learning models. While VLLMs show promising results, their reliability and practical applicability remain uncertain and require further investigation.

new BERT4Traj: Transformer Based Trajectory Reconstruction for Sparse Mobility Data

Authors: Hao Yang, Angela Yao, Christopher Whalen, Gengchen Mai

Abstract: Understanding human mobility is essential for applications in public health, transportation, and urban planning. However, mobility data often suffers from sparsity due to limitations in data collection methods, such as infrequent GPS sampling or call detail record (CDR) data that only capture locations during communication events. To address this challenge, we propose BERT4Traj, a transformer based model that reconstructs complete mobility trajectories by predicting hidden visits in sparse movement sequences. Inspired by BERT's masked language modeling objective and self_attention mechanisms, BERT4Traj leverages spatial embeddings, temporal embeddings, and contextual background features such as demographics and anchor points. We evaluate BERT4Traj on real world CDR and GPS datasets collected in Kampala, Uganda, demonstrating that our approach significantly outperforms traditional models such as Markov Chains, KNN, RNNs, and LSTMs. Our results show that BERT4Traj effectively reconstructs detailed and continuous mobility trajectories, enhancing insights into human movement patterns.

new Cycle-Consistent Helmholtz Machine: Goal-Seeded Simulation via Inverted Inference

Authors: Xin Li

Abstract: The Helmholtz Machine (HM) is a foundational architecture for unsupervised learning, coupling a bottom-up recognition model with a top-down generative model through alternating inference. However, its reliance on symmetric, data-driven updates constrains its ability to perform goal-directed reasoning or simulate temporally extended processes. In this work, we introduce the \emph{Cycle-Consistent Helmholtz Machine} (C$^2$HM), a novel extension that reframes inference as a \emph{goal-seeded}, \emph{asymmetric} process grounded in structured internal priors. Rather than inferring latent causes solely from sensory data, C$^2$HM simulates plausible latent trajectories conditioned on abstract goals, aligning them with observed outcomes through a recursive cycle of forward generation and inverse refinement. This cycle-consistent formulation integrates top-down structure with bottom-up evidence via a variational loop, enforcing mutual alignment between goal-conditioned latent predictions and recognition-based reconstructions. We formalize this mechanism within the framework of the \emph{Context-Content Uncertainty Principle} (CCUP), which posits that inference proceeds by aligning structured, low-entropy content with high-entropy, ambiguous context. C$^2$HM improves representational efficiency, supports memory chaining via path-dependent inference, and enables spatial compositional imagination. By offering a biologically inspired alternative to classical amortized inference, $C^2$HM reconceives generative modeling as intentional simulation, bridging memory-based planning and unsupervised learning in a unified probabilistic framework.

new Mitigating Goal Misgeneralization with Minimax Regret

Authors: Karim Abdel Sadek, Matthew Farrugia-Roberts, Usman Anwar, Hannah Erlebach, Christian Schroeder de Witt, David Krueger, Michael Dennis

Abstract: Safe generalization in reinforcement learning requires not only that a learned policy acts capably in new situations, but also that it uses its capabilities towards the pursuit of the designer's intended goal. The latter requirement may fail when a proxy goal incentivizes similar behavior to the intended goal within the training environment, but not in novel deployment environments. This creates the risk that policies will behave as if in pursuit of the proxy goal, rather than the intended goal, in deployment -- a phenomenon known as goal misgeneralization. In this paper, we formalize this problem setting in order to theoretically study the possibility of goal misgeneralization under different training objectives. We show that goal misgeneralization is possible under approximate optimization of the maximum expected value (MEV) objective, but not the minimax expected regret (MMER) objective. We then empirically show that the standard MEV-based training method of domain randomization exhibits goal misgeneralization in procedurally-generated grid-world environments, whereas current regret-based unsupervised environment design (UED) methods are more robust to goal misgeneralization (though they don't find MMER policies in all cases). Our findings suggest that minimax expected regret is a promising approach to mitigating goal misgeneralization.

new Neural Dynamic Modes: Computational Imaging of Dynamical Systems from Sparse Observations

Authors: Ali SaraerToosi, Renbo Tu, Kamyar Azizzadenesheli, Aviad Levis

Abstract: Dynamical systems are ubiquitous within science and engineering, from turbulent flow across aircraft wings to structural variability of proteins. Although some systems are well understood and simulated, scientific imaging often confronts never-before-seen dynamics observed through indirect, noisy, and highly sparse measurements. We present NeuralDMD, a model-free framework that combines neural implicit representations with Dynamic Mode Decomposition (DMD) to reconstruct continuous spatio-temporal dynamics from such measurements. The expressiveness of neural representations enables capturing complex spatial structures, while the linear dynamical modes of DMD introduce an inductive bias that guides training and supports stable, low-dimensional representations and forecasting. We validate NeuralDMD on two real-world problems: reconstructing near-surface wind-speed fields over North America from sparse station observations, and recovering the evolution of plasma near the Galactic-center black hole, Sgr A*. In both cases, NeuralDMD outperforms established baselines, demonstrating its potential as a general tool for imaging dynamical systems across geoscience, astronomy, and beyond.

new SymMatika: Structure-Aware Symbolic Discovery

Authors: Michael Scherk, Boyuan Chen

Abstract: Symbolic regression (SR) seeks to recover closed-form mathematical expressions that describe observed data. While existing methods have advanced the discovery of either explicit mappings (i.e., $y = f(\mathbf{x})$) or discovering implicit relations (i.e., $F(\mathbf{x}, y)=0$), few modern and accessible frameworks support both. Moreover, most approaches treat each expression candidate in isolation, without reusing recurring structural patterns that could accelerate search. We introduce SymMatika, a hybrid SR algorithm that combines multi-island genetic programming (GP) with a reusable motif library inspired by biological sequence analysis. SymMatika identifies high-impact substructures in top-performing candidates and reintroduces them to guide future generations. Additionally, it incorporates a feedback-driven evolutionary engine and supports both explicit and implicit relation discovery using implicit-derivative metrics. Across benchmarks, SymMatika achieves state-of-the-art recovery rates, achieving 5.1% higher performance than the previous best results on Nguyen, the first recovery of Nguyen-12, and competitive performance on the Feynman equations. It also recovers implicit physical laws from Eureqa datasets up to $100\times$ faster. Our results demonstrate the power of structure-aware evolutionary search for scientific discovery. To support broader research in interpretable modeling and symbolic discovery, we have open-sourced the full SymMatika framework.

new BLaST: High Performance Inference and Pretraining using BLock Sparse Transformers

Authors: Patrik Okanovic, Sameer Deshmukh, Grzegorz Kwasniewski, Kentaro Katayama, Takumi Honda, Maciej Besta, Torsten Hoefler

Abstract: The energy consumption of large-scale ML models is dominated by data movement - shuffling billions of parameters across memory hierarchies and data centers. Effective sparsification to prune redundant parameters is still challenging: existing methods incur significant accuracy degradation, performance overhead, or both. We introduce (Bl)ock (a)nd (S)parse (T)ransformers (BLaST), a general, robust, and reliable sparsification method applicable to linear layers in all settings. Our method iteratively sparsifies weight matrices into a block sparsity pattern suitable for efficient sparse matrix-matrix (SpMM) multiplication. BLaST achieves up to 95% sparsity in MLP weights with negligible accuracy loss. Our fused, highly optimized Sparse MLP kernel delivers up to 16.7x speedup over dense MLPs across 9 architectures and 8 datasets, resulting in up to 1.6x inference speedup, 1.11x pretraining speedup and up to 3.12x inference memory usage reduction. BLaST enables the next generation of large-scale AI systems by reducing energy use, memory footprint, and latency.

new Neural-Network solver of ideal MHD equilibria

Authors: Timo Thun, Andrea Merlo, Rory Conlin, Dario Panici, Daniel B\"ockenhoff

Abstract: We present a novel approach to compute three-dimensional Magnetohydrodynamic equilibria by parametrizing Fourier modes with artificial neural networks and compare it to equilibria computed by conventional solvers. The full nonlinear global force residual across the volume in real space is then minimized with first order optimizers. Already,we observe competitive computational cost to arrive at the same minimum residuals computed by existing codes. With increased computational cost,lower minima of the residual are achieved by the neural networks,establishing a new lower bound for the force residual. We use minimally complex neural networks,and we expect significant improvements for solving not only single equilibria with neural networks,but also for computing neural network models valid over continuous distributions of equilibria.

new How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models

Authors: Dharshan Kumaran, Stephen M Fleming, Larisa Markeeva, Joe Heyward, Andrea Banino, Mrinal Mathur, Razvan Pascanu, Simon Osindero, Benedetto de Martino, Petar Velickovic, Viorica Patraucean

Abstract: Large language models (LLMs) exhibit strikingly conflicting behaviors: they can appear steadfastly overconfident in their initial answers whilst at the same time being prone to excessive doubt when challenged. To investigate this apparent paradox, we developed a novel experimental paradigm, exploiting the unique ability to obtain confidence estimates from LLMs without creating memory of their initial judgments -- something impossible in human participants. We show that LLMs -- Gemma 3, GPT4o and o1-preview -- exhibit a pronounced choice-supportive bias that reinforces and boosts their estimate of confidence in their answer, resulting in a marked resistance to change their mind. We further demonstrate that LLMs markedly overweight inconsistent compared to consistent advice, in a fashion that deviates qualitatively from normative Bayesian updating. Finally, we demonstrate that these two mechanisms -- a drive to maintain consistency with prior commitments and hypersensitivity to contradictory feedback -- parsimoniously capture LLM behavior in a different domain. Together, these findings furnish a mechanistic account of LLM confidence that explains both their stubbornness and excessive sensitivity to criticism.

new Set Valued Predictions For Robust Domain Generalization

Authors: Ron Tsibulsky, Daniel Nevo, Uri Shalit

Abstract: Despite the impressive advancements in modern machine learning, achieving robustness in Domain Generalization (DG) tasks remains a significant challenge. In DG, models are expected to perform well on samples from unseen test distributions (also called domains), by learning from multiple related training distributions. Most existing approaches to this problem rely on single-valued predictions, which inherently limit their robustness. We argue that set-valued predictors could be leveraged to enhance robustness across unseen domains, while also taking into account that these sets should be as small as possible. We introduce a theoretical framework defining successful set prediction in the DG setting, focusing on meeting a predefined performance criterion across as many domains as possible, and provide theoretical insights into the conditions under which such domain generalization is achievable. We further propose a practical optimization method compatible with modern learning architectures, that balances robust performance on unseen domains with small prediction set sizes. We evaluate our approach on several real-world datasets from the WILDS benchmark, demonstrating its potential as a promising direction for robust domain generalization.

new HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference

Authors: Weishu Deng, Yujie Yang, Peiran Du, Lingfeng Xiang, Zhen Lin, Chen Zhong, Song Jiang, Hui Lu, Jia Rao

Abstract: Scaling inference for large language models (LLMs) is increasingly constrained by limited GPU memory, especially due to growing key-value (KV) caches required for long-context generation. While existing approaches offload KV caches to CPU memory or apply sparse attention to reduce GPU load, they often underutilize CPU compute resources and compromise accuracy. We present HGCA, a hybrid CPU-GPU attention mechanism that enables scalable, high-throughput LLM inference with near-full attention quality. HGCA performs dense attention on recently generated KV entries retained in GPU memory and parallel sparse attention on selected, salient KV entries in CPU memory. The attention outputs are efficiently merged using log-sum-exp fusion, minimizing PCIe transfer overhead. HGCA also introduces a finegrained, per-head sparsification strategy optimized for CPU execution, preserving contextual relevance while reducing computation. Our implementation seamlessly integrates into existing LLM frameworks without requiring model retraining. Experiments across diverse models and workloads show that HGCA achieves superior scalability, supports longer sequences and larger batch sizes, and outperforms existing sparse attention baselines in both performance and accuracy -- all on commodity GPU hardware.

new MathOptAI.jl: Embed trained machine learning predictors into JuMP models

Authors: Oscar Dowson, Robert B Parker, Russel Bent

Abstract: We present \texttt{MathOptAI.jl}, an open-source Julia library for embedding trained machine learning predictors into a JuMP model. \texttt{MathOptAI.jl} can embed a wide variety of neural networks, decision trees, and Gaussian Processes into a larger mathematical optimization model. In addition to interfacing a range of Julia-based machine learning libraries such as \texttt{Lux.jl} and \texttt{Flux.jl}, \texttt{MathOptAI.jl} uses Julia's Python interface to provide support for PyTorch models. When the PyTorch support is combined with \texttt{MathOptAI.jl}'s gray-box formulation, the function, Jacobian, and Hessian evaluations associated with the PyTorch model are offloaded to the GPU in Python, while the rest of the nonlinear oracles are evaluated on the CPU in Julia. \MathOptAI is available at https://github.com/lanl-ansi/MathOptAI.jl under a BSD-3 license.

URLs: https://github.com/lanl-ansi/MathOptAI.jl

new PiCME: Pipeline for Contrastive Modality Evaluation and Encoding in the MIMIC Dataset

Authors: Michal Golovanevsky, Pranav Mahableshwarkar, Carsten Eickhoff, Ritambhara Singh

Abstract: Multimodal deep learning holds promise for improving clinical prediction by integrating diverse patient data, including text, imaging, time-series, and structured demographics. Contrastive learning facilitates this integration by producing a unified representation that can be reused across tasks, reducing the need for separate models or encoders. Although contrastive learning has seen success in vision-language domains, its use in clinical settings remains largely limited to image and text pairs. We propose the Pipeline for Contrastive Modality Evaluation and Encoding (PiCME), which systematically assesses five clinical data types from MIMIC: discharge summaries, radiology reports, chest X-rays, demographics, and time-series. We pre-train contrastive models on all 26 combinations of two to five modalities and evaluate their utility on in-hospital mortality and phenotype prediction. To address performance plateaus with more modalities, we introduce a Modality-Gated LSTM that weights each modality according to its contrastively learned importance. Our results show that contrastive models remain competitive with supervised baselines, particularly in three-modality settings. Performance declines beyond three modalities, which supervised models fail to recover. The Modality-Gated LSTM mitigates this drop, improving AUROC from 73.19% to 76.93% and AUPRC from 51.27% to 62.26% in the five-modality setting. We also compare contrastively learned modality importance scores with attribution scores and evaluate generalization across demographic subgroups, highlighting strengths in interpretability and fairness. PiCME is the first to scale contrastive learning across all modality combinations in MIMIC, offering guidance for modality selection, training strategies, and equitable clinical prediction.

new Adopting a human developmental visual diet yields robust, shape-based AI vision

Authors: Zejin Lu, Sushrut Thorat, Radoslaw M Cichy, Tim C Kietzmann

Abstract: Despite years of research and the dramatic scaling of artificial intelligence (AI) systems, a striking misalignment between artificial and human vision persists. Contrary to humans, AI heavily relies on texture-features rather than shape information, lacks robustness to image distortions, remains highly vulnerable to adversarial attacks, and struggles to recognise simple abstract shapes within complex backgrounds. To close this gap, we here introduce a solution that arises from a previously underexplored direction: rather than scaling up, we take inspiration from how human vision develops from early infancy into adulthood. We quantified the visual maturation by synthesising decades of psychophysical and neurophysiological research into a novel developmental visual diet (DVD) for AI vision. We show that guiding AI systems through this human-inspired curriculum produces models that closely align with human behaviour on every hallmark of robust vision tested yielding the strongest reported reliance on shape information to date, abstract shape recognition beyond the state of the art, higher robustness to image corruptions, and stronger resilience to adversarial attacks. By outperforming high parameter AI foundation models trained on orders of magnitude more data, we provide evidence that robust AI vision can be achieved by guiding the way how a model learns, not merely how much it learns, offering a resource-efficient route toward safer and more human-like artificial visual systems.

new Latent Thermodynamic Flows: Unified Representation Learning and Generative Modeling of Temperature-Dependent Behaviors from Limited Data

Authors: Yunrui Qiu, Richard John, Lukas Herron, Pratyush Tiwary

Abstract: Accurate characterization of the equilibrium distributions of complex molecular systems and their dependence on environmental factors such as temperature is essential for understanding thermodynamic properties and transition mechanisms. Projecting these distributions onto meaningful low-dimensional representations enables interpretability and downstream analysis. Recent advances in generative AI, particularly flow models such as Normalizing Flows (NFs), have shown promise in modeling such distributions, but their scope is limited without tailored representation learning. In this work, we introduce Latent Thermodynamic Flows (LaTF), an end-to-end framework that tightly integrates representation learning and generative modeling. LaTF unifies the State Predictive Information Bottleneck (SPIB) with NFs to simultaneously learn low-dimensional latent representations, referred to as Collective Variables (CVs), classify metastable states, and generate equilibrium distributions across temperatures beyond the training data. The two components of representation learning and generative modeling are optimized jointly, ensuring that the learned latent features capture the system's slow, important degrees of freedom while the generative model accurately reproduces the system's equilibrium behavior. We demonstrate LaTF's effectiveness across diverse systems, including a model potential, the Chignolin protein, and cluster of Lennard Jones particles, with thorough evaluations and benchmarking using multiple metrics and extensive simulations. Finally, we apply LaTF to a RNA tetraloop system, where despite using simulation data from only two temperatures, LaTF reconstructs the temperature-dependent structural ensemble and melting behavior, consistent with experimental and prior extensive computational results.

new Understanding Knowledge Transferability for Transfer Learning: A Survey

Authors: Haohua Wang, Jingge Wang, Zijie Zhao, Yang Tan, Yanru Wu, Hanbing Liu, Jingyun Yang, Enming Zhang, Xiangyu Chen, Zhengze Rong, Shanxin Guo, Yang Li

Abstract: Transfer learning has become an essential paradigm in artificial intelligence, enabling the transfer of knowledge from a source task to improve performance on a target task. This approach, particularly through techniques such as pretraining and fine-tuning, has seen significant success in fields like computer vision and natural language processing. However, despite its widespread use, how to reliably assess the transferability of knowledge remains a challenge. Understanding the theoretical underpinnings of each transferability metric is critical for ensuring the success of transfer learning. In this survey, we provide a unified taxonomy of transferability metrics, categorizing them based on transferable knowledge types and measurement granularity. This work examines the various metrics developed to evaluate the potential of source knowledge for transfer learning and their applicability across different learning paradigms emphasizing the need for careful selection of these metrics. By offering insights into how different metrics work under varying conditions, this survey aims to guide researchers and practitioners in selecting the most appropriate metric for specific applications, contributing to more efficient, reliable, and trustworthy AI systems. Finally, we discuss some open challenges in this field and propose future research directions to further advance the application of transferability metrics in trustworthy transfer learning.

new DistZO2: High-Throughput and Memory-Efficient Zeroth-Order Fine-tuning LLMs with Distributed Parallel Computing

Authors: Liangyu Wang, Huanyi Xie, Di Wang

Abstract: Fine-tuning large language models (LLMs) remains resource-intensive due to their sheer scale. While zeroth-order (ZO) optimization provides a memory-efficient alternative by eliminating backward passes, its application to multi-hundred-billion-parameter models is constrained by GPU memory and compute throughput. The ZO2 framework addresses the memory bottleneck by offloading model parameters to CPU memory and overlapping transformer block transfer with dual forward computation on a single GPU. However, ZO2 remains limited by its single-device execution and achieves modest throughput. In this work, we present DistZO2, a high-throughput, memory-efficient framework for distributed zeroth-order fine-tuning of LLMs. DistZO2 introduces three parallel strategies: (1) Perturbation Parallelism (PertP), which parallelizes the two perturbed forward passes across devices; (2) Distributed Data Parallelism (DDP), adapted to the scalar-gradient nature of ZO training; and (3) a unified 2D Parallelism design that combines PertP and DDP. To further mitigate communication bottlenecks introduced by parameter offloading, we propose a hardware-aware communication strategy that slices parameter blocks and redistributes them across GPUs via high-speed interconnects such as NVLink. DistZO2 scales zeroth-order fine-tuning to modern multi-GPU systems, preserving ZO2's memory efficiency while substantially improving training throughput. In our experiments on OPT-175B, DistZO2 achieves a 3x speedup over ZO2 with distributed computing. DistZO2's code has been open-sourced in https://github.com/liangyuwang/zo2.

URLs: https://github.com/liangyuwang/zo2.

new Neural Inhibition Improves Dynamic Routing and Mixture of Experts

Authors: Will Y. Zou, Jennifer Y. Zhang

Abstract: To be effective, efficient, and diverse, deep learning models need to dynamically choose its architecture based on signals from a population of neurons. We hypothesize dynamic routing models can be improved with neural inhibition in those neural populations. This means signals commonly shared among the various modes of data statistics can be inhibited so that the routing model can choose a specialized expert path for each data sample. Only through inhibition is the routing mechanism able to effectively select neural pathways. We believe this is an under-studied and under-verified implementation methodology for Mixture-of-Experts, dynamic routing, and transformer language models. We provide experimental evidence that the neural inhibition algorithm significantly boosts the performance of general tasks and motivates more effort to be invested in this research direction.

new Conformal Information Pursuit for Interactively Guiding Large Language Models

Authors: Kwan Ho Ryan Chan, Yuyan Ge, Edgar Dobriban, Hamed Hassani, Ren\'e Vidal

Abstract: A significant use case of instruction-finetuned Large Language Models (LLMs) is to solve question-answering tasks interactively. In this setting, an LLM agent is tasked with making a prediction by sequentially querying relevant information from the user, as opposed to a single-turn conversation. This paper explores sequential querying strategies that aim to minimize the expected number of queries. One such strategy is Information Pursuit (IP), a greedy algorithm that at each iteration selects the query that maximizes information gain or equivalently minimizes uncertainty. However, obtaining accurate estimates of mutual information or conditional entropy for LLMs is very difficult in practice due to over- or under-confident LLM probabilities, which leads to suboptimal query selection and predictive performance. To better estimate the uncertainty at each iteration, we propose Conformal Information Pursuit (C-IP), an alternative approach to sequential information gain based on conformal prediction sets. More specifically, C-IP leverages a relationship between prediction sets and conditional entropy at each iteration to estimate uncertainty based on the average size of conformal prediction sets. In contrast to conditional entropy, we find that conformal prediction sets are a distribution-free and robust method of measuring uncertainty. Experiments with 20 Questions show that C-IP obtains better predictive performance and shorter query-answer chains compared to previous approaches to IP and uncertainty-based chain-of-thought methods. Furthermore, extending to an interactive medical setting between a doctor and a patient on the MediQ dataset, C-IP achieves competitive performance with direct single-turn prediction while offering greater interpretability.

new Do Tensorized Large-Scale Spatiotemporal Dynamic Atmospheric Data Exhibit Low-Rank Properties?

Authors: Ryan Solgi, Seyedali Mousavinezhad, Hugo A. Loaiciga

Abstract: In this study, we investigate for the first time the low-rank properties of a tensorized large-scale spatio-temporal dynamic atmospheric variable. We focus on the Sentinel-5P tropospheric NO2 product (S5P-TN) over a four-year period in an area that encompasses the contiguous United States (CONUS). Here, it is demonstrated that a low-rank approximation of such a dynamic variable is feasible. We apply the low-rank properties of the S5P-TN data to inpaint gaps in the Sentinel-5P product by adopting a low-rank tensor model (LRTM) based on the CANDECOMP / PARAFAC (CP) decomposition and alternating least squares (ALS). Furthermore, we evaluate the LRTM's results by comparing them with spatial interpolation using geostatistics, and conduct a comprehensive spatial statistical and temporal analysis of the S5P-TN product. The results of this study demonstrated that the tensor completion successfully reconstructs the missing values in the S5P-TN product, particularly in the presence of extended cloud obscuration, predicting outliers and identifying hotspots, when the data is tensorized over extended spatial and temporal scales.

new Global Variational Inference Enhanced Robust Domain Adaptation

Authors: Lingkun Luo, Shiqiang Hu, Liming Chen

Abstract: Deep learning-based domain adaptation (DA) methods have shown strong performance by learning transferable representations. However, their reliance on mini-batch training limits global distribution modeling, leading to unstable alignment and suboptimal generalization. We propose Global Variational Inference Enhanced Domain Adaptation (GVI-DA), a framework that learns continuous, class-conditional global priors via variational inference to enable structure-aware cross-domain alignment. GVI-DA minimizes domain gaps through latent feature reconstruction, and mitigates posterior collapse using global codebook learning with randomized sampling. It further improves robustness by discarding low-confidence pseudo-labels and generating reliable target-domain samples. Extensive experiments on four benchmarks and thirty-eight DA tasks demonstrate consistent state-of-the-art performance. We also derive the model's evidence lower bound (ELBO) and analyze the effects of prior continuity, codebook size, and pseudo-label noise tolerance. In addition, we compare GVI-DA with diffusion-based generative frameworks in terms of optimization principles and efficiency, highlighting both its theoretical soundness and practical advantages.

new MGAA: Multi-Granular Adaptive Allocation fof Low-Rank Compression of LLMs

Authors: Guangyan Li, Yongqiang Tang, Wensheng Zhang

Abstract: The enormous parameter scale of large language models (LLMs) has made model compression a research hotspot, which aims to alleviate computational resource demands during deployment and inference. As a promising direction, low-rank approximation technique has made remarkable achievements. Nevertheless, unfortunately, the vast majority of studies to low-rank approximation compression generally apply uniform compression ratios across all weight matrices, while disregarding their inherently differentiated impacts on the model's performance. Although a few recent work attempts to employ heuristic search strategies to achieve the optimal parameter allocation, such strategies are computationally inefficient and lose the generalization ability in the era of LLMs. In this study, we propose a novel parameter Multi-Granular Adaptive Allocation (MGAA) method, which can adaptively allocate parameters between and within sublayers without task-specific evaluations in the compression process. MGAA consists of two components: 1) Among different sublayers, it assigns compression ratios based on their cosine similarity between inputs and outputs, allowing for a more tailored compression in sublayers with varying degrees of importance, and 2) Within each sublayer, it allocates different compression ratios to weight matrices based on their energy distribution characteristics, ensuring a consistent energy retention ratio while optimizing compression efficiency. Comprehensive evaluations of MGAA across multiple LLMs backbone models and benchmark datasets demonstrate its superior performance. Additionally, we apply our MGAA to multimodal model LLaVA, exhibiting remarkable performance improvements.

new Dyn-O: Building Structured World Models with Object-Centric Representations

Authors: Zizhao Wang, Kaixin Wang, Li Zhao, Peter Stone, Jiang Bian

Abstract: World models aim to capture the dynamics of the environment, enabling agents to predict and plan for future states. In most scenarios of interest, the dynamics are highly centered on interactions among objects within the environment. This motivates the development of world models that operate on object-centric rather than monolithic representations, with the goal of more effectively capturing environment dynamics and enhancing compositional generalization. However, the development of object-centric world models has largely been explored in environments with limited visual complexity (such as basic geometries). It remains underexplored whether such models can generalize to more complex settings with diverse textures and cluttered scenes. In this paper, we fill this gap by introducing Dyn-O, an enhanced structured world model built upon object-centric representations. Compared to prior work in object-centric representations, Dyn-O improves in both learning representations and modeling dynamics. On the challenging Procgen games, we find that our method can learn object-centric world models directly from pixel observations, outperforming DreamerV3 in rollout prediction accuracy. Furthermore, by decoupling object-centric features into dynamics-agnostic and dynamics-aware components, we enable finer-grained manipulation of these features and generate more diverse imagined trajectories.

new LRM-1B: Towards Large Routing Model

Authors: Han Li, Fei Liu, Zhenkun Wang, Qingfu Zhang

Abstract: Vehicle routing problems (VRPs) are central to combinatorial optimization with significant practical implications. Recent advancements in neural combinatorial optimization (NCO) have demonstrated promising results by leveraging neural networks to solve VRPs, yet the exploration of model scaling within this domain remains underexplored. Inspired by the success of model scaling in large language models (LLMs), this study introduces a Large Routing Model with 1 billion parameters (LRM-1B), designed to address diverse VRP scenarios. We present a comprehensive evaluation of LRM-1B across multiple problem variants, distributions, and sizes, establishing state-of-the-art results. Our findings reveal that LRM-1B not only adapts to different VRP challenges but also showcases superior performance, outperforming existing models. Additionally, we explore the scaling behavior of neural routing models from 1M to 1B parameters. Our analysis confirms power-law between multiple model factors and performance, offering critical insights into the optimal configurations for foundation neural routing solvers.

new ReTimeCausal: EM-Augmented Additive Noise Models for Interpretable Causal Discovery in Irregular Time Series

Authors: Weihong Li, Anpeng Wu, Kun Kuang, Keting Yin

Abstract: This paper studies causal discovery in irregularly sampled time series-a pivotal challenge in high-stakes domains like finance, healthcare, and climate science, where missing data and inconsistent sampling frequencies distort causal mechanisms. Traditional methods (e.g., Granger causality, PCMCI) fail to reconcile multi-scale interactions (e.g., hourly storms vs. decadal climate shifts), while neural approaches (e.g., CUTS+) lack interpretability, stemming from a critical gap: existing frameworks either rigidly assume temporal regularity or aggregate dynamics into opaque representations, neglecting real-world granularity and auditable logic. To bridge this gap, we propose ReTimeCausal, a novel integration of Additive Noise Models (ANM) and Expectation-Maximization (EM) that unifies physics-guided data imputation with sparse causal inference. Through kernelized sparse regression and structural constraints, ReTimeCausal iteratively refines missing values (E-step) and causal graphs (M-step), resolving cross-frequency dependencies and missing data issues. Extensive experiments on synthetic and real-world datasets demonstrate that ReTimeCausal outperforms existing state-of-the-art methods under challenging irregular sampling and missing data conditions.

new MPX: Mixed Precision Training for JAX

Authors: Alexander Gr\"afe, Sebastian Trimpe

Abstract: Mixed-precision training has emerged as an indispensable tool for enhancing the efficiency of neural network training in recent years. Concurrently, JAX has grown in popularity as a versatile machine learning toolbox. However, it currently lacks robust support for mixed-precision training. We propose MPX, a mixed-precision training toolbox for JAX that simplifies and accelerates the training of large-scale neural networks while preserving model accuracy. MPX seamlessly integrates with popular toolboxes such as Equinox and Flax, allowing users to convert full-precision pipelines to mixed-precision versions with minimal modifications. By casting both inputs and outputs to half precision, and introducing a dynamic loss-scaling mechanism, MPX alleviates issues like gradient underflow and overflow that commonly arise in half precision computations. Its design inherits critical features from JAX's type-promotion behavior, ensuring that operations take place in the correct precision and allowing for selective enforcement of full precision where needed (e.g., sums, means, or softmax). MPX further provides wrappers for automatic creation and management of mixed-precision gradients and optimizers, enabling straightforward integration into existing JAX training pipelines. MPX's source code, documentation, and usage examples are available at github.com/Data-Science-in-Mechanical-Engineering/mixed_precision_for_JAX .

new Structure-Aware Compound-Protein Affinity Prediction via Graph Neural Network with Group Lasso Regularization

Authors: Zanyu Shi, Yang Wang, Pathum Weerawarna, Jie Zhang, Timothy Richardson, Yijie Wang, Kun Huang

Abstract: Explainable artificial intelligence (XAI) approaches have been increasingly applied in drug discovery to learn molecular representations and identify substructures driving property predictions. However, building end-to-end explainable machine learning models for structure-activity relationship (SAR) modeling for compound property prediction faces many challenges, such as limited activity data per target and the sensitivity of properties to subtle molecular changes. To address this, we leveraged activity-cliff molecule pairs, i.e., compounds sharing a common scaffold but differing sharply in potency, targeting three proto-oncogene tyrosine-protein kinase Src proteins (i.e., PDB IDs 1O42, 2H8H, and 4MXO). We implemented graph neural network (GNN) methods to obtain atom-level feature information and predict compound-protein affinity (i.e., half maximal inhibitory concentration, IC50). In addition, we trained GNN models with different structure-aware loss functions to adequately leverage molecular property and structure information. We also utilized group lasso and sparse group lasso to prune and highlight molecular subgraphs and enhance the structure-specific model explainability for the predicted property difference in molecular activity-cliff pairs. We improved drug property prediction by integrating common and uncommon node information and using sparse group lasso, reducing the average root mean squared error (RMSE) by 12.70%, and achieving the lowest averaged RMSE=0.2551 and the highest PCC=0.9572. Furthermore, applying regularization enhances feature attribution methods that estimate the contribution of each atom in the molecular graphs by boosting global direction scores and atom-level accuracy in atom coloring accuracy, which improves model interpretability in drug discovery pipelines, particularly in investigating important molecular substructures in lead optimization.

new Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency

Authors: Naoki Nishikawa, Rei Higuchi, Taiji Suzuki

Abstract: Linear attention has attracted interest as a computationally efficient approximation to softmax attention, especially for long sequences. Recent studies have explored distilling softmax attention in pre-trained Transformers into linear attention. However, a critical challenge remains: how to choose the feature dimension that governs the approximation quality. Existing methods fix this dimension uniformly across all attention layers, overlooking the diverse roles and complexities of them. In this paper, we propose a principled method to automatically determine the feature dimension in linear attention using the concept of statistical degrees of freedom, which represent the effective dimensionality of the inputs. We provide a theoretical bound on the approximation error and show that the dimension chosen by our method achieves smaller error under a fixed computational budget. Furthermore, we introduce an efficient layerwise training strategy to learn nonlinear features tailored to each layer. Experiments on multiple pre-trained transformers demonstrate that our method improves the performance of distilled models compared to baselines without increasing the inference cost. Our findings also provide insight into how the complexity of the attention mechanism evolves across layers.

new Action Robust Reinforcement Learning via Optimal Adversary Aware Policy Optimization

Authors: Buqing Nie, Yangqing Fu, Jingtian Ji, Yue Gao

Abstract: Reinforcement Learning (RL) has achieved remarkable success in sequential decision tasks. However, recent studies have revealed the vulnerability of RL policies to different perturbations, raising concerns about their effectiveness and safety in real-world applications. In this work, we focus on the robustness of RL policies against action perturbations and introduce a novel framework called Optimal Adversary-aware Policy Iteration (OA-PI). Our framework enhances action robustness under various perturbations by evaluating and improving policy performance against the corresponding optimal adversaries. Besides, our approach can be integrated into mainstream DRL algorithms such as Twin Delayed DDPG (TD3) and Proximal Policy Optimization (PPO), improving action robustness effectively while maintaining nominal performance and sample efficiency. Experimental results across various environments demonstrate that our method enhances robustness of DRL policies against different action adversaries effectively.

new Absolute Evaluation Measures for Machine Learning: A Survey

Authors: Silvia Beddar-Wiesing, Alice Moallemy-Oureh, Marie Kempkes, Josephine M. Thomas

Abstract: Machine Learning is a diverse field applied across various domains such as computer science, social sciences, medicine, chemistry, and finance. This diversity results in varied evaluation approaches, making it difficult to compare models effectively. Absolute evaluation measures offer a practical solution by assessing a model's performance on a fixed scale, independent of reference models and data ranges, enabling explicit comparisons. However, many commonly used measures are not universally applicable, leading to a lack of comprehensive guidance on their appropriate use. This survey addresses this gap by providing an overview of absolute evaluation metrics in ML, organized by the type of learning problem. While classification metrics have been extensively studied, this work also covers clustering, regression, and ranking metrics. By grouping these measures according to the specific ML challenges they address, this survey aims to equip practitioners with the tools necessary to select appropriate metrics for their models. The provided overview thus improves individual model evaluation and facilitates meaningful comparisons across different models and applications.

new On the Effectiveness of the $z$-Transform Method in Quadratic Optimization

Authors: Francis Bach (SIERRA)

Abstract: The $z$-transform of a sequence is a classical tool used within signal processing, control theory, computer science, and electrical engineering. It allows for studying sequences from their generating functions, with many operations that can be equivalently defined on the original sequence and its $z$-transform. In particular, the $z$-transform method focuses on asymptotic behaviors and allows the use of Taylor expansions. We present a sequence of results of increasing significance and difficulty for linear models and optimization algorithms, demonstrating the effectiveness and versatility of the $z$-transform method in deriving new asymptotic results. Starting from the simplest gradient descent iterations in an infinite-dimensional Hilbert space, we show how the spectral dimension characterizes the convergence behavior. We then extend the analysis to Nesterov acceleration, averaging techniques, and stochastic gradient descent.

new A Hybrid Game-Theory and Deep Learning Framework for Predicting Tourist Arrivals via Big Data Analytics and Opinion Leader Detection

Authors: Ali Nikseresht

Abstract: In the era of Industry 5.0, data-driven decision-making has become indispensable for optimizing systems across Industrial Engineering. This paper addresses the value of big data analytics by proposing a novel non-linear hybrid approach for forecasting international tourist arrivals in two different contexts: (i) arrivals to Hong Kong from five major source nations (pre-COVID-19), and (ii) arrivals to Sanya in Hainan province, China (post-COVID-19). The method integrates multiple sources of Internet big data and employs an innovative game theory-based algorithm to identify opinion leaders on social media platforms. Subsequently, nonstationary attributes in tourism demand data are managed through Empirical Wavelet Transform (EWT), ensuring refined time-frequency analysis. Finally, a memory-aware Stacked Bi-directional Long Short-Term Memory (Stacked BiLSTM) network is used to generate accurate demand forecasts. Experimental results demonstrate that this approach outperforms existing state-of-the-art techniques and remains robust under dynamic and volatile conditions, highlighting its applicability to broader Industrial Engineering domains, such as logistics, supply chain management, and production planning, where forecasting and resource allocation are key challenges. By merging advanced Deep Learning (DL), time-frequency analysis, and social media insights, the proposed framework showcases how large-scale data can elevate the quality and efficiency of decision-making processes.

new Multi-Level Fusion Graph Neural Network for Molecule Property Prediction

Authors: XiaYu Liu, Hou-biao Li, Yang Liu, Chao Fan

Abstract: Accurate molecular property prediction is essential in drug discovery and related fields. However, existing graph neural networks (GNNs) often struggle to simultaneously capture both local and global molecular structures. In this work, we propose a Multi-Level Fusion Graph Neural Network (MLFGNN) that integrates Graph Attention Networks and a novel Graph Transformer to jointly model local and global dependencies. In addition, we incorporate molecular fingerprints as a complementary modality and introduce a mechanism of interaction between attention to adaptively fuse information across representations. Extensive experiments on multiple benchmark datasets demonstrate that MLFGNN consistently outperforms state-of-the-art methods in both classification and regression tasks. Interpretability analysis further reveals that the model effectively captures task-relevant chemical patterns, supporting the usefulness of multi-level and multi-modal fusion in molecular representation learning.

new Molecular Machine Learning Using Euler Characteristic Transforms

Authors: Victor Toscano-Duran, Florian Rottach, Bastian Rieck

Abstract: The shape of a molecule determines its physicochemical and biological properties. However, it is often underrepresented in standard molecular representation learning approaches. Here, we propose using the Euler Characteristic Transform (ECT) as a geometrical-topological descriptor. Computed directly on a molecular graph derived from handcrafted atomic features, the ECT enables the extraction of multiscale structural features, offering a novel way to represent and encode molecular shape in the feature space. We assess the predictive performance of this representation across nine benchmark regression datasets, all centered around predicting the inhibition constant $K_i$. In addition, we compare our proposed ECT-based representation against traditional molecular representations and methods, such as molecular fingerprints/descriptors and graph neural networks (GNNs). Our results show that our ECT-based representation achieves competitive performance, ranking among the best-performing methods on several datasets. More importantly, its combination with traditional representations, particularly with the AVALON fingerprint, significantly \emph{enhances predictive performance}, outperforming other methods on most datasets. These findings highlight the complementary value of multiscale topological information and its potential for being combined with established techniques. Our study suggests that hybrid approaches incorporating explicit shape information can lead to more informative and robust molecular representations, enhancing and opening new avenues in molecular machine learning tasks. To support reproducibility and foster open biomedical research, we provide open access to all experiments and code used in this work.

new ObjectRL: An Object-Oriented Reinforcement Learning Codebase

Authors: Gulcin Baykal, Abdullah Akg\"ul, Manuel Haussmann, Bahareh Tasdighi, Nicklas Werge, Yi-Shan Wu, Melih Kandemir

Abstract: ObjectRL is an open-source Python codebase for deep reinforcement learning (RL), designed for research-oriented prototyping with minimal programming effort. Unlike existing codebases, ObjectRL is built on Object-Oriented Programming (OOP) principles, providing a clear structure that simplifies the implementation, modification, and evaluation of new algorithms. ObjectRL lowers the entry barrier for deep RL research by organizing best practices into explicit, clearly separated components, making them easier to understand and adapt. Each algorithmic component is a class with attributes that describe key RL concepts and methods that intuitively reflect their interactions. The class hierarchy closely follows common ontological relationships, enabling data encapsulation, inheritance, and polymorphism, which are core features of OOP. We demonstrate the efficiency of ObjectRL's design through representative use cases that highlight its flexibility and suitability for rapid prototyping. The documentation and source code are available at https://objectrl.readthedocs.io and https://github.com/adinlab/objectrl .

URLs: https://objectrl.readthedocs.io, https://github.com/adinlab/objectrl

new Reinforcement Learning-based Feature Generation Algorithm for Scientific Data

Authors: Meng Xiao, Junfeng Zhou, Yuanchun Zhou

Abstract: Feature generation (FG) aims to enhance the prediction potential of original data by constructing high-order feature combinations and removing redundant features. It is a key preprocessing step for tabular scientific data to improve downstream machine-learning model performance. Traditional methods face the following two challenges when dealing with the feature generation of scientific data: First, the effective construction of high-order feature combinations in scientific data necessitates profound and extensive domain-specific expertise. Secondly, as the order of feature combinations increases, the search space expands exponentially, imposing prohibitive human labor consumption. Advancements in the Data-Centric Artificial Intelligence (DCAI) paradigm have opened novel avenues for automating feature generation processes. Inspired by that, this paper revisits the conventional feature generation workflow and proposes the Multi-agent Feature Generation (MAFG) framework. Specifically, in the iterative exploration stage, multi-agents will construct mathematical transformation equations collaboratively, synthesize and identify feature combinations ex-hibiting high information content, and leverage a reinforcement learning mechanism to evolve their strategies. Upon completing the exploration phase, MAFG integrates the large language models (LLMs) to interpreta-tively evaluate the generated features of each significant model performance breakthrough. Experimental results and case studies consistently demonstrate that the MAFG framework effectively automates the feature generation process and significantly enhances various downstream scientific data mining tasks.

new Decoupled Relative Learning Rate Schedules

Authors: Jan Ludziejewski, Jan Ma{\l}a\'snicki, Maciej Pi\'oro, Micha{\l} Krutul, Kamil Ciebiera, Maciej Stefaniak, Jakub Krajewski, Piotr Sankowski, Marek Cygan, Kamil Adamczewski, Sebastian Jaszczur

Abstract: In this work, we introduce a novel approach for optimizing LLM training by adjusting learning rates across weights of different components in Transformer models. Traditional methods often apply a uniform learning rate across all network layers, potentially overlooking the unique dynamics of each part. Remarkably, our introduced relative learning rates, RLRS, method accelerates the training process by up to $23\%$, particularly in complex models such as Mixture of Experts (MoE). Hyperparameters of RLRS can be efficiently tuned on smaller models and then effectively reused on models up to $27\times$ larger. This simple and effective method results in a substantial reduction in training time and computational resources, offering a practical and scalable solution for optimizing large-scale neural networks.

new Generating Synthetic Relational Tabular Data via Structural Causal Models

Authors: Frederik Hoppe, Astrid Franz, Lars Kleinemeier, Udo G\"obel

Abstract: Synthetic tabular data generation has received increasing attention in recent years, particularly with the emergence of foundation models for tabular data. The breakthrough success of TabPFN (Hollmann et al.,2025), which leverages vast quantities of synthetic tabular datasets derived from structural causal models (SCMs), demonstrates the critical role synthetic data plays in developing powerful tabular foundation models. However, most real-world tabular data exists in relational formats spanning multiple interconnected tables - a structure not adequately addressed by current generation methods. In this work, we extend the SCM-based approach by developing a novel framework that generates realistic synthetic relational tabular data including causal relationships across tables. Our experiments confirm that this framework is able to construct relational datasets with complex inter-table dependencies mimicking real-world scenarios.

new Communication Efficient, Differentially Private Distributed Optimization using Correlation-Aware Sketching

Authors: Julien Nicolas, Mohamed Maouche, Sonia Ben Mokhtar, Mark Coates

Abstract: Federated learning with differential privacy suffers from two major costs: each client must transmit $d$-dimensional gradients every round, and the magnitude of DP noise grows with $d$. Yet empirical studies show that gradient updates exhibit strong temporal correlations and lie in a $k$-dimensional subspace with $k \ll d$. Motivated by this, we introduce DOME, a decentralized DP optimization framework in which each client maintains a compact sketch to project gradients into $\mathbb{R}^k$ before privatization and Secure Aggregation. This reduces per-round communication from order $d$ to order $k$ and moves towards a gradient approximation mean-squared error of $\sigma^2 k$. To allow the sketch to span new directions and prevent it from collapsing onto historical gradients, we augment it with random probes orthogonal to historical directions. We prove that our overall protocol satisfies $(\epsilon,\delta)$-Differential Privacy.

new Simplifying Graph Neural Kernels: from Stacking Layers to Collapsed Structure

Authors: Lin Wang, Shijie Wang, Sirui Huang, Qing Li

Abstract: The Graph Neural Tangent Kernel (GNTK) successfully bridges the gap between kernel methods and Graph Neural Networks (GNNs), addressing key challenges such as the difficulty of training deep networks and the limitations of traditional kernel methods. However, the existing layer-stacking strategy in GNTK introduces redundant computations, significantly increasing computational complexity and limiting scalability for practical applications. To address these issues, this paper proposes the Simplified Graph Neural Tangent Kernel (SGTK), which replaces the traditional multi-layer stacking mechanism with a continuous $K$-step aggregation operation. This novel approach streamlines the iterative kernel computation process, effectively eliminating redundant calculations while preserving the kernel's expressiveness. By reducing the dependency on layer stacking, SGTK achieves both computational simplicity and efficiency. Furthermore, we introduce the Simplified Graph Neural Kernel (SGNK), which models infinitely wide Graph Neural Networks as Gaussian Processes. This allows kernel values to be directly determined from the expected outputs of activation functions in the infinite-width regime, bypassing the need for explicit layer-by-layer computation. SGNK further reduces computational complexity while maintaining the capacity to capture intricate structural patterns in graphs. Extensive experiments on node and graph classification tasks demonstrate that the proposed SGTK and SGNK achieve performance comparable to existing approaches while improving computational efficiency. Implementation details are available at https://anonymous.4open.science/r/SGNK-1CE4/.

URLs: https://anonymous.4open.science/r/SGNK-1CE4/.

new Kinetic Langevin Diffusion for Crystalline Materials Generation

Authors: Fran\c{c}ois Cornet, Federico Bergamin, Arghya Bhowmik, Juan Maria Garcia Lastra, Jes Frellsen, Mikkel N. Schmidt

Abstract: Generative modeling of crystalline materials using diffusion models presents a series of challenges: the data distribution is characterized by inherent symmetries and involves multiple modalities, with some defined on specific manifolds. Notably, the treatment of fractional coordinates representing atomic positions in the unit cell requires careful consideration, as they lie on a hypertorus. In this work, we introduce Kinetic Langevin Diffusion for Materials (KLDM), a novel diffusion model for crystalline materials generation, where the key innovation resides in the modeling of the coordinates. Instead of resorting to Riemannian diffusion on the hypertorus directly, we generalize Trivialized Diffusion Model (TDM) to account for the symmetries inherent to crystals. By coupling coordinates with auxiliary Euclidean variables representing velocities, the diffusion process is now offset to a flat space. This allows us to effectively perform diffusion on the hypertorus while providing a training objective that accounts for the periodic translation symmetry of the true data distribution. We evaluate KLDM on both Crystal Structure Prediction (CSP) and De-novo Generation (DNG) tasks, demonstrating its competitive performance with current state-of-the-art models.

new Disentangling Doubt in Deep Causal AI

Authors: Cooper Doyle

Abstract: Accurate individual treatment-effect estimation in high-stakes applications demands both reliable point predictions and interpretable uncertainty quantification. We propose a factorized Monte Carlo Dropout framework for deep twin-network models that splits total predictive variance into representation uncertainty (sigma_rep) in the shared encoder and prediction uncertainty (sigma_pred) in the outcome heads. Across three synthetic covariate-shift regimes, our intervals are well-calibrated (ECE < 0.03) and satisfy sigma_rep^2 + sigma_pred^2 ~ sigma_tot^2. Additionally, we observe a crossover: head uncertainty leads on in-distribution data, but representation uncertainty dominates under shift. Finally, on a real-world twins cohort with induced multivariate shifts, only sigma_rep spikes on out-of-distribution samples (delta sigma ~ 0.0002) and becomes the primary error predictor (rho_rep <= 0.89), while sigma_pred remains flat. This module-level decomposition offers a practical diagnostic for detecting and interpreting uncertainty sources in deep causal-effect models.

new Scientific Machine Learning of Chaotic Systems Discovers Governing Equations for Neural Populations

Authors: Anthony G. Chesebro, David Hofmann, Vaibhav Dixit, Earl K. Miller, Richard H. Granger, Alan Edelman, Christopher V. Rackauckas, Lilianne R. Mujica-Parodi, Helmut H. Strey

Abstract: Discovering governing equations that describe complex chaotic systems remains a fundamental challenge in physics and neuroscience. Here, we introduce the PEM-UDE method, which combines the prediction-error method with universal differential equations to extract interpretable mathematical expressions from chaotic dynamical systems, even with limited or noisy observations. This approach succeeds where traditional techniques fail by smoothing optimization landscapes and removing the chaotic properties during the fitting process without distorting optimal parameters. We demonstrate its efficacy by recovering hidden states in the Rossler system and reconstructing dynamics from noise-corrupted electrical circuit data, where the correct functional form of the dynamics is recovered even when one of the observed time series is corrupted by noise 5x the magnitude of the true signal. We demonstrate that this method is capable of recovering the correct dynamics, whereas direct symbolic regression methods, such as SINDy, fail to do so with the given amount of data and noise. Importantly, when applied to neural populations, our method derives novel governing equations that respect biological constraints such as network sparsity - a constraint necessary for cortical information processing yet not captured in next-generation neural mass models - while preserving microscale neuronal parameters. These equations predict an emergent relationship between connection density and both oscillation frequency and synchrony in neural circuits. We validate these predictions using three intracranial electrode recording datasets from the medial entorhinal cortex, prefrontal cortex, and orbitofrontal cortex. Our work provides a pathway to develop mechanistic, multi-scale brain models that generalize across diverse neural architectures, bridging the gap between single-neuron dynamics and macroscale brain activity.

new MLASDO: a software tool to detect and explain clinical and omics inconsistencies applied to the Parkinson's Progression Markers Initiative cohort

Authors: Jos\'e A. Pardo, Tom\'as Bernal, Jaime \~Niguez, Ana Luisa Gil-Mart\'inez, Laura Iba\~nez, Jos\'e T. Palma, Juan A. Bot\'ia, Alicia G\'omez-Pascual

Abstract: Inconsistencies between clinical and omics data may arise within medical cohorts. The identification, annotation and explanation of anomalous omics-based patients or individuals may become crucial to better reshape the disease, e.g., by detecting early onsets signaled by the omics and undetectable from observable symptoms. Here, we developed MLASDO (Machine Learning based Anomalous Sample Detection on Omics), a new method and software tool to identify, characterize and automatically describe anomalous samples based on omics data. Its workflow is based on three steps: (1) classification of healthy and cases individuals using a support vector machine algorithm; (2) detection of anomalous samples within groups; (3) explanation of anomalous individuals based on clinical data and expert knowledge. We showcase MLASDO using transcriptomics data of 317 healthy controls (HC) and 465 Parkinson's disease (PD) cases from the Parkinson's Progression Markers Initiative. In this cohort, MLASDO detected 15 anomalous HC with a PD-like transcriptomic signature and PD-like clinical features, including a lower proportion of CD4/CD8 naive T-cells and CD4 memory T-cells compared to HC (P<3.5*10^-3). MLASDO also identified 22 anomalous PD cases with a transcriptomic signature more similar to that of HC and some clinical features more similar to HC, including a lower proportion of mature neutrophils compared to PD cases (P<6*10^-3). In summary, MLASDO is a powerful tool that can help the clinician to detect and explain anomalous HC and cases of interest to be followed up. MLASDO is an open-source R package available at: https://github.com/JoseAdrian3/MLASDO.

URLs: https://github.com/JoseAdrian3/MLASDO.

new When Network Architecture Meets Physics: Deep Operator Learning for Coupled Multiphysics

Authors: Kazuma Kobayashi, Jaewan Park, Qibang Liu, Seid Koric, Diab Abueidda, Syed Bahauddin Alam

Abstract: Scientific applications increasingly demand real-time surrogate models that can capture the behavior of strongly coupled multiphysics systems driven by multiple input functions, such as in thermo-mechanical and electro-thermal processes. While neural operator frameworks, such as Deep Operator Networks (DeepONets), have shown considerable success in single-physics settings, their extension to multiphysics problems remains poorly understood. In particular, the challenge of learning nonlinear interactions between tightly coupled physical fields has received little systematic attention. This study addresses a foundational question: should the architectural design of a neural operator reflect the strength of physical coupling it aims to model? To answer this, we present the first comprehensive, architecture-aware evaluation of DeepONet variants across three regimes: single-physics, weakly coupled, and strongly coupled multiphysics systems. We consider a reaction-diffusion equation with dual spatial inputs, a nonlinear thermo-electrical problem with bidirectional coupling through temperature-dependent conductivity, and a viscoplastic thermo-mechanical model of steel solidification governed by transient phase-driven interactions. Two operator-learning frameworks, the classical DeepONet and its sequential GRU-based extension, S-DeepONet, are benchmarked using both single-branch and multi-branch (MIONet-style) architectures. Our results demonstrate that architectural alignment with physical coupling is crucial: single-branch networks significantly outperform multi-branch counterparts in strongly coupled settings, whereas multi-branch encodings offer advantages for decoupled or single-physics problems. Once trained, these surrogates achieve full-field predictions up to 1.8e4 times faster than high-fidelity finite-element solvers, without compromising solution accuracy.

new Re-Emergent Misalignment: How Narrow Fine-Tuning Erodes Safety Alignment in LLMs

Authors: Jeremiah Giordani

Abstract: Recent work has shown that fine-tuning large language models (LLMs) on code with security vulnerabilities can result in misaligned and unsafe behaviors across broad domains. These results prompted concerns about the emergence of harmful behaviors from narrow domain fine-tuning. In this paper, we contextualize these findings by analyzing how such narrow adaptation impacts the internal mechanisms and behavioral manifestations of LLMs. Through a series of experiments covering output probability distributions, loss and gradient vector geometry, layer-wise activation dynamics, and activation space dimensions, we find that behaviors attributed to "emergent misalignment" may be better interpreted as an erosion of prior alignment. We show that fine tuning on insecure code induces internal changes that oppose alignment. Further, we identify a shared latent dimension in the model's activation space that governs alignment behavior. We show that this space is activated by insecure code and by misaligned responses more generally, revealing how narrow fine-tuning can degrade general safety behavior by interfering with shared internal mechanisms. Our findings offer a mechanistic interpretation for previously observed misalignment phenomena, and highlights the fragility of alignment in LLMs. The results underscore the need for more robust fine-tuning strategies that preserve intended behavior across domains.

new Plugging Attention into Power Grids: Towards Transparent Forecasting

Authors: Eloi Campagne, Itai Zehavi, Yvenn Amara-Ouali, Yannig Goude, Argyris Kalogeratos

Abstract: Accurate electricity consumption forecasting is crucial for ensuring grid stability and optimizing power generation, particularly in increasingly decentralized and complex systems. While classical approaches such as Generalized Additive Models (GAMs) remain widely used, they often fail to capture the spatial dependencies inherent in energy networks. Graph Neural Networks (GNNs) offer a principled framework to incorporate this structure by directly leveraging graph topologies. In this work, we evaluate a broad set of GNN architectures -- including GCN, GraphSAGE, ChebConv, TAG, APPNP, TransformerConv, and Graph Attention Networks (GAT and GATv2) -- on two real-world electricity consumption datasets from France and the UK. Our experiments show that while complex architectures like GATv2 and TransformerConv do not consistently outperform their simpler counterparts, models such as GCN and APPNP achieve strong results in low-data or highly disaggregated settings. Nonetheless, the vanilla GAT remains highly competitive across both datasets and offers an additional interpretability layer via attention mechanisms. We perform a temporal analysis of attention weights, revealing evolving patterns of regional interaction linked to seasonal and meteorological variability. These results highlight that, although attention is not universally superior, it provides valuable explanatory power when spatial dependencies are prominent. Finally, we benchmark ensemble-based expert aggregation strategies, showing that uniform or learned combinations can enhance robustness and outperform individual models under data heterogeneity.

new CosmoBench: A Multiscale, Multiview, Multitask Cosmology Benchmark for Geometric Deep Learning

Authors: Ningyuan Huang, Richard Stiskalek, Jun-Young Lee, Adrian E. Bayer, Charles C. Margossian, Christian Kragh Jespersen, Lucia A. Perez, Lawrence K. Saul, Francisco Villaescusa-Navarro

Abstract: Cosmological simulations provide a wealth of data in the form of point clouds and directed trees. A crucial goal is to extract insights from this data that shed light on the nature and composition of the Universe. In this paper we introduce CosmoBench, a benchmark dataset curated from state-of-the-art cosmological simulations whose runs required more than 41 million core-hours and generated over two petabytes of data. CosmoBench is the largest dataset of its kind: it contains 34 thousand point clouds from simulations of dark matter halos and galaxies at three different length scales, as well as 25 thousand directed trees that record the formation history of halos on two different time scales. The data in CosmoBench can be used for multiple tasks -- to predict cosmological parameters from point clouds and merger trees, to predict the velocities of individual halos and galaxies from their collective positions, and to reconstruct merger trees on finer time scales from those on coarser time scales. We provide several baselines on these tasks, some based on established approaches from cosmological modeling and others rooted in machine learning. For the latter, we study different approaches -- from simple linear models that are minimally constrained by symmetries to much larger and more computationally-demanding models in deep learning, such as graph neural networks. We find that least-squares fits with a handful of invariant features sometimes outperform deep architectures with many more parameters and far longer training times. Still there remains tremendous potential to improve these baselines by combining machine learning and cosmology to fully exploit the data. CosmoBench sets the stage for bridging cosmology and geometric deep learning at scale. We invite the community to push the frontier of scientific discovery by engaging with this dataset, available at https://cosmobench.streamlit.app

URLs: https://cosmobench.streamlit.app

new Predicting Business Angel Early-Stage Decision Making Using AI

Authors: Yan Katcharovski, Andrew L. Maxwell

Abstract: External funding is crucial for early-stage ventures, particularly technology startups that require significant R&D investment. Business angels offer a critical source of funding, but their decision-making is often subjective and resource-intensive for both investor and entrepreneur. Much research has investigated this investment process to find the critical factors angels consider. One such tool, the Critical Factor Assessment (CFA), deployed more than 20,000 times by the Canadian Innovation Centre, has been evaluated post-decision and found to be significantly more accurate than investors' own decisions. However, a single CFA analysis requires three trained individuals and several days, limiting its adoption. This study builds on previous work validating the CFA to investigate whether the constraints inhibiting its adoption can be overcome using a trained AI model. In this research, we prompted multiple large language models (LLMs) to assign the eight CFA factors to a dataset of 600 transcribed, unstructured startup pitches seeking business angel funding with known investment outcomes. We then trained and evaluated machine learning classification models using the LLM-generated CFA scores as input features. Our best-performing model demonstrated high predictive accuracy (85.0% for predicting BA deal/no-deal outcomes) and exhibited significant correlation (Spearman's r = 0.896, p-value < 0.001) with conventional human-graded evaluations. The integration of AI-based feature extraction with a structured and validated decision-making framework yielded a scalable, reliable, and less-biased model for evaluating startup pitches, removing the constraints that previously limited adoption.

new FAROS: Fair Graph Generation via Attribute Switching Mechanisms

Authors: Abdennacer Badaoui, Oussama Kharouiche, Hatim Mrabet, Daniele Malitesta, Fragkiskos D. Malliaros

Abstract: Recent advancements in graph diffusion models (GDMs) have enabled the synthesis of realistic network structures, yet ensuring fairness in the generated data remains a critical challenge. Existing solutions attempt to mitigate bias by re-training the GDMs with ad-hoc fairness constraints. Conversely, with this work, we propose FAROS, a novel FAir graph geneRatiOn framework leveraging attribute Switching mechanisms and directly running in the generation process of the pre-trained GDM. Technically, our approach works by altering nodes' sensitive attributes during the generation. To this end, FAROS calculates the optimal fraction of switching nodes, and selects the diffusion step to perform the switch by setting tailored multi-criteria constraints to preserve the node-topology profile from the original distribution (a proxy for accuracy) while ensuring the edge independence on the sensitive attributes for the generated graph (a proxy for fairness). Our experiments on benchmark datasets for link prediction demonstrate that the proposed approach effectively reduces fairness discrepancies while maintaining comparable (or even higher) accuracy performance to other similar baselines. Noteworthy, FAROS is also able to strike a better accuracy-fairness trade-off than other competitors in some of the tested settings under the Pareto optimality concept, demonstrating the effectiveness of the imposed multi-criteria constraints.

new Skewed Score: A statistical framework to assess autograders

Authors: Magda Dubois, Harry Coppock, Mario Giulianelli, Lennart Luettgau, Cozmin Ududec

Abstract: The evaluation of large language model (LLM) outputs is increasingly performed by other LLMs, a setup commonly known as "LLM-as-a-judge", or autograders. While autograders offer a scalable alternative to human evaluation, they have shown mixed reliability and may exhibit systematic biases, depending on response type, scoring methodology, domain specificity, and other factors. In this paper we propose a statistical framework based on Bayesian generalised linear models (GLMs) that enables researchers to simultaneously assess their autograders while also addressing their primary research questions (e.g., LLM evaluation). Our approach models evaluation outcomes (e.g., scores or pairwise preferences) as a function of properties of the grader (e.g., human vs. autograder) and the evaluated item (e.g., response length or the LLM that generated it), allowing for explicit quantification of scoring differences and potential biases within a unified framework. In addition, our method can be used to augment traditional reliability metrics such as inter-rater agreement, by providing uncertainty estimates and clarifying the source of disagreement. Overall, this approach contributes to more robust and interpretable use of autograders in LLM evaluation, enabling both performance analysis and bias detection.

new Effective Capacitance Modeling Using Graph Neural Networks

Authors: Eren Dogan, Matthew R. Guthaus

Abstract: Static timing analysis is a crucial stage in the VLSI design flow that verifies the timing correctness of circuits. Timing analysis depends on the placement and routing of the design, but at the same time, placement and routing efficiency depend on the final timing performance. VLSI design flows can benefit from timing-related prediction to better perform the earlier stages of the design flow. Effective capacitance is an essential input for gate delay calculation, and finding exact values requires routing or routing estimates. In this work, we propose the first GNN-based post-layout effective capacitance modeling method, GNN-Ceff, that achieves significant speed gains due to GPU parallelization while also providing better accuracy than current heuristics. GNN-Ceff parallelization achieves 929x speedup on real-life benchmarks over the state-of-the-art method run serially.

new IMPACT: Importance-Aware Activation Space Reconstruction

Authors: Md Mokarram Chowdhury, Daniel Agyei Asante, Ernie Chang, Yang Li

Abstract: Large language models (LLMs) achieve strong performance across many domains but are difficult to deploy in resource-constrained settings due to their size. Low-rank weight matrix compression is a popular strategy for reducing model size, typically by minimizing weight reconstruction error under the assumption that weights are low-rank. However, this assumption often does not hold in LLMs. Instead, LLM activations exhibit stronger low-rank structure-prompting a shift toward minimizing activation reconstruction error. We show that this shift alone is insufficient: activation dimensions contribute unequally to model performance, and uniform reconstruction can harm performance. We propose IMPACT, a principled framework for importance-aware activation reconstruction that links model compression decisions to their impact on model behavior. IMPACT formulates an optimization problem that considers both activation structure and gradient sensitivity, and derives a closed-form solution where the optimal reconstruction bases are the eigenvectors of an importance-weighted activation covariance matrix. This enables low-rank approximations explicitly optimized to preserve accuracy. Experiments across diverse models and tasks show that IMPACT achieves up to 48.6% greater model size reduction with accuracy comparable to state-of-the-art baselines.

new MatRL: Provably Generalizable Iterative Algorithm Discovery via Monte-Carlo Tree Search

Authors: Sungyoon Kim, Rajat Vadiraj Dwaraknath, Longling geng, Mert Pilanci

Abstract: Iterative methods for computing matrix functions have been extensively studied and their convergence speed can be significantly improved with the right tuning of parameters and by mixing different iteration types. Handtuning the design options for optimal performance can be cumbersome, especially in modern computing environments: numerous different classical iterations and their variants exist, each with non-trivial per-step cost and tuning parameters. To this end, we propose MatRL -- a reinforcement learning based framework that automatically discovers iterative algorithms for computing matrix functions. The key idea is to treat algorithm design as a sequential decision-making process. Monte-Carlo tree search is then used to plan a hybrid sequence of matrix iterations and step sizes, tailored to a specific input matrix distribution and computing environment. Moreover, we also show that the learned algorithms provably generalize to sufficiently large matrices drawn from the same distribution. Finally, we corroborate our theoretical results with numerical experiments demonstrating that MatRL produces algorithms that outperform various baselines in the literature.

new Distributed Equivariant Graph Neural Networks for Large-Scale Electronic Structure Prediction

Authors: Manasa Kaniselvan, Alexander Maeder, Chen Hao Xia, Alexandros Nikolaos Ziogas, Mathieu Luisier

Abstract: Equivariant Graph Neural Networks (eGNNs) trained on density-functional theory (DFT) data can potentially perform electronic structure prediction at unprecedented scales, enabling investigation of the electronic properties of materials with extended defects, interfaces, or exhibiting disordered phases. However, as interactions between atomic orbitals typically extend over 10+ angstroms, the graph representations required for this task tend to be densely connected, and the memory requirements to perform training and inference on these large structures can exceed the limits of modern GPUs. Here we present a distributed eGNN implementation which leverages direct GPU communication and introduce a partitioning strategy of the input graph to reduce the number of embedding exchanges between GPUs. Our implementation shows strong scaling up to 128 GPUs, and weak scaling up to 512 GPUs with 87% parallel efficiency for structures with 3,000 to 190,000 atoms on the Alps supercomputer.

new Regularizing Log-Linear Cost Models for Inpatient Stays by Merging ICD-10 Codes

Authors: Chi-Ken Lu, David Alonge, Nicole Richardson, Bruno Richard

Abstract: Cost models in healthcare research must balance interpretability, accuracy, and parameter consistency. However, interpretable models often struggle to achieve both accuracy and consistency. Ordinary least squares (OLS) models for high-dimensional regression can be accurate but fail to produce stable regression coefficients over time when using highly granular ICD-10 diagnostic codes as predictors. This instability arises because many ICD-10 codes are infrequent in healthcare datasets. While regularization methods such as Ridge can address this issue, they risk discarding important predictors. Here, we demonstrate that reducing the granularity of ICD-10 codes is an effective regularization strategy within OLS while preserving the representation of all diagnostic code categories. By truncating ICD-10 codes from seven characters (e.g., T67.0XXA, T67.0XXD) to six (e.g., T67.0XX) or fewer, we reduce the dimensionality of the regression problem while maintaining model interpretability and consistency. Mathematically, the merging of predictors in OLS leads to increased trace of the Hessian matrix, which reduces the variance of coefficient estimation. Our findings explain why broader diagnostic groupings like DRGs and HCC codes are favored over highly granular ICD-10 codes in real-world risk adjustment and cost models.

new KEA Explain: Explanations of Hallucinations using Graph Kernel Analysis

Authors: Reilly Haskins, Ben Adams

Abstract: Large Language Models (LLMs) frequently generate hallucinations: statements that are syntactically plausible but lack factual grounding. This research presents KEA (Kernel-Enriched AI) Explain: a neurosymbolic framework that detects and explains such hallucinations by comparing knowledge graphs constructed from LLM outputs with ground truth data from Wikidata or contextual documents. Using graph kernels and semantic clustering, the method provides explanations for detected hallucinations, ensuring both robustness and interpretability. Our framework achieves competitive accuracy in detecting hallucinations across both open- and closed-domain tasks, and is able to generate contrastive explanations, enhancing transparency. This research advances the reliability of LLMs in high-stakes domains and provides a foundation for future work on precision improvements and multi-source knowledge integration.

new OrbitAll: A Unified Quantum Mechanical Representation Deep Learning Framework for All Molecular Systems

Authors: Beom Seok Kang, Vignesh C. Bhethanabotla, Amin Tavakoli, Maurice D. Hanisch, William A. Goddard III, Anima Anandkumar

Abstract: Despite the success of deep learning methods in quantum chemistry, their representational capacity is most often confined to neutral, closed-shell molecules. However, real-world chemical systems often exhibit complex characteristics, including varying charges, spins, and environments. We introduce OrbitAll, a geometry- and physics-informed deep learning framework that can represent all molecular systems with electronic structure information. OrbitAll utilizes spin-polarized orbital features from the underlying quantum mechanical method, and combines it with graph neural networks satisfying SE(3)-equivariance. The resulting framework can represent and process any molecular system with arbitrary charges, spins, and environmental effects. OrbitAll demonstrates superior performance and generalization on predicting charged, open-shell, and solvated molecules, while also robustly extrapolating to molecules significantly larger than the training data by leveraging a physics-informed architecture. OrbitAll achieves chemical accuracy using 10 times fewer training data than competing AI models, with a speedup of approximately $10^3$ - $10^4$ compared to density functional theory.

new Latent FxLMS: Accelerating Active Noise Control with Neural Adaptive Filters

Authors: Kanad Sarkar, Austin Lu, Manan Mittal, Yongjie Zhuang, Ryan Corey, Andrew Singer

Abstract: Filtered-X LMS (FxLMS) is commonly used for active noise control (ANC), wherein the soundfield is minimized at a desired location. Given prior knowledge of the spatial region of the noise or control sources, we could improve FxLMS by adapting along the low-dimensional manifold of possible adaptive filter weights. We train an auto-encoder on the filter coefficients of the steady-state adaptive filter for each primary source location sampled from a given spatial region and constrain the weights of the adaptive filter to be the output of the decoder for a given state of latent variables. Then, we perform updates in the latent space and use the decoder to generate the cancellation filter. We evaluate how various neural network constraints and normalization techniques impact the convergence speed and steady-state mean squared error. Under certain conditions, our Latent FxLMS model converges in fewer steps with comparable steady-state error to the standard FxLMS.

new Transformer with Koopman-Enhanced Graph Convolutional Network for Spatiotemporal Dynamics Forecasting

Authors: Zekai Wang, Bing Yao

Abstract: Spatiotemporal dynamics forecasting is inherently challenging, particularly in systems defined over irregular geometric domains, due to the need to jointly capture complex spatial correlations and nonlinear temporal dynamics. To tackle these challenges, we propose TK-GCN, a two-stage framework that integrates geometry-aware spatial encoding with long-range temporal modeling. In the first stage, a Koopman-enhanced Graph Convolutional Network (K-GCN) is developed to embed the high-dimensional dynamics distributed on spatially irregular domains into a latent space where the evolution of system states is approximately linear. By leveraging Koopman operator theory, this stage enhances the temporal consistency during the latent learning. In the second stage, a Transformer module is employed to model the temporal progression within the Koopman-encoded latent space. Through the self-attention mechanism, the Transformer captures long-range temporal dependencies, enabling accurate forecasting over extended horizons. We evaluate TK-GCN in spatiotemporal cardiac dynamics forecasting and benchmark its performance against several state-of-the-art baselines. Experimental results and ablation studies show that TK-GCN consistently delivers superior predictive accuracy across a range of forecast horizons, demonstrating its capability to effectively model complex spatial structures and nonlinear temporal dynamics.

new Taylor-Model Physics-Informed Neural Networks (PINNs) for Ordinary Differential Equations

Authors: Chandra Kanth Nagesh, Sriram Sankaranarayanan, Ramneet Kaur, Tuhin Sahai, Susmit Jha

Abstract: We study the problem of learning neural network models for Ordinary Differential Equations (ODEs) with parametric uncertainties. Such neural network models capture the solution to the ODE over a given set of parameters, initial conditions, and range of times. Physics-Informed Neural Networks (PINNs) have emerged as a promising approach for learning such models that combine data-driven deep learning with symbolic physics models in a principled manner. However, the accuracy of PINNs degrade when they are used to solve an entire family of initial value problems characterized by varying parameters and initial conditions. In this paper, we combine symbolic differentiation and Taylor series methods to propose a class of higher-order models for capturing the solutions to ODEs. These models combine neural networks and symbolic terms: they use higher order Lie derivatives and a Taylor series expansion obtained symbolically, with the remainder term modeled as a neural network. The key insight is that the remainder term can itself be modeled as a solution to a first-order ODE. We show how the use of these higher order PINNs can improve accuracy using interesting, but challenging ODE benchmarks. We also show that the resulting model can be quite useful for situations such as controlling uncertain physical systems modeled as ODEs.

new Enhanced accuracy through ensembling of randomly initialized auto-regressive models for time-dependent PDEs

Authors: Ishan Khurjekar, Indrashish Saha, Lori Graham-Brady, Somdatta Goswami

Abstract: Systems governed by partial differential equations (PDEs) require computationally intensive numerical solvers to predict spatiotemporal field evolution. While machine learning (ML) surrogates offer faster solutions, autoregressive inference with ML models suffer from error accumulation over successive predictions, limiting their long-term accuracy. We propose a deep ensemble framework to address this challenge, where multiple ML surrogate models with random weight initializations are trained in parallel and aggregated during inference. This approach leverages the diversity of model predictions to mitigate error propagation while retaining the autoregressive strategies ability to capture the system's time dependent relations. We validate the framework on three PDE-driven dynamical systems - stress evolution in heterogeneous microstructures, Gray-Scott reaction-diffusion, and planetary-scale shallow water system - demonstrating consistent reduction in error accumulation over time compared to individual models. Critically, the method requires only a few time steps as input, enabling full trajectory predictions with inference times significantly faster than numerical solvers. Our results highlight the robustness of ensemble methods in diverse physical systems and their potential as efficient and accurate alternatives to traditional solvers. The codes for this work are available on GitHub (https://github.com/Graham-Brady-Research-Group/AutoregressiveEnsemble_SpatioTemporal_Evolution).

URLs: https://github.com/Graham-Brady-Research-Group/AutoregressiveEnsemble_SpatioTemporal_Evolution).

new A Rigorous Behavior Assessment of CNNs Using a Data-Domain Sampling Regime

Authors: Shuning Jiang, Wei-Lun Chao, Daniel Haehn, Hanspeter Pfister, Jian Chen

Abstract: We present a data-domain sampling regime for quantifying CNNs' graphic perception behaviors. This regime lets us evaluate CNNs' ratio estimation ability in bar charts from three perspectives: sensitivity to training-test distribution discrepancies, stability to limited samples, and relative expertise to human observers. After analyzing 16 million trials from 800 CNNs models and 6,825 trials from 113 human participants, we arrived at a simple and actionable conclusion: CNNs can outperform humans and their biases simply depend on the training-test distance. We show evidence of this simple, elegant behavior of the machines when they interpret visualization images. osf.io/gfqc3 provides registration, the code for our sampling regime, and experimental results.

new Enhancing Adaptive Behavioral Interventions with LLM Inference from Participant-Described States

Authors: Karine Karine, Benjamin M. Marlin

Abstract: The use of reinforcement learning (RL) methods to support health behavior change via personalized and just-in-time adaptive interventions is of significant interest to health and behavioral science researchers focused on problems such as smoking cessation support and physical activity promotion. However, RL methods are often applied to these domains using a small collection of context variables to mitigate the significant data scarcity issues that arise from practical limitations on the design of adaptive intervention trials. In this paper, we explore an approach to significantly expanding the state space of an adaptive intervention without impacting data efficiency. The proposed approach enables intervention participants to provide natural language descriptions of aspects of their current state. It then leverages inference with pre-trained large language models (LLMs) to better align the policy of a base RL method with these state descriptions. To evaluate our method, we develop a novel physical activity intervention simulation environment that generates text-based state descriptions conditioned on latent state variables using an auxiliary LLM. We show that this approach has the potential to significantly improve the performance of online policy learning methods.

new Unraveling the Black-box Magic: An Analysis of Neural Networks' Dynamic Local Extrema

Authors: Shengjian Chen

Abstract: We point out that neural networks are not black boxes, and their generalization stems from the ability to dynamically map a dataset to the local extrema of the model function. We further prove that the number of local extrema in a neural network is positively correlated with the number of its parameters, and on this basis, we give a new algorithm that is different from the back-propagation algorithm, which we call the extremum-increment algorithm. Some difficult situations, such as gradient vanishing and overfitting, can be reasonably explained and dealt with in this framework.

new GenAI-Powered Inference

Authors: Kosuke Imai, Kentaro Nakamura

Abstract: We introduce GenAI-Powered Inference (GPI), a statistical framework for both causal and predictive inference using unstructured data, including text and images. GPI leverages open-source Generative Artificial Intelligence (GenAI) models - such as large language models and diffusion models - not only to generate unstructured data at scale but also to extract low-dimensional representations that capture their underlying structure. Applying machine learning to these representations, GPI enables estimation of causal and predictive effects while quantifying associated estimation uncertainty. Unlike existing approaches to representation learning, GPI does not require fine-tuning of generative models, making it computationally efficient and broadly accessible. We illustrate the versatility of the GPI framework through three applications: (1) analyzing Chinese social media censorship, (2) estimating predictive effects of candidates' facial appearance on electoral outcomes, and (3) assessing the persuasiveness of political rhetoric. An open-source software package is available for implementing GPI.

new Transformer Model for Alzheimer's Disease Progression Prediction Using Longitudinal Visit Sequences

Authors: Mahdi Moghaddami, Clayton Schubring, Mohammad-Reza Siadat

Abstract: Alzheimer's disease (AD) is a neurodegenerative disorder with no known cure that affects tens of millions of people worldwide. Early detection of AD is critical for timely intervention to halt or slow the progression of the disease. In this study, we propose a Transformer model for predicting the stage of AD progression at a subject's next clinical visit using features from a sequence of visits extracted from the subject's visit history. We also rigorously compare our model to recurrent neural networks (RNNs) such as long short-term memory (LSTM), gated recurrent unit (GRU), and minimalRNN and assess their performances based on factors such as the length of prior visits and data imbalance. We test the importance of different feature categories and visit history, as well as compare the model to a newer Transformer-based model optimized for time series. Our model demonstrates strong predictive performance despite missing visits and missing features in available visits, particularly in identifying converter subjects -- individuals transitioning to more severe disease stages -- an area that has posed significant challenges in longitudinal prediction. The results highlight the model's potential in enhancing early diagnosis and patient outcomes.

new Risk-sensitive Actor-Critic with Static Spectral Risk Measures for Online and Offline Reinforcement Learning

Authors: Mehrdad Moghimi, Hyejin Ku

Abstract: The development of Distributional Reinforcement Learning (DRL) has introduced a natural way to incorporate risk sensitivity into value-based and actor-critic methods by employing risk measures other than expectation in the value function. While this approach is widely adopted in many online and offline RL algorithms due to its simplicity, the naive integration of risk measures often results in suboptimal policies. This limitation can be particularly harmful in scenarios where the need for effective risk-sensitive policies is critical and worst-case outcomes carry severe consequences. To address this challenge, we propose a novel framework for optimizing static Spectral Risk Measures (SRM), a flexible family of risk measures that generalizes objectives such as CVaR and Mean-CVaR, and enables the tailoring of risk preferences. Our method is applicable to both online and offline RL algorithms. We establish theoretical guarantees by proving convergence in the finite state-action setting. Moreover, through extensive empirical evaluations, we demonstrate that our algorithms consistently outperform existing risk-sensitive methods in both online and offline environments across diverse domains.

new Return of the Latent Space COWBOYS: Re-thinking the use of VAEs for Bayesian Optimisation of Structured Spaces

Authors: Henry B. Moss, Sebastian W. Ober, Tom Diethe

Abstract: Bayesian optimisation in the latent space of a Variational AutoEncoder (VAE) is a powerful framework for optimisation tasks over complex structured domains, such as the space of scientifically interesting molecules. However, existing approaches tightly couple the surrogate and generative models, which can lead to suboptimal performance when the latent space is not tailored to specific tasks, which in turn has led to the proposal of increasingly sophisticated algorithms. In this work, we explore a new direction, instead proposing a decoupled approach that trains a generative model and a Gaussian Process (GP) surrogate separately, then combines them via a simple yet principled Bayesian update rule. This separation allows each component to focus on its strengths -- structure generation from the VAE and predictive modelling by the GP. We show that our decoupled approach improves our ability to identify high-potential candidates in molecular optimisation problems under constrained evaluation budgets.

new Consistency-Aware Padding for Incomplete Multi-Modal Alignment Clustering Based on Self-Repellent Greedy Anchor Search

Authors: Shubin Ma, Liang Zhao, Mingdong Lu, Yifan Guo, Bo Xu

Abstract: Multimodal representation is faithful and highly effective in describing real-world data samples' characteristics by describing their complementary information. However, the collected data often exhibits incomplete and misaligned characteristics due to factors such as inconsistent sensor frequencies and device malfunctions. Existing research has not effectively addressed the issue of filling missing data in scenarios where multiview data are both imbalanced and misaligned. Instead, it relies on class-level alignment of the available data. Thus, it results in some data samples not being well-matched, thereby affecting the quality of data fusion. In this paper, we propose the Consistency-Aware Padding for Incomplete Multimodal Alignment Clustering Based on Self-Repellent Greedy Anchor Search(CAPIMAC) to tackle the problem of filling imbalanced and misaligned data in multimodal datasets. Specifically, we propose a self-repellent greedy anchor search module(SRGASM), which employs a self-repellent random walk combined with a greedy algorithm to identify anchor points for re-representing incomplete and misaligned multimodal data. Subsequently, based on noise-contrastive learning, we design a consistency-aware padding module (CAPM) to effectively interpolate and align imbalanced and misaligned data, thereby improving the quality of multimodal data fusion. Experimental results demonstrate the superiority of our method over benchmark datasets. The code will be publicly released at https://github.com/Autism-mm/CAPIMAC.git.

URLs: https://github.com/Autism-mm/CAPIMAC.git.

new Combining Graph Neural Networks and Mixed Integer Linear Programming for Molecular Inference under the Two-Layered Model

Authors: Jianshen Zhu, Naveed Ahmed Azam, Kazuya Haraguchi, Liang Zhao, Tatsuya Akutsu

Abstract: Recently, a novel two-phase framework named mol-infer for inference of chemical compounds with prescribed abstract structures and desired property values has been proposed. The framework mol-infer is primarily based on using mixed integer linear programming (MILP) to simulate the computational process of machine learning methods and describe the necessary and sufficient conditions to ensure such a chemical graph exists. The existing approaches usually first convert the chemical compounds into handcrafted feature vectors to construct prediction functions, but because of the limit on the kinds of descriptors originated from the need for tractability in the MILP formulation, the learning performances on datasets of some properties are not good enough. A lack of good learning performance can greatly lower the quality of the inferred chemical graphs, and thus improving learning performance is of great importance. On the other hand, graph neural networks (GNN) offer a promising machine learning method to directly utilize the chemical graphs as the input, and many existing GNN-based approaches to the molecular property prediction problem have shown that they can enjoy better learning performances compared to the traditional approaches that are based on feature vectors. In this study, we develop a molecular inference framework based on mol-infer, namely mol-infer-GNN, that utilizes GNN as the learning method while keeping the great flexibility originated from the two-layered model on the abstract structure of the chemical graph to be inferred. We conducted computational experiments on the QM9 dataset to show that our proposed GNN model can obtain satisfying learning performances for some properties despite its simple structure, and can infer small chemical graphs comprising up to 20 non-hydrogen atoms within reasonable computational time.

new MCST-Mamba: Multivariate Mamba-Based Model for Traffic Prediction

Authors: Mohamed Hamad, Mohamed Mabrok, Nizar Zorba

Abstract: Accurate traffic prediction plays a vital role in intelligent transportation systems by enabling efficient routing, congestion mitigation, and proactive traffic control. However, forecasting is challenging due to the combined effects of dynamic road conditions, varying traffic patterns across different locations, and external influences such as weather and accidents. Traffic data often consists of several interrelated measurements - such as speed, flow and occupancy - yet many deep-learning approaches either predict only one of these variables or require a separate model for each. This limits their ability to capture joint patterns across channels. To address this, we introduce the Multi-Channel Spatio-Temporal (MCST) Mamba model, a forecasting framework built on the Mamba selective state-space architecture that natively handles multivariate inputs and simultaneously models all traffic features. The proposed MCST-Mamba model integrates adaptive spatio-temporal embeddings and separates the modeling of temporal sequences and spatial sensor interactions into two dedicated Mamba blocks, improving representation learning. Unlike prior methods that evaluate on a single channel, we assess MCST-Mamba across all traffic features at once, aligning more closely with how congestion arises in practice. Our results show that MCST-Mamba achieves strong predictive performance with a lower parameter count compared to baseline models.

new Graph Collaborative Attention Network for Link Prediction in Knowledge Graphs

Authors: Thanh Hoang-Minh (Department of Information Technology, VNUHCM - University of Science)

Abstract: Knowledge graphs offer a structured representation of real-world entities and their relationships, enabling a wide range of applications from information retrieval to automated reasoning. In this paper, we conduct a systematic comparison between traditional rule-based approaches and modern deep learning methods for link prediction. We focus on KBGAT, a graph neural network model that leverages multi-head attention to jointly encode both entity and relation features within local neighborhood structures. To advance this line of research, we introduce \textbf{GCAT} (Graph Collaborative Attention Network), a refined model that enhances context aggregation and interaction between heterogeneous nodes. Experimental results on four widely-used benchmark datasets demonstrate that GCAT not only consistently outperforms rule-based methods but also achieves competitive or superior performance compared to existing neural embedding models. Our findings highlight the advantages of attention-based architectures in capturing complex relational patterns for knowledge graph completion tasks.

new Real-TabPFN: Improving Tabular Foundation Models via Continued Pre-training With Real-World Data

Authors: Anurag Garg, Muhammad Ali, Noah Hollmann, Lennart Purucker, Samuel M\"uller, Frank Hutter

Abstract: Foundation models for tabular data, like TabPFN, achieve strong performance on small datasets when pre-trained solely on synthetic data. We show that this performance can be significantly boosted by a targeted continued pre-training phase. Specifically, we demonstrate that leveraging a small, curated collection of large, real-world datasets for continued pre-training yields superior downstream predictive accuracy compared to using broader, potentially noisier corpora like CommonCrawl or GitTables. Our resulting model, Real-TabPFN, achieves substantial performance gains on 29 datasets from the OpenML AutoML Benchmark.

new Fast Re-Trainable Attention Autoencoder for Liquid Sensor Anomaly Detection at the Edge

Authors: Seongyun Choi

Abstract: A lightweight, edge-deployable pipeline is proposed for detecting sensor anomalies in chemistry and biology laboratories. A custom PCB captures seven sensor channels and streams them over the local network. An Attention-based One-Class Autoencoder reaches a usable state after training on only thirty minutes of normal data. Despite the small data set, the model already attains an F1 score of 0.72, a precision of 0.89, and a recall of 0.61 when tested on synthetic micro-anomalies. The trained network is converted into a TensorFlow-Lite binary of about 31 kB and runs on an Advantech ARK-1221L, a fan-less x86 edge device without AVX instructions; end-to-end inference latency stays below two seconds. The entire collect-train-deploy workflow finishes within one hour, which demonstrates that the pipeline adapts quickly whenever a new liquid or sensor is introduced.

new Seamlessly Integrating Tree-Based Positional Embeddings into Transformer Models for Source Code Representation

Authors: Patryk Bartkowiak, Filip Grali\'nski

Abstract: Transformer-based models have demonstrated significant success in various source code representation tasks. Nonetheless, traditional positional embeddings employed by these models inadequately capture the hierarchical structure intrinsic to source code, typically represented as Abstract Syntax Trees (ASTs). To address this, we propose a novel tree-based positional embedding approach that explicitly encodes hierarchical relationships derived from ASTs, including node depth and sibling indices. These hierarchical embeddings are integrated into the transformer architecture, specifically enhancing the CodeBERTa model. We thoroughly evaluate our proposed model through masked language modeling (MLM) pretraining and clone detection fine-tuning tasks. Experimental results indicate that our Tree-Enhanced CodeBERTa consistently surpasses the baseline model in terms of loss, accuracy, F1 score, precision, and recall, emphasizing the importance of incorporating explicit structural information into transformer-based representations of source code.

new Commute Networks as a Signature of Urban Socioeconomic Performance: Evaluating Mobility Structures with Deep Learning Models

Authors: Devashish Khulbe, Alexander Belyi, Stanislav Sobolevsky

Abstract: Urban socioeconomic modeling has predominantly concentrated on extensive location and neighborhood-based features, relying on the localized population footprint. However, networks in urban systems are common, and many urban modeling methods don't account for network-based effects. In this study, we propose using commute information records from the census as a reliable and comprehensive source to construct mobility networks across cities. Leveraging deep learning architectures, we employ these commute networks across U.S. metro areas for socioeconomic modeling. We show that mobility network structures provide significant predictive performance without considering any node features. Consequently, we use mobility networks to present a supervised learning framework to model a city's socioeconomic indicator directly, combining Graph Neural Network and Vanilla Neural Network models to learn all parameters in a single learning pipeline. Our experiments in 12 major U.S. cities show the proposed model outperforms previous conventional machine learning models. This work provides urban researchers methods to incorporate network effects in urban modeling and informs stakeholders of wider network-based effects in urban policymaking and planning.

new Benchmarking Stochastic Approximation Algorithms for Fairness-Constrained Training of Deep Neural Networks

Authors: Andrii Kliachkin, Jana Lep\v{s}ov\'a, Gilles Bareilles, Jakub Mare\v{c}ek

Abstract: The ability to train Deep Neural Networks (DNNs) with constraints is instrumental in improving the fairness of modern machine-learning models. Many algorithms have been analysed in recent years, and yet there is no standard, widely accepted method for the constrained training of DNNs. In this paper, we provide a challenging benchmark of real-world large-scale fairness-constrained learning tasks, built on top of the US Census (Folktables). We point out the theoretical challenges of such tasks and review the main approaches in stochastic approximation algorithms. Finally, we demonstrate the use of the benchmark by implementing and comparing three recently proposed, but as-of-yet unimplemented, algorithms both in terms of optimization performance, and fairness improvement. We release the code of the benchmark as a Python package at https://github.com/humancompatible/train.

URLs: https://github.com/humancompatible/train.

new Predictive Modeling of Effluent Temperature in SAT Systems Using Ambient Meteorological Data: Implications for Infiltration Management

Authors: Roy Elkayam

Abstract: Accurate prediction of effluent temperature in recharge basins is essential for optimizing the Soil Aquifer Treatment (SAT) process, as temperature directly influences water viscosity and infiltration rates. This study develops and evaluates predictive models for effluent temperature in the upper recharge layer of a Shafdan SAT system recharge basin using ambient meteorological data. Multiple linear regression (MLR), neural networks (NN), and random forests (RF) were tested for their predictive accuracy and interpretability. The MLR model, preferred for its operational simplicity and robust performance, achieved high predictive accuracy (R2 = 0.86-0.87) and was used to estimate effluent temperatures over a 10-year period. Results highlight pronounced seasonal temperature cycles and the importance of topsoil temperature in governing the thermal profile of the infiltrating effluent. The study provides practical equations for real-time monitoring and long-term planning of SAT operations.

new Attributing Data for Sharpness-Aware Minimization

Authors: Chenyang Ren, Yifan Jia, Huanyi Xie, Zhaobin Xu, Tianxing Wei, Liangyu Wang, Lijie Hu, Di Wang

Abstract: Sharpness-aware Minimization (SAM) improves generalization in large-scale model training by linking loss landscape geometry to generalization. However, challenges such as mislabeled noisy data and privacy concerns have emerged as significant issues. Data attribution, which identifies the contributions of specific training samples, offers a promising solution. However, directly rendering existing data influence evaluation tools such as influence functions (IF) to SAM will be inapplicable or inaccurate as SAM utilizes an inner loop to find model perturbations that maximize loss, which the outer loop then minimizes, resulting in a doubled computational structure. Additionally, this bilevel structure complicates the modeling of data influence on the parameters. In this paper, based on the IF, we develop two innovative data valuation methods for SAM, each offering unique benefits in different scenarios: the Hessian-based IF and the Gradient Trajectory-based IF. The first one provides a comprehensive estimation of data influence using a closed-form measure that relies only on the trained model weights. In contrast, the other IF for SAM utilizes gradient trajectory information during training for more accurate and efficient data assessment. Extensive experiments demonstrate their effectiveness in data evaluation and parameter tuning, with applications in identifying mislabeled data, model editing, and enhancing interpretability.

new Accurate and Efficient World Modeling with Masked Latent Transformers

Authors: Maxime Burchi, Radu Timofte

Abstract: The Dreamer algorithm has recently obtained remarkable performance across diverse environment domains by training powerful agents with simulated trajectories. However, the compressed nature of its world model's latent space can result in the loss of crucial information, negatively affecting the agent's performance. Recent approaches, such as $\Delta$-IRIS and DIAMOND, address this limitation by training more accurate world models. However, these methods require training agents directly from pixels, which reduces training efficiency and prevents the agent from benefiting from the inner representations learned by the world model. In this work, we propose an alternative approach to world modeling that is both accurate and efficient. We introduce EMERALD (Efficient MaskEd latent tRAnsformer worLD model), a world model using a spatial latent state with MaskGIT predictions to generate accurate trajectories in latent space and improve the agent performance. On the Crafter benchmark, EMERALD achieves new state-of-the-art performance, becoming the first method to surpass human experts performance within 10M environment steps. Our method also succeeds to unlock all 22 Crafter achievements at least once during evaluation.

new Hierarchical Testing with Rabbit Optimization for Industrial Cyber-Physical Systems

Authors: Jinwei Hu, Zezhi Tang, Xin Jin, Benyuan Zhang, Yi Dong, Xiaowei Huang

Abstract: This paper presents HERO (Hierarchical Testing with Rabbit Optimization), a novel black-box adversarial testing framework for evaluating the robustness of deep learning-based Prognostics and Health Management systems in Industrial Cyber-Physical Systems. Leveraging Artificial Rabbit Optimization, HERO generates physically constrained adversarial examples that align with real-world data distributions via global and local perspective. Its generalizability ensures applicability across diverse ICPS scenarios. This study specifically focuses on the Proton Exchange Membrane Fuel Cell system, chosen for its highly dynamic operational conditions, complex degradation mechanisms, and increasing integration into ICPS as a sustainable and efficient energy solution. Experimental results highlight HERO's ability to uncover vulnerabilities in even state-of-the-art PHM models, underscoring the critical need for enhanced robustness in real-world applications. By addressing these challenges, HERO demonstrates its potential to advance more resilient PHM systems across a wide range of ICPS domains.

new Relational inductive biases on attention mechanisms

Authors: V\'ictor Mijangos, Ximena Gutierrez-Vasques, Ver\'onica E. Arriola, Ulises Rodr\'iguez-Dom\'inguez, Alexis Cervantes, Jos\'e Luis Almanzara

Abstract: Inductive learning aims to construct general models from specific examples, guided by biases that influence hypothesis selection and determine generalization capacity. In this work, we focus on characterizing the relational inductive biases present in attention mechanisms, understood as assumptions about the underlying relationships between data elements. From the perspective of geometric deep learning, we analyze the most common attention mechanisms in terms of their equivariance properties with respect to permutation subgroups, which allows us to propose a classification based on their relational biases. Under this perspective, we show that different attention layers are characterized by the underlying relationships they assume on the input data.

new When Data-Free Knowledge Distillation Meets Non-Transferable Teacher: Escaping Out-of-Distribution Trap is All You Need

Authors: Ziming Hong, Runnan Chen, Zengmao Wang, Bo Han, Bo Du, Tongliang Liu

Abstract: Data-free knowledge distillation (DFKD) transfers knowledge from a teacher to a student without access the real in-distribution (ID) data. Its common solution is to use a generator to synthesize fake data and use them as a substitute for real ID data. However, existing works typically assume teachers are trustworthy, leaving the robustness and security of DFKD from untrusted teachers largely unexplored. In this work, we conduct the first investigation into distilling non-transferable learning (NTL) teachers using DFKD, where the transferability from an ID domain to an out-of-distribution (OOD) domain is prohibited. We find that NTL teachers fool DFKD through divert the generator's attention from the useful ID knowledge to the misleading OOD knowledge. This hinders ID knowledge transfer but prioritizes OOD knowledge transfer. To mitigate this issue, we propose Adversarial Trap Escaping (ATEsc) to benefit DFKD by identifying and filtering out OOD-like synthetic samples. Specifically, inspired by the evidence that NTL teachers show stronger adversarial robustness on OOD samples than ID samples, we split synthetic samples into two groups according to their robustness. The fragile group is treated as ID-like data and used for normal knowledge distillation, while the robust group is seen as OOD-like data and utilized for forgetting OOD knowledge. Extensive experiments demonstrate the effectiveness of ATEsc for improving DFKD against NTL teachers. Code is released at https://github.com/tmllab/2025_ICML_ATEsc.

URLs: https://github.com/tmllab/2025_ICML_ATEsc.

new Graph Neural Networks as a Substitute for Transformers in Single-Cell Transcriptomics

Authors: Jiaxin Qi, Yan Cui, Jinli Ou, Jianqiang Huang, Gaogang Xie

Abstract: Graph Neural Networks (GNNs) and Transformers share significant similarities in their encoding strategies for interacting with features from nodes of interest, where Transformers use query-key scores and GNNs use edges. Compared to GNNs, which are unable to encode relative positions, Transformers leverage dynamic attention capabilities to better represent relative relationships, thereby becoming the standard backbones in large-scale sequential pre-training. However, the subtle difference prompts us to consider: if positions are no longer crucial, could we substitute Transformers with Graph Neural Networks in some fields such as Single-Cell Transcriptomics? In this paper, we first explore the similarities and differences between GNNs and Transformers, specifically in terms of relative positions. Additionally, we design a synthetic example to illustrate their equivalence where there are no relative positions between tokens in the sample. Finally, we conduct extensive experiments on a large-scale position-agnostic dataset-single-cell transcriptomics-finding that GNNs achieve competitive performance compared to Transformers while consuming fewer computation resources. These findings provide novel insights for researchers in the field of single-cell transcriptomics, challenging the prevailing notion that the Transformer is always the optimum choice.

new Structure As Search: Unsupervised Permutation Learning for Combinatorial Optimization

Authors: Yimeng Min, Carla P. Gomes

Abstract: We propose a non-autoregressive framework for the Travelling Salesman Problem where solutions emerge directly from learned permutations without explicit search. By applying a similarity transformation to Hamiltonian cycles, the model learns to approximate permutation matrices via continuous relaxations. Our unsupervised approach achieves competitive performance against classical heuristics, demonstrating that the inherent structure of the problem can effectively guide combinatorial optimization without sequential decision-making.

new Uncertainty Quantification in the Tsetlin Machine

Authors: Runar Helin, Ole-Christoffer Granmo, Mayur Kishor Shende, Lei Jiao, Vladimir I. Zadorozhny, Kunal Ganesh Dumbre, Rishad Shafik, Alex Yakovlev

Abstract: Data modeling using Tsetlin machines (TMs) is all about building logical rules from the data features. The decisions of the model are based on a combination of these logical rules. Hence, the model is fully transparent and it is possible to get explanations of its predictions. In this paper, we present a probability score for TM predictions and develop new techniques for uncertainty quantification to increase the explainability further. The probability score is an inherent property of any TM variant and is derived through an analysis of the TM learning dynamics. Simulated data is used to show a clear connection between the learned TM probability scores and the underlying probabilities of the data. A visualization of the probability scores also reveals that the TM is less confident in its predictions outside the training data domain, which contrasts the typical extrapolation phenomenon found in Artificial Neural Networks. The paper concludes with an application of the uncertainty quantification techniques on an image classification task using the CIFAR-10 dataset, where they provide new insights and suggest possible improvements to current TM image classification models.

new skfolio: Portfolio Optimization in Python

Authors: Carlo Nicolini, Matteo Manzi, Hugo Delatte

Abstract: Portfolio optimization is a fundamental challenge in quantitative finance, requiring robust computational tools that integrate statistical rigor with practical implementation. We present skfolio, an open-source Python library for portfolio construction and risk management that seamlessly integrates with the scikit-learn ecosystem. skfolio provides a unified framework for diverse allocation strategies, from classical mean-variance optimization to modern clustering-based methods, state-of-the-art financial estimators with native interfaces, and advanced cross-validation techniques tailored for financial time series. By adhering to scikit-learn's fit-predict-transform paradigm, the library enables researchers and practitioners to leverage machine learning workflows for portfolio optimization, promoting reproducibility and transparency in quantitative finance.

new JAX-MPM: A Learning-Augmented Differentiable Meshfree Framework for GPU-Accelerated Lagrangian Simulation and Geophysical Inverse Modeling

Authors: Honghui Du, QiZhi He

Abstract: Differentiable programming that enables automatic differentiation through simulation pipelines has emerged as a powerful paradigm in scientific computing, supporting both forward and inverse modeling and facilitating integration with deep learning frameworks. We present JAX-MPM, a general-purpose differentiable meshfree solver within a hybrid Lagrangian-Eulerian framework, tailored for simulating complex continuum mechanics involving large deformations, frictional contact, and inelastic material behavior, with emphasis on geomechanics and geophysical hazard applications. Built on the material point method (MPM) and implemented using the JAX computing framework, JAX-MPM is fully differentiable and GPU-accelerated, enabling efficient gradient-based optimization directly through time-stepping solvers. It supports joint training of physical models and neural networks, allowing the learning of embedded closures and neural constitutive models. We validate JAX-MPM on several 2D and 3D benchmarks, including dam-breaks and granular collapses, demonstrating its accuracy and performance. A high-resolution 3D granular cylinder collapse with 2.7 million particles completes 1000 steps in ~22 seconds (single precision) and ~98 seconds (double precision) on a single GPU. Beyond forward modeling, we demonstrate inverse modeling capabilities such as velocity field reconstruction and spatially varying friction estimation. These results establish JAX-MPM as a unified, scalable platform for differentiable meshfree simulation and data-driven geomechanical inference.

new Predicting Air Pollution in Cork, Ireland Using Machine Learning

Authors: Md Rashidunnabi, Fahmida Faiza Ananna, Kailash Hambarde, Bruno Gabriel Nascimento Andrade, Dean Venables, Hugo Proenca

Abstract: Air pollution poses a critical health threat in cities worldwide, with nitrogen dioxide levels in Cork, Ireland exceeding World Health Organization safety standards by up to $278\%$. This study leverages artificial intelligence to predict air pollution with unprecedented accuracy, analyzing nearly ten years of data from five monitoring stations combined with 30 years of weather records. We evaluated 17 machine learning algorithms, with Extra Trees emerging as the optimal solution, achieving $77\%$ prediction accuracy and significantly outperforming traditional forecasting methods. Our analysis reveals that meteorological conditions particularly temperature, wind speed, and humidity are the primary drivers of pollution levels, while traffic patterns and seasonal changes create predictable pollution cycles. Pollution exhibits dramatic seasonal variations, with winter levels nearly double those of summer, and daily rush-hour peaks reaching $120\%$ above normal levels. While Cork's air quality shows concerning violations of global health standards, our models detected an encouraging $31\%$ improvement from 2014 to 2022. This research demonstrates that intelligent forecasting systems can provide city planners and environmental officials with powerful prediction tools, enabling life-saving early warning systems and informed urban planning decisions. The technology exists today to transform urban air quality management. All research materials and code are freely available at: https://github.com/MdRashidunnabi/Air-Pollution-Analysis.git

URLs: https://github.com/MdRashidunnabi/Air-Pollution-Analysis.git

new An explicit formulation of the learned noise predictor $\epsilon_{\theta}({\bf x}_t, t)$ via the forward-process noise $\epsilon_{t}$ in denoising diffusion probabilistic models (DDPMs)

Authors: KiHyun Yun

Abstract: In denoising diffusion probabilistic models (DDPMs), the learned noise predictor $ \epsilon_{\theta} ( {\bf x}_t , t)$ is trained to approximate the forward-process noise $\epsilon_t$. The equality $\nabla_{{\bf x}_t} \log q({\bf x}_t) = -\frac 1 {\sqrt {1- {\bar \alpha}_t} } \epsilon_{\theta} ( {\bf x}_t , t)$ plays a fundamental role in both theoretical analyses and algorithmic design, and thus is frequently employed across diffusion-based generative models. In this paper, an explicit formulation of $ \epsilon_{\theta} ( {\bf x}_t , t)$ in terms of the forward-process noise $\epsilon_t$ is derived. This result show how the forward-process noise $\epsilon_t$ contributes to the learned predictor $ \epsilon_{\theta} ( {\bf x}_t , t)$. Furthermore, based on this formulation, we present a novel and mathematically rigorous proof of the fundamental equality above, clarifying its origin and providing new theoretical insight into the structure of diffusion models.

new Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs

Authors: Yan Scholten, Sophie Xhonneux, Stephan G\"unnemann, Leo Schwinn

Abstract: Current unlearning methods for LLMs optimize on the private information they seek to remove by incorporating it into their training objectives. We argue this not only risks reinforcing exposure to sensitive data, it also fundamentally contradicts the principle of minimizing its use. As a remedy, we propose a novel unlearning method - Partial Model Collapse (PMC), which does not require unlearning targets in the unlearning objective. Our approach is inspired by recent observations that training generative models on their own generations leads to distribution collapse, effectively removing information from the model. Our core idea is to leverage this collapse for unlearning by triggering collapse partially on the sensitive data. We theoretically analyze that our approach converges to the desired outcome, i.e. the LLM unlearns the information in the forget set. We empirically demonstrate that PMC overcomes two key limitations of existing unlearning approaches that explicitly optimize on unlearning targets, and more effectively removes private information from model outputs. Overall, our contributions represent an important step toward more comprehensive unlearning that aligns with real-world privacy constraints. Code available at https://www.cs.cit.tum.de/daml/partial-model-collapse/.

URLs: https://www.cs.cit.tum.de/daml/partial-model-collapse/.

new Zero-Shot Cyclic Peptide Design with Composable Geometric Conditions

Authors: Dapeng Jiang, Xiangzhe Kong, Jiaqi Han, Mingyu Li, Rui Jiao, Wenbing Huang, Stefano Ermon, Jianzhu Ma, Yang Liu

Abstract: Cyclic peptides, characterized by geometric constraints absent in linear peptides, offer enhanced biochemical properties, presenting new opportunities to address unmet medical needs. However, designing target-specific cyclic peptides remains underexplored due to limited training data. To bridge the gap, we propose CP-Composer, a novel generative framework that enables zero-shot cyclic peptide generation via composable geometric constraints. Our approach decomposes complex cyclization patterns into unit constraints, which are incorporated into a diffusion model through geometric conditioning on nodes and edges. During training, the model learns from unit constraints and their random combinations in linear peptides, while at inference, novel constraint combinations required for cyclization are imposed as input. Experiments show that our model, despite trained with linear peptides, is capable of generating diverse target-binding cyclic peptides, reaching success rates from 38% to 84% on different cyclization strategies.

new Normalized Iterative Hard Thresholding for Tensor Recovery

Authors: Li Li, Yuneng Liang, Kaijie Zheng, Jian Lu

Abstract: Low-rank recovery builds upon ideas from the theory of compressive sensing, which predicts that sparse signals can be accurately reconstructed from incomplete measurements. Iterative thresholding-type algorithms-particularly the normalized iterative hard thresholding (NIHT) method-have been widely used in compressed sensing (CS) and applied to matrix recovery tasks. In this paper, we propose a tensor extension of NIHT, referred to as TNIHT, for the recovery of low-rank tensors under two widely used tensor decomposition models. This extension enables the effective reconstruction of high-order low-rank tensors from a limited number of linear measurements by leveraging the inherent low-dimensional structure of multi-way data. Specifically, we consider both the CANDECOMP/PARAFAC (CP) rank and the Tucker rank to characterize tensor low-rankness within the TNIHT framework. At the same time, we establish a convergence theorem for the proposed TNIHT method under the tensor restricted isometry property (TRIP), providing theoretical support for its recovery guarantees. Finally, we evaluate the performance of TNIHT through numerical experiments on synthetic, image, and video data, and compare it with several state-of-the-art algorithms.

new Scaling Context Requires Rethinking Attention

Authors: Carles Gelada, Jacob Buckman, Sean Zhang, Txus Bach

Abstract: We argue that neither transformers nor sub-quadratic architectures are well suited to training at long sequence lengths: the cost of processing the context is too expensive in the former, too inexpensive in the latter. Approaches such as sliding window attention which reduce the cost-per-token of a transformer impair in-context learning, and so are also unsuitable. To address these limitations, we introduce power attention, an architectural layer for linear-cost sequence modeling whose state size can be adjusted independently of parameters, unlocking the advantages of linear attention on practical domains. We develop and open-source a set of GPU kernels for efficient power attention, identifying a novel pattern of operation fusion to avoid memory and bandwidth bottlenecks. Our experiments on the in-context learning of power attention shows that these models dominate both exponential attention and linear attention at long-context training.

new Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning

Authors: Mahavir Dabas, Si Chen, Charles Fleming, Ming Jin, Ruoxi Jia

Abstract: Safety alignment is crucial for large language models (LLMs) to resist malicious instructions but often results in over-refusals, where benign prompts are unnecessarily rejected, impairing user experience and model utility. We introduce ACTOR (Activation-Based Training for Over-Refusal Reduction), a robust and compute- and data-efficient training framework that minimizes over-refusals by leveraging internal activation patterns from diverse queries. ACTOR precisely identifies and adjusts the activation components that trigger refusals, providing stronger control over the refusal mechanism. By fine-tuning only a single model layer, ACTOR effectively reduces over-refusals across multiple benchmarks while maintaining the model's ability to handle harmful queries and preserve overall utility.

new ATwo-Stage Ensemble Feature Selection and Particle Swarm Optimization Approach for Micro-Array Data Classification in Distributed Computing Environments

Authors: Aayush Adhikari, Sandesh Bhatta, Harendra S. Jangwan, Amit Mishra, Khair Ul Nisa, Abu Taha Zamani, Aaron Sapkota, Debendra Muduli, Nikhat Parveen

Abstract: High dimensionality in datasets produced by microarray technology presents a challenge for Machine Learning (ML) algorithms, particularly in terms of dimensionality reduction and handling imbalanced sample sizes. To mitigate the explained problems, we have proposedhybrid ensemble feature selection techniques with majority voting classifier for micro array classi f ication. Here we have considered both filter and wrapper-based feature selection techniques including Mutual Information (MI), Chi-Square, Variance Threshold (VT), Least Absolute Shrinkage and Selection Operator (LASSO), Analysis of Variance (ANOVA), and Recursive Feature Elimination (RFE), followed by Particle Swarm Optimization (PSO) for selecting the optimal features. This Artificial Intelligence (AI) approach leverages a Majority Voting Classifier that combines multiple machine learning models, such as Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost), to enhance overall performance and accuracy. By leveraging the strengths of each model, the ensemble approach aims to provide more reliable and effective diagnostic predictions. The efficacy of the proposed model has been tested in both local and cloud environments. In the cloud environment, three virtual machines virtual Central Processing Unit (vCPU) with size 8,16 and 64 bits, have been used to demonstrate the model performance. From the experiment it has been observed that, virtual Central Processing Unit (vCPU)-64 bits provides better classification accuracies of 95.89%, 97.50%, 99.13%, 99.58%, 99.11%, and 94.60% with six microarray datasets, Mixed Lineage Leukemia (MLL), Leukemia, Small Round Blue Cell Tumors (SRBCT), Lymphoma, Ovarian, andLung,respectively, validating the effectiveness of the proposed modelin bothlocalandcloud environments.

new An Explainable Transformer Model for Alzheimer's Disease Detection Using Retinal Imaging

Authors: Saeed Jamshidiha, Alireza Rezaee, Farshid Hajati, Mojtaba Golzan, Raymond Chiong

Abstract: Alzheimer's disease (AD) is a neurodegenerative disorder that affects millions worldwide. In the absence of effective treatment options, early diagnosis is crucial for initiating management strategies to delay disease onset and slow down its progression. In this study, we propose Retformer, a novel transformer-based architecture for detecting AD using retinal imaging modalities, leveraging the power of transformers and explainable artificial intelligence. The Retformer model is trained on datasets of different modalities of retinal images from patients with AD and age-matched healthy controls, enabling it to learn complex patterns and relationships between image features and disease diagnosis. To provide insights into the decision-making process of our model, we employ the Gradient-weighted Class Activation Mapping algorithm to visualize the feature importance maps, highlighting the regions of the retinal images that contribute most significantly to the classification outcome. These findings are compared to existing clinical studies on detecting AD using retinal biomarkers, allowing us to identify the most important features for AD detection in each imaging modality. The Retformer model outperforms a variety of benchmark algorithms across different performance metrics by margins of up to 11\.

new QF: Quick Feedforward AI Model Training without Gradient Back Propagation

Authors: Feng Qi

Abstract: We propose Quick Feedforward (QF) Learning, a novel knowledge consolidation framework for transformer-based models that enables efficient transfer of instruction derived knowledge into model weights through feedforward activations without any gradient back propagation. Unlike traditional finetuning, QF updates are computed in closed form, require minimal parameter modification, and preserve prior knowledge. Importantly, QF allows models to train and infer within the same runtime environment, making the process more resource efficient and closely aligned with how the human brain operates. Code and models are open sourced on GitHub. I hope QF Learning inspires a more efficient and brain-like paradigm for AI systems.

new Domain Adaptation of Drag Reduction Policy to Partial Measurements

Authors: Anton Plaksin, Georgios Rigas

Abstract: Feedback control of fluid-based systems poses significant challenges due to their high-dimensional, nonlinear, and multiscale dynamics, which demand real-time, three-dimensional, multi-component measurements for sensing. While such measurements are feasible in digital simulations, they are often only partially accessible in the real world. In this paper, we propose a method to adapt feedback control policies obtained from full-state measurements to setups with only partial measurements. Our approach is demonstrated in a simulated environment by minimising the aerodynamic drag of a simplified road vehicle. Reinforcement learning algorithms can optimally solve this control task when trained on full-state measurements by placing sensors in the wake. However, in real-world applications, sensors are limited and typically only on the vehicle, providing only partial measurements. To address this, we propose to train a Domain Specific Feature Transfer (DSFT) map reconstructing the full measurements from the history of the partial measurements. By applying this map, we derive optimal policies based solely on partial data. Additionally, our method enables determination of the optimal history length and offers insights into the architecture of optimal control policies, facilitating their implementation in real-world environments with limited sensor information.

new Heterogeneous Federated Learning with Prototype Alignment and Upscaling

Authors: Gyuejeong Lee, Jihwan Shin, Daeyoung Choi

Abstract: Heterogeneity in data distributions and model architectures remains a significant challenge in federated learning (FL). Various heterogeneous FL (HtFL) approaches have recently been proposed to address this challenge. Among them, prototype-based FL (PBFL) has emerged as a practical framework that only shares per-class mean activations from the penultimate layer. However, PBFL approaches often suffer from suboptimal prototype separation, limiting their discriminative power. We propose Prototype Normalization (ProtoNorm), a novel PBFL framework that addresses this limitation through two key components: Prototype Alignment (PA) and Prototype Upscaling (PU). The PA method draws inspiration from the Thomson problem in classical physics, optimizing global prototype configurations on a unit sphere to maximize angular separation; subsequently, the PU method increases prototype magnitudes to enhance separation in Euclidean space. Extensive evaluations on benchmark datasets show that our approach better separates prototypes and thus consistently outperforms existing HtFL approaches. Notably, since ProtoNorm inherits the communication efficiency of PBFL and the PA is performed server-side, it is particularly suitable for resource-constrained environments.

new TinyProto: Communication-Efficient Federated Learning with Sparse Prototypes in Resource-Constrained Environments

Authors: Gyuejeong Lee, Daeyoung Choi

Abstract: Communication efficiency in federated learning (FL) remains a critical challenge for resource-constrained environments. While prototype-based FL reduces communication overhead by sharing class prototypes-mean activations in the penultimate layer-instead of model parameters, its efficiency decreases with larger feature dimensions and class counts. We propose TinyProto, which addresses these limitations through Class-wise Prototype Sparsification (CPS) and adaptive prototype scaling. CPS enables structured sparsity by allocating specific dimensions to class prototypes and transmitting only non-zero elements, while adaptive scaling adjusts prototypes based on class distributions. Our experiments show TinyProto reduces communication costs by up to 4x compared to existing methods while maintaining performance. Beyond its communication efficiency, TinyProto offers crucial advantages: achieving compression without client-side computational overhead and supporting heterogeneous architectures, making it ideal for resource-constrained heterogeneous FL.

new Consistent Labeling Across Group Assignments: Variance Reduction in Conditional Average Treatment Effect Estimation

Authors: Yi-Fu Fu, Keng-Te Liao, Shou-De Lin

Abstract: Numerous algorithms have been developed for Conditional Average Treatment Effect (CATE) estimation. In this paper, we first highlight a common issue where many algorithms exhibit inconsistent learning behavior for the same instance across different group assignments. We introduce a metric to quantify and visualize this inconsistency. Next, we present a theoretical analysis showing that this inconsistency indeed contributes to higher test errors and cannot be resolved through conventional machine learning techniques. To address this problem, we propose a general method called \textbf{Consistent Labeling Across Group Assignments} (CLAGA), which eliminates the inconsistency and is applicable to any existing CATE estimation algorithm. Experiments on both synthetic and real-world datasets demonstrate significant performance improvements with CLAGA.

new Interactive Groupwise Comparison for Reinforcement Learning from Human Feedback

Authors: Jan Kompatscher, Danqing Shi, Giovanna Varni, Tino Weinkauf, Antti Oulasvirta

Abstract: Reinforcement learning from human feedback (RLHF) has emerged as a key enabling technology for aligning AI behavior with human preferences. The traditional way to collect data in RLHF is via pairwise comparisons: human raters are asked to indicate which one of two samples they prefer. We present an interactive visualization that better exploits the human visual ability to compare and explore whole groups of samples. The interface is comprised of two linked views: 1) an exploration view showing a contextual overview of all sampled behaviors organized in a hierarchical clustering structure; and 2) a comparison view displaying two selected groups of behaviors for user queries. Users can efficiently explore large sets of behaviors by iterating between these two views. Additionally, we devised an active learning approach suggesting groups for comparison. As shown by our evaluation in six simulated robotics tasks, our approach increases the final policy returns by 69.34%. It leads to lower error rates and better policies. We open-source the code that can be easily integrated into the RLHF training loop, supporting research on human-AI alignment.

new Information-theoretic Quantification of High-order Feature Effects in Classification Problems

Authors: Ivan Lazic, Chiara Bar\`a, Marta Iovino, Sebastiano Stramaglia, Niksa Jakovljevic, Luca Faes

Abstract: Understanding the contribution of individual features in predictive models remains a central goal in interpretable machine learning, and while many model-agnostic methods exist to estimate feature importance, they often fall short in capturing high-order interactions and disentangling overlapping contributions. In this work, we present an information-theoretic extension of the High-order interactions for Feature importance (Hi-Fi) method, leveraging Conditional Mutual Information (CMI) estimated via a k-Nearest Neighbor (kNN) approach working on mixed discrete and continuous random variables. Our framework decomposes feature contributions into unique, synergistic, and redundant components, offering a richer, model-independent understanding of their predictive roles. We validate the method using synthetic datasets with known Gaussian structures, where ground truth interaction patterns are analytically derived, and further test it on non-Gaussian and real-world gene expression data from TCGA-BRCA. Results indicate that the proposed estimator accurately recovers theoretical and expected findings, providing a potential use case for developing feature selection algorithms or model development based on interaction analysis.

new Time2Agri: Temporal Pretext Tasks for Agricultural Monitoring

Authors: Moti Rattan Gupta, Anupam Sobti

Abstract: Self Supervised Learning(SSL) has emerged as a prominent paradigm for label-efficient learning, and has been widely utilized by remote sensing foundation models(RSFMs). Recent RSFMs including SatMAE, DoFA, primarily rely on masked autoencoding(MAE), contrastive learning or some combination of them. However, these pretext tasks often overlook the unique temporal characteristics of agricultural landscape, namely nature's cycle. Motivated by this gap, we propose three novel agriculture-specific pretext tasks, namely Time-Difference Prediction(TD), Temporal Frequency Prediction(FP), and Future-Frame Prediction(FF). Comprehensive evaluation on SICKLE dataset shows FF achieves 69.6% IoU on crop mapping and FP reduces yield prediction error to 30.7% MAPE, outperforming all baselines, and TD remains competitive on most tasks. Further, we also scale FF to the national scale of India, achieving 54.2% IoU outperforming all baselines on field boundary delineation on FTW India dataset.

new Adaptive Malware Detection using Sequential Feature Selection: A Dueling Double Deep Q-Network (D3QN) Framework for Intelligent Classification

Authors: Naseem Khan, Aref Y. Al-Tamimi, Amine Bermak, Issa M. Khalil

Abstract: Traditional malware detection methods exhibit computational inefficiency due to exhaustive feature extraction requirements, creating accuracy-efficiency trade-offs that limit real-time deployment. We formulate malware classification as a Markov Decision Process with episodic feature acquisition and propose a Dueling Double Deep Q-Network (D3QN) framework for adaptive sequential feature selection. The agent learns to dynamically select informative features per sample before terminating with classification decisions, optimizing both detection accuracy and computational cost through reinforcement learning. We evaluate our approach on Microsoft Big2015 (9-class, 1,795 features) and BODMAS (binary, 2,381 features) datasets. D3QN achieves 99.22% and 98.83% accuracy while utilizing only 61 and 56 features on average, representing 96.6% and 97.6% dimensionality reduction. This yields computational efficiency improvements of 30.1x and 42.5x over traditional ensemble methods. Comprehensive ablation studies demonstrate consistent superiority over Random Forest, XGBoost, and static feature selection approaches. Quantitative analysis demonstrates that D3QN learns non-random feature selection policies with 62.5% deviation from uniform baseline distributions. The learned policies exhibit structured hierarchical preferences, utilizing high-level metadata features for initial assessment while selectively incorporating detailed behavioral features based on classification uncertainty. Feature specialization analysis reveals 57.7% of examined features demonstrate significant class-specific discrimination patterns. Our results validate reinforcement learning-based sequential feature selection for malware classification, achieving superior accuracy with substantial computational reduction through learned adaptive policies.

new Hierarchical Reinforcement Learning with Targeted Causal Interventions

Authors: Sadegh Khorasani, Saber Salehkaleybar, Negar Kiyavash, Matthias Grossglauser

Abstract: Hierarchical reinforcement learning (HRL) improves the efficiency of long-horizon reinforcement-learning tasks with sparse rewards by decomposing the task into a hierarchy of subgoals. The main challenge of HRL is efficient discovery of the hierarchical structure among subgoals and utilizing this structure to achieve the final goal. We address this challenge by modeling the subgoal structure as a causal graph and propose a causal discovery algorithm to learn it. Additionally, rather than intervening on the subgoals at random during exploration, we harness the discovered causal model to prioritize subgoal interventions based on their importance in attaining the final goal. These targeted interventions result in a significantly more efficient policy in terms of the training cost. Unlike previous work on causal HRL, which lacked theoretical analysis, we provide a formal analysis of the problem. Specifically, for tree structures and, for a variant of Erd\H{o}s-R\'enyi random graphs, our approach results in remarkable improvements. Our experimental results on HRL tasks also illustrate that our proposed framework outperforms existing work in terms of training cost.

new Tractable Representation Learning with Probabilistic Circuits

Authors: Steven Braun, Sahil Sidheekh, Antonio Vergari, Martin Mundt, Sriraam Natarajan, Kristian Kersting

Abstract: Probabilistic circuits (PCs) are powerful probabilistic models that enable exact and tractable inference, making them highly suitable for probabilistic reasoning and inference tasks. While dominant in neural networks, representation learning with PCs remains underexplored, with prior approaches relying on external neural embeddings or activation-based encodings. To address this gap, we introduce autoencoding probabilistic circuits (APCs), a novel framework leveraging the tractability of PCs to model probabilistic embeddings explicitly. APCs extend PCs by jointly modeling data and embeddings, obtaining embedding representations through tractable probabilistic inference. The PC encoder allows the framework to natively handle arbitrary missing data and is seamlessly integrated with a neural decoder in a hybrid, end-to-end trainable architecture enabled by differentiable sampling. Our empirical evaluation demonstrates that APCs outperform existing PC-based autoencoding methods in reconstruction quality, generate embeddings competitive with, and exhibit superior robustness in handling missing data compared to neural autoencoders. These results highlight APCs as a powerful and flexible representation learning method that exploits the probabilistic inference capabilities of PCs, showing promising directions for robust inference, out-of-distribution detection, and knowledge distillation.

new Inverse Reinforcement Learning using Revealed Preferences and Passive Stochastic Optimization

Authors: Vikram Krishnamurthy

Abstract: This monograph, spanning three chapters, explores Inverse Reinforcement Learning (IRL). The first two chapters view inverse reinforcement learning (IRL) through the lens of revealed preferences from microeconomics while the third chapter studies adaptive IRL via Langevin dynamics stochastic gradient algorithms. Chapter uses classical revealed preference theory (Afriat's theorem and extensions) to identify constrained utility maximizers based on observed agent actions. This allows for the reconstruction of set-valued estimates of an agent's utility. We illustrate this procedure by identifying the presence of a cognitive radar and reconstructing its utility function. The chapter also addresses the construction of a statistical detector for utility maximization behavior when agent actions are corrupted by noise. Chapter 2 studies Bayesian IRL. It investigates how an analyst can determine if an observed agent is a rationally inattentive Bayesian utility maximizer (i.e., simultaneously optimizing its utility and observation likelihood). The chapter discusses inverse stopping-time problems, focusing on reconstructing the continuation and stopping costs of a Bayesian agent operating over a random horizon. We then apply this IRL methodology to identify the presence of a Bayes-optimal sequential detector. Additionally, Chapter 2 provides a concise overview of discrete choice models, inverse Bayesian filtering, and inverse stochastic gradient algorithms for adaptive IRL. Finally, Chapter 3 introduces an adaptive IRL approach utilizing passive Langevin dynamics. This method aims to track time-varying utility functions given noisy and misspecified gradients. In essence, the adaptive IRL algorithms presented in Chapter 3 can be conceptualized as inverse stochastic gradient algorithms, as they learn the utility function in real-time while a stochastic gradient algorithm is in operation.

new Convergence and Sample Complexity of First-Order Methods for Agnostic Reinforcement Learning

Authors: Uri Sherman, Tomer Koren, Yishay Mansour

Abstract: We study reinforcement learning (RL) in the agnostic policy learning setting, where the goal is to find a policy whose performance is competitive with the best policy in a given class of interest $\Pi$ -- crucially, without assuming that $\Pi$ contains the optimal policy. We propose a general policy learning framework that reduces this problem to first-order optimization in a non-Euclidean space, leading to new algorithms as well as shedding light on the convergence properties of existing ones. Specifically, under the assumption that $\Pi$ is convex and satisfies a variational gradient dominance (VGD) condition -- an assumption known to be strictly weaker than more standard completeness and coverability conditions -- we obtain sample complexity upper bounds for three policy learning algorithms: \emph{(i)} Steepest Descent Policy Optimization, derived from a constrained steepest descent method for non-convex optimization; \emph{(ii)} the classical Conservative Policy Iteration algorithm \citep{kakade2002approximately} reinterpreted through the lens of the Frank-Wolfe method, which leads to improved convergence results; and \emph{(iii)} an on-policy instantiation of the well-studied Policy Mirror Descent algorithm. Finally, we empirically evaluate the VGD condition across several standard environments, demonstrating the practical relevance of our key assumption.

new Enhancing Text-Based Hierarchical Multilabel Classification for Mobile Applications via Contrastive Learning

Authors: Jiawei Guo, Yang Xiao, Weipeng Huang, Guangyuan Piao

Abstract: A hierarchical labeling system for mobile applications (apps) benefits a wide range of downstream businesses that integrate the labeling with their proprietary user data, to improve user modeling. Such a label hierarchy can define more granular labels that capture detailed app features beyond the limitations of traditional broad app categories. In this paper, we address the problem of hierarchical multilabel classification for apps by using their textual information such as names and descriptions. We present: 1) HMCN (Hierarchical Multilabel Classification Network) for handling the classification from two perspectives: the first focuses on a multilabel classification without hierarchical constraints, while the second predicts labels sequentially at each hierarchical level considering such constraints; 2) HMCL (Hierarchical Multilabel Contrastive Learning), a scheme that is capable of learning more distinguishable app representations to enhance the performance of HMCN. Empirical results on our Tencent App Store dataset and two public datasets demonstrate that our approach performs well compared with state-of-the-art methods. The approach has been deployed at Tencent and the multilabel classification outputs for apps have helped a downstream task--credit risk management of user--improve its performance by 10.70% with regard to the Kolmogorov-Smirnov metric, for over one year.

new Tail-aware Adversarial Attacks: A Distributional Approach to Efficient LLM Jailbreaking

Authors: Tim Beyer, Yan Scholten, Stephan G\"unnemann, Leo Schwinn

Abstract: To guarantee safe and robust deployment of large language models (LLMs) at scale, it is critical to accurately assess their adversarial robustness. Existing adversarial attacks typically target harmful responses in single-point, greedy generations, overlooking the inherently stochastic nature of LLMs. In this paper, we propose a novel framework for adversarial robustness evaluation that explicitly models the entire output distribution, including tail-risks, providing better estimates for model robustness at scale. By casting the attack process as a resource allocation problem between optimization and sampling, we determine compute-optimal tradeoffs and show that integrating sampling into existing attacks boosts ASR by up to 48% and improves efficiency by up to two orders of magnitude. Our framework also enables us to analyze how different attack algorithms affect output harm distributions. Surprisingly, we find that most optimization strategies have little effect on output harmfulness. Finally, we introduce a data-free proof-of-concept objective based on entropy-maximization to demonstrate how our tail-aware perspective enables new optimization targets. Overall, our findings highlight the importance of tail-aware attacks and evaluation protocols to accurately assess and strengthen LLM safety.

new Transfer Learning in Infinite Width Feature Learning Networks

Authors: Clarissa Lauditi, Blake Bordelon, Cengiz Pehlevan

Abstract: We develop a theory of transfer learning in infinitely wide neural networks where both the pretraining (source) and downstream (target) task can operate in a feature learning regime. We analyze both the Bayesian framework, where learning is described by a posterior distribution over the weights, and gradient flow training of randomly initialized networks trained with weight decay. Both settings track how representations evolve in both source and target tasks. The summary statistics of these theories are adapted feature kernels which, after transfer learning, depend on data and labels from both source and target tasks. Reuse of features during transfer learning is controlled by an elastic weight coupling which controls the reliance of the network on features learned during training on the source task. We apply our theory to linear and polynomial regression tasks as well as real datasets. Our theory and experiments reveal interesting interplays between elastic weight coupling, feature learning strength, dataset size, and source and target task alignment on the utility of transfer learning.

new ESSA: Evolutionary Strategies for Scalable Alignment

Authors: Daria Korotyshova, Boris Shaposhnikov, Alexey Malakhov, Nikita Surnachev, George Bredis, Alexey Gorbatovski, Viacheslav Sinii, Daniil Gavrilov

Abstract: Large Language Models (LLMs) are increasingly relying on alignment techniques to ensure that their outputs match human preferences. Although reinforcement learning from human feedback (RLHF) is the dominant approach, it has high computational costs, memory requirements, and training instability, particularly when scaling to larger models. This paper introduces ESSA (Evolutionary Strategies for Scalable Alignment), a new framework that uses Evolutionary Strategies (ES) to efficiently align LLMs without the need for gradient computation. ES is well-suited for LLM alignment due to its favorable properties, such as high parallelizability, memory efficiency, robustness to sparse rewards, and fewer data samples required for convergence, especially when starting from a strong pre-trained policy. Moreover, ES eliminates the need for extensive hyperparameter tuning, making the alignment process simpler and more stable. Although ES excels in low-dimensional optimization, it poses a challenge when applied to high-dimensional LLMs. To address this challenge, we propose a parameter-efficient architectural modification that reduces the dimensionality of optimization through low-rank adaptation. We evaluated our approach on mathematical reasoning tasks with verifiable accuracy-based metrics, demonstrating that ESSA converges faster and is more data efficient than gradient-based methods like Group Relative Policy Optimization (GRPO). Our findings establish ES as a promising and scalable alternative to gradient-based alignment, paving the way for efficient post-training of large language models.

new Model Inversion Attacks on Llama 3: Extracting PII from Large Language Models

Authors: Sathesh P. Sivashanmugam

Abstract: Large language models (LLMs) have transformed natural language processing, but their ability to memorize training data poses significant privacy risks. This paper investigates model inversion attacks on the Llama 3.2 model, a multilingual LLM developed by Meta. By querying the model with carefully crafted prompts, we demonstrate the extraction of personally identifiable information (PII) such as passwords, email addresses, and account numbers. Our findings highlight the vulnerability of even smaller LLMs to privacy attacks and underscore the need for robust defenses. We discuss potential mitigation strategies, including differential privacy and data sanitization, and call for further research into privacy-preserving machine learning techniques.

new Source Attribution in Retrieval-Augmented Generation

Authors: Ikhtiyor Nematov, Tarik Kalai, Elizaveta Kuzmenko, Gabriele Fugagnoli, Dimitris Sacharidis, Katja Hose, Tomer Sagi

Abstract: While attribution methods, such as Shapley values, are widely used to explain the importance of features or training data in traditional machine learning, their application to Large Language Models (LLMs), particularly within Retrieval-Augmented Generation (RAG) systems, is nascent and challenging. The primary obstacle is the substantial computational cost, where each utility function evaluation involves an expensive LLM call, resulting in direct monetary and time expenses. This paper investigates the feasibility and effectiveness of adapting Shapley-based attribution to identify influential retrieved documents in RAG. We compare Shapley with more computationally tractable approximations and some existing attribution methods for LLM. Our work aims to: (1) systematically apply established attribution principles to the RAG document-level setting; (2) quantify how well SHAP approximations can mirror exact attributions while minimizing costly LLM interactions; and (3) evaluate their practical explainability in identifying critical documents, especially under complex inter-document relationships such as redundancy, complementarity, and synergy. This study seeks to bridge the gap between powerful attribution techniques and the practical constraints of LLM-based RAG systems, offering insights into achieving reliable and affordable RAG explainability.

new LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization

Authors: Xujia Wang, Yunjia Qi, Bin Xu

Abstract: Parameter-Efficient Fine-Tuning (PEFT) methods, such as LoRA, significantly reduce the number of trainable parameters by introducing low-rank decomposition matrices. However, existing methods perform extensive matrix multiplications in domain specialization tasks, resulting in computational inefficiency and sub-optimal fine-tuning performance. Hence, we propose LoSiA(Low-Resources Subnet Integration Adaptation), an innovative method that dynamically localizes and optimizes critical parameters during the training process. Specifically, it identifies a sub-network using gradient sparsity analysis and optimizes it as the trainable target. This design enables effective high-rank adaptation by updating only the sub-network parameters, reducing the additional matrix multiplication. We also present LoSiA-Pro, a faster implementation of LoSiA, which reduces the training latency by about $27\%$ compared to LoRA. Extensive evaluations show that our method achieves minimal performance drop compared to full fine-tuning, while requiring the least training time across domain specialization and common-sense reasoning tasks. Further analysis shows that LoSiA also reduces forgetting during continued training.

new Dealing with Uncertainty in Contextual Anomaly Detection

Authors: Luca Bindini, Lorenzo Perini, Stefano Nistri, Jesse Davis, Paolo Frasconi

Abstract: Contextual anomaly detection (CAD) aims to identify anomalies in a target (behavioral) variable conditioned on a set of contextual variables that influence the normalcy of the target variable but are not themselves indicators of anomaly. In many anomaly detection tasks, there exist contextual variables that influence the normalcy of the target variable but are not themselves indicators of anomaly. In this work, we propose a novel framework for CAD, normalcy score (NS), that explicitly models both the aleatoric and epistemic uncertainties. Built on heteroscedastic Gaussian process regression, our method regards the Z-score as a random variable, providing confidence intervals that reflect the reliability of the anomaly assessment. Through experiments on benchmark datasets and a real-world application in cardiology, we demonstrate that NS outperforms state-of-the-art CAD methods in both detection accuracy and interpretability. Moreover, confidence intervals enable an adaptive, uncertainty-driven decision-making process, which may be very important in domains such as healthcare.

new Machine Learning-Based Prediction of Metal-Organic Framework Materials: A Comparative Analysis of Multiple Models

Authors: Zhuo Zheng, Keyan Liu, Xiyuan Zhu

Abstract: Metal-organic frameworks (MOFs) have emerged as promising materials for various applications due to their unique structural properties and versatile functionalities. This study presents a comprehensive investigation of machine learning approaches for predicting MOF material properties. We employed five different machine learning models: Random Forest, XGBoost, LightGBM, Support Vector Machine, and Neural Network, to analyze and predict MOF characteristics using a dataset from the Kaggle platform. The models were evaluated using multiple performance metrics, including RMSE, R^2, MAE, and cross-validation scores. Results demonstrated that the Random Forest model achieved superior performance with an R^2 value of 0.891 and RMSE of 0.152, significantly outperforming other models. LightGBM showed remarkable computational efficiency, completing training in 25.7 seconds while maintaining high accuracy. Our comparative analysis revealed that ensemble learning methods generally exhibited better performance than traditional single models in MOF property prediction. This research provides valuable insights into the application of machine learning in materials science and establishes a robust framework for future MOF material design and property prediction.

new DOTResize: Reducing LLM Width via Discrete Optimal Transport-based Neuron Merging

Authors: Neha Verma, Kenton Murray, Kevin Duh

Abstract: Model compression offers a promising path to reducing the cost and inaccessibility of large pre-trained models, without significantly compromising their impressive performance. Large Transformer models, including large language models (LLMs), often contain computational redundancy, which can serve as a target for new model compression methods. In this work, we specifically target neuron-level redundancies in model layers by combining groups of similar neurons into fewer neurons. We frame this width reduction as a Discrete Optimal Transport problem, and propose DOTResize, a novel Transformer compression method that uses optimal transport theory to transform and compress model weights. To ensure applicability within the Transformer architecture, we motivate and incorporate entropic regularization and matrix factorization into the transportation maps produced by our method. Unlike pruning-based approaches which discard neurons based on importance measures, DOTResize re-projects the entire neuron width, allowing the retention and redistribution of useful signal across the reduced layer. Empirical results show that compared to simple or state-of-the-art neuron width-pruning techniques, DOTResize can outperform these methods across multiple LLM families and sizes, while achieving measurable reductions in real-world computational cost.

new Evaluating LLMs on Real-World Forecasting Against Human Superforecasters

Authors: Janna Lu

Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, but their ability to forecast future events remains understudied. A year ago, large language models struggle to come close to the accuracy of a human crowd. I evaluate state-of-the-art LLMs on 464 forecasting questions from Metaculus, comparing their performance against human superforecasters. Frontier models achieve Brier scores that ostensibly surpass the human crowd but still significantly underperform a group of superforecasters.

new A Lightweight Deep Learning Model for Automatic Modulation Classification using Dual Path Deep Residual Shrinkage Network

Authors: Prakash Suman, Yanzhen Qu

Abstract: Efficient spectrum utilization is critical to meeting the growing data demands of modern wireless communication networks. Automatic Modulation Classification (AMC) plays a key role in enhancing spectrum efficiency by accurately identifying modulation schemes in received signals-an essential capability for dynamic spectrum allocation and interference mitigation, particularly in cognitive radio (CR) systems. With the increasing deployment of smart edge devices, such as IoT nodes with limited computational and memory resources, there is a pressing need for lightweight AMC models that balance low complexity with high classification accuracy. This paper proposes a low-complexity, lightweight deep learning (DL) AMC model optimized for resource-constrained edge devices. We introduce a dual-path deep residual shrinkage network (DP-DRSN) with Garrote thresholding for effective signal denoising and design a compact hybrid CNN-LSTM architecture comprising only 27,000 training parameters. The proposed model achieves average classification accuracies of 61.20%, 63.78%, and 62.13% on the RML2016.10a, RML2016.10b, and RML2018.01a datasets, respectively demonstrating a strong balance between model efficiency and classification performance. These results underscore the model's potential for enabling accurate and efficient AMC on-edge devices with limited resources.

new Photon Splatting: A Physics-Guided Neural Surrogate for Real-Time Wireless Channel Prediction

Authors: Ge Cao, Gabriele Gradoni, Zhen Peng

Abstract: We present Photon Splatting, a physics-guided neural surrogate model for real-time wireless channel prediction in complex environments. The proposed framework introduces surface-attached virtual sources, referred to as photons, which carry directional wave signatures informed by the scene geometry and transmitter configuration. At runtime, channel impulse responses (CIRs) are predicted by splatting these photons onto the angular domain of the receiver using a geodesic rasterizer. The model is trained to learn a physically grounded representation that maps transmitter-receiver configurations to full channel responses. Once trained, it generalizes to new transmitter positions, antenna beam patterns, and mobile receivers without requiring model retraining. We demonstrate the effectiveness of the framework through a series of experiments, from canonical 3D scenes to a complex indoor cafe with 1,000 receivers. Results show 30 millisecond-level inference latency and accurate CIR predictions across a wide range of configurations. The approach supports real-time adaptability and interpretability, making it a promising candidate for wireless digital twin platforms and future 6G network planning.

new Accelerated Online Reinforcement Learning using Auxiliary Start State Distributions

Authors: Aman Mehra, Alexandre Capone, Jeff Schneider

Abstract: A long-standing problem in online reinforcement learning (RL) is of ensuring sample efficiency, which stems from an inability to explore environments efficiently. Most attempts at efficient exploration tackle this problem in a setting where learning begins from scratch, without prior information available to bootstrap learning. However, such approaches fail to leverage expert demonstrations and simulators that can reset to arbitrary states. These affordances are valuable resources that offer enormous potential to guide exploration and speed up learning. In this paper, we explore how a small number of expert demonstrations and a simulator allowing arbitrary resets can accelerate learning during online RL. We find that training with a suitable choice of an auxiliary start state distribution that may differ from the true start state distribution of the underlying Markov Decision Process can significantly improve sample efficiency. We find that using a notion of safety to inform the choice of this auxiliary distribution significantly accelerates learning. By using episode length information as a way to operationalize this notion, we demonstrate state-of-the-art sample efficiency on a sparse-reward hard-exploration environment.

new any4: Learned 4-bit Numeric Representation for LLMs

Authors: Mostafa Elhoushi, Jeff Johnson

Abstract: We present any4, a learned 4-bit weight quantization solution for large language models (LLMs) providing arbitrary numeric representations without requiring pre-processing of weights or activations. any4 yields higher accuracy compared to other related 4-bit numeric representation types: int4, fp4 and nf4, as evaluated on a range of model sizes, generations and families (Llama 2, Llama 3, Mistral and Mixtral). While any4 does not require preprocessing of weights or activations, it is also competitive with orthogonal techniques that require such preprocessing (e.g., AWQ and GPTQ). We also experiment with any3 and any2 and show competitiveness at lower bits. Additionally, we show that we can calibrate using a single curated diverse sample rather than hundreds of samples from a dataset as done in most quantization approaches. We also open source tinygemm, a latency optimized GPU matrix multiplication library for LLMs, that implements any4 using a GPU-efficient lookup table strategy along with other common quantization methods. We open source our code at https://github.com/facebookresearch/any4 .

URLs: https://github.com/facebookresearch/any4

new Information-Guided Diffusion Sampling for Dataset Distillation

Authors: Linfeng Ye, Shayan Mohajer Hamidi, Guang Li, Takahiro Ogawa, Miki Haseyama, Konstantinos N. Plataniotis

Abstract: Dataset distillation aims to create a compact dataset that retains essential information while maintaining model performance. Diffusion models (DMs) have shown promise for this task but struggle in low images-per-class (IPC) settings, where generated samples lack diversity. In this paper, we address this issue from an information-theoretic perspective by identifying two key types of information that a distilled dataset must preserve: ($i$) prototype information $\mathrm{I}(X;Y)$, which captures label-relevant features; and ($ii$) contextual information $\mathrm{H}(X | Y)$, which preserves intra-class variability. Here, $(X,Y)$ represents the pair of random variables corresponding to the input data and its ground truth label, respectively. Observing that the required contextual information scales with IPC, we propose maximizing $\mathrm{I}(X;Y) + \beta \mathrm{H}(X | Y)$ during the DM sampling process, where $\beta$ is IPC-dependent. Since directly computing $\mathrm{I}(X;Y)$ and $\mathrm{H}(X | Y)$ is intractable, we develop variational estimations to tightly lower-bound these quantities via a data-driven approach. Our approach, information-guided diffusion sampling (IGDS), seamlessly integrates with diffusion models and improves dataset distillation across all IPC settings. Experiments on Tiny ImageNet and ImageNet subsets show that IGDS significantly outperforms existing methods, particularly in low-IPC regimes. The code will be released upon acceptance.

new Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences

Authors: Yusong Zhang, Yuxuan Sun, Lei Guo, Wei Chen, Bo Ai, Deniz Gunduz

Abstract: 6G networks promise revolutionary immersive communication experiences including augmented reality (AR), virtual reality (VR), and holographic communications. These applications demand high-dimensional multimodal data transmission and intelligent data processing in real-time, which is extremely challenging over resource-limited wireless communication systems. Moreover, a joint understanding of the environment, context, and user intent is essential to deliver task-relevant content effectively. This article presents a novel multimodal large language model (MLLM) integrated semantic communications framework, termed MLLM-SC, which fully leverages reasoning and generative capabilities of pre-trained foundation models for context-aware and task-oriented wireless communication. The MLLM-SC framework adopts a device-edge collaborative architecture. At the edge, MLLM-empowered semantic guidance module analyzes multimodal inputs, user intents, and channel conditions to generate importance-aware attention maps prioritizing semantically critical information. An importance-aware semantic encoder and a resource-adaptive semantic decoder are jointly designed and optimized, which can utilize the semantic guidance for adaptive bandwidth allocation and high-quality content reconstruction or generation. Extensive case studies on visual question answering for AR/VR applications and diffusion-driven image generation validate the effectiveness of MLLM-SC.

new SOSAE: Self-Organizing Sparse AutoEncoder

Authors: Sarthak Ketanbhai Modi, Zi Pong Lim, Yushi Cao, Yupeng Cheng, Yon Shin Teo, Shang-Wei Lin

Abstract: The process of tuning the size of the hidden layers for autoencoders has the benefit of providing optimally compressed representations for the input data. However, such hyper-parameter tuning process would take a lot of computation and time effort with grid search as the default option. In this paper, we introduce the Self-Organization Regularization for Autoencoders that dynamically adapts the dimensionality of the feature space to the optimal size. Inspired by physics concepts, Self-Organizing Sparse AutoEncoder (SOSAE) induces sparsity in feature space in a structured way that permits the truncation of the non-active part of the feature vector without any loss of information. This is done by penalizing the autoencoder based on the magnitude and the positional index of the feature vector dimensions, which during training constricts the feature space in both terms. Extensive experiments on various datasets show that our SOSAE can tune the feature space dimensionality up to 130 times lesser Floating-point Operations (FLOPs) than other baselines while maintaining the same quality of tuning and performance.

new A Cycle-Consistency Constrained Framework for Dynamic Solution Space Reduction in Noninjective Regression

Authors: Hanzhang Jia, Yi Gao

Abstract: To address the challenges posed by the heavy reliance of multi-output models on preset probability distributions and embedded prior knowledge in non-injective regression tasks, this paper proposes a cycle consistency-based data-driven training framework. The method jointly optimizes a forward model {\Phi}: X to Y and a backward model {\Psi}: Y to X, where the cycle consistency loss is defined as L _cycleb equal L(Y reduce {\Phi}({\Psi}(Y))) (and vice versa). By minimizing this loss, the framework establishes a closed-loop mechanism integrating generation and validation phases, eliminating the need for manual rule design or prior distribution assumptions. Experiments on normalized synthetic and simulated datasets demonstrate that the proposed method achieves a cycle reconstruction error below 0.003, achieving an improvement of approximately 30% in evaluation metrics compared to baseline models without cycle consistency. Furthermore, the framework supports unsupervised learning and significantly reduces reliance on manual intervention, demonstrating potential advantages in non-injective regression tasks.

new Hybrid Adversarial Spectral Loss Conditional Generative Adversarial Networks for Signal Data Augmentation in Ultra-precision Machining Surface Roughness Prediction

Authors: Suiyan Shang, Chi Fai Cheung, Pai Zheng

Abstract: Accurate surface roughness prediction in ultra-precision machining (UPM) is critical for real-time quality control, but small datasets hinder model performance. We propose HAS-CGAN, a Hybrid Adversarial Spectral Loss CGAN, for effective UPM data augmentation. Among five CGAN variants tested, HAS-CGAN excels in 1D force signal generation, particularly for high-frequency signals, achieving >0.85 wavelet coherence through Fourier-domain optimization. By combining generated signals with machining parameters, prediction accuracy significantly improves. Experiments with traditional ML (SVR, RF, LSTM) and deep learning models (BPNN, 1DCNN, CNN-Transformer) demonstrate that augmenting training data with 520+ synthetic samples reduces prediction error from 31.4% (original 52 samples) to ~9%, effectively addressing data scarcity in UPM roughness prediction."

new DANCE: Resource-Efficient Neural Architecture Search with Data-Aware and Continuous Adaptation

Authors: Maolin Wang, Tianshuo Wei, Sheng Zhang, Ruocheng Guo, Wanyu Wang, Shanshan Ye, Lixin Zou, Xuetao Wei, Xiangyu Zhao

Abstract: Neural Architecture Search (NAS) has emerged as a powerful approach for automating neural network design. However, existing NAS methods face critical limitations in real-world deployments: architectures lack adaptability across scenarios, each deployment context requires costly separate searches, and performance consistency across diverse platforms remains challenging. We propose DANCE (Dynamic Architectures with Neural Continuous Evolution), which reformulates architecture search as a continuous evolution problem through learning distributions over architectural components. DANCE introduces three key innovations: a continuous architecture distribution enabling smooth adaptation, a unified architecture space with learned selection gates for efficient sampling, and a multi-stage training strategy for effective deployment optimization. Extensive experiments across five datasets demonstrate DANCE's effectiveness. Our method consistently outperforms state-of-the-art NAS approaches in terms of accuracy while significantly reducing search costs. Under varying computational constraints, DANCE maintains robust performance while smoothly adapting architectures to different hardware requirements. The code and appendix can be found at https://github.com/Applied-Machine-Learning-Lab/DANCE.

URLs: https://github.com/Applied-Machine-Learning-Lab/DANCE.

new Identify, Isolate, and Purge: Mitigating Hallucinations in LVLMs via Self-Evolving Distillation

Authors: Wenhao Li, Xiu Su, Jingyi Wu, Feng Yang, Yang Liu, Yi Chen, Shan You, Chang Xu

Abstract: Large Vision-Language Models (LVLMs) have demonstrated remarkable advancements in numerous areas such as multimedia. However, hallucination issues significantly limit their credibility and application potential. Existing mitigation methods typically rely on external tools or the comparison of multi-round inference, which significantly increase inference time. In this paper, we propose \textbf{SE}lf-\textbf{E}volving \textbf{D}istillation (\textbf{SEED}), which identifies hallucinations within the inner knowledge of LVLMs, isolates and purges them, and then distills the purified knowledge back into the model, enabling self-evolution. Furthermore, we identified that traditional distillation methods are prone to inducing void spaces in the output space of LVLMs. To address this issue, we propose a Mode-Seeking Evolving approach, which performs distillation to capture the dominant modes of the purified knowledge distribution, thereby avoiding the chaotic results that could emerge from void spaces. Moreover, we introduce a Hallucination Elimination Adapter, which corrects the dark knowledge of the original model by learning purified knowledge. Extensive experiments on multiple benchmarks validate the superiority of our SEED, demonstrating substantial improvements in mitigating hallucinations for representative LVLM models such as LLaVA-1.5 and InternVL2. Remarkably, the F1 score of LLaVA-1.5 on the hallucination evaluation metric POPE-Random improved from 81.3 to 88.3.

new Recovering Plasticity of Neural Networks via Soft Weight Rescaling

Authors: Seungwon Oh, Sangyeon Park, Isaac Han, Kyung-Joong Kim

Abstract: Recent studies have shown that as training progresses, neural networks gradually lose their capacity to learn new information, a phenomenon known as plasticity loss. An unbounded weight growth is one of the main causes of plasticity loss. Furthermore, it harms generalization capability and disrupts optimization dynamics. Re-initializing the network can be a solution, but it results in the loss of learned information, leading to performance drops. In this paper, we propose Soft Weight Rescaling (SWR), a novel approach that prevents unbounded weight growth without losing information. SWR recovers the plasticity of the network by simply scaling down the weight at each step of the learning process. We theoretically prove that SWR bounds weight magnitude and balances weight magnitude between layers. Our experiment shows that SWR improves performance on warm-start learning, continual learning, and single-task learning setups on standard image classification benchmarks.

new Bridging KAN and MLP: MJKAN, a Hybrid Architecture with Both Efficiency and Expressiveness

Authors: Hanseon Joo, Hayoung Choi, Ook Lee, Minjong Cheon

Abstract: Kolmogorov-Arnold Networks (KANs) have garnered attention for replacing fixed activation functions with learnable univariate functions, but they exhibit practical limitations, including high computational costs and performance deficits in general classification tasks. In this paper, we propose the Modulation Joint KAN (MJKAN), a novel neural network layer designed to overcome these challenges. MJKAN integrates a FiLM (Feature-wise Linear Modulation)-like mechanism with Radial Basis Function (RBF) activations, creating a hybrid architecture that combines the non-linear expressive power of KANs with the efficiency of Multilayer Perceptrons (MLPs). We empirically validated MJKAN's performance across a diverse set of benchmarks, including function regression, image classification (MNIST, CIFAR-10/100), and natural language processing (AG News, SMS Spam). The results demonstrate that MJKAN achieves superior approximation capabilities in function regression tasks, significantly outperforming MLPs, with performance improving as the number of basis functions increases. Conversely, in image and text classification, its performance was competitive with MLPs but revealed a critical dependency on the number of basis functions. We found that a smaller basis size was crucial for better generalization, highlighting that the model's capacity must be carefully tuned to the complexity of the data to prevent overfitting. In conclusion, MJKAN offers a flexible architecture that inherits the theoretical advantages of KANs while improving computational efficiency and practical viability.

new Interpretable Reward Modeling with Active Concept Bottlenecks

Authors: Sonia Laguna, Katarzyna Kobalczyk, Julia E. Vogt, Mihaela Van der Schaar

Abstract: We introduce Concept Bottleneck Reward Models (CB-RM), a reward modeling framework that enables interpretable preference learning through selective concept annotation. Unlike standard RLHF methods that rely on opaque reward functions, CB-RM decomposes reward prediction into human-interpretable concepts. To make this framework efficient in low-supervision settings, we formalize an active learning strategy that dynamically acquires the most informative concept labels. We propose an acquisition function based on Expected Information Gain and show that it significantly accelerates concept learning without compromising preference accuracy. Evaluated on the UltraFeedback dataset, our method outperforms baselines in interpretability and sample efficiency, marking a step towards more transparent, auditable, and human-aligned reward models.

new Performance Evaluation of General Purpose Large Language Models for Basic Linear Algebra Subprograms Code Generation

Authors: Daichi Mukunoki, Shun-ichiro Hayashi, Tetsuya Hoshino, Takahiro Katagiri

Abstract: Generative AI technology based on Large Language Models (LLM) has been developed and applied to assist or automatically generate program codes. In this paper, we evaluate the capability of existing general LLMs for Basic Linear Algebra Subprograms (BLAS) code generation for CPUs. We use two LLMs provided by OpenAI: GPT-4.1, a Generative Pre-trained Transformer (GPT) model, and o4-mini, one of the o-series of Reasoning models. Both have been released in April 2025. For the routines from level-1 to 3 BLAS, we tried to generate (1) C code without optimization from routine name only, (2) C code with basic performance optimizations (thread parallelization, SIMD vectorization, and cache blocking) from routine name only, and (3) C code with basic performance optimizations based on Fortran reference code. As a result, we found that correct code can be generated in many cases even when only routine name are given. We also confirmed that thread parallelization with OpenMP, SIMD vectorization, and cache blocking can be implemented to some extent, and that the code is faster than the reference code.

new UrbanMind: Towards Urban General Intelligence via Tool-Enhanced Retrieval-Augmented Generation and Multilevel Optimization

Authors: Kai Yang, Zelin Zhu, Chengtao Jian, Hui Ma, Shengjie Zhao, Xiaozhou Ye, Ye Ouyang

Abstract: Urban general intelligence (UGI) refers to the capacity of AI systems to autonomously perceive, reason, and act within dynamic and complex urban environments. In this paper, we introduce UrbanMind, a tool-enhanced retrieval-augmented generation (RAG) framework designed to facilitate UGI. Central to UrbanMind is a novel architecture based on Continual Retrieval-Augmented MoE-based LLM (C-RAG-LLM), which dynamically incorporates domain-specific knowledge and evolving urban data to support long-term adaptability. The architecture of C-RAG-LLM aligns naturally with a multilevel optimization framework, where different layers are treated as interdependent sub-problems. Each layer has distinct objectives and can be optimized either independently or jointly through a hierarchical learning process. The framework is highly flexible, supporting both end-to-end training and partial layer-wise optimization based on resource or deployment constraints. To remain adaptive under data drift, it is further integrated with an incremental corpus updating mechanism. Evaluations on real-world urban tasks of a variety of complexity verify the effectiveness of the proposed framework. This work presents a promising step toward the realization of general-purpose LLM agents in future urban environments.

new Spooky Action at a Distance: Normalization Layers Enable Side-Channel Spatial Communication

Authors: Samuel Pfrommer, George Ma, Yixiao Huang, Somayeh Sojoudi

Abstract: This work shows that normalization layers can facilitate a surprising degree of communication across the spatial dimensions of an input tensor. We study a toy localization task with a convolutional architecture and show that normalization layers enable an iterative message passing procedure, allowing information aggregation from well outside the local receptive field. Our results suggest that normalization layers should be employed with caution in applications such as diffusion-based trajectory generation, where maintaining a spatially limited receptive field is crucial.

new ABench-Physics: Benchmarking Physical Reasoning in LLMs via High-Difficulty and Dynamic Physics Problems

Authors: Yiming Zhang, Yingfan Ma, Yanmei Gu, Zhengkai Yang, Yihong Zhuang, Feng Wang, Zenan Huang, Yuanyuan Wang, Chao Huang, Bowen Song, Cheng Lin, Junbo Zhao

Abstract: Large Language Models (LLMs) have shown impressive performance in domains such as mathematics and programming, yet their capabilities in physics remain underexplored and poorly understood. Physics poses unique challenges that demand not only precise computation but also deep conceptual understanding and physical modeling skills. Existing benchmarks often fall short due to limited difficulty, multiple-choice formats, and static evaluation settings that fail to capture physical modeling ability. In this paper, we introduce ABench-Physics, a novel benchmark designed to rigorously evaluate LLMs' physical reasoning and generalization capabilities. ABench-Physics consists of two components: Phy_A, a static set of 400 graduate- or Olympiad-level problems; and Phy_B, a dynamic subset of 100 problems equipped with an automatic variation engine to test model robustness across changing conditions. All questions require precise numerical answers, with strict formatting and tolerance constraints. Our evaluation of several state-of-the-art LLMs reveals substantial performance gaps, highlighting persistent limitations in physical reasoning, especially in generalization to dynamic variants. ABench-Physics provides a challenging and diagnostic framework for advancing scientific reasoning in LLMs.

new FedPall: Prototype-based Adversarial and Collaborative Learning for Federated Learning with Feature Drift

Authors: Yong Zhang, Feng Liang, Guanghu Yuan, Min Yang, Chengming Li, Xiping Hu

Abstract: Federated learning (FL) enables collaborative training of a global model in the centralized server with data from multiple parties while preserving privacy. However, data heterogeneity can significantly degrade the performance of the global model when each party uses datasets from different sources to train a local model, thereby affecting personalized local models. Among various cases of data heterogeneity, feature drift, feature space difference among parties, is prevalent in real-life data but remains largely unexplored. Feature drift can distract feature extraction learning in clients and thus lead to poor feature extraction and classification performance. To tackle the problem of feature drift in FL, we propose FedPall, an FL framework that utilizes prototype-based adversarial learning to unify feature spaces and collaborative learning to reinforce class information within the features. Moreover, FedPall leverages mixed features generated from global prototypes and local features to enhance the global classifier with classification-relevant information from a global perspective. Evaluation results on three representative feature-drifted datasets demonstrate FedPall's consistently superior performance in classification with feature-drifted data in the FL scenario.

new Machine Learning from Explanations

Authors: Jiashu Tao, Reza Shokri

Abstract: Acquiring and training on large-scale labeled data can be impractical due to cost constraints. Additionally, the use of small training datasets can result in considerable variability in model outcomes, overfitting, and learning of spurious correlations. A crucial shortcoming of data labels is their lack of any reasoning behind a specific label assignment, causing models to learn any arbitrary classification rule as long as it aligns data with labels. To overcome these issues, we introduce an innovative approach for training reliable classification models on smaller datasets, by using simple explanation signals such as important input features from labeled data. Our method centers around a two-stage training cycle that alternates between enhancing model prediction accuracy and refining its attention to match the explanations. This instructs models to grasp the rationale behind label assignments during their learning phase. We demonstrate that our training cycle expedites the convergence towards more accurate and reliable models, particularly for small, class-imbalanced training data, or data with spurious features.

new A High-Level Compiler Integration Approach for Deep Learning Accelerators Supporting Abstraction and Optimization

Authors: Samira Ahmadifarsani, Daniel Mueller-Gritschneder, Ulf Schlichtmann

Abstract: The growing adoption of domain-specific architectures in edge computing platforms for deep learning has highlighted the efficiency of hardware accelerators. However, integrating custom accelerators into modern machine learning (ML) compilers remains a complex challenge due to the need for significant modifications in compilation layers and specialized scheduling techniques. Existing frameworks offer partial solutions and require users to navigate intricate compiler internals. In this paper, we introduce a TVM-based compilation integration approach that targets GEMM-based deep learning accelerators. Our approach abstracts the complexities of compiler integration, enabling seamless integration of accelerators without requiring in-depth knowledge of the underlying compiler. Furthermore, we extend and incorporate design space exploration tools, specifically CoSA, to automate efficient tensor scheduling, accounting for factors such as uneven mapping and double buffering. Our framework is benchmarked on the Gemmini accelerator, demonstrating performance comparable to its specialized manually implemented toolchain.

new Discrete Diffusion Trajectory Alignment via Stepwise Decomposition

Authors: Jiaqi Han, Austin Wang, Minkai Xu, Wenda Chu, Meihua Dang, Yisong Yue, Stefano Ermon

Abstract: Discrete diffusion models have demonstrated great promise in modeling various sequence data, ranging from human language to biological sequences. Inspired by the success of RL in language models, there is growing interest in further improving the models by alignment with a certain reward. In this work, we propose a novel preference optimization method for masked discrete diffusion models through a principled diffusion trajectory alignment. Instead of applying the reward on the final output and backpropagating the gradient to the entire discrete denoising process, we decompose the problem into a set of stepwise alignment objectives. This framework enables efficient diffusion optimization, is compatible with arbitrary reward functions, and importantly, guarantees an equivalent optimal solution under additive factorization of the trajectory reward. Experiments across multiple domains including DNA sequence design, protein inverse folding, and language modeling consistently demonstrate the superiority of our approach. Notably, it achieves an up to 12\% improvement over the most competitive RL-based baseline in terms of predicted activity on DNA sequence design, and further improves the GSM8K score from 78.6 to 80.7 on LLaDA-8B-Instruct for language modeling.

new NTSFormer: A Self-Teaching Graph Transformer for Multimodal Cold-Start Node Classification

Authors: Jun Hu, Yufei He, Yuan Li, Bryan Hooi, Bingsheng He

Abstract: Cold-start node classification on multimodal graphs is challenging because cold-start nodes are isolated (i.e., no edges) and often have missing modalities (e.g., absent text or image features). Existing methods address structural isolation by degrading graph learning models to MLPs for cold-start inference, using a teacher model (with graph access) to guide the MLP. However, this results in limited model capacity in the student, which is further challenged when modalities are missing. In this paper, we propose Neighbor-to-Self Graph Transformer (NTSFormer), a unified Graph Transformer framework that jointly tackles the isolation and missing-modality issues via a self-teaching paradigm. Specifically, NTSFormer uses a cold-start attention mask to simultaneously make two predictions for each node: a "student" prediction based only on self-information (i.e., the node's own features), and a "teacher" prediction incorporating both self and neighbor information. This enables the model to supervise itself without degrading to an MLP, thereby fully leveraging the Transformer's capacity to handle missing modalities. To handle diverse graph information and missing modalities, NTSFormer performs a one-time multimodal graph pre-computation that converts structural and feature data into token sequences, which are then processed by a Mixture-of-Experts (MoE) Input Projection and Transformer layers for effective fusion. Experimental results on public datasets show that NTSFormer achieves superior performance on multimodal cold-start node classification tasks.

new Beyond Training-time Poisoning: Component-level and Post-training Backdoors in Deep Reinforcement Learning

Authors: Sanyam Vyas, Alberto Caron, Chris Hicks, Pete Burnap, Vasilios Mavroudis

Abstract: Deep Reinforcement Learning (DRL) systems are increasingly used in safety-critical applications, yet their security remains severely underexplored. This work investigates backdoor attacks, which implant hidden triggers that cause malicious actions only when specific inputs appear in the observation space. Existing DRL backdoor research focuses solely on training-time attacks requiring unrealistic access to the training pipeline. In contrast, we reveal critical vulnerabilities across the DRL supply chain where backdoors can be embedded with significantly reduced adversarial privileges. We introduce two novel attacks: (1) TrojanentRL, which exploits component-level flaws to implant a persistent backdoor that survives full model retraining; and (2) InfrectroRL, a post-training backdoor attack which requires no access to training, validation, nor test data. Empirical and analytical evaluations across six Atari environments show our attacks rival state-of-the-art training-time backdoor attacks while operating under much stricter adversarial constraints. We also demonstrate that InfrectroRL further evades two leading DRL backdoor defenses. These findings challenge the current research focus and highlight the urgent need for robust defenses.

new Fine-tuning on simulated data outperforms prompting for agent tone of voice

Authors: Ingo Marquardt, Philippe Brule

Abstract: Deploying language models (LMs) in customer-facing speech applications requires conversational fluency and adherence to specific stylistic guidelines. This can be challenging to achieve reliably using complex system prompts due to issues like instruction following limitations and in-context bias. This study investigates the effectiveness of fine-tuning versus system prompting for aligning LMs with a specific behavioral target: responding in a natural, conversational tone suitable for voice interactions. We fine-tuned a small, open-weights model (`Llama3.2-1B-Instruct`) using Low-Rank Adaptation (LoRA) on a synthetically generated dataset derived from Wikipedia. Additionally, we fine-tuned two closed-source models (`gpt-4o-mini`, `gpt-4.1-mini`). Our results demonstrate that fine-tuning outperformed system prompting, achieving a high percentage of conversational responses, even when trained on only 100 data samples. Semantic similarity analysis confirmed that fine-tuning did not degrade content quality. Interestingly, fine-tuning with 8-bit integer quantization converged faster towards the target style than using bfloat16 precision, potentially due to implicit regularization effects. We conclude that fine-tuning small, open-weights LMs on simulated data is a highly effective and data-efficient method for instilling specific stylistic behaviors, offering a preferable alternative to complex system prompting for practical applications requiring nuanced response styles.

new Object-centric Denoising Diffusion Models for Physical Reasoning

Authors: Moritz Lange, Raphael C. Engelhardt, Wolfgang Konen, Andrew Melnik, Laurenz Wiskott

Abstract: Reasoning about the trajectories of multiple, interacting objects is integral to physical reasoning tasks in machine learning. This involves conditions imposed on the objects at different time steps, for instance initial states or desired goal states. Existing approaches in physical reasoning generally rely on autoregressive modeling, which can only be conditioned on initial states, but not on later states. In fields such as planning for reinforcement learning, similar challenges are being addressed with denoising diffusion models. In this work, we propose an object-centric denoising diffusion model architecture for physical reasoning that is translation equivariant over time, permutation equivariant over objects, and can be conditioned on arbitrary time steps for arbitrary objects. We demonstrate how this model can solve tasks with multiple conditions and examine its performance when changing object numbers and trajectory lengths during inference.

new ConBatch-BAL: Batch Bayesian Active Learning under Budget Constraints

Authors: Pablo G. Morato, Charalampos P. Andriotis, Seyran Khademi

Abstract: Varying annotation costs among data points and budget constraints can hinder the adoption of active learning strategies in real-world applications. This work introduces two Bayesian active learning strategies for batch acquisition under constraints (ConBatch-BAL), one based on dynamic thresholding and one following greedy acquisition. Both select samples using uncertainty metrics computed via Bayesian neural networks. The dynamic thresholding strategy redistributes the budget across the batch, while the greedy one selects the top-ranked sample at each step, limited by the remaining budget. Focusing on scenarios with costly data annotation and geospatial constraints, we also release two new real-world datasets containing geolocated aerial images of buildings, annotated with energy efficiency or typology classes. The ConBatch-BAL strategies are benchmarked against a random acquisition baseline on these datasets under various budget and cost scenarios. The results show that the developed ConBatch-BAL strategies can reduce active learning iterations and data acquisition costs in real-world settings, and even outperform the unconstrained baseline solutions.

new Classification of autoimmune diseases from Peripheral blood TCR repertoires by multimodal multi-instance learning

Authors: Ruihao Zhang, Fei Ye, Dandan Meng, Yixuan Huang, Maochen, Xiao Liu

Abstract: T cell receptor (TCR) repertoires encode critical immunological signatures for autoimmune diseases, yet their clinical application remains limited by sequence sparsity and low witness rates. We developed EAMil, a multi-instance deep learning framework that leverages TCR sequencing data to diagnose systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) with exceptional accuracy. By integrating PrimeSeq feature extraction with ESMonehot encoding and enhanced gate attention mechanisms, our model achieved state-of-the-art performance with AUCs of 98.95% for SLE and 97.76% for RA. EAMil successfully identified disease-associated genes with over 90% concordance with established differential analyses and effectively distinguished disease-specific TCR genes. The model demonstrated robustness in classifying multiple disease categories, utilizing the SLEDAI score to stratify SLE patients by disease severity as well as to diagnose the site of damage in SLE patients, and effectively controlling for confounding factors such as age and gender. This interpretable framework for immune receptor analysis provides new insights for autoimmune disease detection and classification with broad potential clinical applications across immune-mediated conditions.

new Meta-Learning Transformers to Improve In-Context Generalization

Authors: Lorenzo Braccaioli, Anna Vettoruzzo, Prabhant Singh, Joaquin Vanschoren, Mohamed-Rafik Bouguelia, Nicola Conci

Abstract: In-context learning enables transformer models to generalize to new tasks based solely on input prompts, without any need for weight updates. However, existing training paradigms typically rely on large, unstructured datasets that are costly to store, difficult to evaluate for quality and balance, and pose privacy and ethical concerns due to the inclusion of sensitive information. Motivated by these limitations and risks, we propose an alternative training strategy where we leverage a collection of multiple, small-scale, and domain-specific datasets. We empirically demonstrate that the increased quality and diversity of such data improve the generalization abilities of in-context learners beyond their training domain, while achieving comparable performance with models trained on a single large-scale dataset. We investigate this paradigm by leveraging meta-learning to train an in-context learner on the Meta-Album collection under several settings. Firstly, we show the performance in a controlled environment, where the test domain is completely excluded from the training knowledge. Secondly, we explore the robustness of these models to forgetting in a continual scenario where the information is accessible for a limited time. Finally, we explore the more challenging unsupervised scenario. Our findings demonstrate that transformers still generalize for in-context prediction when trained on a curated dataset collection while offering advantages in modularity and replaceability.

new Beyond Scaling Curves: Internal Dynamics of Neural Networks Through the NTK Lens

Authors: Konstantin Nikolaou, Sven Krippendorf, Samuel Tovey, Christian Holm

Abstract: Scaling laws offer valuable insights into the relationship between neural network performance and computational cost, yet their underlying mechanisms remain poorly understood. In this work, we empirically analyze how neural networks behave under data and model scaling through the lens of the neural tangent kernel (NTK). This analysis establishes a link between performance scaling and the internal dynamics of neural networks. Our findings of standard vision tasks show that similar performance scaling exponents can occur even though the internal model dynamics show opposite behavior. This demonstrates that performance scaling alone is insufficient for understanding the underlying mechanisms of neural networks. We also address a previously unresolved issue in neural scaling: how convergence to the infinite-width limit affects scaling behavior in finite-width models. To this end, we investigate how feature learning is lost as the model width increases and quantify the transition between kernel-driven and feature-driven scaling regimes. We identify the maximum model width that supports feature learning, which, in our setups, we find to be more than ten times smaller than typical large language model widths.

new Replacing thinking with tool usage enables reasoning in small language models

Authors: Corrado Rainone, Tim Bakker, Roland Memisevic

Abstract: Recent advances have established a new machine learning paradigm based on scaling up compute at inference time as well as at training time. In that line of work, a combination of Supervised Fine-Tuning (SFT) on synthetic demonstrations and Reinforcement Learning with Verifiable Rewards (RLVR) is used for training Large Language Models to expend extra compute during inference in the form of "thoughts" expressed in natural language. In this paper, we propose to instead format these tokens as a multi-turn interaction trace with a stateful tool. At each turn, the new state of the tool is appended to the context of the model, whose job is to generate the tokens necessary to control the tool via a custom DSL. We benchmark this approach on the problem of repairing malfunctioning Python code, and show that this constrained setup allows for faster sampling of experience and a denser reward signal, allowing even models of size up to 3B parameters to learn how to proficiently expend additional compute on the task.

new Distribution-dependent Generalization Bounds for Tuning Linear Regression Across Tasks

Authors: Maria-Florina Balcan, Saumya Goyal, Dravyansh Sharma

Abstract: Modern regression problems often involve high-dimensional data and a careful tuning of the regularization hyperparameters is crucial to avoid overly complex models that may overfit the training data while guaranteeing desirable properties like effective variable selection. We study the recently introduced direction of tuning regularization hyperparameters in linear regression across multiple related tasks. We obtain distribution-dependent bounds on the generalization error for the validation loss when tuning the L1 and L2 coefficients, including ridge, lasso and the elastic net. In contrast, prior work develops bounds that apply uniformly to all distributions, but such bounds necessarily degrade with feature dimension, d. While these bounds are shown to be tight for worst-case distributions, our bounds improve with the "niceness" of the data distribution. Concretely, we show that under additional assumptions that instances within each task are i.i.d. draws from broad well-studied classes of distributions including sub-Gaussians, our generalization bounds do not get worse with increasing d, and are much sharper than prior work for very large d. We also extend our results to a generalization of ridge regression, where we achieve tighter bounds that take into account an estimate of the mean of the ground truth distribution.

new Exploring Semantic Clustering and Similarity Search for Heterogeneous Traffic Scenario Graph

Authors: Ferdinand M\"utsch, Maximilian Zipfl, Nikolai Polley, J. Marius Z\"ollner

Abstract: Scenario-based testing is an indispensable instrument for the comprehensive validation and verification of automated vehicles (AVs). However, finding a manageable and finite, yet representative subset of scenarios in a scalable, possibly unsupervised manner is notoriously challenging. Our work is meant to constitute a cornerstone to facilitate sample-efficient testing, while still capturing the diversity of relevant operational design domains (ODDs) and accounting for the "long tail" phenomenon in particular. To this end, we first propose an expressive and flexible heterogeneous, spatio-temporal graph model for representing traffic scenarios. Leveraging recent advances of graph neural networks (GNNs), we then propose a self-supervised method to learn a universal embedding space for scenario graphs that enables clustering and similarity search. In particular, we implement contrastive learning alongside a bootstrapping-based approach and evaluate their suitability for partitioning the scenario space. Experiments on the nuPlan dataset confirm the model's ability to capture semantics and thus group related scenarios in a meaningful way despite the absence of discrete class labels. Different scenario types materialize as distinct clusters. Our results demonstrate how variable-length traffic scenarios can be condensed into single vector representations that enable nearest-neighbor retrieval of representative candidates for distinct scenario categories. Notably, this is achieved without manual labeling or bias towards an explicit objective such as criticality. Ultimately, our approach can serve as a basis for scalable selection of scenarios to further enhance the efficiency and robustness of testing AVs in simulation.

new PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs

Authors: Xinzhe Zheng, Hao Du, Fanding Xu, Jinzhe Li, Zhiyuan Liu, Wenkang Wang, Tao Chen, Wanli Ouyang, Stan Z. Li, Yan Lu, Nanqing Dong, Yang Zhang

Abstract: Deep learning-based computational methods have achieved promising results in predicting protein-protein interactions (PPIs). However, existing benchmarks predominantly focus on isolated pairwise evaluations, overlooking a model's capability to reconstruct biologically meaningful PPI networks, which is crucial for biology research. To address this gap, we introduce PRING, the first comprehensive benchmark that evaluates protein-protein interaction prediction from a graph-level perspective. PRING curates a high-quality, multi-species PPI network dataset comprising 21,484 proteins and 186,818 interactions, with well-designed strategies to address both data redundancy and leakage. Building on this golden-standard dataset, we establish two complementary evaluation paradigms: (1) topology-oriented tasks, which assess intra and cross-species PPI network construction, and (2) function-oriented tasks, including protein complex pathway prediction, GO module analysis, and essential protein justification. These evaluations not only reflect the model's capability to understand the network topology but also facilitate protein function annotation, biological module detection, and even disease mechanism analysis. Extensive experiments on four representative model categories, consisting of sequence similarity-based, naive sequence-based, protein language model-based, and structure-based approaches, demonstrate that current PPI models have potential limitations in recovering both structural and functional properties of PPI networks, highlighting the gap in supporting real-world biological applications. We believe PRING provides a reliable platform to guide the development of more effective PPI prediction models for the community. The dataset and source code of PRING are available at https://github.com/SophieSarceau/PRING.

URLs: https://github.com/SophieSarceau/PRING.

new DICE: Discrete inverse continuity equation for learning population dynamics

Authors: Tobias Blickhan, Jules Berman, Andrew Stuart, Benjamin Peherstorfer

Abstract: We introduce the Discrete Inverse Continuity Equation (DICE) method, a generative modeling approach that learns the evolution of a stochastic process from given sample populations at a finite number of time points. Models learned with DICE capture the typically smooth and well-behaved population dynamics, rather than the dynamics of individual sample trajectories that can exhibit complex or even chaotic behavior. The DICE loss function is developed specifically to be invariant, even in discrete time, to spatially constant but time-varying spurious constants that can emerge during training; this invariance increases training stability and robustness. Generating a trajectory of sample populations with DICE is fast because samples evolve directly in the time interval over which the stochastic process is formulated, in contrast to approaches that condition on time and then require multiple sampling steps per time step. DICE is stable to train, in situations where other methods for learning population dynamics fail, and DICE generates representative samples with orders of magnitude lower costs than methods that have to condition on time. Numerical experiments on a wide range of problems from random waves, Vlasov-Poisson instabilities and high-dimensional chaos are included to justify these assertions.

new Kronecker-factored Approximate Curvature (KFAC) From Scratch

Authors: Felix Dangel, B\'alint Mucs\'anyi, Tobias Weber, Runa Eschenhagen

Abstract: Kronecker-factored approximate curvature (KFAC) is arguably one of the most prominent curvature approximations in deep learning. Its applications range from optimization to Bayesian deep learning, training data attribution with influence functions, and model compression or merging. While the intuition behind KFAC is easy to understand, its implementation is tedious: It comes in many flavours, has common pitfalls when translating the math to code, and is challenging to test, which complicates ensuring a properly functioning implementation. Some of the authors themselves have dealt with these challenges and experienced the discomfort of not being able to fully test their code. Thanks to recent advances in understanding KFAC, we are now able to provide test cases and a recipe for a reliable KFAC implementation. This tutorial is meant as a ground-up introduction to KFAC. In contrast to the existing work, our focus lies on providing both math and code side-by-side and providing test cases based on the latest insights into KFAC that are scattered throughout the literature. We hope this tutorial provides a contemporary view of KFAC that allows beginners to gain a deeper understanding of this curvature approximation while lowering the barrier to its implementation, extension, and usage in practice.

new Deep Learning to Automate Parameter Extraction and Model Fitting of Two-Dimensional Transistors

Authors: Robert K. A. Bennett, Jan-Lucas Uslu, Harmon F. Gault, Asir Intisar Khan, Lauren Hoang, Tara Pe\~na, Kathryn Neilson, Young Suh Song, Zhepeng Zhang, Andrew J. Mannix, Eric Pop

Abstract: We present a deep learning approach to extract physical parameters (e.g., mobility, Schottky contact barrier height, defect profiles) of two-dimensional (2D) transistors from electrical measurements, enabling automated parameter extraction and technology computer-aided design (TCAD) fitting. To facilitate this task, we implement a simple data augmentation and pre-training approach by training a secondary neural network to approximate a physics-based device simulator. This method enables high-quality fits after training the neural network on electrical data generated from physics-based simulations of ~500 devices, a factor >40$\times$ fewer than other recent efforts. Consequently, fitting can be achieved by training on physically rigorous TCAD models, including complex geometry, self-consistent transport, and electrostatic effects, and is not limited to computationally inexpensive compact models. We apply our approach to reverse-engineer key parameters from experimental monolayer WS$_2$ transistors, achieving a median coefficient of determination ($R^2$) = 0.99 when fitting measured electrical data. We also demonstrate that this approach generalizes and scales well by reverse-engineering electrical data on high-electron-mobility transistors while fitting 35 parameters simultaneously. To facilitate future research on deep learning approaches for inverse transistor design, we have published our code and sample data sets online.

new Hardware-efficient tractable probabilistic inference for TinyML Neurosymbolic AI applications

Authors: Jelin Leslin, Martin Trapp, Martin Andraud

Abstract: Neurosymbolic AI (NSAI) has recently emerged to mitigate limitations associated with deep learning (DL) models, e.g. quantifying their uncertainty or reason with explicit rules. Hence, TinyML hardware will need to support these symbolic models to bring NSAI to embedded scenarios. Yet, although symbolic models are typically compact, their sparsity and computation resolution contrasts with low-resolution and dense neuro models, which is a challenge on resource-constrained TinyML hardware severely limiting the size of symbolic models that can be computed. In this work, we remove this bottleneck leveraging a tight hardware/software integration to present a complete framework to compute NSAI with TinyML hardware. We focus on symbolic models realized with tractable probabilistic circuits (PCs), a popular subclass of probabilistic models for hardware integration. This framework: (1) trains a specific class of hardware-efficient \emph{deterministic} PCs, chosen for the symbolic task; (2) \emph{compresses} this PC until it can be computed on TinyML hardware with minimal accuracy degradation, using our $n^{th}$-root compression technique, and (3) \emph{deploys} the complete NSAI model on TinyML hardware. Compared to a 64b precision baseline necessary for the PC without compression, our workflow leads to significant hardware reduction on FPGA (up to 82.3\% in FF, 52.6\% in LUTs, and 18.0\% in Flash usage) and an average inference speedup of 4.67x on ESP32 microcontroller.

new A generalized Wasserstein-2 distance approach for efficient reconstruction of random field models using stochastic neural networks

Authors: Mingtao Xia, Qijing Shen

Abstract: In this work, we propose a novel generalized Wasserstein-2 distance approach for efficiently training stochastic neural networks to reconstruct random field models, where the target random variable comprises both continuous and categorical components. We prove that a stochastic neural network can approximate random field models under a Wasserstein-2 distance metric under nonrestrictive conditions. Furthermore, this stochastic neural network can be efficiently trained by minimizing our proposed generalized local squared Wasserstein-2 loss function. We showcase the effectiveness of our proposed approach in various uncertainty quantification tasks, including classification, reconstructing the distribution of mixed random variables, and learning complex noisy dynamical systems from spatiotemporal data.

new Critiques of World Models

Authors: Eric Xing, Mingkai Deng, Jinyu Hou, Zhiting Hu

Abstract: World Model, the supposed algorithmic surrogate of the real-world environment which biological agents experience with and act upon, has been an emerging topic in recent years because of the rising needs to develop virtual agents with artificial (general) intelligence. There has been much debate on what a world model really is, how to build it, how to use it, and how to evaluate it. In this essay, starting from the imagination in the famed Sci-Fi classic Dune, and drawing inspiration from the concept of "hypothetical thinking" in psychology literature, we offer critiques of several schools of thoughts on world modeling, and argue the primary goal of a world model to be simulating all actionable possibilities of the real world for purposeful reasoning and acting. Building on the critiques, we propose a new architecture for a general-purpose world model, based on hierarchical, multi-level, and mixed continuous/discrete representations, and a generative and self-supervision learning framework, with an outlook of a Physical, Agentic, and Nested (PAN) AGI system enabled by such a model.

new Train-before-Test Harmonizes Language Model Rankings

Authors: Guanhua Zhang, Ricardo Dominguez-Olmedo, Moritz Hardt

Abstract: Existing language model benchmarks provide contradictory model rankings, even for benchmarks that aim to capture similar skills. This dilemma of conflicting rankings hampers model selection, clouds model comparisons, and adds confusion to a growing ecosystem of competing models. Recent work attributed ranking disagreement to the phenomenon of training on the test task: As released, different models exhibit a different level of preparation for any given test task. A candidate solution to the problem is train-before-test: Give each model the same benchmark-specific finetuning before evaluation. Our primary contribution is a broad empirical evaluation of train-before-test across 24 benchmarks and 61 models. We show that train-before-test significantly improves ranking agreement consistently across all benchmarks. Whereas rankings have little external validity to start with, they enjoy a significant degree of external validity when applying train-before-test: Model rankings transfer gracefully from one benchmark to the other. Even within the same model family, train-before-test reduces strong ranking disagreement to near-perfect agreement. In addition, train-before-test reduces the model-score matrix to essentially rank one, revealing new insights into the latent factors of benchmark performance. Our work supports the recommendation to make train-before-test a default component of LLM benchmarking.

new Bridging Prediction and Intervention Problems in Social Systems

Authors: Lydia T. Liu, Inioluwa Deborah Raji, Angela Zhou, Luke Guerdan, Jessica Hullman, Daniel Malinsky, Bryan Wilder, Simone Zhang, Hammaad Adam, Amanda Coston, Ben Laufer, Ezinne Nwankwo, Michael Zanger-Tishler, Eli Ben-Michael, Solon Barocas, Avi Feller, Marissa Gerchick, Talia Gillis, Shion Guha, Daniel Ho, Lily Hu, Kosuke Imai, Sayash Kapoor, Joshua Loftus, Razieh Nabi, Arvind Narayanan, Ben Recht, Juan Carlos Perdomo, Matthew Salganik, Mark Sendak, Alexander Tolbert, Berk Ustun, Suresh Venkatasubramanian, Angelina Wang, Ashia Wilson

Abstract: Many automated decision systems (ADS) are designed to solve prediction problems -- where the goal is to learn patterns from a sample of the population and apply them to individuals from the same population. In reality, these prediction systems operationalize holistic policy interventions in deployment. Once deployed, ADS can shape impacted population outcomes through an effective policy change in how decision-makers operate, while also being defined by past and present interactions between stakeholders and the limitations of existing organizational, as well as societal, infrastructure and context. In this work, we consider the ways in which we must shift from a prediction-focused paradigm to an interventionist paradigm when considering the impact of ADS within social systems. We argue this requires a new default problem setup for ADS beyond prediction, to instead consider predictions as decision support, final decisions, and outcomes. We highlight how this perspective unifies modern statistical frameworks and other tools to study the design, implementation, and evaluation of ADS systems, and point to the research directions necessary to operationalize this paradigm shift. Using these tools, we characterize the limitations of focusing on isolated prediction tasks, and lay the foundation for a more intervention-oriented approach to developing and deploying ADS.

new QuEst: Enhancing Estimates of Quantile-Based Distributional Measures Using Model Predictions

Authors: Zhun Deng, Thomas P Zollo, Benjamin Eyre, Amogh Inamdar, David Madras, Richard Zemel

Abstract: As machine learning models grow increasingly competent, their predictions can supplement scarce or expensive data in various important domains. In support of this paradigm, algorithms have emerged to combine a small amount of high-fidelity observed data with a much larger set of imputed model outputs to estimate some quantity of interest. Yet current hybrid-inference tools target only means or single quantiles, limiting their applicability for many critical domains and use cases. We present QuEst, a principled framework to merge observed and imputed data to deliver point estimates and rigorous confidence intervals for a wide family of quantile-based distributional measures. QuEst covers a range of measures, from tail risk (CVaR) to population segments such as quartiles, that are central to fields such as economics, sociology, education, medicine, and more. We extend QuEst to multidimensional metrics, and introduce an additional optimization technique to further reduce variance in this and other hybrid estimators. We demonstrate the utility of our framework through experiments in economic modeling, opinion polling, and language model auto-evaluation.

new Cascade: Token-Sharded Private LLM Inference

Authors: Rahul Thomas, Louai Zahran, Erica Choi, Akilesh Potti, Micah Goldblum, Arka Pal

Abstract: As LLMs continue to increase in parameter size, the computational resources required to run them are available to fewer parties. Therefore, third-party inference services -- where LLMs are hosted by third parties with significant computational resources -- are becoming increasingly popular. However, third party inference raises critical concerns about user data privacy. To mitigate these risks, privacy researchers have developed provably secure schemes for third-party inference, such as Secure Multi-Party Computation (SMPC). However, SMPC protocols have significant computational and communication overhead, and do not scale to large models. In this work, we propose a new multi-party inference protocol, Cascade, that avoids these punitive costs by leveraging sharding in the sequence dimension to maintain privacy, trading off cryptographic privacy guarantees for increased performance and scalability. We demonstrate that Cascade is resistant to a generalization of a recent attack that is highly effective against other statistical privacy schemes, and that it is further resistant to learning-based attacks. As Cascade is orders of magnitude faster than existing schemes, our findings offer practical solutions for secure deployment of modern state-of-the-art LLMs.

new Logit Reweighting for Topic-Focused Summarization

Authors: Joschka Braun, B\'alint Mucs\'anyi, Seyed Ali Bahrainian

Abstract: Generating abstractive summaries that adhere to a specific topic remains a significant challenge for language models. While standard approaches, such as fine-tuning, are resource-intensive, simpler methods like prompt engineering often struggle to maintain topical focus, particularly with smaller models. To address this, we propose a lightweight method that enhances topical relevance by directly reweighting the logits of topic-relevant tokens during generation. We evaluate three such reweighting techniques: Constant Shift, which adds a constant value to logits; Factor Scaling, which multiplies them by a factor; and Threshold Selection, which selectively boosts logits that exceed a probability threshold. Experiments on the NEWTS topical summarization dataset, using both Gemma-2B and Llama-3-8B models, show that these techniques effectively increase the use of topic-relevant vocabulary. Notably, the Threshold Selection method successfully improves topical focus without compromising summary quality-a trade-off often seen in other approaches. Our findings demonstrate that directly reweighting logits is a practical and resource-efficient alternative to fine-tuning, offering a promising pathway for precisely controlling the thematic content of generated text.

new Multi-Disease Deep Learning Framework for GWAS: Beyond Feature Selection Constraints

Authors: Iqra Farooq, Sara Atito, Ayse Demirkan, Inga Prokopenko, Muhammad Rana

Abstract: Traditional GWAS has advanced our understanding of complex diseases but often misses nonlinear genetic interactions. Deep learning offers new opportunities to capture complex genomic patterns, yet existing methods mostly depend on feature selection strategies that either constrain analysis to known pathways or risk data leakage when applied across the full dataset. Further, covariates can inflate predictive performance without reflecting true genetic signals. We explore different deep learning architecture choices for GWAS and demonstrate that careful architectural choices can outperform existing methods under strict no-leakage conditions. Building on this, we extend our approach to a multi-label framework that jointly models five diseases, leveraging shared genetic architecture for improved efficiency and discovery. Applied to five million SNPs across 37,000 samples, our method achieves competitive predictive performance (AUC 0.68-0.96), offering a scalable, leakage-free, and biologically meaningful approach for multi-disease GWAS analysis.

cross An Adaptive Simulated Annealing-Based Machine Learning Approach for Developing an E-Triage Tool for Hospital Emergency Operations

Authors: Abdulaziz Ahmed, Mohammed Al-Maamari, Mohammad Firouz, Dursun Delen

Abstract: Patient triage at emergency departments (EDs) is necessary to prioritize care for patients with critical and time-sensitive conditions. Different tools are used for patient triage and one of the most common ones is the emergency severity index (ESI), which has a scale of five levels, where level 1 is the most urgent and level 5 is the least urgent. This paper proposes a framework for utilizing machine learning to develop an e-triage tool that can be used at EDs. A large retrospective dataset of ED patient visits is obtained from the electronic health record of a healthcare provider in the Midwest of the US for three years. However, the main challenge of using machine learning algorithms is that most of them have many parameters and without optimizing these parameters, developing a high-performance model is not possible. This paper proposes an approach to optimize the hyperparameters of machine learning. The metaheuristic optimization algorithms simulated annealing (SA) and adaptive simulated annealing (ASA) are proposed to optimize the parameters of extreme gradient boosting (XGB) and categorical boosting (CaB). The newly proposed algorithms are SA-XGB, ASA-XGB, SA-CaB, ASA-CaB. Grid search (GS), which is a traditional approach used for machine learning fine-tunning is also used to fine-tune the parameters of XGB and CaB, which are named GS-XGB and GS-CaB. The six algorithms are trained and tested using eight data groups obtained from the feature selection phase. The results show ASA-CaB outperformed all the proposed algorithms with accuracy, precision, recall, and f1 of 83.3%, 83.2%, 83.3%, 83.2%, respectively.

cross Automated Energy Billing with Blockchain and the Prophet Forecasting Model: A Holistic Approach

Authors: Ajesh Thangaraj Nadar, Soham Chandane, Gabriel Nixon Raj, Nihar Mahesh Pasi, Yash Arvind Patil

Abstract: This paper presents a comprehensive approach to automated energy billing that leverages IoT-based smart meters, blockchain technology, and the Prophet time series forecasting model. The proposed system facilitates real-time power consumption monitoring via Wi-Fi-enabled ESP32 modules and a mobile application interface. It integrates Firebase and blockchain for secure, transparent billing processes and employs smart contracts for automated payments. The Prophet model is used for energy demand forecasting, with careful data preprocessing, transformation, and parameter tuning to improve prediction accuracy. This holistic solution aims to reduce manual errors, enhance user awareness, and promote sustainable energy use.

cross On the Structure of Replicable Hypothesis Testers

Authors: Anders Aamand, Maryam Aliakbarpour, Justin Y. Chen, Shyam Narayanan, Sandeep Silwal

Abstract: A hypothesis testing algorithm is replicable if, when run on two different samples from the same distribution, it produces the same output with high probability. This notion, defined by by Impagliazzo, Lei, Pitassi, and Sorell [STOC'22], can increase trust in testing procedures and is deeply related to algorithmic stability, generalization, and privacy. We build general tools to prove lower and upper bounds on the sample complexity of replicable testers, unifying and quantitatively improving upon existing results. We identify a set of canonical properties, and prove that any replicable testing algorithm can be modified to satisfy these properties without worsening accuracy or sample complexity. A canonical replicable algorithm computes a deterministic function of its input (i.e., a test statistic) and thresholds against a uniformly random value in $[0,1]$. It is invariant to the order in which the samples are received, and, if the testing problem is ``symmetric,'' then the algorithm is also invariant to the labeling of the domain elements, resolving an open question by Liu and Ye [NeurIPS'24]. We prove new lower bounds for uniformity, identity, and closeness testing by reducing to the case where the replicable algorithm satisfies these canonical properties. We systematize and improve upon a common strategy for replicable algorithm design based on test statistics with known expectation and bounded variance. Our framework allow testers which have been extensively analyzed in the non-replicable setting to be made replicable with minimal overhead. As direct applications of our framework, we obtain constant-factor optimal bounds for coin testing and closeness testing and get replicability for free in a large parameter regime for uniformity testing. We also give state-of-the-art bounds for replicable Gaussian mean testing, and, unlike prior work, our algorithm runs in polynomial time.

cross A Simulator Dataset to Support the Study of Impaired Driving

Authors: John Gideon, Kimimasa Tamura, Emily Sumner, Laporsha Dees, Patricio Reyes Gomez, Bassamul Haq, Todd Rowell, Avinash Balachandran, Simon Stent, Guy Rosman

Abstract: Despite recent advances in automated driving technology, impaired driving continues to incur a high cost to society. In this paper, we present a driving dataset designed to support the study of two common forms of driver impairment: alcohol intoxication and cognitive distraction. Our dataset spans 23.7 hours of simulated urban driving, with 52 human subjects under normal and impaired conditions, and includes both vehicle data (ground truth perception, vehicle pose, controls) and driver-facing data (gaze, audio, surveys). It supports analysis of changes in driver behavior due to alcohol intoxication (0.10\% blood alcohol content), two forms of cognitive distraction (audio n-back and sentence parsing tasks), and combinations thereof, as well as responses to a set of eight controlled road hazards, such as vehicle cut-ins. The dataset will be made available at https://toyotaresearchinstitute.github.io/IDD/.

URLs: https://toyotaresearchinstitute.github.io/IDD/.

cross ZettaLith: An Architectural Exploration of Extreme-Scale AI Inference Acceleration

Authors: Kia Silverbrook

Abstract: The high computational cost and power consumption of current and anticipated AI systems present a major challenge for widespread deployment and further scaling. Current hardware approaches face fundamental efficiency limits. This paper introduces ZettaLith, a scalable computing architecture designed to reduce the cost and power of AI inference by over 1,000x compared to current GPU-based systems. Based on architectural analysis and technology projections, a single ZettaLith rack could potentially achieve 1.507 zettaFLOPS in 2027 - representing a theoretical 1,047x improvement in inference performance, 1,490x better power efficiency, and could be 2,325x more cost-effective than current leading GPU racks for FP4 transformer inference. The ZettaLith architecture achieves these gains by abandoning general purpose GPU applications, and via the multiplicative effect of numerous co-designed architectural innovations using established digital electronic technologies, as detailed in this paper. ZettaLith's core architectural principles scale down efficiently to exaFLOPS desktop systems and petaFLOPS mobile chips, maintaining their roughly 1,000x advantage. ZettaLith presents a simpler system architecture compared to the complex hierarchy of current GPU clusters. ZettaLith is optimized exclusively for AI inference and is not applicable for AI training.

cross Biaxialformer: Leveraging Channel Independence and Inter-Channel Correlations in EEG Signal Decoding for Predicting Neurological Outcomes

Authors: Naimahmed Nesaragi, Hemin Ali Qadir, Per Steiner Halvorsen, Ilangko Balasingham

Abstract: Accurate decoding of EEG signals requires comprehensive modeling of both temporal dynamics within individual channels and spatial dependencies across channels. While Transformer-based models utilizing channel-independence (CI) strategies have demonstrated strong performance in various time series tasks, they often overlook the inter-channel correlations that are critical in multivariate EEG signals. This omission can lead to information degradation and reduced prediction accuracy, particularly in complex tasks such as neurological outcome prediction. To address these challenges, we propose Biaxialformer, characterized by a meticulously engineered two-stage attention-based framework. This model independently captures both sequence-specific (temporal) and channel-specific (spatial) EEG information, promoting synergy and mutual reinforcement across channels without sacrificing CI. By employing joint learning of positional encodings, Biaxialformer preserves both temporal and spatial relationships in EEG data, mitigating the interchannel correlation forgetting problem common in traditional CI models. Additionally, a tokenization module with variable receptive fields balance the extraction of fine-grained, localized features and broader temporal dependencies. To enhance spatial feature extraction, we leverage bipolar EEG signals, which capture inter-hemispheric brain interactions, a critical but often overlooked aspect in EEG analysis. Our study broadens the use of Transformer-based models by addressing the challenge of predicting neurological outcomes in comatose patients. Using the multicenter I-CARE data from five hospitals, we validate the robustness and generalizability of Biaxialformer with an average AUC 0.7688, AUPRC 0.8643, and F1 0.6518 in a cross-hospital scenario.

cross DISPROTBENCH: A Disorder-Aware, Task-Rich Benchmark for Evaluating Protein Structure Prediction in Realistic Biological Contexts

Authors: Xinyue Zeng, Tuo Wang, Adithya Kulkarni, Alexander Lu, Alexandra Ni, Phoebe Xing, Junhan Zhao, Siwei Chen, Dawei Zhou

Abstract: Recent advances in protein structure prediction have achieved near-atomic accuracy for well-folded proteins. However, current benchmarks inadequately assess model performance in biologically challenging contexts, especially those involving intrinsically disordered regions (IDRs), limiting their utility in applications such as drug discovery, disease variant interpretation, and protein interface design. We introduce DisProtBench, a comprehensive benchmark for evaluating protein structure prediction models (PSPMs) under structural disorder and complex biological conditions. DisProtBench spans three key axes: (1) Data complexity, covering disordered regions, G protein-coupled receptor (GPCR) ligand pairs, and multimeric complexes; (2) Task diversity, benchmarking twelve leading PSPMs across structure-based tasks with unified classification, regression, and interface metrics; and (3) Interpretability, via the DisProtBench Portal, which provides precomputed 3D structures and visual error analyses. Our results reveal significant variability in model robustness under disorder, with low-confidence regions linked to functional prediction failures. Notably, global accuracy metrics often fail to predict task performance in disordered settings, emphasizing the need for function-aware evaluation. DisProtBench establishes a reproducible, extensible, and biologically grounded framework for assessing next-generation PSPMs in realistic biomedical scenarios.

cross Modeling Membrane Degradation in PEM Electrolyzers with Physics-Informed Neural Networks

Authors: Alejandro Polo-Molina, Jose Portela, Luis Alberto Herrero Rozas, Rom\'an Cicero Gonz\'alez

Abstract: Proton exchange membrane (PEM) electrolyzers are pivotal for sustainable hydrogen production, yet their long-term performance is hindered by membrane degradation, which poses reliability and safety challenges. Therefore, accurate modeling of this degradation is essential for optimizing durability and performance. To address these concerns, traditional physics-based models have been developed, offering interpretability but requiring numerous parameters that are often difficult to measure and calibrate. Conversely, data-driven approaches, such as machine learning, offer flexibility but may lack physical consistency and generalizability. To address these limitations, this study presents the first application of Physics-Informed Neural Networks (PINNs) to model membrane degradation in PEM electrolyzers. The proposed PINN framework couples two ordinary differential equations, one modeling membrane thinning via a first-order degradation law and another governing the time evolution of the cell voltage under membrane degradation. Results demonstrate that the PINN accurately captures the long-term system's degradation dynamics while preserving physical interpretability with limited noisy data. Consequently, this work introduces a novel hybrid modeling approach for estimating and understanding membrane degradation mechanisms in PEM electrolyzers, offering a foundation for more robust predictive tools in electrochemical system diagnostics.

cross A Novel Method to Manage Production on Industry 4.0: Forecasting Overall Equipment Efficiency by Time Series with Topological Features

Authors: Korkut Anapa, \.Ismail G\"uzel, Ceylan Yozgatl{\i}gil

Abstract: Purpose: Overall equipment efficiency (OEE) is a key manufacturing KPI, but its volatile nature complicates short-term forecasting. This study presents a novel framework combining time series decomposition and topological data analysis to improve OEE prediction across various equipment, such as hydraulic press systems. Methods: The approach begins by decomposing hourly OEE data into trend, seasonal, and residual components. The residual, capturing short-term variability, is modeled using a seasonal ARIMA with exogenous variables (SARIMAX). These exogenous features include statistical descriptors and topological summaries from related time series. To manage the high-dimensional input space, we propose a hybrid feature selection strategy using recursive feature elimination based on statistically significant SARIMAX predictors, coupled with BIC-guided particle swarm optimization. The framework is evaluated on real-world datasets from multiple production systems. Results: The proposed model consistently outperforms conventional time series models and advanced transformer-based approaches, achieving significantly lower mean absolute error and mean absolute percentage error. Conclusion: Integrating classical forecasting with topological data analysis enhances OEE prediction accuracy, enabling proactive maintenance and informed production decisions in complex manufacturing environments.

cross Online Continual Learning via Spiking Neural Networks with Sleep Enhanced Latent Replay

Authors: Erliang Lin, Wenbin Luo, Wei Jia, Yu Chen, Shaofu Yang

Abstract: Edge computing scenarios necessitate the development of hardware-efficient online continual learning algorithms to be adaptive to dynamic environment. However, existing algorithms always suffer from high memory overhead and bias towards recently trained tasks. To tackle these issues, this paper proposes a novel online continual learning approach termed as SESLR, which incorporates a sleep enhanced latent replay scheme with spiking neural networks (SNNs). SESLR leverages SNNs' binary spike characteristics to store replay features in single bits, significantly reducing memory overhead. Furthermore, inspired by biological sleep-wake cycles, SESLR introduces a noise-enhanced sleep phase where the model exclusively trains on replay samples with controlled noise injection, effectively mitigating classification bias towards new classes. Extensive experiments on both conventional (MNIST, CIFAR10) and neuromorphic (NMNIST, CIFAR10-DVS) datasets demonstrate SESLR's effectiveness. On Split CIFAR10, SESLR achieves nearly 30% improvement in average accuracy with only one-third of the memory consumption compared to baseline methods. On Split CIFAR10-DVS, it improves accuracy by approximately 10% while reducing memory overhead by a factor of 32. These results validate SESLR as a promising solution for online continual learning in resource-constrained edge computing scenarios.

cross Preference-Optimal Multi-Metric Weighting for Parallel Coordinate Plots

Authors: Chisa Mori, Shuhei Watanabe, Masaki Onishi, Takayuki Itoh

Abstract: Parallel coordinate plots (PCPs) are a prevalent method to interpret the relationship between the control parameters and metrics. PCPs deliver such an interpretation by color gradation based on a single metric. However, it is challenging to provide such a gradation when multiple metrics are present. Although a naive approach involves calculating a single metric by linearly weighting each metric, such weighting is unclear for users. To address this problem, we first propose a principled formulation for calculating the optimal weight based on a specific preferred metric combination. Although users can simply select their preference from a two-dimensional (2D) plane for bi-metric problems, multi-metric problems require intuitive visualization to allow them to select their preference. We achieved this using various radar charts to visualize the metric trade-offs on the 2D plane reduced by UMAP. In the analysis using pedestrian flow guidance planning, our method identified unique patterns of control parameter importance for each user preference, highlighting the effectiveness of our method.

cross Enhancing Sports Strategy with Video Analytics and Data Mining: Automated Video-Based Analytics Framework for Tennis Doubles

Authors: Jia Wei Chen

Abstract: We present a comprehensive video-based analytics framework for tennis doubles that addresses the lack of automated analysis tools for this strategically complex sport. Our approach introduces a standardised annotation methodology encompassing player positioning, shot types, court formations, and match outcomes, coupled with a specialised annotation tool designed to meet the unique requirements of tennis video labelling. The framework integrates advanced machine learning techniques including GroundingDINO for precise player localisation through natural language grounding and YOLO-Pose for robust pose estimation. This combination significantly reduces manual annotation effort whilst improving data consistency and quality. We evaluate our approach on doubles tennis match data and demonstrate that CNN-based models with transfer learning substantially outperform pose-based methods for predicting shot types, player positioning, and formations. The CNN models effectively capture complex visual and contextual features essential for doubles tennis analysis. Our integrated system bridges advanced analytical capabilities with the strategic complexities of tennis doubles, providing a foundation for automated tactical analysis, performance evaluation, and strategic modelling in professional tennis.

cross Toward Cyclic A.I. Modelling of Self-Regulated Learning: A Case Study with E-Learning Trace Data

Authors: Andrew Schwabe, \"Ozg\"ur Akg\"un, Ella Haig

Abstract: Many e-learning platforms assert their ability or potential to improve students' self-regulated learning (SRL), however the cyclical and undirected nature of SRL theoretical models represent significant challenges for representation within contemporary machine learning frameworks. We apply SRL-informed features to trace data in order to advance modelling of students' SRL activities, to improve predictability and explainability regarding the causal effects of learning in an eLearning environment. We demonstrate that these features improve predictive accuracy and validate the value of further research into cyclic modelling techniques for SRL.

cross OAK -- Onboarding with Actionable Knowledge

Authors: Steve Dev\`enes (Institute of Systems Engineering, HEI-VS, HES-SO University of Applied Sciences and Arts Western Switzerland), Marine Capallera (HumanTech Institute, HEIA HES-SO University of Applied Sciences and Arts Western Switzerland), Robin Cherix (HumanTech Institute, HEIA HES-SO University of Applied Sciences and Arts Western Switzerland), Elena Mugellini (HumanTech Institute, HEIA HES-SO University of Applied Sciences and Arts Western Switzerland), Omar Abou Khaled (HumanTech Institute, HEIA HES-SO University of Applied Sciences and Arts Western Switzerland), Francesco Carrino (Institute of Systems Engineering, HEI-VS, HES-SO University of Applied Sciences and Arts Western Switzerland)

Abstract: The loss of knowledge when skilled operators leave poses a critical issue for companies. This know-how is diverse and unstructured. We propose a novel method that combines knowledge graph embeddings and multi-modal interfaces to collect and retrieve expertise, making it actionable. Our approach supports decision-making on the shop floor. Additionally, we leverage LLMs to improve query understanding and provide adapted answers. As application case studies, we developed a proof-of-concept for quality control in high precision manufacturing.

cross Audio-JEPA: Joint-Embedding Predictive Architecture for Audio Representation Learning

Authors: Ludovic Tuncay (IRIT-SAMoVA), Etienne Labb\'e (IRIT-SAMoVA), Emmanouil Benetos (QMUL), Thomas Pellegrini (IRIT-SAMoVA)

Abstract: Building on the Joint-Embedding Predictive Architecture (JEPA) paradigm, a recent self-supervised learning framework that predicts latent representations of masked regions in high-level feature spaces, we propose Audio-JEPA (Audio Joint-Embedding Predictive Architecture), tailored specifically for audio data. Audio-JEPA uses a simple Vision Transformer backbone to predict latent representations of masked spectrogram patches rather than reconstructing raw audio. We pre-train on unlabeled AudioSet clips (10s, 32kHz) with random patch masking on mel-spectrograms. We evaluate on the X-ARES suite covering speech, music, and environmental sound tasks. Although our implementation is a straightforward translation of the original model to audio, the results still show comparable performance to wav2vec 2.0 and data2vec while using less than one-fifth of their training data and with no hyper-parameter tuning. All code and pretrained checkpoints will be released on GitHub.

cross Visual-Conversational Interface for Evidence-Based Explanation of Diabetes Risk Prediction

Authors: Reza Samimi, Aditya Bhattacharya, Lucija Gosak, Gregor Stiglic, Katrien Verbert

Abstract: Healthcare professionals need effective ways to use, understand, and validate AI-driven clinical decision support systems. Existing systems face two key limitations: complex visualizations and a lack of grounding in scientific evidence. We present an integrated decision support system that combines interactive visualizations with a conversational agent to explain diabetes risk assessments. We propose a hybrid prompt handling approach combining fine-tuned language models for analytical queries with general Large Language Models (LLMs) for broader medical questions, a methodology for grounding AI explanations in scientific evidence, and a feature range analysis technique to support deeper understanding of feature contributions. We conducted a mixed-methods study with 30 healthcare professionals and found that the conversational interactions helped healthcare professionals build a clear understanding of model assessments, while the integration of scientific evidence calibrated trust in the system's decisions. Most participants reported that the system supported both patient risk evaluation and recommendation.

cross Modeling Urban Food Insecurity with Google Street View Images

Authors: David Li

Abstract: Food insecurity is a significant social and public health issue that plagues many urban metropolitan areas around the world. Existing approaches to identifying food insecurity rely primarily on qualitative and quantitative survey data, which is difficult to scale. This project seeks to explore the effectiveness of using street-level images in modeling food insecurity at the census tract level. To do so, we propose a two-step process of feature extraction and gated attention for image aggregation. We evaluate the effectiveness of our model by comparing against other model architectures, interpreting our learned weights, and performing a case study. While our model falls slightly short in terms of its predictive power, we believe our approach still has the potential to supplement existing methods of identifying food insecurity for urban planners and policymakers.

cross OBSER: Object-Based Sub-Environment Recognition for Zero-Shot Environmental Inference

Authors: Won-Seok Choi, Dong-Sig Han, Suhyung Choi, Hyeonseo Yang, Byoung-Tak Zhang

Abstract: We present the Object-Based Sub-Environment Recognition (OBSER) framework, a novel Bayesian framework that infers three fundamental relationships between sub-environments and their constituent objects. In the OBSER framework, metric and self-supervised learning models estimate the object distributions of sub-environments on the latent space to compute these measures. Both theoretically and empirically, we validate the proposed framework by introducing the ($\epsilon,\delta$) statistically separable (EDS) function which indicates the alignment of the representation. Our framework reliably performs inference in open-world and photorealistic environments and outperforms scene-based methods in chained retrieval tasks. The OBSER framework enables zero-shot recognition of environments to achieve autonomous environment understanding.

cross SPEAR: Structured Pruning for Spiking Neural Networks via Synaptic Operation Estimation and Reinforcement Learning

Authors: Hui Xie, Yuhe Liu, Shaoqi Yang, Jinyang Guo, Yufei Guo, Yuqing Ma, Jiaxin Chen, Jiaheng Liu, Xianglong Liu

Abstract: While deep spiking neural networks (SNNs) demonstrate superior performance, their deployment on resource-constrained neuromorphic hardware still remains challenging. Network pruning offers a viable solution by reducing both parameters and synaptic operations (SynOps) to facilitate the edge deployment of SNNs, among which search-based pruning methods search for the SNNs structure after pruning. However, existing search-based methods fail to directly use SynOps as the constraint because it will dynamically change in the searching process, resulting in the final searched network violating the expected SynOps target. In this paper, we introduce a novel SNN pruning framework called SPEAR, which leverages reinforcement learning (RL) technique to directly use SynOps as the searching constraint. To avoid the violation of SynOps requirements, we first propose a SynOps prediction mechanism called LRE to accurately predict the final SynOps after search. Observing SynOps cannot be explicitly calculated and added to constrain the action in RL, we propose a novel reward called TAR to stabilize the searching. Extensive experiments show that our SPEAR framework can effectively compress SNN under specific SynOps constraint.

cross A Novel Active Learning Approach to Label One Million Unknown Malware Variants

Authors: Ahmed Bensaoud, Jugal Kalita

Abstract: Active learning for classification seeks to reduce the cost of labeling samples by finding unlabeled examples about which the current model is least certain and sending them to an annotator/expert to label. Bayesian theory can provide a probabilistic view of deep neural network models by asserting a prior distribution over model parameters and estimating the uncertainties by posterior distribution over these parameters. This paper proposes two novel active learning approaches to label one million malware examples belonging to different unknown modern malware families. The first model is Inception-V4+PCA combined with several support vector machine (SVM) algorithms (UTSVM, PSVM, SVM-GSU, TBSVM). The second model is Vision Transformer based Bayesian Neural Networks ViT-BNN. Our proposed ViT-BNN is a state-of-the-art active learning approach that differs from current methods and can apply to any particular task. The experiments demonstrate that the ViT-BNN is more stable and robust in handling uncertainty.

cross Less Data, More Security: Advancing Cybersecurity LLMs Specialization via Resource-Efficient Domain-Adaptive Continuous Pre-training with Minimal Tokens

Authors: Salahuddin Salahuddin, Ahmed Hussain, Jussi L\"opp\"onen, Toni Jutila, Panos Papadimitratos

Abstract: While Large Language Models (LLMs) demonstrate exceptional natural language capabilities, general-purpose models lack specialized domain knowledge for effective cybersecurity analysis. In this work, we investigate Domain-Adaptive Continuous Pretraining (DAP) as a methodology for enhancing cybersecurity understanding in pretrained LLMs while preserving general language capabilities. We systematically adapted three decoder-based architectures -- Llama-3.1-8B, DeepSeek-R1-Distill-Qwen-14B, and Llama-3.3-70B-Instruct -- using a curated 126-million-word cybersecurity corpus from standards, academic literature, and various other sources. Our approach employed constrained training parameters and distributed FSDP training to balance domain specialization with knowledge preservation. Evaluation across three cybersecurity benchmarks, namely, CTI-MCQ, CyberMetric, and SecEval, demonstrates consistent improvements post-adaptation. The Llama-3.3-70B-Ins-DAP model achieved state-of-the-art accuracies of 0.718, 0.933, and 0.864, respectively, outperforming specialized models, including Llama-Primus-Base. Notably, competitive performance was achieved using substantially smaller datasets (118.8 million versus 2.77 billion tokens), demonstrating efficient domain specialization viability. We establish that targeted continuous pretraining enables effective cybersecurity domain adaptation with computational feasibility, providing foundations for specialized AI assistants in threat analysis, vulnerability assessment, and security documentation while challenging prevailing assumptions about data requirements for LLM specialization.

cross PB-LLMs: Privacy- and Bias-aware NLP Models using Named-Entity Recognition

Authors: Gonzalo Mancera, Aythami Morales, Julian Fierrez, Ruben Tolosana, Alejandro Penna, Miguel Lopez-Duran, Francisco Jurado, Alvaro Ortigosa

Abstract: The use of Natural Language Processing (NLP) in high-stakes AI-based applications has increased significantly in recent years, especially since the emergence of Large Language Models (LLMs). However, despite their strong performance, LLMs introduce important legal/ethical concerns, particularly regarding privacy, data protection, and transparency. Due to these concerns, this work explores the use of Named-Entity Recognition (NER) to facilitate the privacy-preserving training (or adaptation) of LLMs. We propose a framework that uses NER technologies to anonymize sensitive information in text data, such as personal identities or geographic locations. An evaluation of the proposed privacy-preserving learning framework was conducted to measure its impact on user privacy and system performance in a particular high-stakes and sensitive setup: AI-based resume scoring for recruitment processes. The study involved two language models (BERT and RoBERTa) and six anonymization algorithms (based on Presidio, FLAIR, BERT, and different versions of GPT) applied to a database of 24,000 candidate profiles. The findings indicate that the proposed privacy preservation techniques effectively maintain system performance while playing a critical role in safeguarding candidate confidentiality, thus promoting trust in the experimented scenario. On top of the proposed privacy-preserving approach, we also experiment applying an existing approach that reduces the gender bias in LLMs, thus finally obtaining our proposed Privacy- and Bias-aware LLMs (PB-LLMs). Note that the proposed PB-LLMs have been evaluated in a particular setup (resume scoring), but are generally applicable to any other LLM-based AI application.

cross Farm-Level, In-Season Crop Identification for India

Authors: Ishan Deshpande, Amandeep Kaur Reehal, Chandan Nath, Renu Singh, Aayush Patel, Aishwarya Jayagopal, Gaurav Singh, Gaurav Aggarwal, Amit Agarwal, Prathmesh Bele, Sridhar Reddy, Tanya Warrier, Kinjal Singh, Ashish Tendulkar, Luis Pazos Outon, Nikita Saxena, Agata Dondzik, Dinesh Tewari, Shruti Garg, Avneet Singh, Harsh Dhand, Vaibhav Rajan, Alok Talekar

Abstract: Accurate, timely, and farm-level crop type information is paramount for national food security, agricultural policy formulation, and economic planning, particularly in agriculturally significant nations like India. While remote sensing and machine learning have become vital tools for crop monitoring, existing approaches often grapple with challenges such as limited geographical scalability, restricted crop type coverage, the complexities of mixed-pixel and heterogeneous landscapes, and crucially, the robust in-season identification essential for proactive decision-making. We present a framework designed to address the critical data gaps for targeted data driven decision making which generates farm-level, in-season, multi-crop identification at national scale (India) using deep learning. Our methodology leverages the strengths of Sentinel-1 and Sentinel-2 satellite imagery, integrated with national-scale farm boundary data. The model successfully identifies 12 major crops (which collectively account for nearly 90% of India's total cultivated area showing an agreement with national crop census 2023-24 of 94% in winter, and 75% in monsoon season). Our approach incorporates an automated season detection algorithm, which estimates crop sowing and harvest periods. This allows for reliable crop identification as early as two months into the growing season and facilitates rigorous in-season performance evaluation. Furthermore, we have engineered a highly scalable inference pipeline, culminating in what is, to our knowledge, the first pan-India, in-season, farm-level crop type data product. The system's effectiveness and scalability are demonstrated through robust validation against national agricultural statistics, showcasing its potential to deliver actionable, data-driven insights for transformative agricultural monitoring and management across India.

cross Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench

Authors: Amirali Sajadi, Kostadin Damevski, Preetha Chatterjee

Abstract: Large Language Models (LLMs) and their agentic frameworks are increasingly adopted to automate software development tasks such as issue resolution and program repair. While prior work has identified security risks in LLM-generated code, most evaluations have focused on synthetic or isolated settings, leaving open questions about the security of these systems in real-world development contexts. In this study, we present the first large-scale security analysis of LLM-generated patches using 20,000+ issues from the SWE-bench dataset. We evaluate patches produced by a standalone LLM (Llama 3.3) and compare them to developer-written patches. We also assess the security of patches generated by three top-performing agentic frameworks (OpenHands, AutoCodeRover, HoneyComb) on a subset of our data. Finally, we analyze a wide range of code, issue, and project-level factors to understand the conditions under which LLMs and agents are most likely to generate insecure code. Our findings reveal that the standalone LLM introduces nearly 9x more new vulnerabilities than developers, with many of these exhibiting unique patterns not found in developers' code. Agentic workflows also generate a significant number of vulnerabilities, particularly when granting LLMs more autonomy, potentially increasing the likelihood of misinterpreting project context or task requirements. We find that vulnerabilities are more likely to occur in LLM patches associated with a higher number of files, more lines of generated code, and GitHub issues that lack specific code snippets or information about the expected code behavior and steps to reproduce. These results suggest that contextual factors play a critical role in the security of the generated code and point toward the need for proactive risk assessment methods that account for both code and issue-level information to complement existing vulnerability detection tools.

cross Iterative Misclassification Error Training (IMET): An Optimized Neural Network Training Technique for Image Classification

Authors: Ruhaan Singh, Sreelekha Guggilam

Abstract: Deep learning models have proven to be effective on medical datasets for accurate diagnostic predictions from images. However, medical datasets often contain noisy, mislabeled, or poorly generalizable images, particularly for edge cases and anomalous outcomes. Additionally, high quality datasets are often small in sample size that can result in overfitting, where models memorize noise rather than learn generalizable patterns. This in particular, could pose serious risks in medical diagnostics where the risk associated with mis-classification can impact human life. Several data-efficient training strategies have emerged to address these constraints. In particular, coreset selection identifies compact subsets of the most representative samples, enabling training that approximates full-dataset performance while reducing computational overhead. On the other hand, curriculum learning relies on gradually increasing training difficulty and accelerating convergence. However, developing a generalizable difficulty ranking mechanism that works across diverse domains, datasets, and models while reducing the computational tasks and remains challenging. In this paper, we introduce Iterative Misclassification Error Training (IMET), a novel framework inspired by curriculum learning and coreset selection. The IMET approach is aimed to identify misclassified samples in order to streamline the training process, while prioritizing the model's attention to edge case senarious and rare outcomes. The paper evaluates IMET's performance on benchmark medical image classification datasets against state-of-the-art ResNet architectures. The results demonstrating IMET's potential for enhancing model robustness and accuracy in medical image analysis are also presented in the paper.

cross Modeling Gene Expression Distributional Shifts for Unseen Genetic Perturbations

Authors: Kalyan Ramakrishnan, Jonathan G. Hedley, Sisi Qu, Puneet K. Dokania, Philip H. S. Torr, Cesar A. Prada-Medina, Julien Fauqueur, Kaspar Martens

Abstract: We train a neural network to predict distributional responses in gene expression following genetic perturbations. This is an essential task in early-stage drug discovery, where such responses can offer insights into gene function and inform target identification. Existing methods only predict changes in the mean expression, overlooking stochasticity inherent in single-cell data. In contrast, we offer a more realistic view of cellular responses by modeling expression distributions. Our model predicts gene-level histograms conditioned on perturbations and outperforms baselines in capturing higher-order statistics, such as variance, skewness, and kurtosis, at a fraction of the training cost. To generalize to unseen perturbations, we incorporate prior knowledge via gene embeddings from large language models (LLMs). While modeling a richer output space, the method remains competitive in predicting mean expression changes. This work offers a practical step towards more expressive and biologically informative models of perturbation effects.

cross Leveraging the Structure of Medical Data for Improved Representation Learning

Authors: Andrea Agostini, Sonia Laguna, Alain Ryser, Samuel Ruiperez-Campillo, Moritz Vandenhirtz, Nicolas Deperrois, Farhad Nooralahzadeh, Michael Krauthammer, Thomas M. Sutter, Julia E. Vogt

Abstract: Building generalizable medical AI systems requires pretraining strategies that are data-efficient and domain-aware. Unlike internet-scale corpora, clinical datasets such as MIMIC-CXR offer limited image counts and scarce annotations, but exhibit rich internal structure through multi-view imaging. We propose a self-supervised framework that leverages the inherent structure of medical datasets. Specifically, we treat paired chest X-rays (i.e., frontal and lateral views) as natural positive pairs, learning to reconstruct each view from sparse patches while aligning their latent embeddings. Our method requires no textual supervision and produces informative representations. Evaluated on MIMIC-CXR, we show strong performance compared to supervised objectives and baselines being trained without leveraging structure. This work provides a lightweight, modality-agnostic blueprint for domain-specific pretraining where data is structured but scarce

cross Topological Signatures vs. Gradient Histograms: A Comparative Study for Medical Image Classification

Authors: Faisal Ahmed, Mohammad Alfrad Nobel Bhuiyan

Abstract: We present the first comparative study of two fundamentally distinct feature extraction techniques: Histogram of Oriented Gradients (HOG) and Topological Data Analysis (TDA), for medical image classification using retinal fundus images. HOG captures local texture and edge patterns through gradient orientation histograms, while TDA, using cubical persistent homology, extracts high-level topological signatures that reflect the global structure of pixel intensities. We evaluate both methods on the large APTOS dataset for two classification tasks: binary detection (normal versus diabetic retinopathy) and five-class diabetic retinopathy severity grading. From each image, we extract 26244 HOG features and 800 TDA features, using them independently to train seven classical machine learning models with 10-fold cross-validation. XGBoost achieved the best performance in both cases: 94.29 percent accuracy (HOG) and 94.18 percent (TDA) on the binary task; 74.41 percent (HOG) and 74.69 percent (TDA) on the multi-class task. Our results show that both methods offer competitive performance but encode different structural aspects of the images. This is the first work to benchmark gradient-based and topological features on retinal imagery. The techniques are interpretable, applicable to other medical imaging domains, and suitable for integration into deep learning pipelines.

cross Statistical Quality and Reproducibility of Pseudorandom Number Generators in Machine Learning technologies

Authors: Benjamin A. Antunes (LIRMM | DALI)

Abstract: Machine learning (ML) frameworks rely heavily on pseudorandom number generators (PRNGs) for tasks such as data shuffling, weight initialization, dropout, and optimization. Yet, the statistical quality and reproducibility of these generators-particularly when integrated into frameworks like PyTorch, TensorFlow, and NumPy-are underexplored. In this paper, we compare the statistical quality of PRNGs used in ML frameworks (Mersenne Twister, PCG, and Philox) against their original C implementations. Using the rigorous TestU01 BigCrush test suite, we evaluate 896 independent random streams for each generator. Our findings challenge claims of statistical robustness, revealing that even generators labeled ''crush-resistant'' (e.g., PCG, Philox) may fail certain statistical tests. Surprisingly, we can observe some differences in failure profiles between the native and framework-integrated versions of the same algorithm, highlighting some implementation differences that may exist.

cross PDFMathTranslate: Scientific Document Translation Preserving Layouts

Authors: Rongxin Ouyang, Chang Chu, Zhikuang Xin, Xiangyao Ma

Abstract: Language barriers in scientific documents hinder the diffusion and development of science and technologies. However, prior efforts in translating such documents largely overlooked the information in layouts. To bridge the gap, we introduce PDFMathTranslate, the world's first open-source software for translating scientific documents while preserving layouts. Leveraging the most recent advances in large language models and precise layout detection, we contribute to the community with key improvements in precision, flexibility, and efficiency. The work has been open-sourced at https://github.com/byaidu/pdfmathtranslate with more than 222k downloads.

URLs: https://github.com/byaidu/pdfmathtranslate

cross Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model!

Authors: Do-hyeon Yoon, Minsoo Chun, Thomas Allen, Hans M\"uller, Min Wang, Rajesh Sharma

Abstract: Large language models (LLMs) face significant copyright and intellectual property challenges as the cost of training increases and model reuse becomes prevalent. While watermarking techniques have been proposed to protect model ownership, they may not be robust to continue training and development, posing serious threats to model attribution and copyright protection. This work introduces a simple yet effective approach for robust LLM fingerprinting based on intrinsic model characteristics. We discover that the standard deviation distributions of attention parameter matrices across different layers exhibit distinctive patterns that remain stable even after extensive continued training. These parameter distribution signatures serve as robust fingerprints that can reliably identify model lineage and detect potential copyright infringement. Our experimental validation across multiple model families demonstrates the effectiveness of our method for model authentication. Notably, our investigation uncovers evidence that a recently Pangu Pro MoE model released by Huawei is derived from Qwen-2.5 14B model through upcycling techniques rather than training from scratch, highlighting potential cases of model plagiarism, copyright violation, and information fabrication. These findings underscore the critical importance of developing robust fingerprinting methods for protecting intellectual property in large-scale model development and emphasize that deliberate continued training alone is insufficient to completely obscure model origins.

cross Beyond Overcorrection: Evaluating Diversity in T2I Models with DIVBENCH

Authors: Felix Friedrich, Thiemo Ganesha Welsch, Patrick Schramowski, Kristian Kersting

Abstract: Current diversification strategies for text-to-image (T2I) models often ignore contextual appropriateness, leading to over-diversification where demographic attributes are modified even when explicitly specified in prompts. This paper introduces DIVBENCH, a benchmark and evaluation framework for measuring both under- and over-diversification in T2I generation. Through systematic evaluation of state-of-the-art T2I models, we find that while most models exhibit limited diversity, many diversification approaches overcorrect by inappropriately altering contextually-specified attributes. We demonstrate that context-aware methods, particularly LLM-guided FairDiffusion and prompt rewriting, can already effectively address under-diversity while avoiding over-diversification, achieving a better balance between representation and semantic fidelity.

cross Look-Back: Implicit Visual Re-focusing in MLLM Reasoning

Authors: Shuo Yang, Yuwei Niu, Yuyang Liu, Yang Ye, Bin Lin, Li Yuan

Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable progress in multimodal reasoning. However, they often excessively rely on textual information during the later stages of inference, neglecting the crucial integration of visual input. Current methods typically address this by explicitly injecting visual information to guide the reasoning process. In this work, through an analysis of MLLM attention patterns, we made an intriguing observation: with appropriate guidance, MLLMs can spontaneously re-focus their attention on visual inputs during the later stages of reasoning, even without explicit visual information injection. This spontaneous shift in focus suggests that MLLMs are intrinsically capable of performing visual fusion reasoning. Building on this insight, we introduce Look-Back, an implicit approach designed to guide MLLMs to ``look back" at visual information in a self-directed manner during reasoning. Look-Back empowers the model to autonomously determine when, where, and how to re-focus on visual inputs, eliminating the need for explicit model-structure constraints or additional input. We demonstrate that Look-Back significantly enhances the model's reasoning and perception capabilities, as evidenced by extensive empirical evaluations on multiple multimodal benchmarks.

cross Cautious Next Token Prediction

Authors: Yizhou Wang, Lingzhi Zhang, Yue Bai, Mang Tik Chiu, Zhengmian Hu, Mingyuan Zhang, Qihua Dong, Yu Yin, Sohrab Amirghodsi, Yun Fu

Abstract: Next token prediction paradigm has been prevailing for autoregressive models in the era of LLMs. The current default sampling choice for popular LLMs is temperature scaling together with nucleus sampling to balance diversity and coherence. Nevertheless, such approach leads to inferior performance in various NLP tasks when the model is not certain about testing questions. To this end, we propose a brand new training-free decoding strategy, dubbed as Cautious Next Token Prediction (CNTP). In the decoding process, if the model has comparatively high prediction entropy at a certain step, we sample multiple trials starting from the step independently and stop when encountering any punctuation. Then we select the trial with the lowest perplexity score viewed as the most probable and reliable trial path given the model's capacity. The trial number is negatively correlated with the prediction confidence, i.e., the less confident the model is, the more trials it should sample. This is consistent with human beings' behaviour: when feeling uncertain or unconfident, one tends to think more creatively, exploring multiple thinking paths, to cautiously select the path one feels most confident about. Extensive experiments on both LLMs and MLLMs show that our proposed CNTP approach outperforms existing standard decoding strategies consistently by a clear margin. Moreover, the integration of CNTP with self consistency can further improve over vanilla self consistency. We believe our proposed CNTP has the potential to become one of the default choices for LLM decoding. Code is available at https://github.com/wyzjack/CNTP.

URLs: https://github.com/wyzjack/CNTP.

cross Enhancing Swarms Durability to Threats via Graph Signal Processing and GNN-based Generative Modeling

Authors: Jonathan Karin, Zoe Piran, Mor Nitzan

Abstract: Swarms, such as schools of fish or drone formations, are prevalent in both natural and engineered systems. While previous works have focused on the social interactions within swarms, the role of external perturbations--such as environmental changes, predators, or communication breakdowns--in affecting swarm stability is not fully understood. Our study addresses this gap by modeling swarms as graphs and applying graph signal processing techniques to analyze perturbations as signals on these graphs. By examining predation, we uncover a "detectability-durability trade-off", demonstrating a tension between a swarm's ability to evade detection and its resilience to predation, once detected. We provide theoretical and empirical evidence for this trade-off, explicitly tying it to properties of the swarm's spatial configuration. Toward task-specific optimized swarms, we introduce SwaGen, a graph neural network-based generative model. We apply SwaGen to resilient swarm generation by defining a task-specific loss function, optimizing the contradicting trade-off terms simultaneously.With this, SwaGen reveals novel spatial configurations, optimizing the trade-off at both ends. Applying the model can guide the design of robust artificial swarms and deepen our understanding of natural swarm dynamics.

cross Detection of Rail Line Track and Human Beings Near the Track to Avoid Accidents

Authors: Mehrab Hosain, Rajiv Kapoor

Abstract: This paper presents an approach for rail line detection and the identification of human beings in proximity to the track, utilizing the YOLOv5 deep learning model to mitigate potential accidents. The technique incorporates real-time video data to identify railway tracks with impressive accuracy and recognizes nearby moving objects within a one-meter range, specifically targeting the identification of humans. This system aims to enhance safety measures in railway environments by providing real-time alerts for any detected human presence close to the track. The integration of a functionality to identify objects at a longer distance further fortifies the preventative capabilities of the system. With a precise focus on real-time object detection, this method is poised to deliver significant contributions to the existing technologies in railway safety. The effectiveness of the proposed method is demonstrated through a comprehensive evaluation, yielding a remarkable improvement in accuracy over existing methods. These results underscore the potential of this approach to revolutionize safety measures in railway environments, providing a substantial contribution to accident prevention strategies.

cross Outcome prediction and individualized treatment effect estimation in patients with large vessel occlusion stroke

Authors: Lisa Herzog, Pascal B\"uhler, Ezequiel de la Rosa, Beate Sick, Susanne Wegener

Abstract: Mechanical thrombectomy has become the standard of care in patients with stroke due to large vessel occlusion (LVO). However, only 50% of successfully treated patients show a favorable outcome. We developed and evaluated interpretable deep learning models to predict functional outcomes in terms of the modified Rankin Scale score alongside individualized treatment effects (ITEs) using data of 449 LVO stroke patients from a randomized clinical trial. Besides clinical variables, we considered non-contrast CT (NCCT) and angiography (CTA) scans which were integrated using novel foundation models to make use of advanced imaging information. Clinical variables had a good predictive power for binary functional outcome prediction (AUC of 0.719 [0.666, 0.774]) which could slightly be improved when adding CTA imaging (AUC of 0.737 [0.687, 0.795]). Adding NCCT scans or a combination of NCCT and CTA scans to clinical features yielded no improvement. The most important clinical predictor for functional outcome was pre-stroke disability. While estimated ITEs were well calibrated to the average treatment effect, discriminatory ability was limited indicated by a C-for-Benefit statistic of around 0.55 in all models. In summary, the models allowed us to jointly integrate CT imaging and clinical features while achieving state-of-the-art prediction performance and ITE estimates. Yet, further research is needed to particularly improve ITE estimation.

cross Multiple data-driven missing imputation

Authors: Sergii Kavun

Abstract: This paper introduces KZImputer, a novel adaptive imputation method for univariate time series designed for short to medium-sized missed points (gaps) (1-5 points and beyond) with tailored strategies for segments at the start, middle, or end of the series. KZImputer employs a hybrid strategy to handle various missing data scenarios. Its core mechanism differentiates between gaps at the beginning, middle, or end of the series, applying tailored techniques at each position to optimize imputation accuracy. The method leverages linear interpolation and localized statistical measures, adapting to the characteristics of the surrounding data and the gap size. The performance of KZImputer has been systematically evaluated against established imputation techniques, demonstrating its potential to enhance data quality for subsequent time series analysis. This paper describes the KZImputer methodology in detail and discusses its effectiveness in improving the integrity of time series data. Empirical analysis demonstrates that KZImputer achieves particularly strong performance for datasets with high missingness rates (around 50% or more), maintaining stable and competitive results across statistical and signal-reconstruction metrics. The method proves especially effective in high-sparsity regimes, where traditional approaches typically experience accuracy degradation.

cross Large Language Models for Automating Clinical Data Standardization: HL7 FHIR Use Case

Authors: Alvaro Riquelme, Pedro Costa, Catalina Martinez

Abstract: For years, semantic interoperability standards have sought to streamline the exchange of clinical data, yet their deployment remains time-consuming, resource-intensive, and technically challenging. To address this, we introduce a semi-automated approach that leverages large language models specifically GPT-4o and Llama 3.2 405b to convert structured clinical datasets into HL7 FHIR format while assessing accuracy, reliability, and security. Applying our method to the MIMIC-IV database, we combined embedding techniques, clustering algorithms, and semantic retrieval to craft prompts that guide the models in mapping each tabular field to its corresponding FHIR resource. In an initial benchmark, resource identification achieved a perfect F1-score, with GPT-4o outperforming Llama 3.2 thanks to the inclusion of FHIR resource schemas within the prompt. Under real-world conditions, accuracy dipped slightly to 94 %, but refinements to the prompting strategy restored robust mappings. Error analysis revealed occasional hallucinations of non-existent attributes and mismatches in granularity, which more detailed prompts can mitigate. Overall, our study demonstrates the feasibility of context-aware, LLM-driven transformation of clinical data into HL7 FHIR, laying the groundwork for semi-automated interoperability workflows. Future work will focus on fine-tuning models with specialized medical corpora, extending support to additional standards such as HL7 CDA and OMOP, and developing an interactive interface to enable expert validation and iterative refinement.

cross Federated Learning for ICD Classification with Lightweight Models and Pretrained Embeddings

Authors: Binbin Xu, G\'erard Dray

Abstract: This study investigates the feasibility and performance of federated learning (FL) for multi-label ICD code classification using clinical notes from the MIMIC-IV dataset. Unlike previous approaches that rely on centralized training or fine-tuned large language models, we propose a lightweight and scalable pipeline combining frozen text embeddings with simple multilayer perceptron (MLP) classifiers. This design offers a privacy-preserving and deployment-efficient alternative for clinical NLP applications, particularly suited to distributed healthcare settings. Extensive experiments across both centralized and federated configurations were conducted, testing six publicly available embedding models from Massive Text Embedding Benchmark leaderboard and three MLP classifier architectures under two medical coding (ICD-9 and ICD-10). Additionally, ablation studies over ten random stratified splits assess performance stability. Results show that embedding quality substantially outweighs classifier complexity in determining predictive performance, and that federated learning can closely match centralized results in idealized conditions. While the models are orders of magnitude smaller than state-of-the-art architectures and achieved competitive micro and macro F1 scores, limitations remain including the lack of end-to-end training and the simplified FL assumptions. Nevertheless, this work demonstrates a viable way toward scalable, privacy-conscious medical coding systems and offers a step toward for future research into federated, domain-adaptive clinical AI.

cross Towards a Psychoanalytic Perspective on VLM Behaviour: A First-step Interpretation with Intriguing Observations

Authors: Xiangrui Liu, Man Luo, Agneet Chatterjee, Hua Wei, Yezhou Yang

Abstract: Hallucination is a long-standing problem that has been actively investigated in Vision-Language Models (VLMs). Existing research commonly attributes hallucinations to technical limitations or sycophancy bias, where the latter means the models tend to generate incorrect answers to align with user expectations. However, these explanations primarily focus on technical or externally driven factors, may have neglected the possibility that hallucination behaviours might mirror cognitive biases observed in human psychology. In this work, we introduce a psychological taxonomy, categorizing VLMs' hallucination behaviours, including sycophancy, logical inconsistency, and a newly identified VLMs behaviour: authority bias. To systematically analyze these behaviours, we design AIpsych, a scalable benchmark that reveals psychological tendencies in model response patterns. Leveraging this benchmark, we investigate how variations in model architecture and parameter size influence model behaviour when responding to strategically manipulated questions. Our experiments reveal that as model size increases, VLMs exhibit stronger sycophantic tendencies but reduced authority bias, suggesting increasing competence but a potential erosion of response integrity. A human subject study further validates our hypotheses and highlights key behavioural differences between VLMs and human respondents. This work suggests a new perspective for understanding hallucination in VLMs and highlights the importance of integrating psychological principles into model evaluation.The benchmark is available at https://github.com/lxrswdd/AIpsych.

URLs: https://github.com/lxrswdd/AIpsych.

cross Neural Substitute Solver for Efficient Edge Inference of Power Electronic Hybrid Dynamics

Authors: Jialin Zheng, Haoyu Wang, Yangbin Zeng, Han Xu, Di Mou, Hong Li, Sergio Vazquez, Leopoldo G. Franquelo

Abstract: Advancing the dynamics inference of power electronic systems (PES) to the real-time edge-side holds transform-ative potential for testing, control, and monitoring. How-ever, efficiently inferring the inherent hybrid continu-ous-discrete dynamics on resource-constrained edge hardware remains a significant challenge. This letter pro-poses a neural substitute solver (NSS) approach, which is a neural-network-based framework aimed at rapid accurate inference with significantly reduced computational costs. Specifically, NSS leverages lightweight neural networks to substitute time-consuming matrix operation and high-order numerical integration steps in traditional solvers, which transforms sequential bottlenecks into highly parallel operation suitable for edge hardware. Experimental vali-dation on a multi-stage DC-DC converter demonstrates that NSS achieves 23x speedup and 60% hardware resource reduction compared to traditional solvers, paving the way for deploying edge inference of high-fidelity PES dynamics.

cross DeepGesture: A conversational gesture synthesis system based on emotions and semantics

Authors: Thanh Hoang-Minh

Abstract: Along with the explosion of large language models, improvements in speech synthesis, advancements in hardware, and the evolution of computer graphics, the current bottleneck in creating digital humans lies in generating character movements that correspond naturally to text or speech inputs. In this work, we present DeepGesture, a diffusion-based gesture synthesis framework for generating expressive co-speech gestures conditioned on multimodal signals-text, speech, emotion, and seed motion. Built upon the DiffuseStyleGesture model, DeepGesture introduces novel architectural enhancements that improve semantic alignment and emotional expressiveness in generated gestures. Specifically, we integrate fast text transcriptions as semantic conditioning and implement emotion-guided classifier-free diffusion to support controllable gesture generation across affective states. A lightweight Transformer backbone combines full self-attention and cross-local attention for effective feature fusion of heterogeneous modalities. To visualize results, we implement a full rendering pipeline in Unity based on BVH output from the model. Evaluation on the ZeroEGGS dataset shows that DeepGesture produces gestures with improved human-likeness and contextual appropriateness, outperforming baselines on Mean Opinion Score and Frechet Gesture Distance metrics. Our system supports interpolation between emotional states and demonstrates generalization to out-of-distribution speech, including synthetic voices-marking a step forward toward fully multimodal, emotionally aware digital humans.

cross Last-Iterate Convergence of No-Regret Learning for Equilibria in Bargaining Games

Authors: Serafina Kamp, Reese Liebman, Benjamin Fish

Abstract: Bargaining games, where agents attempt to agree on how to split utility, are an important class of games used to study economic behavior, which motivates a study of online learning algorithms in these games. In this work, we tackle when no-regret learning algorithms converge to Nash equilibria in bargaining games. Recent results have shown that online algorithms related to Follow the Regularized Leader (FTRL) converge to Nash equilibria (NE) in the last iterate in a wide variety of games, including zero-sum games. However, bargaining games do not have the properties used previously to established convergence guarantees, even in the simplest case of the ultimatum game, which features a single take-it-or-leave-it offer. Nonetheless, we establish that FTRL (without the modifications necessary for zero-sum games) achieves last-iterate convergence to an approximate NE in the ultimatum game along with a bound on convergence time under mild assumptions. Further, we provide experimental results to demonstrate that convergence to NE, including NE with asymmetric payoffs, occurs under a broad range of initial conditions, both in the ultimatum game and in bargaining games with multiple rounds. This work demonstrates how complex economic behavior (e.g. learning to use threats and the existence of many possible equilibrium outcomes) can result from using a simple learning algorithm, and that FTRL can converge to equilibria in a more diverse set of games than previously known.

cross Expert-level validation of AI-generated medical text with scalable language models

Authors: Asad Aali, Vasiliki Bikia, Maya Varma, Nicole Chiou, Sophie Ostmeier, Arnav Singhvi, Magdalini Paschali, Ashwin Kumar, Andrew Johnston, Karimar Amador-Martinez, Eduardo Juan Perez Guerrero, Paola Naovi Cruz Rivera, Sergios Gatidis, Christian Bluethgen, Eduardo Pontes Reis, Eddy D. Zandee van Rilland, Poonam Laxmappa Hosamani, Kevin R Keet, Minjoung Go, Evelyn Ling, David B. Larson, Curtis Langlotz, Roxana Daneshjou, Jason Hom, Sanmi Koyejo, Emily Alsentzer, Akshay S. Chaudhari

Abstract: With the growing use of language models (LMs) in clinical environments, there is an immediate need to evaluate the accuracy and safety of LM-generated medical text. Currently, such evaluation relies solely on manual physician review. However, detecting errors in LM-generated text is challenging because 1) manual review is costly and 2) expert-composed reference outputs are often unavailable in real-world settings. While the "LM-as-judge" paradigm (a LM evaluating another LM) offers scalable evaluation, even frontier LMs can miss subtle but clinically significant errors. To address these challenges, we propose MedVAL, a self-supervised framework that leverages synthetic data to train evaluator LMs to assess whether LM-generated medical outputs are factually consistent with inputs, without requiring physician labels or reference outputs. To evaluate LM performance, we introduce MedVAL-Bench, a dataset containing 840 outputs annotated by physicians, following a physician-defined taxonomy of risk levels and error categories. Across 6 diverse medical tasks and 10 state-of-the-art LMs spanning open-source, proprietary, and medically adapted models, MedVAL fine-tuning significantly improves (p < 0.001) alignment with physicians on both seen and unseen tasks, increasing average F1 scores from 66% to 83%, with per-sample safety classification scores up to 86%. MedVAL improves the performance of even the best-performing proprietary LM (GPT-4o) by 8%. To support a scalable, risk-aware pathway towards clinical integration, we open-source the 1) codebase ( https://github.com/StanfordMIMI/MedVAL ), 2) MedVAL-Bench ( https://huggingface.co/datasets/stanfordmimi/MedVAL-Bench ), and 3) MedVAL-4B ( https://huggingface.co/stanfordmimi/MedVAL-4B ), the best-performing open-source LM. Our research provides the first evidence of LMs approaching expert-level validation ability for medical text.

URLs: https://github.com/StanfordMIMI/MedVAL, https://huggingface.co/datasets/stanfordmimi/MedVAL-Bench, https://huggingface.co/stanfordmimi/MedVAL-4B

cross Adversarial Manipulation of Reasoning Models using Internal Representations

Authors: Kureha Yamaguchi, Benjamin Etheridge, Andy Arditi

Abstract: Reasoning models generate chain-of-thought (CoT) tokens before their final output, but how this affects their vulnerability to jailbreak attacks remains unclear. While traditional language models make refusal decisions at the prompt-response boundary, we find evidence that DeepSeek-R1-Distill-Llama-8B makes these decisions within its CoT generation. We identify a linear direction in activation space during CoT token generation that predicts whether the model will refuse or comply -- termed the "caution" direction because it corresponds to cautious reasoning patterns in the generated text. Ablating this direction from model activations increases harmful compliance, effectively jailbreaking the model. We additionally show that intervening only on CoT token activations suffices to control final outputs, and that incorporating this direction into prompt-based attacks improves success rates. Our findings suggest that the chain-of-thought itself is a promising new target for adversarial manipulation in reasoning models. Code available at https://github.com/ky295/reasoning-manipulation

URLs: https://github.com/ky295/reasoning-manipulation

cross Beyond SEO: A Transformer-Based Approach for Reinventing Web Content Optimisation

Authors: Florian L\"uttgenau, Imar Colic, Gervasio Ramirez

Abstract: The rise of generative AI search engines is disrupting traditional SEO, with Gartner predicting 25% reduction in conventional search usage by 2026. This necessitates new approaches for web content visibility in AI-driven search environments. We present a domain-specific fine-tuning approach for Generative Engine Optimization (GEO) that transforms web content to improve discoverability in large language model outputs. Our method fine-tunes a BART-base transformer on synthetically generated training data comprising 1,905 cleaned travel website content pairs. Each pair consists of raw website text and its GEO-optimized counterpart incorporating credible citations, statistical evidence, and improved linguistic fluency. We evaluate using intrinsic metrics (ROUGE-L, BLEU) and extrinsic visibility assessments through controlled experiments with Llama-3.3-70B. The fine-tuned model achieves significant improvements over baseline BART: ROUGE-L scores of 0.249 (vs. 0.226) and BLEU scores of 0.200 (vs. 0.173). Most importantly, optimized content demonstrates substantial visibility gains in generative search responses with 15.63% improvement in absolute word count and 30.96% improvement in position-adjusted word count metrics. This work provides the first empirical demonstration that targeted transformer fine-tuning can effectively enhance web content visibility in generative search engines with modest computational resources. Our results suggest GEO represents a tractable approach for content optimization in the AI-driven search landscape, offering concrete evidence that small-scale, domain-focused fine-tuning yields meaningful improvements in content discoverability.

cross Deep Learning Atmospheric Models Reliably Simulate Out-of-Sample Land Heat and Cold Wave Frequencies

Authors: Zilu Meng, Gregory J. Hakim, Wenchang Yang, Gabriel A. Vecchi

Abstract: Deep learning (DL)-based general circulation models (GCMs) are emerging as fast simulators, yet their ability to replicate extreme events outside their training range remains unknown. Here, we evaluate two such models -- the hybrid Neural General Circulation Model (NGCM) and purely data-driven Deep Learning Earth System Model (DL\textit{ESy}M) -- against a conventional high-resolution land-atmosphere model (HiRAM) in simulating land heatwaves and coldwaves. All models are forced with observed sea surface temperatures and sea ice over 1900-2020, focusing on the out-of-sample early-20th-century period (1900-1960). Both DL models generalize successfully to unseen climate conditions, broadly reproducing the frequency and spatial patterns of heatwave and cold wave events during 1900-1960 with skill comparable to HiRAM. An exception is over portions of North Asia and North America, where all models perform poorly during 1940-1960. Due to excessive temperature autocorrelation, DL\textit{ESy}M tends to overestimate heatwave and cold wave frequencies, whereas the physics-DL hybrid NGCM exhibits persistence more similar to HiRAM.

cross First Contact: Data-driven Friction-Stir Process Control

Authors: James Koch, Ethan King, WoongJo Choi, Megan Ebers, David Garcia, Ken Ross, Keerti Kappagantula

Abstract: This study validates the use of Neural Lumped Parameter Differential Equations for open-loop setpoint control of the plunge sequence in Friction Stir Processing (FSP). The approach integrates a data-driven framework with classical heat transfer techniques to predict tool temperatures, informing control strategies. By utilizing a trained Neural Lumped Parameter Differential Equation model, we translate theoretical predictions into practical set-point control, facilitating rapid attainment of desired tool temperatures and ensuring consistent thermomechanical states during FSP. This study covers the design, implementation, and experimental validation of our control approach, establishing a foundation for efficient, adaptive FSP operations.

cross Transparent Machine Learning: Training and Refining an Explainable Boosting Machine to Identify Overshooting Tops in Satellite Imagery

Authors: Nathan Mitchell, Lander Ver Hoef, Imme Ebert-Uphoff, Kristina Moen, Kyle Hilburn, Yoonjin Lee, Emily J. King

Abstract: An Explainable Boosting Machine (EBM) is an interpretable machine learning (ML) algorithm that has benefits in high risk applications but has not yet found much use in atmospheric science. The overall goal of this work is twofold: (1) explore the use of EBMs, in combination with feature engineering, to obtain interpretable, physics-based machine learning algorithms for meteorological applications; (2) illustrate these methods for the detection of overshooting top (OTs) in satellite imagery. Specifically, we seek to simplify the process of OT detection by first using mathematical methods to extract key features, such as cloud texture using Gray-Level Co-occurrence Matrices, followed by applying an EBM. Our EBM focuses on the classification task of predicting OT regions, utilizing Channel 2 (visible imagery) and Channel 13 (infrared imagery) of the Advanced Baseline Imager sensor of the Geostationary Operational Environmental Satellite 16. Multi-Radar/Multi-Sensor system convection flags are used as labels to train the EBM model. Note, however, that detecting convection, while related, is different from detecting OTs. Once trained, the EBM was examined and minimally altered to more closely match strategies used by domain scientists to identify OTs. The result of our efforts is a fully interpretable ML algorithm that was developed in a human-machine collaboration. While the final model does not reach the accuracy of more complex approaches, it performs well and represents a significant step toward building fully interpretable ML algorithms for this and other meteorological applications.

cross Discovering Algorithms with Computational Language Processing

Authors: Theo Bourdais, Abeynaya Gnanasekaran, Houman Owhadi, Tuhin Sahai

Abstract: Algorithms are the engine for reproducible problem-solving. We present a framework automating algorithm discovery by conceptualizing them as sequences of operations, represented as tokens. These computational tokens are chained using a grammar, enabling the formation of increasingly sophisticated procedures. Our ensemble Monte Carlo tree search (MCTS) guided by reinforcement learning (RL) explores token chaining and drives the creation of new tokens. This methodology rediscovers, improves, and generates new algorithms that substantially outperform existing methods for strongly NP-hard combinatorial optimization problems and foundational quantum computing approaches such as Grover's and Quantum Approximate Optimization Algorithm. Operating at the computational rather than code-generation level, our framework produces algorithms that can be tailored specifically to problem instances, not merely classes.

cross Quantifying Cross-Attention Interaction in Transformers for Interpreting TCR-pMHC Binding

Authors: Jiarui Li, Zixiang Yin, Haley Smith, Zhengming Ding, Samuel J. Landry, Ramgopal R. Mettu

Abstract: CD8+ "killer" T cells and CD4+ "helper" T cells play a central role in the adaptive immune system by recognizing antigens presented by Major Histocompatibility Complex (pMHC) molecules via T Cell Receptors (TCRs). Modeling binding between T cells and the pMHC complex is fundamental to understanding basic mechanisms of human immune response as well as in developing therapies. While transformer-based models such as TULIP have achieved impressive performance in this domain, their black-box nature precludes interpretability and thus limits a deeper mechanistic understanding of T cell response. Most existing post-hoc explainable AI (XAI) methods are confined to encoder-only, co-attention, or model-specific architectures and cannot handle encoder-decoder transformers used in TCR-pMHC modeling. To address this gap, we propose Quantifying Cross-Attention Interaction (QCAI), a new post-hoc method designed to interpret the cross-attention mechanisms in transformer decoders. Quantitative evaluation is a challenge for XAI methods; we have compiled TCR-XAI, a benchmark consisting of 274 experimentally determined TCR-pMHC structures to serve as ground truth for binding. Using these structures we compute physical distances between relevant amino acid residues in the TCR-pMHC interaction region and evaluate how well our method and others estimate the importance of residues in this region across the dataset. We show that QCAI achieves state-of-the-art performance on both interpretability and prediction accuracy under the TCR-XAI benchmark.

cross Weak Form Scientific Machine Learning: Test Function Construction for System Identification

Authors: April Tran, David Bortz

Abstract: Weak form Scientific Machine Learning (WSciML) is a recently developed framework for data-driven modeling and scientific discovery. It leverages the weak form of equation error residuals to provide enhanced noise robustness in system identification via convolving model equations with test functions, reformulating the problem to avoid direct differentiation of data. The performance, however, relies on wisely choosing a set of compactly supported test functions. In this work, we mathematically motivate a novel data-driven method for constructing Single-scale-Local reference functions for creating the set of test functions. Our approach numerically approximates the integration error introduced by the quadrature and identifies the support size for which the error is minimal, without requiring access to the model parameter values. Through numerical experiments across various models, noise levels, and temporal resolutions, we demonstrate that the selected supports consistently align with regions of minimal parameter estimation error. We also compare the proposed method against the strategy for constructing Multi-scale-Global (and orthogonal) test functions introduced in our prior work, demonstrating the improved computational efficiency.

cross LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency Prediction

Authors: Asal Mehradfar, Mohammad Shahab Sepehri, Jose Miguel Hernandez-Lobato, Glen S. Kwon, Mahdi Soltanolkotabi, Salman Avestimehr, Morteza Rasoulianboroujeni

Abstract: The discovery of new ionizable lipids for efficient lipid nanoparticle (LNP)-mediated RNA delivery remains a critical bottleneck for RNA-based therapeutics development. Recent advances have highlighted the potential of machine learning (ML) to predict transfection efficiency from molecular structure, enabling high-throughput virtual screening and accelerating lead identification. However, existing approaches are hindered by inadequate data quality, ineffective feature representations, low predictive accuracy, and poor generalizability. Here, we present LANTERN (Lipid nANoparticle Transfection Efficiency pRedictioN), a robust ML framework for predicting transfection efficiency based on ionizable lipid representation. We benchmarked a diverse set of ML models against AGILE, a previously published model developed for transfection prediction. Our results show that combining simpler models with chemically informative features, particularly count-based Morgan fingerprints, outperforms more complex models that rely on internally learned embeddings, such as AGILE. We also show that a multi-layer perceptron trained on a combination of Morgan fingerprints and Expert descriptors achieved the highest performance ($\text{R}^2$ = 0.8161, r = 0.9053), significantly exceeding AGILE ($\text{R}^2$ = 0.2655, r = 0.5488). We show that the models in LANTERN consistently have strong performance across multiple evaluation metrics. Thus, LANTERN offers a robust benchmarking framework for LNP transfection prediction and serves as a valuable tool for accelerating lipid-based RNA delivery systems design.

cross Symbiosis: Multi-Adapter Inference and Fine-Tuning

Authors: Saransh Gupta, Umesh Deshpande, Travis Janssen, Swami Sundararaman

Abstract: Parameter-efficient fine-tuning (PEFT) allows model builders to capture the task specific parameters into adapters, which are a fraction of the size of the original base model. Popularity of PEFT technique for fine-tuning has led to creation of a large number of adapters for popular Large Language Models (LLMs). However, existing frameworks fall short in supporting inference or fine-tuning with multiple adapters in the following ways. 1) For fine-tuning, each job needs to deploy its dedicated base model instance, which results in excessive GPU memory consumption and poor GPU utilization. 2) While popular inference platforms can serve multiple PEFT adapters, they do not allow independent resource management or mixing of different PEFT methods. 3) They cannot share resources (such as base model instance) between inference and fine-tuning jobs. 4) They do not provide privacy to users who may not wish to expose their fine-tuned parameters to service providers. In Symbiosis, we address the above problems by enabling as-a-service deployment of base model. The base model layers can be shared across multiple inference or fine-tuning processes. Our split-execution technique decouples the execution of client-specific adapters and layers from the frozen base model layers offering them flexibility to manage their resources, to select their fine-tuning method, to achieve their performance goals. Our approach is transparent to models and works out-of-the-box for most models in the transformers library. Our evaluation on Llama2-13B shows the compared to baseline, Symbiosis can fine-tune 4X more adapters on the same set of GPUs in the same amount of time.

cross SI-Agent: An Agentic Framework for Feedback-Driven Generation and Tuning of Human-Readable System Instructions for Large Language Models

Authors: Jeshwanth Challagundla

Abstract: System Instructions (SIs), or system prompts, are pivotal for guiding Large Language Models (LLMs) but manual crafting is resource-intensive and often suboptimal. Existing automated methods frequently generate non-human-readable "soft prompts," sacrificing interpretability. This paper introduces SI-Agent, a novel agentic framework designed to automatically generate and iteratively refine human-readable SIs through a feedback-driven loop. SI-Agent employs three collaborating agents: an Instructor Agent, an Instruction Follower Agent (target LLM), and a Feedback/Reward Agent evaluating task performance and optionally SI readability. The framework utilizes iterative cycles where feedback guides the Instructor's refinement strategy (e.g., LLM-based editing, evolutionary algorithms). We detail the framework's architecture, agent roles, the iterative refinement process, and contrast it with existing methods. We present experimental results validating SI-Agent's effectiveness, focusing on metrics for task performance, SI readability, and efficiency. Our findings indicate that SI-Agent generates effective, readable SIs, offering a favorable trade-off between performance and interpretability compared to baselines. Potential implications include democratizing LLM customization and enhancing model transparency. Challenges related to computational cost and feedback reliability are acknowledged.

cross Subject Invariant Contrastive Learning for Human Activity Recognition

Authors: Yavuz Yarici, Kiran Kokilepersaud, Mohit Prabhushankar, Ghassan AlRegib

Abstract: The high cost of annotating data makes self-supervised approaches, such as contrastive learning methods, appealing for Human Activity Recognition (HAR). Effective contrastive learning relies on selecting informative positive and negative samples. However, HAR sensor signals are subject to significant domain shifts caused by subject variability. These domain shifts hinder model generalization to unseen subjects by embedding subject-specific variations rather than activity-specific features. As a result, human activity recognition models trained with contrastive learning often struggle to generalize to new subjects. We introduce Subject-Invariant Contrastive Learning (SICL), a simple yet effective loss function to improve generalization in human activity recognition. SICL re-weights negative pairs drawn from the same subject to suppress subject-specific cues and emphasize activity-specific information. We evaluate our loss function on three public benchmarks: UTD-MHAD, MMAct, and DARai. We show that SICL improves performance by up to 11% over traditional contrastive learning methods. Additionally, we demonstrate the adaptability of our loss function across various settings, including multiple self-supervised methods, multimodal scenarios, and supervised learning frameworks.

cross LILI clustering algorithm: Limit Inferior Leaf Interval Integrated into Causal Forest for Causal Interference

Authors: Yiran Dong, Di Fan, Chuanhou Gao

Abstract: Causal forest methods are powerful tools in causal inference. Similar to traditional random forest in machine learning, causal forest independently considers each causal tree. However, this independence consideration increases the likelihood that classification errors in one tree are repeated in others, potentially leading to significant bias in causal e ect estimation. In this paper, we propose a novel approach that establishes connections between causal trees through the Limit Inferior Leaf Interval (LILI) clustering algorithm. LILIs are constructed based on the leaves of all causal trees, emphasizing the similarity of dataset confounders. When two instances with di erent treatments are grouped into the same leaf across a su cient number of causal trees, they are treated as counterfactual outcomes of each other. Through this clustering mechanism, LILI clustering reduces bias present in traditional causal tree methods and enhances the prediction accuracy for the average treatment e ect (ATE). By integrating LILIs into a causal forest, we develop an e cient causal inference method. Moreover, we explore several key properties of LILI by relating it to the concepts of limit inferior and limit superior in the set theory. Theoretical analysis rigorously proves the convergence of the estimated ATE using LILI clustering. Empirically, extensive comparative experiments demonstrate the superior performance of LILI clustering.

cross ConceptMix++: Leveling the Playing Field in Text-to-Image Benchmarking via Iterative Prompt Optimization

Authors: Haosheng Gan, Berk Tinaz, Mohammad Shahab Sepehri, Zalan Fabian, Mahdi Soltanolkotabi

Abstract: Current text-to-image (T2I) benchmarks evaluate models on rigid prompts, potentially underestimating true generative capabilities due to prompt sensitivity and creating biases that favor certain models while disadvantaging others. We introduce ConceptMix++, a framework that disentangles prompt phrasing from visual generation capabilities by applying iterative prompt optimization. Building on ConceptMix, our approach incorporates a multimodal optimization pipeline that leverages vision-language model feedback to refine prompts systematically. Through extensive experiments across multiple diffusion models, we show that optimized prompts significantly improve compositional generation performance, revealing previously hidden model capabilities and enabling fairer comparisons across T2I models. Our analysis reveals that certain visual concepts -- such as spatial relationships and shapes -- benefit more from optimization than others, suggesting that existing benchmarks systematically underestimate model performance in these categories. Additionally, we find strong cross-model transferability of optimized prompts, indicating shared preferences for effective prompt phrasing across models. These findings demonstrate that rigid benchmarking approaches may significantly underrepresent true model capabilities, while our framework provides more accurate assessment and insights for future development.

cross Securing Transformer-based AI Execution via Unified TEE and Crypto-protected Accelerators

Authors: Jiaqi Xue, Yifei Zhao, Mengxin Zheng, Xun Chen, Fan Yao, Yan Solihin, Qian Lou

Abstract: Recent advances in Transformer models, e.g., large language models (LLMs), have brought tremendous breakthroughs in various artificial intelligence (AI) tasks, leading to their wide applications in many security-critical domains. Due to their unprecedented scale and prohibitively high development cost, these models have become highly valuable intellectual property for AI stakeholders and are increasingly deployed via machine learning as a service (MLaaS). However, MLaaS often runs on untrusted cloud infrastructure, exposing data and models to potential breaches. Mainstream protection mechanisms leverage trusted execution environments (TEEs) where confidentiality and integrity for secretive data are shielded using hardware-based encryption and integrity checking. Unfortunately, running model inference entirely within TEEs is subject to non-trivial slowdown, which is further exacerbated in LLMs due to the substantial computation and memory footprint involved. Recent studies reveal that the hybrid TEE-based scheme offloading partial model inference operations to the untrusted accelerators (e.g., GPU) is a promising solution. However, prior offloading schemes fail to ensure dual protection of data and model in Transformer inference, as they cannot securely offload critical operations, i.e., Attention and SoftMax, forcing these computations to remain confined within TEEs. To address these challenges, we propose TwinShield, a framework enabling secure Transformer inference in heterogeneous TEE and accelerator systems with dual protection for both model and data. TwinShield offloads ~87% of computation to GPUs and delivers 4.0x - 6.1x speedups over previous approaches across various Transformer models.

cross LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents

Authors: Anand Gokhale, Vaibhav Srivastava, Francesco Bullo

Abstract: Large language models (LLMs) have demonstrated promise in reasoning tasks and general decision-making in static environments. In long-term planning tasks, however, errors tend to accumulate, often leading to unsafe or inefficient behavior, limiting their use in general-purpose settings. We propose a modular actor-critic architecture in which an LLM actor is guided by LTLCrit, a trajectory-level LLM critic that communicates via linear temporal logic (LTL). Our setup combines the reasoning strengths of language models with the guarantees of formal logic. The actor selects high-level actions from natural language observations, while the critic analyzes full trajectories and proposes new LTL constraints that shield the actor from future unsafe or inefficient behavior. The architecture supports both fixed, hand-specified safety constraints and adaptive, learned soft constraints that promote long-term efficiency. Our architecture is model-agnostic: any LLM-based planner can serve as the actor, and LTLCrit serves as a logic-generating wrapper. We formalize planning as graph traversal under symbolic constraints, allowing LTLCrit to analyze failed or suboptimal trajectories and generate new temporal logic rules that improve future behavior. We evaluate our system on the Minecraft diamond-mining benchmark, achieving 100% completion rates and improving efficiency compared to baseline LLM planners. Our results suggest that enabling LLMs to supervise each other through logic is a powerful and flexible paradigm for safe, generalizable decision making.

cross Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling

Authors: Mingzhuo Li, Guang Li, Jiafeng Mao, Linfeng Ye, Takahiro Ogawa, Miki Haseyama

Abstract: To alleviate the reliance of deep neural networks on large-scale datasets, dataset distillation aims to generate compact, high-quality synthetic datasets that can achieve comparable performance to the original dataset. The integration of generative models has significantly advanced this field. However, existing approaches primarily focus on aligning the distilled dataset with the original one, often overlooking task-specific information that can be critical for optimal downstream performance. In this paper, focusing on the downstream task of classification, we propose a task-specific sampling strategy for generative dataset distillation that incorporates the concept of difficulty to consider the requirements of the target task better. The final dataset is sampled from a larger image pool with a sampling distribution obtained by matching the difficulty distribution of the original dataset. A logarithmic transformation is applied as a pre-processing step to correct for distributional bias. The results of extensive experiments demonstrate the effectiveness of our method and suggest its potential for enhancing performance on other downstream tasks.

cross Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky

Authors: Ashutosh Hathidara, Julien Yu, Sebastian Schreiber

Abstract: Large language models (LLMs) are increasingly tasked with invoking enterprise APIs, yet they routinely falter when near-duplicate tools vie for the same user intent or when required arguments are left underspecified. We introduce DiaFORGE (Dialogue Framework for Organic Response Generation & Evaluation), a disambiguation-centric, three-stage pipeline that (i) synthesizes persona-driven, multi-turn dialogues in which the assistant must distinguish among highly similar tools, (ii) performs supervised fine-tuning of open-source models with reasoning traces across 3B - 70B parameters, and (iii) evaluates real-world readiness via a dynamic suite that redeploys each model in a live agentic loop and reports end-to-end goal completion alongside conventional static metrics. On our dynamic benchmark DiaBENCH, models trained with DiaFORGE raise tool-invocation success by 27 pp over GPT-4o and by 49 pp over Claude-3.5-Sonnet, both under optimized prompting. To spur further research, we release an open corpus of 5000 production-grade enterprise API specifications paired with rigorously validated, disambiguation-focused dialogues, offering a practical blueprint for building reliable, enterprise-ready tool-calling agents.

cross Adaptive Gate-Aware Mamba Networks for Magnetic Resonance Fingerprinting

Authors: Tianyi Ding, Hongli Chen, Yang Gao, Zhuang Xiong, Feng Liu, Martijn A. Cloos, Hongfu Sun

Abstract: Magnetic Resonance Fingerprinting (MRF) enables fast quantitative imaging by matching signal evolutions to a predefined dictionary. However, conventional dictionary matching suffers from exponential growth in computational cost and memory usage as the number of parameters increases, limiting its scalability to multi-parametric mapping. To address this, recent work has explored deep learning-based approaches as alternatives to DM. We propose GAST-Mamba, an end-to-end framework that combines a dual Mamba-based encoder with a Gate-Aware Spatial-Temporal (GAST) processor. Built on structured state-space models, our architecture efficiently captures long-range spatial dependencies with linear complexity. On 5 times accelerated simulated MRF data (200 frames), GAST-Mamba achieved a T1 PSNR of 33.12~dB, outperforming SCQ (31.69~dB). For T2 mapping, it reached a PSNR of 30.62~dB and SSIM of 0.9124. In vivo experiments further demonstrated improved anatomical detail and reduced artifacts. Ablation studies confirmed that each component contributes to performance, with the GAST module being particularly important under strong undersampling. These results demonstrate the effectiveness of GAST-Mamba for accurate and robust reconstruction from highly undersampled MRF acquisitions, offering a scalable alternative to traditional DM-based methods.

cross Generating realistic patient data

Authors: Tabea Brandt, Christina B\"using, Johanna Leweke, Finn Seesemann, Sina Weber

Abstract: Developing algorithms for real-life problems that perform well in practice highly depends on the availability of realistic data for testing. Obtaining real-life data for optimization problems in health care, however, is often difficult. This is especially true for any patient related optimization problems, e.g., for patient-to-room assignment, due to data privacy policies. Furthermore, obtained real-life data usually cannot be published which prohibits reproducibility of results by other researchers. Therefore, often artificially generated instances are used. In this paper, we present combinatorial insights about the feasibility of instances for the patient-to-room assignment problem (PRA). We use these insights to develop a configurable instance generator for PRA with an easy-to-use graphical user interface. Configurability is in this case especially important as we observed in an extensive analysis of real-life data that, e.g., the probability distribution for patients' age and length of stay depends on the respective ward.

cross Evaluating the Evaluators: Trust in Adversarial Robustness Tests

Authors: Antonio Emanuele Cin\`a, Maura Pintor, Luca Demetrio, Ambra Demontis, Battista Biggio, Fabio Roli

Abstract: Despite significant progress in designing powerful adversarial evasion attacks for robustness verification, the evaluation of these methods often remains inconsistent and unreliable. Many assessments rely on mismatched models, unverified implementations, and uneven computational budgets, which can lead to biased results and a false sense of security. Consequently, robustness claims built on such flawed testing protocols may be misleading and give a false sense of security. As a concrete step toward improving evaluation reliability, we present AttackBench, a benchmark framework developed to assess the effectiveness of gradient-based attacks under standardized and reproducible conditions. AttackBench serves as an evaluation tool that ranks existing attack implementations based on a novel optimality metric, which enables researchers and practitioners to identify the most reliable and effective attack for use in subsequent robustness evaluations. The framework enforces consistent testing conditions and enables continuous updates, making it a reliable foundation for robustness verification.

cross A Flexible Instruction Set Architecture for Efficient GEMMs

Authors: Alexandre de Limas Santana, Adri\`a Armejach, Francesc Martinez, Erich Focht, Marc Casas

Abstract: GEneral Matrix Multiplications (GEMMs) are recurrent in high-performance computing and deep learning workloads. Typically, high-end CPUs accelerate GEMM workloads with Single-Instruction Multiple Data (SIMD) or vector Instruction Set Architectures (ISAs). Since these ISAs face significant issues when running GEMM workloads, particularly when dealing with small, tall, or skinny matrices, matrix ISAs have been proposed and implemented by major hardware vendors in the last years. Although these matrix ISAs deliver larger throughput when running GEMMs than their SIMD/vector counterparts, they are rigid solutions unable to dynamically adapt themselves to application-specific aspects like the data format. This paper demonstrates that the state-of-the-art matrix ISAs deliver suboptimal performance when running the most commonly used convolution and transformer models. This paper proposes the Matrix Tile Extension (MTE), the first matrix ISA that completely decouples the instruction set architecture from the microarchitecture and seamlessly interacts with existing vector ISAs. MTE incurs minimal implementation overhead since it only requires a few additional instructions and a 64-bit Control Status Register (CSR) to keep its state. Specifically, MTE can i) vectorize GEMMs across the three dimensions M, N, and K; ii) leverage the capacity of the existing vector register file; and iii) decouple the tile shape from the underlying microarchitecture. MTE achieves speed-ups of 1.35x over the best state-of-the-art matrix ISA.

cross UWB TDoA Error Correction using Transformers: Patching and Positional Encoding Strategies

Authors: Dieter Coppens, Adnan Shahid, Eli De Poorter

Abstract: Despite their high accuracy, UWB-based localization systems suffer inaccuracies when deployed in industrial locations with many obstacles due to multipath effects and non-line-of-sight (NLOS) conditions. In such environments, current error mitigation approaches for time difference of arrival (TDoA) localization typically exclude NLOS links. However, this exclusion approach leads to geometric dilution of precision problems and this approach is infeasible when the majority of links are NLOS. To address these limitations, we propose a transformer-based TDoA position correction method that uses raw channel impulse responses (CIRs) from all available anchor nodes to compute position corrections. We introduce different CIR ordering, patching and positional encoding strategies for the transformer, and analyze each proposed technique's scalability and performance gains. Based on experiments on real-world UWB measurements, our approach can provide accuracies of up to 0.39 m in a complex environment consisting of (almost) only NLOS signals, which is an improvement of 73.6 % compared to the TDoA baseline.

cross 2.5D Object Detection for Intelligent Roadside Infrastructure

Authors: Nikolai Polley, Yacin Boualili, Ferdinand M\"utsch, Maximilian Zipfl, Tobias Fleck, J. Marius Z\"ollner

Abstract: On-board sensors of autonomous vehicles can be obstructed, occluded, or limited by restricted fields of view, complicating downstream driving decisions. Intelligent roadside infrastructure perception systems, installed at elevated vantage points, can provide wide, unobstructed intersection coverage, supplying a complementary information stream to autonomous vehicles via vehicle-to-everything (V2X) communication. However, conventional 3D object-detection algorithms struggle to generalize under the domain shift introduced by top-down perspectives and steep camera angles. We introduce a 2.5D object detection framework, tailored specifically for infrastructure roadside-mounted cameras. Unlike conventional 2D or 3D object detection, we employ a prediction approach to detect ground planes of vehicles as parallelograms in the image frame. The parallelogram preserves the planar position, size, and orientation of objects while omitting their height, which is unnecessary for most downstream applications. For training, a mix of real-world and synthetically generated scenes is leveraged. We evaluate generalizability on a held-out camera viewpoint and in adverse-weather scenarios absent from the training set. Our results show high detection accuracy, strong cross-viewpoint generalization, and robustness to diverse lighting and weather conditions. Model weights and inference code are provided at: https://gitlab.kit.edu/kit/aifb/ATKS/public/digit4taf/2.5d-object-detection

URLs: https://gitlab.kit.edu/kit/aifb/ATKS/public/digit4taf/2.5d-object-detection

cross From Street Form to Spatial Justice: Explaining Urban Exercise Inequality via a Triadic SHAP-Informed Framework

Authors: Minwei Zhao, Guosheng Yang, Zhuoni Zhang, Cai Wu

Abstract: Urban streets are essential public spaces that facilitate everyday physical activity and promote health equity. Drawing on Henri Lefebvre's spatial triad, this study proposes a conceptual and methodological framework to quantify street-level exercise deprivation through the dimensions of conceived (planning and structure), perceived (visual and sensory), and lived (practice and experiential) urban spaces. We integrate multi-source spatial data-including street networks, street-view imagery, and social media-using explainable machine learning (SHAP analysis) to classify streets by their dominant deprivation modes, forming a novel typology of spatial inequity. Results highlight significant differences across urban contexts: older city cores predominantly experience infrastructural constraints (conceived space), whereas new development areas suffer from experiential disengagement (lived space). Furthermore, by identifying spatial mismatches between population distribution and exercise intensity, our study reveals localized clusters of latent deprivation. Simulation experiments demonstrate that targeted improvements across spatial dimensions can yield up to 14% increases in exercise supportiveness. This research not only operationalizes Lefebvre's spatial theory at the street scale but also provides actionable insights and intervention guidelines, contributing to the broader goals of spatial justice and urban health equity.

cross SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications

Authors: Yana Hasson, Pauline Luc, Liliane Momeni, Maks Ovsjanikov, Guillaume Le Moing, Alina Kuznetsova, Ira Ktena, Jennifer J. Sun, Skanda Koppula, Dilara Gokay, Joseph Heyward, Etienne Pot, Andrew Zisserman

Abstract: In recent years, there has been a proliferation of spatiotemporal foundation models in different scientific disciplines. While promising, these models are often domain-specific and are only assessed within the particular applications for which they are designed. Given that many tasks can be represented as video modeling problems, video foundation models (ViFMs) hold considerable promise as general-purpose domain-agnostic approaches. However, it is not known whether the knowledge acquired on large-scale but potentially out-of-domain data can be effectively transferred across diverse scientific disciplines, and if a single, pretrained ViFM can be competitive with domain-specific baselines. To address this, we introduce SciVid, a comprehensive benchmark comprising five *Sci*entific *Vid*eo tasks, across medical computer vision, animal behavior, and weather forecasting. We adapt six leading ViFMs to SciVid using simple trainable readout modules, establishing strong baselines and demonstrating the potential for effective transfer learning. Specifically, we show that state-of-the-art results can be obtained in several applications by leveraging the general-purpose representations from ViFM backbones. Furthermore, our results reveal the limitations of existing ViFMs, and highlight opportunities for the development of generalizable models for high-impact scientific applications. We release our code at https://github.com/google-deepmind/scivid to facilitate further research in the development of ViFMs.

URLs: https://github.com/google-deepmind/scivid

cross Is It Time To Treat Prompts As Code? A Multi-Use Case Study For Prompt Optimization Using DSPy

Authors: Francisca Lemos (ALGORITMI Research Centre/LASI, University of Minho), Victor Alves (ALGORITMI Research Centre/LASI, University of Minho), Filipa Ferraz (ALGORITMI Research Centre/LASI, University of Minho)

Abstract: Although prompt engineering is central to unlocking the full potential of Large Language Models (LLMs), crafting effective prompts remains a time-consuming trial-and-error process that relies on human intuition. This study investigates Declarative Self-improving Python (DSPy), an optimization framework that programmatically creates and refines prompts, applied to five use cases: guardrail enforcement, hallucination detection in code, code generation, routing agents, and prompt evaluation. Each use case explores how prompt optimization via DSPy influences performance. While some cases demonstrated modest improvements - such as minor gains in the guardrails use case and selective enhancements in hallucination detection - others showed notable benefits. The prompt evaluation criterion task demonstrated a substantial performance increase, rising accuracy from 46.2% to 64.0%. In the router agent case, the possibility of improving a poorly performing prompt and of a smaller model matching a stronger one through optimized prompting was explored. Although prompt refinement increased accuracy from 85.0% to 90.0%, using the optimized prompt with a cheaper model did not improve performance. Overall, this study's findings suggest that DSPy's systematic prompt optimization can enhance LLM performance, particularly when instruction tuning and example selection are optimized together. However, the impact varies by task, highlighting the importance of evaluating specific use cases in prompt optimization research.

cross From Video to EEG: Adapting Joint Embedding Predictive Architecture to Uncover Visual Concepts in Brain Signal Analysis

Authors: Amirabbas Hojjati, Lu Li, Ibrahim Hameed, Anis Yazidi, Pedro G. Lind, Rabindra Khadka

Abstract: EEG signals capture brain activity with high temporal and low spatial resolution, supporting applications such as neurological diagnosis, cognitive monitoring, and brain-computer interfaces. However, effective analysis is hindered by limited labeled data, high dimensionality, and the absence of scalable models that fully capture spatiotemporal dependencies. Existing self-supervised learning (SSL) methods often focus on either spatial or temporal features, leading to suboptimal representations. To this end, we propose EEG-VJEPA, a novel adaptation of the Video Joint Embedding Predictive Architecture (V-JEPA) for EEG classification. By treating EEG as video-like sequences, EEG-VJEPA learns semantically meaningful spatiotemporal representations using joint embeddings and adaptive masking. To our knowledge, this is the first work that exploits V-JEPA for EEG classification and explores the visual concepts learned by the model. Evaluations on the publicly available Temple University Hospital (TUH) Abnormal EEG dataset show that EEG-VJEPA outperforms existing state-of-the-art models in classification accuracy.Beyond classification accuracy, EEG-VJEPA captures physiologically relevant spatial and temporal signal patterns, offering interpretable embeddings that may support human-AI collaboration in diagnostic workflows. These findings position EEG-VJEPA as a promising framework for scalable, trustworthy EEG analysis in real-world clinical settings.

cross Disentangling the Roles of Representation and Selection in Data Pruning

Authors: Yupei Du, Yingjin Song, Hugh Mee Wong, Daniil Ignatev, Albert Gatt, Dong Nguyen

Abstract: Data pruning, selecting small but impactful subsets, offers a promising way to efficiently scale NLP model training. However, existing methods often involve many different design choices, which have not been systematically studied. This limits future developments. In this work, we decompose data pruning into two key components: the data representation and the selection algorithm, and we systematically analyze their influence on the selection of instances. Our theoretical and empirical results highlight the crucial role of representations: better representations, e.g., training gradients, generally lead to a better selection of instances, regardless of the chosen selection algorithm. Furthermore, different selection algorithms excel in different settings, and none consistently outperforms the others. Moreover, the selection algorithms do not always align with their intended objectives: for example, algorithms designed for the same objective can select drastically different instances, highlighting the need for careful evaluation.

cross Robust estimation of heterogeneous treatment effects in randomized trials leveraging external data

Authors: Rickard Karlsson, Piersilvio De Bartolomeis, Issa J. Dahabreh, Jesse H. Krijthe

Abstract: Randomized trials are typically designed to detect average treatment effects but often lack the statistical power to uncover effect heterogeneity over patient characteristics, limiting their value for personalized decision-making. To address this, we propose the QR-learner, a model-agnostic learner that estimates conditional average treatment effects (CATE) within the trial population by leveraging external data from other trials or observational studies. The proposed method is robust: it has the potential to reduce the CATE prediction mean squared error while maintaining consistency, even when the external data is not aligned with the trial. Moreover, we introduce a procedure that combines the QR-learner with a trial-only CATE learner and show that it asymptotically matches or exceeds the trial-only learner in terms of mean squared error. We examine the performance of our approach in simulation studies and apply the methods to a real-world dataset, demonstrating improvements in both CATE estimation and statistical power for detecting heterogeneous effects.

cross Towards Machine Theory of Mind with Large Language Model-Augmented Inverse Planning

Authors: Rebekah A. Gelp\'i, Eric Xue, William A. Cunningham

Abstract: We propose a hybrid approach to machine Theory of Mind (ToM) that uses large language models (LLMs) as a mechanism for generating hypotheses and likelihood functions with a Bayesian inverse planning model that computes posterior probabilities for an agent's likely mental states given its actions. Bayesian inverse planning models can accurately predict human reasoning on a variety of ToM tasks, but these models are constrained in their ability to scale these predictions to scenarios with a large number of possible hypotheses and actions. Conversely, LLM-based approaches have recently demonstrated promise in solving ToM benchmarks, but can exhibit brittleness and failures on reasoning tasks even when they pass otherwise structurally identical versions. By combining these two methods, this approach leverages the strengths of each component, closely matching optimal results on a task inspired by prior inverse planning models and improving performance relative to models that utilize LLMs alone or with chain-of-thought prompting, even with smaller LLMs that typically perform poorly on ToM tasks. We also exhibit the model's potential to predict mental states on open-ended tasks, offering a promising direction for future development of ToM models and the creation of socially intelligent generative agents.

cross A Resource Efficient Quantum Kernel

Authors: Utkarsh Singh, Marco Armenta, Jean-Fr\'ed\'eric Laprade, Aaron Z. Goldberg, Khabat Heshami

Abstract: Quantum processors may enhance machine learning by mapping high-dimensional data onto quantum systems for processing. Conventional quantum kernels, or feature maps, for encoding data features onto a quantum circuit are currently impractical, as the number of entangling gates scales quadratically with the dimension of the dataset and the number of qubits. In this work, we introduce a quantum kernel designed to handle high-dimensional data with a significantly reduced number of qubits and entangling operations. Our approach preserves essential data characteristics while promoting computational efficiency, as evidenced by extensive experiments on benchmark datasets that demonstrate a marked improvement in both accuracy and resource utilization, as compared to state-of-the-art quantum feature maps. Our noisy simulations results combined with lower resource requirements highlight our kernel's ability to function within the constraints of noisy intermediate-scale quantum devices. Through numerical simulations and small-scale implementation on a superconducting circuit quantum computing platform, we demonstrate that our scheme performs on par or better than a set of classical algorithms for classification. Our findings herald a promising avenue for the practical implementation of quantum machine learning algorithms on near future quantum computing platforms.

cross Differentially private scale testing via rank transformations and percentile modifications

Authors: Joshua Levine, Kelly Ramsay

Abstract: We develop a class of differentially private two-sample scale tests, called the rank-transformed percentile-modified Siegel--Tukey tests, or RPST tests. These RPST tests are inspired both by recent differentially private extensions of some common rank tests and some older modifications to non-private rank tests. We present the asymptotic distribution of the RPST test statistic under the null hypothesis, under a very general condition on the rank transformation. We also prove RPST tests are differentially private, and that their type I error does not exceed the given level. We uncover that the growth rate of the rank transformation presents a tradeoff between power and sensitivity. We do extensive simulations to investigate the effects of the tuning parameters and compare to a general private testing framework. Lastly, we show that our techniques can also be used to improve the differentially private signed-rank test.

cross Less is More: Empowering GUI Agent with Context-Aware Simplification

Authors: Gongwei Chen, Xurui Zhou, Rui Shao, Yibo Lyu, Kaiwen Zhou, Shuai Wang, Wentao Li, Yinchuan Li, Zhongang Qi, Liqiang Nie

Abstract: The research focus of GUI agents is shifting from text-dependent to pure-vision-based approaches, which, though promising, prioritize comprehensive pre-training data collection while neglecting contextual modeling challenges. We probe the characteristics of element and history contextual modeling in GUI agent and summarize: 1) the high-density and loose-relation of element context highlight the existence of many unrelated elements and their negative influence; 2) the high redundancy of history context reveals the inefficient history modeling in current GUI agents. In this work, we propose a context-aware simplification framework for building an efficient and effective GUI Agent, termed SimpAgent. To mitigate potential interference from numerous unrelated elements, we introduce a masking-based element pruning method that circumvents the intractable relation modeling through an efficient masking mechanism. To reduce the redundancy in historical information, we devise a consistency-guided history compression module, which enhances implicit LLM-based compression through innovative explicit guidance, achieving an optimal balance between performance and efficiency. With the above components, SimpAgent reduces 27% FLOPs and achieves superior GUI navigation performances. Comprehensive navigation experiments across diverse web and mobile environments demonstrate the effectiveness and potential of our agent.

cross Transforming Calabi-Yau Constructions: Generating New Calabi-Yau Manifolds with Transformers

Authors: Jacky H. T. Yip, Charles Arnal, Francois Charton, Gary Shiu

Abstract: Fine, regular, and star triangulations (FRSTs) of four-dimensional reflexive polytopes give rise to toric varieties, within which generic anticanonical hypersurfaces yield smooth Calabi-Yau threefolds. We employ transformers -- deep learning models originally developed for language modeling -- to generate FRSTs across a range of polytope sizes. Our models exhibit efficient and unbiased sampling, and can self-improve through retraining on their own output. These results lay the foundation for AICY: a community-driven platform that combines self-improving machine learning models with a continuously expanding FRST database to explore and catalog the Calabi-Yau landscape.

cross Inverse Synthetic Aperture Fourier Ptychography

Authors: Matthew A. Chan, Casey J. Pellizzari, Christopher A. Metzler

Abstract: Fourier ptychography (FP) is a powerful light-based synthetic aperture imaging technique that allows one to reconstruct a high-resolution, wide field-of-view image by computationally integrating a diverse collection of low-resolution, far-field measurements. Typically, FP measurement diversity is introduced by changing the angle of the illumination or the position of the camera; either approach results in sampling different portions of the target's spatial frequency content, but both approaches introduce substantial costs and complexity to the acquisition process. In this work, we introduce Inverse Synthetic Aperture Fourier Ptychography, a novel approach to FP that foregoes changing the illumination angle or camera position and instead generates measurement diversity through target motion. Critically, we also introduce a novel learning-based method for estimating k-space coordinates from dual plane intensity measurements, thereby enabling synthetic aperture imaging without knowing the rotation of the target. We experimentally validate our method in simulation and on a tabletop optical system.

cross Determination of Particle-Size Distributions from Light-Scattering Measurement Using Constrained Gaussian Process Regression

Authors: Fahime Seyedheydari, Mahdi Nasiri, Marcin Mi\'nkowski, Simo S\"arkk\"a

Abstract: In this work, we propose a novel methodology for robustly estimating particle size distributions from optical scattering measurements using constrained Gaussian process regression. The estimation of particle size distributions is commonly formulated as a Fredholm integral equation of the first kind, an ill-posed inverse problem characterized by instability due to measurement noise and limited data. To address this, we use a Gaussian process prior to regularize the solution and integrate a normalization constraint into the Gaussian process via two approaches: by constraining the Gaussian process using a pseudo-measurement and by using Lagrange multipliers in the equivalent optimization problem. To improve computational efficiency, we employ a spectral expansion of the covariance kernel using eigenfunctions of the Laplace operator, resulting in a computationally tractable low-rank representation without sacrificing accuracy. Additionally, we investigate two complementary strategies for hyperparameter estimation: a data-driven approach based on maximizing the unconstrained log marginal likelihood, and an alternative approach where the physical constraints are taken into account. Numerical experiments demonstrate that the proposed constrained Gaussian process regression framework accurately reconstructs particle size distributions, producing numerically stable, smooth, and physically interpretable results. This methodology provides a principled and efficient solution for addressing inverse scattering problems and related ill-posed integral equations.

cross StreamDiT: Real-Time Streaming Text-to-Video Generation

Authors: Akio Kodaira, Tingbo Hou, Ji Hou, Masayoshi Tomizuka, Yue Zhao

Abstract: Recently, great progress has been achieved in text-to-video (T2V) generation by scaling transformer-based diffusion models to billions of parameters, which can generate high-quality videos. However, existing models typically produce only short clips offline, restricting their use cases in interactive and real-time applications. This paper addresses these challenges by proposing StreamDiT, a streaming video generation model. StreamDiT training is based on flow matching by adding a moving buffer. We design mixed training with different partitioning schemes of buffered frames to boost both content consistency and visual quality. StreamDiT modeling is based on adaLN DiT with varying time embedding and window attention. To practice the proposed method, we train a StreamDiT model with 4B parameters. In addition, we propose a multistep distillation method tailored for StreamDiT. Sampling distillation is performed in each segment of a chosen partitioning scheme. After distillation, the total number of function evaluations (NFEs) is reduced to the number of chunks in a buffer. Finally, our distilled model reaches real-time performance at 16 FPS on one GPU, which can generate video streams at 512p resolution. We evaluate our method through both quantitative metrics and human evaluation. Our model enables real-time applications, e.g. streaming generation, interactive generation, and video-to-video. We provide video results and more examples in our project website: https://cumulo-autumn.github.io/StreamDiT/

URLs: https://cumulo-autumn.github.io/StreamDiT/

cross Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis

Authors: Tyler Farghly, Patrick Rebeschini, George Deligiannidis, Arnaud Doucet

Abstract: The success of denoising diffusion models raises important questions regarding their generalisation behaviour, particularly in high-dimensional settings. Notably, it has been shown that when training and sampling are performed perfectly, these models memorise training data -- implying that some form of regularisation is essential for generalisation. Existing theoretical analyses primarily rely on algorithm-independent techniques such as uniform convergence, heavily utilising model structure to obtain generalisation bounds. In this work, we instead leverage the algorithmic aspects that promote generalisation in diffusion models, developing a general theory of algorithm-dependent generalisation for this setting. Borrowing from the framework of algorithmic stability, we introduce the notion of score stability, which quantifies the sensitivity of score-matching algorithms to dataset perturbations. We derive generalisation bounds in terms of score stability, and apply our framework to several fundamental learning settings, identifying sources of regularisation. In particular, we consider denoising score matching with early stopping (denoising regularisation), sampler-wide coarse discretisation (sampler regularisation) and optimising with SGD (optimisation regularisation). By grounding our analysis in algorithmic properties rather than model structure, we identify multiple sources of implicit regularisation unique to diffusion models that have so far been overlooked in the literature.

cross Sequential Regression Learning with Randomized Algorithms

Authors: Dorival Le\~ao, Reiko Aoki, Teh Led Red

Abstract: This paper presents ``randomized SINDy", a sequential machine learning algorithm designed for dynamic data that has a time-dependent structure. It employs a probabilistic approach, with its PAC learning property rigorously proven through the mathematical theory of functional analysis. The algorithm dynamically predicts using a learned probability distribution of predictors, updating weights via gradient descent and a proximal algorithm to maintain a valid probability density. Inspired by SINDy (Brunton et al. 2016), it incorporates feature augmentation and Tikhonov regularization. For multivariate normal weights, the proximal step is omitted to focus on parameter estimation. The algorithm's effectiveness is demonstrated through experimental results in regression and binary classification using real-world data.

cross Causal Evidence for the Primordiality of Colors in Trans-Neptunian Objects

Authors: Benjamin L. Davis, Mohamad Ali-Dib, Yujia Zheng, Zehao Jin, Kun Zhang, Andrea Valerio Macci\`o

Abstract: The origins of the colors of Trans-Neptunian Objects (TNOs) represent a crucial unresolved question, central to understanding the history of our Solar System. Recent observational surveys have revealed correlations between the eccentricity and inclination of TNOs and their colors. This has rekindled the long-standing debate on whether these colors reflect the conditions of TNO formation or their subsequent collisional evolution. In this study, we address this question with 98.7% certainty, using a model-agnostic, data-driven approach based on causal graphs. First, as a sanity check, we demonstrate how our model can replicate the currently accepted paradigms of TNOs' dynamical history, blindly and without any orbital modeling or physics-based assumptions. In fact, our causal model (with no knowledge of the existence of Neptune) predicts the existence of an unknown perturbing body, i.e., Neptune. We then show how this model predicts, with high certainty, that the color of TNOs is the root cause of their inclination distribution, rather than the other way around. This strongly suggests that the colors of TNOs reflect an underlying dynamical property, most likely their formation location. Moreover, our causal model excludes formation scenarios that invoke substantial color modification by subsequent irradiation. We therefore conclude that the colors of TNOs are predominantly primordial.

cross FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed

Authors: Jiaqi Zhang, Juntuo Wang, Zhixin Sun, John Zou, Randall Balestriero

Abstract: Large-scale vision foundation models such as DINOv2 boast impressive performances by leveraging massive architectures and training datasets. But numerous scenarios require practitioners to reproduce those pre-training solutions, such as on private data, new modalities, or simply for scientific questioning--which is currently extremely demanding computation-wise. We thus propose a novel pre-training strategy for DINOv2 that simultaneously accelerates convergence--and strengthens robustness to common corruptions as a by-product. Our approach involves a frequency filtering curriculum--low-frequency being seen first--and the Gaussian noise patching augmentation. Applied to a ViT-B/16 backbone trained on ImageNet-1K, while pre-training time and FLOPs are reduced by 1.6x and 2.25x, our method still achieves matching robustness in corruption benchmarks (ImageNet-C) and maintains competitive linear probing performance compared with baseline. This dual benefit of efficiency and robustness makes large-scale self-supervised foundation modeling more attainable, while opening the door to novel exploration around data curriculum and augmentation as means to improve self-supervised learning models robustness. The code is available at https://github.com/KevinZ0217/fast_dinov2

URLs: https://github.com/KevinZ0217/fast_dinov2

cross Efficient and Effective Query Context-Aware Learning-to-Rank Model for Sequential Recommendation

Authors: Andrii Dzhoha, Alisa Mironenko, Vladimir Vlasov, Maarten Versteegh, Marjan Celikik

Abstract: Modern sequential recommender systems commonly use transformer-based models for next-item prediction. While these models demonstrate a strong balance between efficiency and quality, integrating interleaving features - such as the query context (e.g., browse category) under which next-item interactions occur - poses challenges. Effectively capturing query context is crucial for refining ranking relevance and enhancing user engagement, as it provides valuable signals about user intent within a session. Unlike an item's features, query context is not temporally aligned with the item sequence, making its incorporation into transformers challenging and error-prone. This paper analyzes different strategies for incorporating query context into transformers trained with a causal language modeling procedure as a case study. We propose a new method that effectively fuses the item sequence with query context within the attention mechanism. Through extensive offline and online experiments on a large-scale online platform and open datasets, we present evidence that our proposed method is an effective approach for integrating query context to improve model ranking quality in terms of relevance and diversity.

cross Coil Geometry Learning for Short-Range Magnetic Actuation

Authors: Yuta Takahashi, Hayate Tajima, Shin-ichiro Sakai

Abstract: Fuel-free docking is a key operational technology for in-space assembly, resupplying space stations, sample return missions, and formation keeping of large-scale satellite swarms. The use of conventional propulsion systems, including thrusters, can cause adverse effects at short distances, such as sensor contamination, which may lead to the failure of the satellite or onboard equipment. The magnetic field interaction control generated by magnetorquers can overcome these weaknesses of propulsion. This actuation enables simultaneous control of attitude and formation control among desired satellite groups. The previous study typically uses the traditional dipole approximation model of the exact magnetic field to reduce computation cost. However, proximity operations often involve relatively short distances between satellites, which can easily compromise the effectiveness of this approximation. To avoid model errors that could result in satellite collisions, we utilize a magnetic field model described by Biot-Savart's law, without distance approximations (Near-field model), in consideration of short-distance operations. To overcome the high computational cost associated with the coil geometry and relative states information, a learning-based magnetic field approximation is derived, and its effectiveness is shown in the docking simulation of target and chaser satellites equipped with electromagnetic coils on three axes. Our method significantly reduces the computational cost of the exact magnetic model and possesses scalability that can accommodate an increasing number of target satellites through parallel processing.

cross Leveraging Large Language Models for Tacit Knowledge Discovery in Organizational Contexts

Authors: Gianlucca Zuin, Saulo Mastelini, T\'ulio Loures, Adriano Veloso

Abstract: Documenting tacit knowledge in organizations can be a challenging task due to incomplete initial information, difficulty in identifying knowledgeable individuals, the interplay of formal hierarchies and informal networks, and the need to ask the right questions. To address this, we propose an agent-based framework leveraging large language models (LLMs) to iteratively reconstruct dataset descriptions through interactions with employees. Modeling knowledge dissemination as a Susceptible-Infectious (SI) process with waning infectivity, we conduct 864 simulations across various synthetic company structures and different dissemination parameters. Our results show that the agent achieves 94.9% full-knowledge recall, with self-critical feedback scores strongly correlating with external literature critic scores. We analyze how each simulation parameter affects the knowledge retrieval process for the agent. In particular, we find that our approach is able to recover information without needing to access directly the only domain specialist. These findings highlight the agent's ability to navigate organizational complexity and capture fragmented knowledge that would otherwise remain inaccessible.

cross Interpretable Diffusion Models with B-cos Networks

Authors: Nicola Bernold, Moritz Vandenhirtz, Alice Bizeul, Julia E. Vogt

Abstract: Text-to-image diffusion models generate images by iteratively denoising random noise, conditioned on a prompt. While these models have enabled impressive progress in image generation, they often fail to accurately reflect all semantic information described in the prompt -- failures that are difficult to detect automatically. In this work, we introduce a diffusion model architecture built with B-cos modules that offers inherent interpretability. Our approach provides insight into how individual prompt tokens affect the generated image by producing explanations that highlight the pixel regions influenced by each token. We demonstrate that B-cos diffusion models can produce high-quality images while providing meaningful insights into prompt-image alignment.

cross Continual Recommender Systems

Authors: Hyunsik Yoo, SeongKu Kang, Hanghang Tong

Abstract: Modern recommender systems operate in uniquely dynamic settings: user interests, item pools, and popularity trends shift continuously, and models must adapt in real time without forgetting past preferences. While existing tutorials on continual or lifelong learning cover broad machine learning domains (e.g., vision and graphs), they do not address recommendation-specific demands-such as balancing stability and plasticity per user, handling cold-start items, and optimizing recommendation metrics under streaming feedback. This tutorial aims to make a timely contribution by filling that gap. We begin by reviewing the background and problem settings, followed by a comprehensive overview of existing approaches. We then highlight recent efforts to apply continual learning to practical deployment environments, such as resource-constrained systems and sequential interaction settings. Finally, we discuss open challenges and future research directions. We expect this tutorial to benefit researchers and practitioners in recommender systems, data mining, AI, and information retrieval across academia and industry.

cross OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference

Authors: Seungjun Shin, Jaehoon Oh, Dokwan Oh

Abstract: Attention mechanisms are central to the success of large language models (LLMs), enabling them to capture intricate token dependencies and implicitly assign importance to each token. Recent studies have revealed the sink token, which receives disproportionately high attention despite their limited semantic role. In this paper, we first expand the relationship between the sink token and other tokens, moving beyond attention to explore their similarity in hidden states, considering the layer depth. We observe that as the layers get deeper, the cosine similarity between the normalized hidden states of the sink token and those of other tokens increases, and that the normalized hidden states of the sink token exhibit negligible changes. These imply that other tokens consistently are directed toward the sink token throughout the layers. Next, we propose a dynamic token selection method, called OrthoRank, using these findings to select important tokens. Specifically, in a certain layer, we define token importance by the speed at which the token moves toward the sink token. This is converted into orthogonality with the sink token, meaning that tokens that are more orthogonal to the sink token are assigned greater importance. Finally, through extensive experiments, we demonstrated that our method results in lower perplexity and higher zero-shot accuracy compared to layer pruning methods at the same sparsity ratio with comparable throughput, while also achieving superior performance on LongBench.

cross Leo Breiman, the Rashomon Effect, and the Occam Dilemma

Authors: Cynthia Rudin

Abstract: In the famous Two Cultures paper, Leo Breiman provided a visionary perspective on the cultures of ''data models'' (modeling with consideration of data generation) versus ''algorithmic models'' (vanilla machine learning models). I provide a modern perspective on these approaches. One of Breiman's key arguments against data models is the ''Rashomon Effect,'' which is the existence of many different-but-equally-good models. The Rashomon Effect implies that data modelers would not be able to determine which model generated the data. Conversely, one of his core advantages in favor of data models is simplicity, as he claimed there exists an ''Occam Dilemma,'' i.e., an accuracy-simplicity tradeoff. After 25 years of powerful computers, it has become clear that this claim is not generally true, in that algorithmic models do not need to be complex to be accurate; however, there are nuances that help explain Breiman's logic, specifically, that by ''simple,'' he appears to consider only linear models or unoptimized decision trees. Interestingly, the Rashomon Effect is a key tool in proving the nullification of the Occam Dilemma. To his credit though, Breiman did not have the benefit of modern computers, with which my observations are much easier to make. Breiman's goal for interpretability was somewhat intertwined with causality: simpler models can help reveal which variables have a causal relationship with the outcome. However, I argue that causality can be investigated without the use of single models, whether or not they are simple. Interpretability is useful in its own right, and I think Breiman knew that too. Technically, my modern perspective does not belong to either of Breiman's Two Cultures, but shares the goals of both of them - causality, simplicity, accuracy - and shows that these goals can be accomplished in other ways, without the limitations Breiman was concerned about.

cross Optimizing Age of Trust and Throughput in Multi-Hop UAV-Aided IoT Networks

Authors: Yizhou Luo, Kwan-Wu Chin, Ruyi Guan, Xi Xiao, Caimeng Wang, Jingyin Feng, Tengjiao He

Abstract: Devices operating in Internet of Things (IoT) networks may be deployed across vast geographical areas and interconnected via multi-hop communications. Further, they may be unguarded. This makes them vulnerable to attacks and motivates operators to check on devices frequently. To this end, we propose and study an Unmanned Aerial Vehicle (UAV)-aided attestation framework for use in IoT networks with a charging station powered by solar. A key challenge is optimizing the trajectory of the UAV to ensure it attests as many devices as possible. A trade-off here is that devices being checked by the UAV are offline, which affects the amount of data delivered to a gateway. Another challenge is that the charging station experiences time-varying energy arrivals, which in turn affect the flight duration and charging schedule of the UAV. To address these challenges, we employ a Deep Reinforcement Learning (DRL) solution to optimize the UAV's charging schedule and the selection of devices to be attested during each flight. The simulation results show that our solution reduces the average age of trust by 88% and throughput loss due to attestation by 30%.

cross A Comparative Study of Specialized LLMs as Dense Retrievers

Authors: Hengran Zhang, Keping Bi, Jiafeng Guo

Abstract: While large language models (LLMs) are increasingly deployed as dense retrievers, the impact of their domain-specific specialization on retrieval effectiveness remains underexplored. This investigation systematically examines how task-specific adaptations in LLMs influence their retrieval capabilities, an essential step toward developing unified retrievers capable of handling text, code, images, and multimodal content. We conduct extensive experiments with eight Qwen2.5 7B LLMs, including base, instruction-tuned, code/math-specialized, long reasoning, and vision-language models across zero-shot retrieval settings and the supervised setting. For the zero-shot retrieval settings, we consider text retrieval from the BEIR benchmark and code retrieval from the CoIR benchmark. Further, to evaluate supervised performance, all LLMs are fine-tuned on the MS MARCO dataset. We find that mathematical specialization and the long reasoning capability cause consistent degradation in three settings, indicating conflicts between mathematical reasoning and semantic matching. The vision-language model and code-specialized LLMs demonstrate superior zero-shot performance compared to other LLMs, even surpassing BM25 on the code retrieval task, and maintain comparable performance to base LLMs in supervised settings. These findings suggest promising directions for the unified retrieval task leveraging cross-domain and cross-modal fusion.

cross MalVol-25: A Diverse, Labelled and Detailed Volatile Memory Dataset for Malware Detection and Response Testing and Validation

Authors: Dipo Dunsin, Mohamed Chahine Ghanem, Eduardo Almeida Palmieri

Abstract: This paper addresses the critical need for high-quality malware datasets that support advanced analysis techniques, particularly machine learning and agentic AI frameworks. Existing datasets often lack diversity, comprehensive labelling, and the complexity necessary for effective machine learning and agent-based AI training. To fill this gap, we developed a systematic approach for generating a dataset that combines automated malware execution in controlled virtual environments with dynamic monitoring tools. The resulting dataset comprises clean and infected memory snapshots across multiple malware families and operating systems, capturing detailed behavioural and environmental features. Key design decisions include applying ethical and legal compliance, thorough validation using both automated and manual methods, and comprehensive documentation to ensure replicability and integrity. The dataset's distinctive features enable modelling system states and transitions, facilitating RL-based malware detection and response strategies. This resource is significant for advancing adaptive cybersecurity defences and digital forensic research. Its scope supports diverse malware scenarios and offers potential for broader applications in incident response and automated threat mitigation.

cross Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Authors: Ziyang Miao, Qiyu Sun, Jingyuan Wang, Yuchen Gong, Yaowei Zheng, Shiqi Li, Richong Zhang

Abstract: Large language models (LLMs) have shown impressive performance on general-purpose tasks, yet adapting them to specific domains remains challenging due to the scarcity of high-quality domain data. Existing data synthesis tools often struggle to extract reliable fine-tuning data from heterogeneous documents effectively. To address this limitation, we propose Easy Dataset, a unified framework for synthesizing fine-tuning data from unstructured documents via an intuitive graphical user interface (GUI). Specifically, Easy Dataset allows users to easily configure text extraction models and chunking strategies to transform raw documents into coherent text chunks. It then leverages a persona-driven prompting approach to generate diverse question-answer pairs using public-available LLMs. Throughout the pipeline, a human-in-the-loop visual interface facilitates the review and refinement of intermediate outputs to ensure data quality. Experiments on a financial question-answering task show that fine-tuning LLMs on the synthesized dataset significantly improves domain-specific performance while preserving general knowledge. The source code and installable package are available at https://github.com/ConardLi/easy-dataset and have garnered over 9,000 GitHub stars.

URLs: https://github.com/ConardLi/easy-dataset

cross Beyond Independent Passages: Adaptive Passage Combination Retrieval for Retrieval Augmented Open-Domain Question Answering

Authors: Ting-Wen Ko, Jyun-Yu Jiang, Pu-Jen Cheng

Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external documents at inference time, enabling up-to-date knowledge access without costly retraining. However, conventional RAG methods retrieve passages independently, often leading to redundant, noisy, or insufficiently diverse context-particularly problematic - particularly problematic in noisy corpora and for multi-hop questions. To address this, we propose Adaptive Passage Combination Retrieval (AdaPCR), a novel framework for open-domain question answering with black-box LMs. AdaPCR explicitly models dependencies between passages by considering passage combinations as units for retrieval and reranking. It consists of a context-aware query reformulation using concatenated passages, and a reranking step trained with a predictive objective aligned with downstream answer likelihood. Crucially, AdaPCR adaptively selects the number of retrieved passages without additional stopping modules. Experiments across several QA benchmarks show that AdaPCR outperforms baselines, particularly in multi-hop reasoning, demonstrating the effectiveness of modeling inter-passage dependencies for improved retrieval.

cross How to Train Your LLM Web Agent: A Statistical Diagnosis

Authors: Dheeraj Vattikonda, Santhoshi Ravichandran, Emiliano Penaloza, Hadi Nekoei, Megh Thakkar, Thibault Le Sellier de Chezelles, Nicolas Gontier, Miguel Mu\~noz-M\'armol, Sahar Omidi Shayegan, Stefania Raimondo, Xue Liu, Alexandre Drouin, Laurent Charlin, Alexandre Pich\'e, Alexandre Lacoste, Massimo Caccia

Abstract: LLM-based web agents have recently made significant progress, but much of it has occurred in closed-source systems, widening the gap with open-source alternatives. Progress has been held back by two key challenges: first, a narrow focus on single-step tasks that overlooks the complexity of multi-step web interactions; and second, the high compute costs required to post-train LLM-based web agents. To address this, we present the first statistically grounded study on compute allocation for LLM web-agent post-training. Our approach uses a two-stage pipeline, training a Llama 3.1 8B student to imitate a Llama 3.3 70B teacher via supervised fine-tuning (SFT), followed by on-policy reinforcement learning. We find this process highly sensitive to hyperparameter choices, making exhaustive sweeps impractical. To spare others from expensive trial-and-error, we sample 1,370 configurations and use bootstrapping to estimate effective hyperparameters. Our results show that combining SFT with on-policy RL consistently outperforms either approach alone on both WorkArena and MiniWob++. Further, this strategy requires only 55% of the compute to match the peak performance of pure SFT on MiniWob++, effectively pushing the compute-performance Pareto frontier, and is the only strategy that can close the gap with closed-source models.

cross Model selection for stochastic dynamics: a parsimonious and principled approach

Authors: Andonis Gerardos

Abstract: This thesis focuses on the discovery of stochastic differential equations (SDEs) and stochastic partial differential equations (SPDEs) from noisy and discrete time series. A major challenge is selecting the simplest possible correct model from vast libraries of candidate models, where standard information criteria (AIC, BIC) are often limited. We introduce PASTIS (Parsimonious Stochastic Inference), a new information criterion derived from extreme value theory. Its penalty term, $n_\mathcal{B} \ln(n_0/p)$, explicitly incorporates the size of the initial library of candidate parameters ($n_0$), the number of parameters in the considered model ($n_\mathcal{B}$), and a significance threshold ($p$). This significance threshold represents the probability of selecting a model containing more parameters than necessary when comparing many models. Benchmarks on various systems (Lorenz, Ornstein-Uhlenbeck, Lotka-Volterra for SDEs; Gray-Scott for SPDEs) demonstrate that PASTIS outperforms AIC, BIC, cross-validation (CV), and SINDy (a competing method) in terms of exact model identification and predictive capability. Furthermore, real-world data can be subject to large sampling intervals ($\Delta t$) or measurement noise ($\sigma$), which can impair model learning and selection capabilities. To address this, we have developed robust variants of PASTIS, PASTIS-$\Delta t$ and PASTIS-$\sigma$, thus extending the applicability of the approach to imperfect experimental data. PASTIS thus provides a statistically grounded, validated, and practical methodological framework for discovering simple models for processes with stochastic dynamics.

cross Online Convex Optimization with Switching Cost with Only One Single Gradient Evaluation

Authors: Harsh Shah, Purna Chandrasekhar, Rahul Vaze

Abstract: Online convex optimization with switching cost is considered under the frugal information setting where at time $t$, before action $x_t$ is taken, only a single function evaluation and a single gradient is available at the previously chosen action $x_{t-1}$ for either the current cost function $f_t$ or the most recent cost function $f_{t-1}$. When the switching cost is linear, online algorithms with optimal order-wise competitive ratios are derived for the frugal setting. When the gradient information is noisy, an online algorithm whose competitive ratio grows quadratically with the noise magnitude is derived.

cross Token Level Hallucination Detection via Variance in Language Models

Authors: Keshav Kumar

Abstract: Large Language Models (LLMs) have demonstrated impressive generative capabilities across diverse tasks but remain susceptible to hallucinations, confidently generated yet factually incorrect outputs. We introduce a reference-free, token-level hallucination detection framework that leverages the variance in token log-probabilities across multiple stochastic generations. Unlike prior methods that require ground-truth references or sentence-level verification, our approach is model-agnostic, interpretable, and suited for real-time or post-hoc analysis. We evaluate our method on unanswerable question prompts from the SQuAD v2 dataset and benchmark across three autoregressive models of varying scales: GPT-Neo 125M, Falcon 1B, and Mistral 7B. Through both quantitative metrics and visual diagnostics, we show that token-level variance reliably highlights instability in model outputs and correlates with hallucination patterns. Our framework is lightweight, reproducible, and adaptable to multiple domains, offering a valuable diagnostic tool for analyzing generative reliability in LLMs.

cross Driver-Net: Multi-Camera Fusion for Assessing Driver Take-Over Readiness in Automated Vehicles

Authors: Mahdi Rezaei, Mohsen Azarmi

Abstract: Ensuring safe transition of control in automated vehicles requires an accurate and timely assessment of driver readiness. This paper introduces Driver-Net, a novel deep learning framework that fuses multi-camera inputs to estimate driver take-over readiness. Unlike conventional vision-based driver monitoring systems that focus on head pose or eye gaze, Driver-Net captures synchronised visual cues from the driver's head, hands, and body posture through a triple-camera setup. The model integrates spatio-temporal data using a dual-path architecture, comprising a Context Block and a Feature Block, followed by a cross-modal fusion strategy to enhance prediction accuracy. Evaluated on a diverse dataset collected from the University of Leeds Driving Simulator, the proposed method achieves an accuracy of up to 95.8% in driver readiness classification. This performance significantly enhances existing approaches and highlights the importance of multimodal and multi-view fusion. As a real-time, non-intrusive solution, Driver-Net contributes meaningfully to the development of safer and more reliable automated vehicles and aligns with new regulatory mandates and upcoming safety standards.

cross Pedestrian Intention Prediction via Vision-Language Foundation Models

Authors: Mohsen Azarmi, Mahdi Rezaei, He Wang

Abstract: Prediction of pedestrian crossing intention is a critical function in autonomous vehicles. Conventional vision-based methods of crossing intention prediction often struggle with generalizability, context understanding, and causal reasoning. This study explores the potential of vision-language foundation models (VLFMs) for predicting pedestrian crossing intentions by integrating multimodal data through hierarchical prompt templates. The methodology incorporates contextual information, including visual frames, physical cues observations, and ego-vehicle dynamics, into systematically refined prompts to guide VLFMs effectively in intention prediction. Experiments were conducted on three common datasets-JAAD, PIE, and FU-PIP. Results demonstrate that incorporating vehicle speed, its variations over time, and time-conscious prompts significantly enhances the prediction accuracy up to 19.8%. Additionally, optimised prompts generated via an automatic prompt engineering framework yielded 12.5% further accuracy gains. These findings highlight the superior performance of VLFMs compared to conventional vision-based models, offering enhanced generalisation and contextual understanding for autonomous driving applications.

cross Physics-informed neural networks and neural operators for a study of EUV electromagnetic wave diffraction from a lithography mask

Authors: Vasiliy A. Es'kin, Egor V. Ivanov

Abstract: Physics-informed neural networks (PINNs) and neural operators (NOs) for solving the problem of diffraction of Extreme Ultraviolet (EUV) electromagnetic waves from a mask are presented. A novel hybrid Waveguide Neural Operator (WGNO) is introduced, which is based on a waveguide method with its most computationally expensive part replaced by a neural network. Numerical experiments on realistic 2D and 3D masks show that the WGNO achieves state-of-the-art accuracy and inference time, providing a highly efficient solution for accelerating the design workflows of lithography masks.

cross Efficient Detection of Intermittent Job Failures Using Few-Shot Learning

Authors: Henri A\"idasso, Francis Bordeleau, Ali Tizghadam

Abstract: One of the main challenges developers face in the use of continuous integration (CI) and deployment pipelines is the occurrence of intermittent job failures, which result from unexpected non-deterministic issues (e.g., flaky tests or infrastructure problems) rather than regular code-related errors such as bugs. Prior studies developed machine learning (ML) models trained on large datasets of job logs to classify job failures as either intermittent or regular. As an alternative to costly manual labeling of large datasets, the state-of-the-art (SOTA) approach leveraged a heuristic based on non-deterministic job reruns. However, this method mislabels intermittent job failures as regular in contexts where rerunning suspicious job failures is not an explicit policy, and therefore limits the SOTA's performance in practice. In fact, our manual analysis of 2,125 job failures from 5 industrial and 1 open-source projects reveals that, on average, 32% of intermittent job failures are mislabeled as regular. To address these limitations, this paper introduces a novel approach to intermittent job failure detection using few-shot learning (FSL). Specifically, we fine-tune a small language model using a few number of manually labeled log examples to generate rich embeddings, which are then used to train an ML classifier. Our FSL-based approach achieves 70-88% F1-score with only 12 shots in all projects, outperforming the SOTA, which proved ineffective (34-52% F1-score) in 4 projects. Overall, this study underlines the importance of data quality over quantity and provides a more efficient and practical framework for the detection of intermittent job failures in organizations.

cross Where to Intervene: Action Selection in Deep Reinforcement Learning

Authors: Wenbo Zhang, Hengrui Cai

Abstract: Deep reinforcement learning (RL) has gained widespread adoption in recent years but faces significant challenges, particularly in unknown and complex environments. Among these, high-dimensional action selection stands out as a critical problem. Existing works often require a sophisticated prior design to eliminate redundancy in the action space, relying heavily on domain expert experience or involving high computational complexity, which limits their generalizability across different RL tasks. In this paper, we address these challenges by proposing a general data-driven action selection approach with model-free and computationally friendly properties. Our method not only selects minimal sufficient actions but also controls the false discovery rate via knockoff sampling. More importantly, we seamlessly integrate the action selection into deep RL methods during online training. Empirical experiments validate the established theoretical guarantees, demonstrating that our method surpasses various alternative techniques in terms of both performance in variable selection and overall achieved rewards.

cross Mixed-Sample SGD: an End-to-end Analysis of Supervised Transfer Learning

Authors: Yuyang Deng, Samory Kpotufe

Abstract: Theoretical works on supervised transfer learning (STL) -- where the learner has access to labeled samples from both source and target distributions -- have for the most part focused on statistical aspects of the problem, while efficient optimization has received less attention. We consider the problem of designing an SGD procedure for STL that alternates sampling between source and target data, while maintaining statistical transfer guarantees without prior knowledge of the quality of the source data. A main algorithmic difficulty is in understanding how to design such an adaptive sub-sampling mechanism at each SGD step, to automatically gain from the source when it is informative, or bias towards the target and avoid negative transfer when the source is less informative. We show that, such a mixed-sample SGD procedure is feasible for general prediction tasks with convex losses, rooted in tracking an abstract sequence of constrained convex programs that serve to maintain the desired transfer guarantees. We instantiate these results in the concrete setting of linear regression with square loss, and show that the procedure converges, with $1/\sqrt{T}$ rate, to a solution whose statistical performance on the target is adaptive to the a priori unknown quality of the source. Experiments with synthetic and real datasets support the theory.

cross Context Tuning for In-Context Optimization

Authors: Jack Lu, Ryan Teehan, Zhenbang Yang, Mengye Ren

Abstract: We introduce Context Tuning, a simple and effective method to significantly enhance few-shot adaptation of language models (LLMs) without fine-tuning model parameters. While prompt-based adaptation techniques have demonstrated the effectiveness of lightweight adaptation methods for large language models (LLMs), they typically initialize a trainable prompt or prefix with irrelevant tokens for the task at hand. In contrast, Context Tuning initializes the trainable prompt or prefix with task-specific demonstration examples, leveraging the model's inherent In-Context Learning (ICL) ability to extract relevant information for improved few-shot learning performance. Extensive evaluations on benchmarks such as CrossFit, UnifiedQA, MMLU, BIG-Bench Hard, and ARC demonstrate that Context Tuning outperforms traditional prompt-based adaptation methods and achieves competitive accuracy to Test-Time Training with significantly higher training efficiency.

cross Inertial Quadratic Majorization Minimization with Application to Kernel Regularized Learning

Authors: Qiang Heng, Caixing Wang

Abstract: First-order methods in convex optimization offer low per-iteration cost but often suffer from slow convergence, while second-order methods achieve fast local convergence at the expense of costly Hessian inversions. In this paper, we highlight a middle ground: minimizing a quadratic majorant with fixed curvature at each iteration. This strategy strikes a balance between per-iteration cost and convergence speed, and crucially allows the reuse of matrix decompositions, such as Cholesky or spectral decompositions, across iterations and varying regularization parameters. We introduce the Quadratic Majorization Minimization with Extrapolation (QMME) framework and establish its sequential convergence properties under standard assumptions. The new perspective of our analysis is to center the arguments around the induced norm of the curvature matrix $H$. To demonstrate practical advantages, we apply QMME to large-scale kernel regularized learning problems. In particular, we propose a novel Sylvester equation modelling technique for kernel multinomial regression. In Julia-based experiments, QMME compares favorably against various established first- and second-order methods. Furthermore, we demonstrate that our algorithms complement existing kernel approximation techniques through more efficiently handling sketching matrices with large projection dimensions. Our numerical experiments and real data analysis are available and fully reproducible at https://github.com/qhengncsu/QMME.jl.

URLs: https://github.com/qhengncsu/QMME.jl.

cross Bandit Pareto Set Identification in a Multi-Output Linear Model

Authors: Cyrille Kone, Emilie Kaufmann, Laura Richert

Abstract: We study the Pareto Set Identification (PSI) problem in a structured multi-output linear bandit model. In this setting, each arm is associated a feature vector belonging to $\mathbb{R}^h$, and its mean vector in $\mathbb{R}^d$ linearly depends on this feature vector through a common unknown matrix $\Theta \in \mathbb{R}^{h \times d}$. The goal is to identify the set of non-dominated arms by adaptively collecting samples from the arms. We introduce and analyze the first optimal design-based algorithms for PSI, providing nearly optimal guarantees in both the fixed-budget and the fixed-confidence settings. Notably, we show that the difficulty of these tasks mainly depends on the sub-optimality gaps of $h$ arms only. Our theoretical results are supported by an extensive benchmark on synthetic and real-world datasets.

cross Efficient Training of Deep Networks using Guided Spectral Data Selection: A Step Toward Learning What You Need

Authors: Mohammadreza Sharifi, Ahad Harati

Abstract: Effective data curation is essential for optimizing neural network training. In this paper, we present the Guided Spectrally Tuned Data Selection (GSTDS) algorithm, which dynamically adjusts the subset of data points used for training using an off-the-shelf pre-trained reference model. Based on a pre-scheduled filtering ratio, GSTDS effectively reduces the number of data points processed per batch. The proposed method ensures an efficient selection of the most informative data points for training while avoiding redundant or less beneficial computations. Preserving data points in each batch is performed based on spectral analysis. A Fiedler vector-based scoring mechanism removes the filtered portion of the batch, lightening the resource requirements of the learning. The proposed data selection approach not only streamlines the training process but also promotes improved generalization and accuracy. Extensive experiments on standard image classification benchmarks, including CIFAR-10, Oxford-IIIT Pet, and Oxford-Flowers, demonstrate that GSTDS outperforms standard training scenarios and JEST, a recent state-of-the-art data curation method, on several key factors. It is shown that GSTDS achieves notable reductions in computational requirements, up to four times, without compromising performance. GSTDS exhibits a considerable growth in terms of accuracy under the limited computational resource usage, in contrast to other methodologies. These promising results underscore the potential of spectral-based data selection as a scalable solution for resource-efficient deep learning and motivate further exploration into adaptive data curation strategies. You can find the code at https://github.com/rezasharifi82/GSTDS.

URLs: https://github.com/rezasharifi82/GSTDS.

cross VOLTRON: Detecting Unknown Malware Using Graph-Based Zero-Shot Learning

Authors: M. Tahir Akdeniz, Zeynep Ye\c{s}ilkaya, \.I. Enes K\"ose, \.I. Ula\c{s} \"Unal, Sevil \c{S}en

Abstract: The persistent threat of Android malware presents a serious challenge to the security of millions of users globally. While many machine learning-based methods have been developed to detect these threats, their reliance on large labeled datasets limits their effectiveness against emerging, previously unseen malware families, for which labeled data is scarce or nonexistent. To address this challenge, we introduce a novel zero-shot learning framework that combines Variational Graph Auto-Encoders (VGAE) with Siamese Neural Networks (SNN) to identify malware without needing prior examples of specific malware families. Our approach leverages graph-based representations of Android applications, enabling the model to detect subtle structural differences between benign and malicious software, even in the absence of labeled data for new threats. Experimental results show that our method outperforms the state-of-the-art MaMaDroid, especially in zero-day malware detection. Our model achieves 96.24% accuracy and 95.20% recall for unknown malware families, highlighting its robustness against evolving Android threats.

cross Adversarial Data Augmentation for Single Domain Generalization via Lyapunov Exponent-Guided Optimization

Authors: Zuyu Zhang, Ning Chen, Yongshan Liu, Qinghua Zhang, Xu Zhang

Abstract: Single Domain Generalization (SDG) aims to develop models capable of generalizing to unseen target domains using only one source domain, a task complicated by substantial domain shifts and limited data diversity. Existing SDG approaches primarily rely on data augmentation techniques, which struggle to effectively adapt training dynamics to accommodate large domain shifts. To address this, we propose LEAwareSGD, a novel Lyapunov Exponent (LE)-guided optimization approach inspired by dynamical systems theory. By leveraging LE measurements to modulate the learning rate, LEAwareSGD encourages model training near the edge of chaos, a critical state that optimally balances stability and adaptability. This dynamic adjustment allows the model to explore a wider parameter space and capture more generalizable features, ultimately enhancing the model's generalization capability. Extensive experiments on PACS, OfficeHome, and DomainNet demonstrate that LEAwareSGD yields substantial generalization gains, achieving up to 9.47\% improvement on PACS in low-data regimes. These results underscore the effectiveness of training near the edge of chaos for enhancing model generalization capability in SDG tasks.

cross CLIP-RL: Surgical Scene Segmentation Using Contrastive Language-Vision Pretraining & Reinforcement Learning

Authors: Fatmaelzahraa Ali Ahmed, Muhammad Arsalan, Abdulaziz Al-Ali, Khalid Al-Jalham, Shidin Balakrishnan

Abstract: Understanding surgical scenes can provide better healthcare quality for patients, especially with the vast amount of video data that is generated during MIS. Processing these videos generates valuable assets for training sophisticated models. In this paper, we introduce CLIP-RL, a novel contrastive language-image pre-training model tailored for semantic segmentation for surgical scenes. CLIP-RL presents a new segmentation approach which involves reinforcement learning and curriculum learning, enabling continuous refinement of the segmentation masks during the full training pipeline. Our model has shown robust performance in different optical settings, such as occlusions, texture variations, and dynamic lighting, presenting significant challenges. CLIP model serves as a powerful feature extractor, capturing rich semantic context that enhances the distinction between instruments and tissues. The RL module plays a pivotal role in dynamically refining predictions through iterative action-space adjustments. We evaluated CLIP-RL on the EndoVis 2018 and EndoVis 2017 datasets. CLIP-RL achieved a mean IoU of 81%, outperforming state-of-the-art models, and a mean IoU of 74.12% on EndoVis 2017. This superior performance was achieved due to the combination of contrastive learning with reinforcement learning and curriculum learning.

cross A note on the unique properties of the Kullback--Leibler divergence for sampling via gradient flows

Authors: Francesca Romana Crucinio

Abstract: We consider the problem of sampling from a probability distribution $\pi$. It is well known that this can be written as an optimisation problem over the space of probability distribution in which we aim to minimise a divergence from $\pi$. and The optimisation problem is normally solved through gradient flows in the space of probability distribution with an appropriate metric. We show that the Kullback--Leibler divergence is the only divergence in the family of Bregman divergences whose gradient flow w.r.t. many popular metrics does not require knowledge of the normalising constant of $\pi$.

cross Efficient Perplexity Bound and Ratio Matching in Discrete Diffusion Language Models

Authors: Etrit Haxholli, Yeti Z. G\"urb\"uz, O\u{g}ul Can, Eli Waxman

Abstract: While continuous diffusion models excel in modeling continuous distributions, their application to categorical data has been less effective. Recent work has shown that ratio-matching through score-entropy within a continuous-time discrete Markov chain (CTMC) framework serves as a competitive alternative to autoregressive models in language modeling. To enhance this framework, we first introduce three new theorems concerning the KL divergence between the data and learned distribution. Our results serve as the discrete counterpart to those established for continuous diffusion models and allow us to derive an improved upper bound of the perplexity. Second, we empirically show that ratio-matching performed by minimizing the denoising cross-entropy between the clean and corrupted data enables models to outperform those utilizing score-entropy with up to 10% lower perplexity/generative-perplexity, and 15% faster training steps. To further support our findings, we introduce and evaluate a novel CTMC transition-rate matrix that allows prediction refinement, and derive the analytic expression for its matrix exponential which facilitates the computation of conditional ratios thus enabling efficient training and generation.

cross Transferring Visual Explainability of Self-Explaining Models through Task Arithmetic

Authors: Yuya Yoshikawa, Ryotaro Shimizu, Takahiro Kawashima, Yuki Saito

Abstract: In scenarios requiring both prediction and explanation efficiency for image classification, self-explaining models that perform both tasks in a single inference are effective. However, their training incurs substantial labeling and computational costs. This study aims to tackle the issue by proposing a method to transfer the visual explainability of self-explaining models, learned in a source domain, to a target domain based on a task arithmetic framework. Specifically, we construct a self-explaining model by extending image classifiers based on a vision-language pretrained model. We then define an \emph{explainability vector} as the difference between model parameters trained on the source domain with and without explanation supervision. Based on the task arithmetic framework, we impart explainability to a model trained only on the prediction task in the target domain by applying the explainability vector. Experimental results on various image classification datasets demonstrate that, except for transfers between some less-related domains, visual explainability can be successfully transferred from source to target domains, improving explanation quality in the target domain without sacrificing classification accuracy. Furthermore, we show that the explainability vector learned on a large and diverse dataset like ImageNet, extended with explanation supervision, exhibits universality and robustness, improving explanation quality on nine out of ten different target datasets. We also find that the explanation quality achieved with a single model inference is comparable to that of Kernel SHAP, which requires 150 model inferences.

cross Neural Networks for Tamed Milstein Approximation of SDEs with Additive Symmetric Jump Noise Driven by a Poisson Random Measure

Authors: Ramirez-Gonzalez Jose-Hermenegildo, Sun Ying

Abstract: This work aims to estimate the drift and diffusion functions in stochastic differential equations (SDEs) driven by a particular class of L\'evy processes with finite jump intensity, using neural networks. We propose a framework that integrates the Tamed-Milstein scheme with neural networks employed as non-parametric function approximators. Estimation is carried out in a non-parametric fashion for the drift function \( f: \mathbb{Z} \to \mathbb{R} \), the diffusion coefficient \( g: \mathbb{Z} \to \mathbb{R} \). The model of interest is given by \[ dX(t) = \xi + f(X(t))\, dt + g(X(t))\, dW_t + \gamma \int_{\mathbb{Z}} z\, N(dt,dz), \] where \( W_t \) is a standard Brownian motion, and \( N(dt,dz) \) is a Poisson random measure on \( (~\mathbb{R}_{+} ~\times ~\mathbb{Z}~, ~\mathcal{B}~(~\mathbb{R}_{+}~)~\otimes~\mathcal{Z}~,~ \lambda( \Lambda~\otimes~v~)~) \), with \( \lambda, \gamma > 0 \), \( \Lambda \) being the Lebesgue measure on \( \mathbb{R}_{+} \), and \( v \) a finite measure on the measurable space \( (\mathbb{Z}, \mathcal{Z}) \). Neural networks are used as non-parametric function approximators, enabling the modeling of complex nonlinear dynamics without assuming restrictive functional forms. The proposed methodology constitutes a flexible alternative for inference in systems with state-dependent noise and discontinuities driven by L\'evy processes.

cross ARMR: Adaptively Responsive Network for Medication Recommendation

Authors: Feiyue Wu, Tianxing Wu, Shenqi Jing

Abstract: Medication recommendation is a crucial task in healthcare, especially for patients with complex medical conditions. However, existing methods often struggle to effectively balance the reuse of historical medications with the introduction of new drugs in response to the changing patient conditions. In order to address this challenge, we propose an Adaptively Responsive network for Medication Recommendation (ARMR), a new method which incorporates 1) a piecewise temporal learning component that distinguishes between recent and distant patient history, enabling more nuanced temporal understanding, and 2) an adaptively responsive mechanism that dynamically adjusts attention to new and existing drugs based on the patient's current health state and medication history. Experiments on the MIMIC-III and MIMIC-IV datasets indicate that ARMR has better performance compared with the state-of-the-art baselines in different evaluation metrics, which contributes to more personalized and accurate medication recommendations. The source code is publicly avaiable at: https://github.com/seucoin/armr2.

URLs: https://github.com/seucoin/armr2.

cross Reconstructing Biological Pathways by Applying Selective Incremental Learning to (Very) Small Language Models

Authors: Pranta Saha, Joyce Reimer, Brook Byrns, Connor Burbridge, Neeraj Dhar, Jeffrey Chen, Steven Rayan, Gordon Broderick

Abstract: The use of generative artificial intelligence (AI) models is becoming ubiquitous in many fields. Though progress continues to be made, general purpose large language AI models (LLM) show a tendency to deliver creative answers, often called "hallucinations", which have slowed their application in the medical and biomedical fields where accuracy is paramount. We propose that the design and use of much smaller, domain and even task-specific LM may be a more rational and appropriate use of this technology in biomedical research. In this work we apply a very small LM by today's standards to the specialized task of predicting regulatory interactions between molecular components to fill gaps in our current understanding of intracellular pathways. Toward this we attempt to correctly posit known pathway-informed interactions recovered from manually curated pathway databases by selecting and using only the most informative examples as part of an active learning scheme. With this example we show that a small (~110 million parameters) LM based on a Bidirectional Encoder Representations from Transformers (BERT) architecture can propose molecular interactions relevant to tuberculosis persistence and transmission with over 80% accuracy using less than 25% of the ~520 regulatory relationships in question. Using information entropy as a metric for the iterative selection of new tuning examples, we also find that increased accuracy is driven by favoring the use of the incorrectly assigned statements with the highest certainty (lowest entropy). In contrast, the concurrent use of correct but least certain examples contributed little and may have even been detrimental to the learning rate.

cross Quantum Algorithms for Bandits with Knapsacks with Improved Regret and Time Complexities

Authors: Yuexin Su, Ziyi Yang, Peiyuan Huang, Tongyang Li, Yinyu Ye

Abstract: Bandits with knapsacks (BwK) constitute a fundamental model that combines aspects of stochastic integer programming with online learning. Classical algorithms for BwK with a time horizon $T$ achieve a problem-independent regret bound of ${O}(\sqrt{T})$ and a problem-dependent bound of ${O}(\log T)$. In this paper, we initiate the study of the BwK model in the setting of quantum computing, where both reward and resource consumption can be accessed via quantum oracles. We establish both problem-independent and problem-dependent regret bounds for quantum BwK algorithms. For the problem-independent case, we demonstrate that a quantum approach can improve the classical regret bound by a factor of $(1+\sqrt{B/\mathrm{OPT}_\mathrm{LP}})$, where $B$ is budget constraint in BwK and $\mathrm{OPT}_{\mathrm{LP}}$ denotes the optimal value of a linear programming relaxation of the BwK problem. For the problem-dependent setting, we develop a quantum algorithm using an inexact quantum linear programming solver. This algorithm achieves a quadratic improvement in terms of the problem-dependent parameters, as well as a polynomial speedup of time complexity on problem's dimensions compared to classical counterparts. Compared to previous works on quantum algorithms for multi-armed bandits, our study is the first to consider bandit models with resource constraints and hence shed light on operations research.

cross The Joys of Categorical Conformal Prediction

Authors: Michele Caprio

Abstract: Conformal prediction (CP) is an Uncertainty Representation technique that delivers finite-sample calibrated prediction regions for any underlying Machine Learning model, yet its status as an Uncertainty Quantification (UQ) tool has remained conceptually opaque. We adopt a category-theoretic approach to CP -- framing it as a morphism, embedded in a commuting diagram, of two newly-defined categories -- that brings us three joys. First, we show that -- under minimal assumptions -- CP is intrinsically a UQ mechanism, that is, its UQ capabilities are a structural feature of the method. Second, we demonstrate that CP bridges (and perhaps subsumes) the Bayesian, frequentist, and imprecise probabilistic approaches to predictive statistical reasoning. Finally, we show that a conformal prediction region (CPR) is the image of a covariant functor. This observation is relevant to AI privacy: It implies that privacy noise added locally does not break coverage.

cross Thousand-Brains Systems: Sensorimotor Intelligence for Rapid, Robust Learning and Inference

Authors: Niels Leadholm (Thousand Brains Project), Viviane Clay (Thousand Brains Project), Scott Knudstrup (Thousand Brains Project), Hojae Lee (Thousand Brains Project), Jeff Hawkins (Thousand Brains Project)

Abstract: Current AI systems achieve impressive performance on many tasks, yet they lack core attributes of biological intelligence, including rapid, continual learning, representations grounded in sensorimotor interactions, and structured knowledge that enables efficient generalization. Neuroscience theory suggests that mammals evolved flexible intelligence through the replication of a semi-independent, sensorimotor module, a functional unit known as a cortical column. To address the disparity between biological and artificial intelligence, thousand-brains systems were proposed as a means of mirroring the architecture of cortical columns and their interactions. In the current work, we evaluate the unique properties of Monty, the first implementation of a thousand-brains system. We focus on 3D object perception, and in particular, the combined task of object recognition and pose estimation. Utilizing the YCB dataset of household objects, we first assess Monty's use of sensorimotor learning to build structured representations, finding that these enable robust generalization. These representations include an emphasis on classifying objects by their global shape, as well as a natural ability to detect object symmetries. We then explore Monty's use of model-free and model-based policies to enable rapid inference by supporting principled movements. We find that such policies complement Monty's modular architecture, a design that can accommodate communication between modules to further accelerate inference speed via a novel `voting' algorithm. Finally, we examine Monty's use of associative, Hebbian-like binding to enable rapid, continual, and computationally efficient learning, properties that compare favorably to current deep learning architectures. While Monty is still in a nascent stage of development, these findings support thousand-brains systems as a powerful and promising new approach to AI.

cross Verification of Visual Controllers via Compositional Geometric Transformations

Authors: Alexander Estornell, Leonard Jung, Michael Everett

Abstract: Perception-based neural network controllers are increasingly used in autonomous systems that rely on visual inputs to operate in the real world. Ensuring the safety of such systems under uncertainty is challenging. Existing verification techniques typically focus on Lp-bounded perturbations in the pixel space, which fails to capture the low-dimensional structure of many real-world effects. In this work, we introduce a novel verification framework for perception-based controllers that can generate outer-approximations of reachable sets through explicitly modeling uncertain observations with geometric perturbations. Our approach constructs a boundable mapping from states to images, enabling the use of state-based verification tools while accounting for uncertainty in perception. We provide theoretical guarantees on the soundness of our method and demonstrate its effectiveness across benchmark control environments. This work provides a principled framework for certifying the safety of perception-driven control systems under realistic visual perturbations.

cross DP-Fusion: Token-Level Differentially Private Inference for Large Language Models

Authors: Rushil Thareja, Preslav Nakov, Praneeth Vepakomma, Nils Lukas

Abstract: Large language models (LLMs) can leak sensitive information from their context through generated outputs, either accidentally or when prompted adversarially. Existing defenses that aim to preserve context privacy during inference either lack formal guarantees or suffer from a poor utility/privacy trade-off. We propose DP-Fusion, a token-level Differentially Private Inference (DPI) mechanism that provably bounds how much an LLM's outputs reveal about sensitive tokens in its context. We demonstrate DPI through the task of document privatization, where the goal is to paraphrase documents so that sensitive content (e.g., Personally Identifiable Information, PII) cannot be reliably inferred, while still preserving the overall utility of the text. This is controlled by a parameter $\epsilon$: $\epsilon=0$ hides PII entirely, while higher values trade off privacy for improved paraphrase quality. DP-Fusion works as follows: (i) partition sensitive tokens into disjoint privacy groups, (ii) run the LLM once per group, and (iii) blend the output distributions so that the final output remains within a fixed statistical distance of the baseline distribution produced when no privacy group is revealed. This approach allows fine-grained control over the privacy/utility trade-off but requires multiple LLM forward passes.

cross da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs

Authors: Chang Sun, Zhiqiang Que, Vladimir Loncar, Wayne Luk, Maria Spiropulu

Abstract: Neural networks with a latency requirement on the order of microseconds, like the ones used at the CERN Large Hadron Collider, are typically deployed on FPGAs fully unrolled and pipelined. A bottleneck for the deployment of such neural networks is area utilization, which is directly related to the required constant matrix-vector multiplication (CMVM) operations. In this work, we propose an efficient algorithm for implementing CMVM operations with distributed arithmetic (DA) on FPGAs that simultaneously optimizes for area consumption and latency. The algorithm achieves resource reduction similar to state-of-the-art algorithms while being significantly faster to compute. The proposed algorithm is open-sourced and integrated into the \texttt{hls4ml} library, a free and open-source library for running real-time neural network inference on FPGAs. We show that the proposed algorithm can reduce on-chip resources by up to a third for realistic, highly quantized neural networks while simultaneously reducing latency, enabling the implementation of previously infeasible networks.

cross SPIRA: Building an Intelligent System for Respiratory Insufficiency Detection

Authors: Renato Cordeiro Ferreira (University of S\~ao Paulo), Dayanne Gomes (University of S\~ao Paulo), Vitor Tamae (University of S\~ao Paulo), Francisco Wernke (University of S\~ao Paulo), Alfredo Goldman (University of S\~ao Paulo)

Abstract: Respiratory insufficiency is a medic symptom in which a person gets a reduced amount of oxygen in the blood. This paper reports the experience of building SPIRA: an intelligent system for detecting respiratory insufficiency from voice. It compiles challenges faced in two succeeding implementations of the same architecture, summarizing lessons learned on data collection, training, and inference for future projects in similar systems.

cross Self-supervised learning of speech representations with Dutch archival data

Authors: Nik Vaessen, Roeland Ordelman, David A. van Leeuwen

Abstract: This paper explores the use of Dutch archival television broadcast data for self-supervised learning of speech foundation models, specifically wav2vec 2.0. We first study data quality assumptions for pre-training, and show how music, noise and speaker overlap affect SSL convergence and downstream fine-tuning performance. Secondly, we explore effectively pre-processing strategies to convert the noisy broadcast dataset into a qualitative dataset for pre-training, by using Whisper and WhisperX. Thirdly, we compare mono-lingual and multi-lingual pre-training with equivalent amounts of data, and show that mono-lingual pre-training is more robust to out-of-domain data. Lastly, we achieve a state-of-the-art LARGE wav2vec 2.0 model for the Dutch language, by a continuation of pre-training a wav2vec 2.0 XLS-R model checkpoint with our 55k hour archival dataset.

cross Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts

Authors: Guokan Shang, Hadi Abdine, Ahmad Chamma, Amr Mohamed, Mohamed Anwar, Abdelaziz Bounhar, Omar El Herraoui, Preslav Nakov, Michalis Vazirgiannis, Eric Xing

Abstract: We introduce Nile-Chat-4B, 3x4B-A6B, and 12B, a collection of LLMs for Egyptian dialect, uniquely designed to understand and generate texts written in both Arabic and Latin scripts. Specifically, with Nile-Chat-3x4B-A6B, we introduce a novel language adaptation approach by leveraging the Branch-Train-MiX strategy to merge script-specialized experts, into a single MoE model. Our Nile-Chat models significantly outperform leading multilingual and Arabic LLMs, such as LLaMa, Jais, and ALLaM, on our newly introduced Egyptian evaluation benchmarks, which span both understanding and generative tasks. Notably, our 12B model yields a 14.4% performance gain over Qwen2.5-14B-Instruct on Latin-script benchmarks. All our resources are publicly available. We believe this work presents a comprehensive methodology for adapting LLMs to dual-script languages, addressing an often overlooked aspect in modern LLM development.

cross Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?

Authors: Yun Qu, Qi Cheems Wang, Yixiu Mao, Vincent Tao Hu, Xiangyang Ji

Abstract: Recent advances have witnessed the effectiveness of reinforcement learning (RL) finetuning in enhancing the reasoning capabilities of large language models (LLMs). The optimization process often requires numerous iterations to achieve satisfactory performance, resulting in high computational costs due to the need for frequent prompt evaluations under intensive LLM interactions and repeated policy updates. Appropriate online prompt selection methods reduce iteration steps by prioritizing informative prompts during training, while the pipeline's reliance on exhaustive prompt evaluation and subset selection for optimization still incurs substantial computational overhead due to frequent LLM inference calls. Distinguished from these direct evaluate-then-select schemes, this work investigates iterative approximate evaluation for arbitrary prompts and introduces Model Predictive Prompt Selection (MoPPS), a Bayesian risk-predictive framework that online estimates prompt difficulty without requiring costly LLM interactions. Technically, MoPPS models each prompt's success rate as a latent variable, performs streaming Bayesian inference, and employs posterior sampling in a constructed multi-armed bandit machine, enabling sample efficient and adaptive prompt selection. Extensive experiments across mathematics, planning, and vision-based geometry tasks show that MoPPS reliably predicts prompt difficulty and accelerates training with significantly reduced LLM rollouts.

cross VaxPulse: Monitoring of Online Public Concerns to Enhance Post-licensure Vaccine Surveillance

Authors: Muhammad Javed, Sedigh Khademi, Joanne Hickman, Jim Buttery, Hazel Clothier, Gerardo Luis Dimaguila

Abstract: The recent vaccine-related infodemic has amplified public concerns, highlighting the need for proactive misinformation management. We describe how we enhanced the reporting surveillance system of Victoria's vaccine safety service, SAEFVIC, through the incorporation of new information sources for public sentiment analysis, topics of discussion, and hesitancies about vaccinations online. Using VaxPulse, a multi-step framework, we integrate adverse events following immunisation (AEFI) with sentiment analysis, demonstrating the importance of contextualising public concerns. Additionally, we emphasise the need to address non-English languages to stratify concerns across ethno-lingual communities, providing valuable insights for vaccine uptake strategies and combating mis/disinformation. The framework is applied to real-world examples and a case study on women's vaccine hesitancy, showcasing its benefits and adaptability by identifying public opinion from online media.

cross Operator-based machine learning framework for generalizable prediction of unsteady treatment dynamics in stormwater infrastructure

Authors: Mohamed Shatarah, Kai Liu, Haochen Li

Abstract: Stormwater infrastructures are decentralized urban water-management systems that face highly unsteady hydraulic and pollutant loadings from episodic rainfall-runoff events. Accurately evaluating their in-situ treatment performance is essential for cost-effective design and planning. Traditional lumped dynamic models (e.g., continuously stirred tank reactor, CSTR) are computationally efficient but oversimplify transport and reaction processes, limiting predictive accuracy and insight. Computational fluid dynamics (CFD) resolves detailed turbulent transport and pollutant fate physics but incurs prohibitive computational cost for unsteady and long-term simulations. To address these limitations, this study develops a composite operator-based neural network (CPNN) framework that leverages state-of-the-art operator learning to predict the spatial and temporal dynamics of hydraulics and particulate matter (PM) in stormwater treatment. The framework is demonstrated on a hydrodynamic separator (HS), a common urban treatment device. Results indicate that the CPNN achieves R2 > 0.8 for hydraulic predictions in 95.2% of test cases; for PM concentration predictions, R2 > 0.8 in 72.6% of cases and 0.4 < R2 < 0.8 in 22.6%. The analysis identifies challenges in capturing dynamics under extreme low-flow conditions, owing to their lower contribution to the training loss. Exploiting the automatic-differentiation capability of the CPNN, sensitivity analyses quantify the influence of storm event loading on PM transport. Finally, the potential of the CPNN framework for continuous, long-term evaluation of stormwater infrastructure performance is discussed, marking a step toward robust, climate-aware planning and implementation.

cross Geometric-Guided Few-Shot Dental Landmark Detection with Human-Centric Foundation Model

Authors: Anbang Wang, Marawan Elbatel, Keyuan Liu, Lizhuo Lin, Meng Lan, Yanqi Yang, Xiaomeng Li

Abstract: Accurate detection of anatomic landmarks is essential for assessing alveolar bone and root conditions, thereby optimizing clinical outcomes in orthodontics, periodontics, and implant dentistry. Manual annotation of landmarks on cone-beam computed tomography (CBCT) by dentists is time-consuming, labor-intensive, and subject to inter-observer variability. Deep learning-based automated methods present a promising approach to streamline this process efficiently. However, the scarcity of training data and the high cost of expert annotations hinder the adoption of conventional deep learning techniques. To overcome these challenges, we introduce GeoSapiens, a novel few-shot learning framework designed for robust dental landmark detection using limited annotated CBCT of anterior teeth. Our GeoSapiens framework comprises two key components: (1) a robust baseline adapted from Sapiens, a foundational model that has achieved state-of-the-art performance in human-centric vision tasks, and (2) a novel geometric loss function that improves the model's capacity to capture critical geometric relationships among anatomical structures. Experiments conducted on our collected dataset of anterior teeth landmarks revealed that GeoSapiens surpassed existing landmark detection methods, outperforming the leading approach by an 8.18% higher success detection rate at a strict 0.5 mm threshold-a standard widely recognized in dental diagnostics. Code is available at: https://github.com/xmed-lab/GeoSapiens.

URLs: https://github.com/xmed-lab/GeoSapiens.

cross Mutual Information Optimal Control of Discrete-Time Linear Systems

Authors: Shoju Enami, Kenji Kashima

Abstract: In this paper, we formulate a mutual information optimal control problem (MIOCP) for discrete-time linear systems. This problem can be regarded as an extension of a maximum entropy optimal control problem (MEOCP). Differently from the MEOCP where the prior is fixed to the uniform distribution, the MIOCP optimizes the policy and prior simultaneously. As analytical results, under the policy and prior classes consisting of Gaussian distributions, we derive the optimal policy and prior of the MIOCP with the prior and policy fixed, respectively. Using the results, we propose an alternating minimization algorithm for the MIOCP. Through numerical experiments, we discuss how our proposed algorithm works.

cross Optimal Model Selection for Conformalized Robust Optimization

Authors: Yajie Bao, Yang Hu, Haojie Ren, Peng Zhao, Changliang Zou

Abstract: In decision-making under uncertainty, Contextual Robust Optimization (CRO) provides reliability by minimizing the worst-case decision loss over a prediction set, hedging against label variability. While recent advances use conformal prediction to construct prediction sets for machine learning models, the downstream decisions critically depend on model selection. This paper introduces novel model selection frameworks for CRO that unify robustness control with decision risk minimization. We first propose Conformalized Robust Optimization with Model Selection (CROMS), which automatically selects models to approximately minimize the average decision risk in CRO solutions. We develop two algorithms: E-CROMS, which is computationally efficient, and F-CROMS, which enjoys a marginal robustness guarantee in finite samples. Further, we introduce Conformalized Robust Optimization with Individualized Model Selection (CROiMS), which performs individualized model selection by minimizing the conditional decision risk given the covariate of test data. This framework advances conformal prediction methodology by enabling covariate-aware model selection. Theoretically, CROiMS achieves asymptotic conditional robustness and decision efficiency under mild assumptions. Numerical results demonstrate significant improvements in decision efficiency and robustness across diverse synthetic and real-world applications, outperforming baseline approaches.

cross Advocate for Complete Benchmarks for Formal Reasoning with Formal/Informal Statements and Formal/Informal Proofs

Authors: Roozbeh Yousefzadeh, Xuenan Cao

Abstract: This position paper provides a critical but constructive discussion of current practices in benchmarking and evaluative practices in the field of formal reasoning and automated theorem proving. We take the position that open code, open data, and benchmarks that are complete and error-free will accelerate progress in this field. We identify practices that create barriers to contributing to this field and suggest ways to remove them. We also discuss some of the practices that might produce misleading evaluative information. We aim to create discussions that bring together people from various groups contributing to automated theorem proving, autoformalization, and informal reasoning.

cross CueLearner: Bootstrapping and local policy adaptation from relative feedback

Authors: Giulio Schiavi, Andrei Cramariuc, Lionel Ott, Roland Siegwart

Abstract: Human guidance has emerged as a powerful tool for enhancing reinforcement learning (RL). However, conventional forms of guidance such as demonstrations or binary scalar feedback can be challenging to collect or have low information content, motivating the exploration of other forms of human input. Among these, relative feedback (i.e., feedback on how to improve an action, such as "more to the left") offers a good balance between usability and information richness. Previous research has shown that relative feedback can be used to enhance policy search methods. However, these efforts have been limited to specific policy classes and use feedback inefficiently. In this work, we introduce a novel method to learn from relative feedback and combine it with off-policy reinforcement learning. Through evaluations on two sparse-reward tasks, we demonstrate our method can be used to improve the sample efficiency of reinforcement learning by guiding its exploration process. Additionally, we show it can adapt a policy to changes in the environment or the user's preferences. Finally, we demonstrate real-world applicability by employing our approach to learn a navigation policy in a sparse reward setting.

cross Activation Steering for Chain-of-Thought Compression

Authors: Seyedarmin Azizi, Erfan Baghaei Potraghloo, Massoud Pedram

Abstract: Large language models (LLMs) excel at complex reasoning when they include intermediate steps, known as "chains of thought" (CoTs). However, these rationales are often overly verbose, even for simple problems, leading to wasted context, increased latency, and higher energy consumption. We observe that verbose, English-heavy CoTs and concise, math-centric CoTs occupy distinct regions in the model's residual-stream activation space. By extracting and injecting a "steering vector" to transition between these modes, we can reliably shift generation toward more concise reasoning, effectively compressing CoTs without retraining. We formalize this approach as Activation-Steered Compression (ASC), an inference-time technique that shortens reasoning traces by directly modifying hidden representations. In addition, we provide a theoretical analysis of the impact of ASC on the output distribution, derived from a closed-form KL-divergence-bounded constraint to regulate steering strength. Using only 100 paired verbose and concise examples, ASC achieves up to 67.43% reduction in CoT length on MATH500 and GSM8K datasets, while maintaining accuracy across 7B, 8B, and 32B parameter models. As a training-free method, ASC introduces negligible runtime overhead and, on MATH500, delivers an average 2.73x speedup in end-to-end reasoning wall-clock time on an 8B model. This makes ASC a practical and efficient tool for streamlining the deployment of reasoning-capable LLMs in latency- or cost-sensitive settings. The code is available at: https://github.com/ArminAzizi98/ASC

URLs: https://github.com/ArminAzizi98/ASC

cross Intervening to learn and compose disentangled representations

Authors: Alex Markham, Jeri A. Chang, Isaac Hirsch, Liam Solus, Bryon Aragam

Abstract: In designing generative models, it is commonly believed that in order to learn useful latent structure, we face a fundamental tension between expressivity and structure. In this paper we challenge this view by proposing a new approach to training arbitrarily expressive generative models that simultaneously learn disentangled latent structure. This is accomplished by adding a simple decoder-only module to the head of an existing decoder block that can be arbitrarily complex. The module learns to process concept information by implicitly inverting linear representations from an encoder. Inspired by the notion of intervention in causal graphical models, our module selectively modifies its architecture during training, allowing it to learn a compact joint model over different contexts. We show how adding this module leads to disentangled representations that can be composed for out-of-distribution generation. To further validate our proposed approach, we prove a new identifiability result that extends existing work on identifying structured representations in nonlinear models.

cross Efficient Unlearning with Privacy Guarantees

Authors: Josep Domingo-Ferrer, Najeeb Jebreel, David S\'anchez

Abstract: Privacy protection laws, such as the GDPR, grant individuals the right to request the forgetting of their personal data not only from databases but also from machine learning (ML) models trained on them. Machine unlearning has emerged as a practical means to facilitate model forgetting of data instances seen during training. Although some existing machine unlearning methods guarantee exact forgetting, they are typically costly in computational terms. On the other hand, more affordable methods do not offer forgetting guarantees and are applicable only to specific ML models. In this paper, we present \emph{efficient unlearning with privacy guarantees} (EUPG), a novel machine unlearning framework that offers formal privacy guarantees to individuals whose data are being unlearned. EUPG involves pre-training ML models on data protected using privacy models, and it enables {\em efficient unlearning with the privacy guarantees offered by the privacy models in use}. Through empirical evaluation on four heterogeneous data sets protected with $k$-anonymity and $\epsilon$-differential privacy as privacy models, our approach demonstrates utility and forgetting effectiveness comparable to those of exact unlearning methods, while significantly reducing computational and storage costs. Our code is available at https://github.com/najeebjebreel/EUPG.

URLs: https://github.com/najeebjebreel/EUPG.

cross Improving BERT for Symbolic Music Understanding Using Token Denoising and Pianoroll Prediction

Authors: Jun-You Wang, Li Su

Abstract: We propose a pre-trained BERT-like model for symbolic music understanding that achieves competitive performance across a wide range of downstream tasks. To achieve this target, we design two novel pre-training objectives, namely token correction and pianoroll prediction. First, we sample a portion of note tokens and corrupt them with a limited amount of noise, and then train the model to denoise the corrupted tokens; second, we also train the model to predict bar-level and local pianoroll-derived representations from the corrupted note tokens. We argue that these objectives guide the model to better learn specific musical knowledge such as pitch intervals. For evaluation, we propose a benchmark that incorporates 12 downstream tasks ranging from chord estimation to symbolic genre classification. Results confirm the effectiveness of the proposed pre-training objectives on downstream tasks.

cross Sure Convergence and Constructive Universal Approximation for Multi-Layer Neural Networks

Authors: Chien-Ming Chi

Abstract: We propose a new neural network model, 01Neuro, built on indicator activation neurons. Its boosted variant possesses two key statistical properties: (1) Sure Convergence, where model optimization can be achieved with high probability given sufficient computational resources; and (2) Constructive Universal Approximation: In the infinite sample setting, the model can approximate any finite sum of measurable functions, each depending on only k out of p input features, provided the architecture is properly tuned. Unlike most universal approximation results that are agnostic to training procedures, our guarantees are directly tied to the model's explicit construction and optimization algorithm. To improve prediction stability, we integrate stochastic training and bagging into the boosted 01Neuro framework. Empirical evaluations on simulated and real-world tabular datasets with small to medium sample sizes highlight its strengths: effective approximation of interaction components (multiplicative terms), stable prediction performance (comparable to Random Forests), robustness to many noisy features, and insensitivity to feature scaling. A major limitation of the current implementation of boosted 01Neuro is its higher computational cost, which is approximately 5 to 30 times that of Random Forests and XGBoost.

cross Reason to Rote: Rethinking Memorization in Reasoning

Authors: Yupei Du, Philipp Mondorf, Silvia Casola, Yuekun Yao, Robert Litschko, Barbara Plank

Abstract: Large language models readily memorize arbitrary training instances, such as label noise, yet they perform strikingly well on reasoning tasks. In this work, we investigate how language models memorize label noise, and why such memorization in many cases does not heavily affect generalizable reasoning capabilities. Using two controllable synthetic reasoning datasets with noisy labels, four-digit addition (FDA) and two-hop relational reasoning (THR), we discover a reliance of memorization on generalizable reasoning mechanisms: models continue to compute intermediate reasoning outputs even when retrieving memorized noisy labels, and intervening reasoning adversely affects memorization. We further show that memorization operates through distributed encoding, i.e., aggregating various inputs and intermediate results, rather than building a look-up mechanism from inputs to noisy labels. Moreover, our FDA case study reveals memorization occurs via outlier heuristics, where existing neuron activation patterns are slightly shifted to fit noisy labels. Together, our findings suggest that memorization of label noise in language models builds on, rather than overrides, the underlying reasoning mechanisms, shedding lights on the intriguing phenomenon of benign memorization.

cross Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning

Authors: Giwon Lee, Wooseong Jeong, Daehee Park, Jaewoo Jeong, Kuk-Jin Yoon

Abstract: Motion planning is a crucial component of autonomous robot driving. While various trajectory datasets exist, effectively utilizing them for a target domain remains challenging due to differences in agent interactions and environmental characteristics. Conventional approaches, such as domain adaptation or ensemble learning, leverage multiple source datasets but suffer from domain imbalance, catastrophic forgetting, and high computational costs. To address these challenges, we propose Interaction-Merged Motion Planning (IMMP), a novel approach that leverages parameter checkpoints trained on different domains during adaptation to the target domain. IMMP follows a two-step process: pre-merging to capture agent behaviors and interactions, sufficiently extracting diverse information from the source domain, followed by merging to construct an adaptable model that efficiently transfers diverse interactions to the target domain. Our method is evaluated on various planning benchmarks and models, demonstrating superior performance compared to conventional approaches.

cross Interpretable Machine Learning for Urban Heat Mitigation: Attribution and Weighting of Multi-Scale Drivers

Authors: David Tschan, Zhi Wang, Jan Carmeliet, Yongling Zhao

Abstract: Urban heat islands (UHIs) are often accentuated during heat waves (HWs) and pose a public health risk. Mitigating UHIs requires urban planners to first estimate how urban heat is influenced by different land use types (LUTs) and drivers across scales - from synoptic-scale climatic background processes to small-scale urban- and scale-bridging features. This study proposes to classify these drivers into driving (D), urban (U), and local (L) features, respectively. To increase interpretability and enhance computation efficiency, a LUT-distinguishing machine learning approach is proposed as a fast emulator for Weather Research and Forecasting model coupled to a Single-Layer Urban Canopy Model (WRF-SLUCM) to predict ground- (TSK) and 2-meter air temperature (T2). Using random forests (RFs) with extreme gradient boosting (XGB) trained on WRF-SLUCM output over Zurich, Switzerland, during heatwave (HW) periods in 2017 and 2019, this study proposes LUT-based (LB) models that categorize features by scales and practical controllability, allowing optional categorical weighting. This approach enables category-specific feature ranking and sensitivity estimation of T2 and TSK to most important small-scale drivers - most notably surface emissivity, albedo, and leaf area index (LAI). Models employing the LB framework are statistically significantly more accurate than models that do not, with higher performance when more HW data is included in training. With RF-XGB robustly performing optimal with unit weights, the method substantially increase interpretability. Despite the needs to reduce statistical uncertainties and testing the method on other cities, the proposed approach offers urban planners a direct framework for feasibility-centered UHI mitigation assessment.

cross Kalman Filter Aided Federated Koopman Learning

Authors: Yutao Chen, Wei Chen

Abstract: Real-time control and estimation are pivotal for applications such as industrial automation and future healthcare. The realization of this vision relies heavily on efficient interactions with nonlinear systems. Therefore, Koopman learning, which leverages the power of deep learning to linearize nonlinear systems, has been one of the most successful examples of mitigating the complexity inherent in nonlinearity. However, the existing literature assumes access to accurate system states and abundant high-quality data for Koopman analysis, which is usually impractical in real-world scenarios. To fill this void, this paper considers the case where only observations of the system are available and where the observation data is insufficient to accomplish an independent Koopman analysis. To this end, we propose Kalman Filter aided Federated Koopman Learning (KF-FedKL), which pioneers the combination of Kalman filtering and federated learning with Koopman analysis. By doing so, we can achieve collaborative linearization with privacy guarantees. Specifically, we employ a straightforward yet efficient loss function to drive the training of a deep Koopman network for linearization. To obtain system information devoid of individual information from observation data, we leverage the unscented Kalman filter and the unscented Rauch-Tung-Striebel smoother. To achieve collaboration between clients, we adopt the federated learning framework and develop a modified FedAvg algorithm to orchestrate the collaboration. A convergence analysis of the proposed framework is also presented. Finally, through extensive numerical simulations, we showcase the performance of KF-FedKL under various situations.

cross UDF-GMA: Uncertainty Disentanglement and Fusion for General Movement Assessment

Authors: Zeqi Luo, Ali Gooya, Edmond S. L. Ho

Abstract: General movement assessment (GMA) is a non-invasive tool for the early detection of brain dysfunction through the qualitative assessment of general movements, and the development of automated methods can broaden its application. However, mainstream pose-based automated GMA methods are prone to uncertainty due to limited high-quality data and noisy pose estimation, hindering clinical reliability without reliable uncertainty measures. In this work, we introduce UDF-GMA which explicitly models epistemic uncertainty in model parameters and aleatoric uncertainty from data noise for pose-based automated GMA. UDF-GMA effectively disentangles uncertainties by directly modelling aleatoric uncertainty and estimating epistemic uncertainty through Bayesian approximation. We further propose fusing these uncertainties with the embedded motion representation to enhance class separation. Extensive experiments on the Pmi-GMA benchmark dataset demonstrate the effectiveness and generalisability of the proposed approach in predicting poor repertoire.

cross Spatial and Semantic Embedding Integration for Stereo Sound Event Localization and Detection in Regular Videos

Authors: Davide Berghi, Philip J. B. Jackson

Abstract: This report presents our systems submitted to the audio-only and audio-visual tracks of the DCASE2025 Task 3 Challenge: Stereo Sound Event Localization and Detection (SELD) in Regular Video Content. SELD is a complex task that combines temporal event classification with spatial localization, requiring reasoning across spatial, temporal, and semantic dimensions. The last is arguably the most challenging to model. Traditional SELD architectures rely on multichannel input, which limits their ability to leverage large-scale pre-training due to data constraints. To address this, we enhance standard SELD architectures with semantic information by integrating pre-trained, contrastive language-aligned models: CLAP for audio and OWL-ViT for visual inputs. These embeddings are incorporated into a modified Conformer module tailored for multimodal fusion, which we refer to as the Cross-Modal Conformer. Additionally, we incorporate autocorrelation-based acoustic features to improve distance estimation. We pre-train our models on curated synthetic audio and audio-visual datasets and apply a left-right channel swapping augmentation to further increase the training data. Both our audio-only and audio-visual systems substantially outperform the challenge baselines on the development set, demonstrating the effectiveness of our strategy. Performance is further improved through model ensembling and a visual post-processing step based on human keypoints. Future work will investigate the contribution of each modality and explore architectural variants to further enhance results.

cross Towards Human-in-the-Loop Onset Detection: A Transfer Learning Approach for Maracatu

Authors: Ant\'onio S\'a Pinto

Abstract: We explore transfer learning strategies for musical onset detection in the Afro-Brazilian Maracatu tradition, which features complex rhythmic patterns that challenge conventional models. We adapt two Temporal Convolutional Network architectures: one pre-trained for onset detection (intra-task) and another for beat tracking (inter-task). Using only 5-second annotated snippets per instrument, we fine-tune these models through layer-wise retraining strategies for five traditional percussion instruments. Our results demonstrate significant improvements over baseline performance, with F1 scores reaching up to 0.998 in the intra-task setting and improvements of over 50 percentage points in best-case scenarios. The cross-task adaptation proves particularly effective for time-keeping instruments, where onsets naturally align with beat positions. The optimal fine-tuning configuration varies by instrument, highlighting the importance of instrument-specific adaptation strategies. This approach addresses the challenges of underrepresented musical traditions, offering an efficient human-in-the-loop methodology that minimizes annotation effort while maximizing performance. Our findings contribute to more inclusive music information retrieval tools applicable beyond Western musical contexts.

cross Music Boomerang: Reusing Diffusion Models for Data Augmentation and Audio Manipulation

Authors: Alexander Fichtinger, Jan Schl\"uter, Gerhard Widmer

Abstract: Generative models of music audio are typically used to generate output based solely on a text prompt or melody. Boomerang sampling, recently proposed for the image domain, allows generating output close to an existing example, using any pretrained diffusion model. In this work, we explore its application in the audio domain as a tool for data augmentation or content manipulation. Specifically, implementing Boomerang sampling for Stable Audio Open, we augment training data for a state-of-the-art beat tracker, and attempt to replace musical instruments in recordings. Our results show that the rhythmic structure of existing examples is mostly preserved, that it improves performance of the beat tracker, but only in scenarios of limited training data, and that it can accomplish text-based instrument replacement on monophonic inputs. We publish our implementation to invite experiments on data augmentation in other tasks and explore further applications.

cross Adaptive Slimming for Scalable and Efficient Speech Enhancement

Authors: Riccardo Miccini, Minje Kim, Cl\'ement Laroche, Luca Pezzarossa, Paris Smaragdis

Abstract: Speech enhancement (SE) enables robust speech recognition, real-time communication, hearing aids, and other applications where speech quality is crucial. However, deploying such systems on resource-constrained devices involves choosing a static trade-off between performance and computational efficiency. In this paper, we introduce dynamic slimming to DEMUCS, a popular SE architecture, making it scalable and input-adaptive. Slimming lets the model operate at different utilization factors (UF), each corresponding to a different performance/efficiency trade-off, effectively mimicking multiple model sizes without the extra storage costs. In addition, a router subnet, trained end-to-end with the backbone, determines the optimal UF for the current input. Thus, the system saves resources by adaptively selecting smaller UFs when additional complexity is unnecessary. We show that our solution is Pareto-optimal against individual UFs, confirming the benefits of dynamic routing. When training the proposed dynamically-slimmable model to use 10% of its capacity on average, we obtain the same or better speech quality as the equivalent static 25% utilization while reducing MACs by 29%.

cross When do World Models Successfully Learn Dynamical Systems?

Authors: Edmund Ross, Claudia Drygala, Leonhard Schwarz, Samir Kaiser, Francesca di Mare, Tobias Breiten, Hanno Gottschalk

Abstract: In this work, we explore the use of compact latent representations with learned time dynamics ('World Models') to simulate physical systems. Drawing on concepts from control theory, we propose a theoretical framework that explains why projecting time slices into a low-dimensional space and then concatenating to form a history ('Tokenization') is so effective at learning physics datasets, and characterise when exactly the underlying dynamics admit a reconstruction mapping from the history of previous tokenized frames to the next. To validate these claims, we develop a sequence of models with increasing complexity, starting with least-squares regression and progressing through simple linear layers, shallow adversarial learners, and ultimately full-scale generative adversarial networks (GANs). We evaluate these models on a variety of datasets, including modified forms of the heat and wave equations, the chaotic regime 2D Kuramoto-Sivashinsky equation, and a challenging computational fluid dynamics (CFD) dataset of a 2D K\'arm\'an vortex street around a fixed cylinder, where our model is successfully able to recreate the flow.

cross The Case for Instance-Optimized LLMs in OLAP Databases

Authors: Bardia Mohammadi, Laurent Bindschaedler

Abstract: Large Language Models (LLMs) can enhance analytics systems with powerful data summarization, cleaning, and semantic transformation capabilities. However, deploying LLMs at scale -- processing millions to billions of rows -- remains prohibitively expensive in computation and memory. We present IOLM-DB, a novel system that makes LLM-enhanced database queries practical through query-specific model optimization. Instead of using general-purpose LLMs, IOLM-DB generates lightweight, specialized models tailored to each query's specific needs using representative data samples. IOLM-DB reduces model footprints by up to 76% and increases throughput by up to 3.31$\times$ while maintaining accuracy through aggressive compression techniques, including quantization, sparsification, and structural pruning. We further show how our approach enables higher parallelism on existing hardware and seamlessly supports caching and batching strategies to reduce overheads. Our prototype demonstrates that leveraging LLM queries inside analytics systems is feasible at scale, opening new possibilities for future OLAP applications.

cross A Generative Diffusion Model for Amorphous Materials

Authors: Kai Yang, Daniel Schwalbe-Koda

Abstract: Generative models show great promise for the inverse design of molecules and inorganic crystals, but remain largely ineffective within more complex structures such as amorphous materials. Here, we present a diffusion model that reliably generates amorphous structures up to 1000 times faster than conventional simulations across processing conditions, compositions, and data sources. Generated structures recovered the short- and medium-range order, sampling diversity, and macroscopic properties of silica glass, as validated by simulations and an information-theoretical strategy. Conditional generation allowed sampling large structures at low cooling rates of 10$^{-2}$ K/ps to uncover a ductile-to-brittle transition and mesoporous silica structures. Extension to metallic glassy systems accurately reproduced local structures and properties from both computational and experimental datasets, demonstrating how synthetic data can be generated from characterization results. Our methods provide a roadmap for the design and simulation of amorphous materials previously inaccessible to computational methods.

cross A COMPASS to Model Comparison and Simulation-Based Inference in Galactic Chemical Evolution

Authors: Berkay Gunes, Sven Buder, Tobias Buck

Abstract: We present COMPASS, a novel simulation-based inference framework that combines score-based diffusion models with transformer architectures to jointly perform parameter estimation and Bayesian model comparison across competing Galactic Chemical Evolution (GCE) models. COMPASS handles high-dimensional, incomplete, and variable-size stellar abundance datasets. Applied to high-precision elemental abundance measurements, COMPASS evaluates 40 combinations of nucleosynthetic yield tables. The model strongly favours Asymptotic Giant Branch yields from NuGrid and core-collapse SN yields used in the IllustrisTNG simulation, achieving near-unity cumulative posterior probability. Using the preferred model, we infer a steep high-mass IMF slope and an elevated Supernova Ia normalization, consistent with prior solar neighbourhood studies but now derived from fully amortized Bayesian inference. Our results demonstrate that modern SBI methods can robustly constrain uncertain physics in astrophysical simulators and enable principled model selection when analysing complex, simulation-based data.

cross AI-Driven Cytomorphology Image Synthesis for Medical Diagnostics

Authors: Jan Carreras Boada, Rao Muhammad Umer, Carsten Marr

Abstract: Biomedical datasets often contain a large sample imbalance and are subject to strict privacy constraints, which together hinder the development of accurate machine learning models. One potential solution is to generate synthetic images, as this can improve data availability while preserving patient privacy. However, it remains difficult to generate synthetic images of sufficient quality for training robust classifiers. In this work, we focus on the classification of single white blood cells, a key component in the diagnosis of hematological diseases such as acute myeloid leukemia (AML), a severe blood cancer. We demonstrate how synthetic images generated with a fine-tuned stable diffusion model using LoRA weights when guided by real few-shot samples of the target white blood cell classes, can enhance classifier performance for limited data. When training a ResNet classifier, accuracy increased from 27.3\% to 78.4\% (+51.1\%) by adding 5000 synthetic images per class to a small and highly imbalanced real dataset. For a CLIP-based classifier, the accuracy improved from 61.8\% to 76.8\% (+15.0\%). The synthetic images are highly similar to real images, and they can help overcome dataset limitations, enhancing model generalization. Our results establish synthetic images as a tool in biomedical research, improving machine learning models, and facilitating medical diagnosis and research.

cross Vecchia-Inducing-Points Full-Scale Approximations for Gaussian Processes

Authors: Tim Gyger, Reinhard Furrer, Fabio Sigrist

Abstract: Gaussian processes are flexible, probabilistic, non-parametric models widely used in machine learning and statistics. However, their scalability to large data sets is limited by computational constraints. To overcome these challenges, we propose Vecchia-inducing-points full-scale (VIF) approximations combining the strengths of global inducing points and local Vecchia approximations. Vecchia approximations excel in settings with low-dimensional inputs and moderately smooth covariance functions, while inducing point methods are better suited to high-dimensional inputs and smoother covariance functions. Our VIF approach bridges these two regimes by using an efficient correlation-based neighbor-finding strategy for the Vecchia approximation of the residual process, implemented via a modified cover tree algorithm. We further extend our framework to non-Gaussian likelihoods by introducing iterative methods that substantially reduce computational costs for training and prediction by several orders of magnitudes compared to Cholesky-based computations when using a Laplace approximation. In particular, we propose and compare novel preconditioners and provide theoretical convergence results. Extensive numerical experiments on simulated and real-world data sets show that VIF approximations are both computationally efficient as well as more accurate and numerically stable than state-of-the-art alternatives. All methods are implemented in the open source C++ library GPBoost with high-level Python and R interfaces.

cross Beyond Features: How Dataset Design Influences Multi-Agent Trajectory Prediction Performance

Authors: Tobias Demmler, Jakob H\"aringer, Andreas Tamke, Thao Dang, Alexander Hegai, Lars Mikelsons

Abstract: Accurate trajectory prediction is critical for safe autonomous navigation, yet the impact of dataset design on model performance remains understudied. This work systematically examines how feature selection, cross-dataset transfer, and geographic diversity influence trajectory prediction accuracy in multi-agent settings. We evaluate a state-of-the-art model using our novel L4 Motion Forecasting dataset based on our own data recordings in Germany and the US. This includes enhanced map and agent features. We compare our dataset to the US-centric Argoverse 2 benchmark. First, we find that incorporating supplementary map and agent features unique to our dataset, yields no measurable improvement over baseline features, demonstrating that modern architectures do not need extensive feature sets for optimal performance. The limited features of public datasets are sufficient to capture convoluted interactions without added complexity. Second, we perform cross-dataset experiments to evaluate how effective domain knowledge can be transferred between datasets. Third, we group our dataset by country and check the knowledge transfer between different driving cultures.

cross CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation

Authors: Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang

Abstract: Deep Neural Networks (DNNs) are susceptible to backdoor attacks, where adversaries poison training data to implant backdoor into the victim model. Current backdoor defenses on poisoned data often suffer from high computational costs or low effectiveness against advanced attacks like clean-label and clean-image backdoors. To address them, we introduce CLIP-Guided backdoor Defense (CGD), an efficient and effective method that mitigates various backdoor attacks. CGD utilizes a publicly accessible CLIP model to identify inputs that are likely to be clean or poisoned. It then retrains the model with these inputs, using CLIP's logits as a guidance to effectively neutralize the backdoor. Experiments on 4 datasets and 11 attack types demonstrate that CGD reduces attack success rates (ASRs) to below 1% while maintaining clean accuracy (CA) with a maximum drop of only 0.3%, outperforming existing defenses. Additionally, we show that clean-data-based defenses can be adapted to poisoned data using CGD. Also, CGD exhibits strong robustness, maintaining low ASRs even when employing a weaker CLIP model or when CLIP itself is compromised by a backdoor. These findings underscore CGD's exceptional efficiency, effectiveness, and applicability for real-world backdoor defense scenarios. Code: https://github.com/binyxu/CGD.

URLs: https://github.com/binyxu/CGD.

cross LVM4CSI: Enabling Direct Application of Pre-Trained Large Vision Models for Wireless Channel Tasks

Authors: Jiajia Guo, Peiwen Jiang, Chao-Kai Wen, Shi Jin, Jun Zhang

Abstract: Accurate channel state information (CSI) is critical to the performance of wireless communication systems, especially with the increasing scale and complexity introduced by 5G and future 6G technologies. While artificial intelligence (AI) offers a promising approach to CSI acquisition and utilization, existing methods largely depend on task-specific neural networks (NNs) that require expert-driven design and large training datasets, limiting their generalizability and practicality. To address these challenges, we propose LVM4CSI, a general and efficient framework that leverages the structural similarity between CSI and computer vision (CV) data to directly apply large vision models (LVMs) pre-trained on extensive CV datasets to wireless tasks without any fine-tuning, in contrast to large language model-based methods that generally necessitate fine-tuning. LVM4CSI maps CSI tasks to analogous CV tasks, transforms complex-valued CSI into visual formats compatible with LVMs, and integrates lightweight trainable layers to adapt extracted features to specific communication objectives. We validate LVM4CSI through three representative case studies, including channel estimation, human activity recognition, and user localization. Results demonstrate that LVM4CSI achieves comparable or superior performance to task-specific NNs, including an improvement exceeding 9.61 dB in channel estimation and approximately 40% reduction in localization error. Furthermore, it significantly reduces the number of trainable parameters and eliminates the need for task-specific NN design.

cross SMART: Simulated Students Aligned with Item Response Theory for Question Difficulty Prediction

Authors: Alexander Scarlatos, Nigel Fernandez, Christopher Ormerod, Susan Lottridge, Andrew Lan

Abstract: Item (question) difficulties play a crucial role in educational assessments, enabling accurate and efficient assessment of student abilities and personalization to maximize learning outcomes. Traditionally, estimating item difficulties can be costly, requiring real students to respond to items, followed by fitting an item response theory (IRT) model to get item difficulty estimates. This approach cannot be applied to the cold-start setting for previously unseen items either. In this work, we present SMART (Simulated Students Aligned with IRT), a novel method for aligning simulated students with instructed ability, which can then be used in simulations to predict the difficulty of open-ended items. We achieve this alignment using direct preference optimization (DPO), where we form preference pairs based on how likely responses are under a ground-truth IRT model. We perform a simulation by generating thousands of responses, evaluating them with an LLM-based scoring model, and fit the resulting data to an IRT model to obtain item difficulty estimates. Through extensive experiments on a real-world student response dataset, we show that SMART outperforms other item difficulty prediction methods by leveraging its improved ability alignment.

cross VERITAS: Verification and Explanation of Realness in Images for Transparency in AI Systems

Authors: Aadi Srivastava, Vignesh Natarajkumar, Utkarsh Bheemanaboyna, Devisree Akashapu, Nagraj Gaonkar, Archit Joshi

Abstract: The widespread and rapid adoption of AI-generated content, created by models such as Generative Adversarial Networks (GANs) and Diffusion Models, has revolutionized the digital media landscape by allowing efficient and creative content generation. However, these models also blur the difference between real images and AI-generated synthetic images, raising concerns regarding content authenticity and integrity. While many existing solutions to detect fake images focus solely on classification and higher-resolution images, they often lack transparency in their decision-making, making it difficult for users to understand why an image is classified as fake. In this paper, we present VERITAS, a comprehensive framework that not only accurately detects whether a small (32x32) image is AI-generated but also explains why it was classified that way through artifact localization and semantic reasoning. VERITAS produces human-readable explanations that describe key artifacts in synthetic images. We show that this architecture offers clear explanations of the basis of zero-shot synthetic image detection tasks. Code and relevant prompts can be found at https://github.com/V-i-g-n-e-s-h-N/VERITAS .

URLs: https://github.com/V-i-g-n-e-s-h-N/VERITAS

cross Pseudo-likelihood produces associative memories able to generalize, even for asymmetric couplings

Authors: Francesco D'Amico, Dario Bocchi, Luca Maria Del Bono, Saverio Rossi, Matteo Negri

Abstract: Energy-based probabilistic models learned by maximizing the likelihood of the data are limited by the intractability of the partition function. A widely used workaround is to maximize the pseudo-likelihood, which replaces the global normalization with tractable local normalizations. Here we show that, in the zero-temperature limit, a network trained to maximize pseudo-likelihood naturally implements an associative memory: if the training set is small, patterns become fixed-point attractors whose basins of attraction exceed those of any classical Hopfield rule. We explain quantitatively this effect on uncorrelated random patterns. Moreover, we show that, for different structured datasets coming from computer science (random feature model, MNIST), physics (spin glasses) and biology (proteins), as the number of training examples increases the learned network goes beyond memorization, developing meaningful attractors with non-trivial correlations with test examples, thus showing the ability to generalize. Our results therefore reveal pseudo-likelihood works both as an efficient inference tool and as a principled mechanism for memory and generalization.

cross OGF: An Online Gradient Flow Method for Optimizing the Statistical Steady-State Time Averages of Unsteady Turbulent Flows

Authors: Tom Hickling, Jonathan F. MacArt, Justin Sirignano, Den Waidmann

Abstract: Turbulent flows are chaotic and unsteady, but their statistical distribution converges to a statistical steady state. Engineering quantities of interest typically take the form of time-average statistics such as $ \frac{1}{t} \int_0^t f ( u(x,\tau; \theta) ) d\tau \overset{t \rightarrow \infty}{\rightarrow} F(x; \theta)$, where $u(x,t; \theta)$ are solutions of the Navier--Stokes equations with parameters $\theta$. Optimizing over $F(x; \theta)$ has many engineering applications including geometric optimization, flow control, and closure modeling. However, this remains an open challenge, as existing computational approaches are incapable of scaling to physically representative numbers of grid points. The fundamental obstacle is the chaoticity of turbulent flows: gradients calculated with the adjoint method diverge exponentially as $t \rightarrow \infty$. We develop a new online gradient-flow (OGF) method that is scalable to large degree-of-freedom systems and enables optimizing for the steady-state statistics of chaotic, unsteady, turbulence-resolving simulations. The method forward-propagates an online estimate for the gradient of $F(x; \theta)$ while simultaneously performing online updates of the parameters $\theta$. A key feature is the fully online nature of the algorithm to facilitate faster optimization progress and its combination with a finite-difference estimator to avoid the divergence of gradients due to chaoticity. The proposed OGF method is demonstrated for optimizations over three chaotic ordinary and partial differential equations: the Lorenz-63 equation, the Kuramoto--Sivashinsky equation, and Navier--Stokes solutions of compressible, forced, homogeneous isotropic turbulence. In each case, the OGF method successfully reduces the loss based on $F(x; \theta)$ by several orders of magnitude and accurately recovers the optimal parameters.

cross A Dynamical Systems Perspective on the Analysis of Neural Networks

Authors: Dennis Chemnitz, Maximilian Engel, Christian Kuehn, Sara-Viola Kuntz

Abstract: In this chapter, we utilize dynamical systems to analyze several aspects of machine learning algorithms. As an expository contribution we demonstrate how to re-formulate a wide variety of challenges from deep neural networks, (stochastic) gradient descent, and related topics into dynamical statements. We also tackle three concrete challenges. First, we consider the process of information propagation through a neural network, i.e., we study the input-output map for different architectures. We explain the universal embedding property for augmented neural ODEs representing arbitrary functions of given regularity, the classification of multilayer perceptrons and neural ODEs in terms of suitable function classes, and the memory-dependence in neural delay equations. Second, we consider the training aspect of neural networks dynamically. We describe a dynamical systems perspective on gradient descent and study stability for overdetermined problems. We then extend this analysis to the overparameterized setting and describe the edge of stability phenomenon, also in the context of possible explanations for implicit bias. For stochastic gradient descent, we present stability results for the overparameterized setting via Lyapunov exponents of interpolation solutions. Third, we explain several results regarding mean-field limits of neural networks. We describe a result that extends existing techniques to heterogeneous neural networks involving graph limits via digraph measures. This shows how large classes of neural networks naturally fall within the framework of Kuramoto-type models on graphs and their large-graph limits. Finally, we point out that similar strategies to use dynamics to study explainable and reliable AI can also be applied to settings such as generative models or fundamental issues in gradient training methods, such as backpropagation or vanishing/exploding gradients.

cross Blind Targeting: Personalization under Third-Party Privacy Constraints

Authors: Anya Shchetkina

Abstract: Major advertising platforms recently increased privacy protections by limiting advertisers' access to individual-level data. Instead of providing access to granular raw data, the platforms only allow a limited number of aggregate queries to a dataset, which is further protected by adding differentially private noise. This paper studies whether and how advertisers can design effective targeting policies within these restrictive privacy preserving data environments. To achieve this, I develop a probabilistic machine learning method based on Bayesian optimization, which facilitates dynamic data exploration. Since Bayesian optimization was designed to sample points from a function to find its maximum, it is not applicable to aggregate queries and to targeting. Therefore, I introduce two innovations: (i) integral updating of posteriors which allows to select the best regions of the data to query rather than individual points and (ii) a targeting-aware acquisition function that dynamically selects the most informative regions for the targeting task. I identify the conditions of the dataset and privacy environment that necessitate the use of such a "smart" querying strategy. I apply the strategic querying method to the Criteo AI Labs dataset for uplift modeling (Diemert et al., 2018) that contains visit and conversion data from 14M users. I show that an intuitive benchmark strategy only achieves 33% of the non-privacy-preserving targeting potential in some cases, while my strategic querying method achieves 97-101% of that potential, and is statistically indistinguishable from Causal Forest (Athey et al., 2019): a state-of-the-art non-privacy-preserving machine learning targeting method.

cross $\varphi$-Adapt: A Physics-Informed Adaptation Learning Approach to 2D Quantum Material Discovery

Authors: Hoang-Quan Nguyen, Xuan Bac Nguyen, Sankalp Pandey, Tim Faltermeier, Nicholas Borys, Hugh Churchill, Khoa Luu

Abstract: Characterizing quantum flakes is a critical step in quantum hardware engineering because the quality of these flakes directly influences qubit performance. Although computer vision methods for identifying two-dimensional quantum flakes have emerged, they still face significant challenges in estimating flake thickness. These challenges include limited data, poor generalization, sensitivity to domain shifts, and a lack of physical interpretability. In this paper, we introduce one of the first Physics-informed Adaptation Learning approaches to overcome these obstacles. We focus on two main issues, i.e., data scarcity and generalization. First, we propose a new synthetic data generation framework that produces diverse quantum flake samples across various materials and configurations, reducing the need for time-consuming manual collection. Second, we present $\varphi$-Adapt, a physics-informed adaptation method that bridges the performance gap between models trained on synthetic data and those deployed in real-world settings. Experimental results show that our approach achieves state-of-the-art performance on multiple benchmarks, outperforming existing methods. Our proposed approach advances the integration of physics-based modeling and domain adaptation. It also addresses a critical gap in leveraging synthesized data for real-world 2D material analysis, offering impactful tools for deep learning and materials science communities.

cross Pre-Trained Policy Discriminators are General Reward Models

Authors: Shihan Dou, Shichun Liu, Yuming Yang, Yicheng Zou, Yunhua Zhou, Shuhao Xing, Chenhao Huang, Qiming Ge, Demin Song, Haijun Lv, Songyang Gao, Chengqi Lv, Enyu Zhou, Honglin Guo, Zhiheng Xi, Wenwei Zhang, Qipeng Guo, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Tao Gui, Kai Chen

Abstract: We offer a novel perspective on reward modeling by formulating it as a policy discriminator, which quantifies the difference between two policies to generate a reward signal, guiding the training policy towards a target policy with desired behaviors. Based on this conceptual insight, we propose a scalable pre-training method named Policy Discriminative Learning (POLAR), which trains a reward model (RM) to discern identical policies and discriminate different ones. Unlike traditional reward modeling methods relying on absolute preferences, POLAR captures the relative difference between one policy and an arbitrary target policy, which is a scalable, high-level optimization objective suitable for modeling generic ranking relationships. Leveraging the POLAR pre-training paradigm, we present a series of RMs with parameter scales from 1.8B to 7B. Empirical results show that POLAR substantially outperforms traditional non-pre-trained methods, significantly enhancing RM performance. For instance, POLAR-7B could improve preference accuracy from 54.8% to 81.0% on STEM tasks and from 57.9% to 85.5% on creative writing tasks compared to SOTA baselines. POLAR also shows robust generalization capabilities in RLHF using Reinforcement Fine-tuning (RFT), providing reliable reward signals and markedly enhancing policy performance--improving LLaMa3.1-8B from an average of 47.36% to 56.33% and Qwen2.5-32B from 64.49% to 70.47% on 20 benchmarks. Moreover, scaling experiments reveal a clear power-law relationship between computation and performance, supported by linear correlation coefficients approaching 0.99. The impressive performance, strong generalization, and scaling properties suggest that POLAR is a promising direction for developing general and strong reward models.

cross A 3D Machine Learning based Volume Of Fluid scheme without explicit interface reconstruction

Authors: Moreno Pintore, Bruno Despr\'es

Abstract: We present a machine-learning based Volume Of Fluid method to simulate multi-material flows on three-dimensional domains. One of the novelties of the method is that the flux fraction is computed by evaluating a previously trained neural network and without explicitly reconstructing any local interface approximating the exact one. The network is trained on a purely synthetic dataset generated by randomly sampling numerous local interfaces and which can be adapted to improve the scheme on less regular interfaces when needed. Several strategies to ensure the efficiency of the method and the satisfaction of physical constraints and properties are suggested and formalized. Numerical results on the advection equation are provided to show the performance of the method. We observe numerical convergence as the size of the mesh tends to zero $h=1/N_h\searrow 0$, with a better rate than two reference schemes.

cross NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving

Authors: Qucheng Peng, Chen Bai, Guoxiang Zhang, Bo Xu, Xiaotong Liu, Xiaoyin Zheng, Chen Chen, Cheng Lu

Abstract: Autonomous driving systems have made significant advances in Q&A, perception, prediction, and planning based on local visual information, yet they struggle to incorporate broader navigational context that human drivers routinely utilize. We address this critical gap between local sensor data and global navigation information by proposing NavigScene, an auxiliary navigation-guided natural language dataset that simulates a human-like driving environment within autonomous driving systems. Moreover, we develop three complementary paradigms to leverage NavigScene: (1) Navigation-guided Reasoning, which enhances vision-language models by incorporating navigation context into the prompting approach; (2) Navigation-guided Preference Optimization, a reinforcement learning method that extends Direct Preference Optimization to improve vision-language model responses by establishing preferences for navigation-relevant summarized information; and (3) Navigation-guided Vision-Language-Action model, which integrates navigation guidance and vision-language models with conventional driving models through feature fusion. Extensive experiments demonstrate that our approaches significantly improve performance across perception, prediction, planning, and question-answering tasks by enabling reasoning capabilities beyond visual range and improving generalization to diverse driving scenarios. This work represents a significant step toward more comprehensive autonomous driving systems capable of navigating complex, unfamiliar environments with greater reliability and safety.

cross Physics-Guided Dual Implicit Neural Representations for Source Separation

Authors: Yuan Ni, Zhantao Chen, Alexander N. Petsch, Edmund Xu, Cheng Peng, Alexander I. Kolesnikov, Sugata Chowdhury, Arun Bansil, Jana B. Thayer, Joshua J. Turner

Abstract: Significant challenges exist in efficient data analysis of most advanced experimental and observational techniques because the collected signals often include unwanted contributions--such as background and signal distortions--that can obscure the physically relevant information of interest. To address this, we have developed a self-supervised machine-learning approach for source separation using a dual implicit neural representation framework that jointly trains two neural networks: one for approximating distortions of the physical signal of interest and the other for learning the effective background contribution. Our method learns directly from the raw data by minimizing a reconstruction-based loss function without requiring labeled data or pre-defined dictionaries. We demonstrate the effectiveness of our framework by considering a challenging case study involving large-scale simulated as well as experimental momentum-energy-dependent inelastic neutron scattering data in a four-dimensional parameter space, characterized by heterogeneous background contributions and unknown distortions to the target signal. The method is found to successfully separate physically meaningful signals from a complex or structured background even when the signal characteristics vary across all four dimensions of the parameter space. An analytical approach that informs the choice of the regularization parameter is presented. Our method offers a versatile framework for addressing source separation problems across diverse domains, ranging from superimposed signals in astronomical measurements to structural features in biomedical image reconstructions.

cross From Marginal to Joint Predictions: Evaluating Scene-Consistent Trajectory Prediction Approaches for Automated Driving

Authors: Fabian Konstantinidis, Ariel Dallari Guerreiro, Raphael Trumpp, Moritz Sackmann, Ulrich Hofmann, Marco Caccamo, Christoph Stiller

Abstract: Accurate motion prediction of surrounding traffic participants is crucial for the safe and efficient operation of automated vehicles in dynamic environments. Marginal prediction models commonly forecast each agent's future trajectories independently, often leading to sub-optimal planning decisions for an automated vehicle. In contrast, joint prediction models explicitly account for the interactions between agents, yielding socially and physically consistent predictions on a scene level. However, existing approaches differ not only in their problem formulation but also in the model architectures and implementation details used, making it difficult to compare them. In this work, we systematically investigate different approaches to joint motion prediction, including post-processing of the marginal predictions, explicitly training the model for joint predictions, and framing the problem as a generative task. We evaluate each approach in terms of prediction accuracy, multi-modality, and inference efficiency, offering a comprehensive analysis of the strengths and limitations of each approach. Several prediction examples are available at https://frommarginaltojointpred.github.io/.

URLs: https://frommarginaltojointpred.github.io/.

cross Spatio-Temporal LLM: Reasoning about Environments and Actions

Authors: Haozhen Zheng, Beitong Tian, Mingyuan Wu, Zhenggang Tang, Klara Nahrstedt, Alex Schwing

Abstract: Despite the significant recent progress of Multimodal Large Language Models (MLLMs), MLLMs still struggle to correctly answer prompts that require a holistic spatio-temporal understanding. Specifically, it is challenging to address prompts that refer to 1) the entirety of an environment that an agent equipped with an MLLM can operate in; and simultaneously also refer to 2) recent actions that just happened and are encoded in a video clip. However, such a holistic spatio-temporal understanding is important for agents operating in the real world. To address this issue, we first develop a framework to collect a large-scale dataset. Using the collected "Reasoning about Environments and Actions" (REA) dataset, we show that recent methods indeed struggle to correctly answer the prompts. To improve, we develop a "spatio-temporal LLM" (ST-LLM), a model equipped with projectors to improve both spatial understanding of an environment and temporal understanding of recent observations. On the collected REA data, we show that the proposed method significantly improves results compared to prior work. Code and data are available at https://zoezheng126.github.io/STLLM-website/.

URLs: https://zoezheng126.github.io/STLLM-website/.

cross Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations

Authors: Xiang Xu, Lingdong Kong, Song Wang, Chuanwei Zhou, Qingshan Liu

Abstract: LiDAR representation learning aims to extract rich structural and semantic information from large-scale, readily available datasets, reducing reliance on costly human annotations. However, existing LiDAR representation strategies often overlook the inherent spatiotemporal cues in LiDAR sequences, limiting their effectiveness. In this work, we propose LiMA, a novel long-term image-to-LiDAR Memory Aggregation framework that explicitly captures longer range temporal correlations to enhance LiDAR representation learning. LiMA comprises three key components: 1) a Cross-View Aggregation module that aligns and fuses overlapping regions across neighboring camera views, constructing a more unified and redundancy-free memory bank; 2) a Long-Term Feature Propagation mechanism that efficiently aligns and integrates multi-frame image features, reinforcing temporal coherence during LiDAR representation learning; and 3) a Cross-Sequence Memory Alignment strategy that enforces consistency across driving sequences, improving generalization to unseen environments. LiMA maintains high pretraining efficiency and incurs no additional computational overhead during downstream tasks. Extensive experiments on mainstream LiDAR-based perception benchmarks demonstrate that LiMA significantly improves both LiDAR semantic segmentation and 3D object detection. We hope this work inspires more effective pretraining paradigms for autonomous driving. The code has be made publicly accessible for future research.

replace Fault Sneaking Attack: a Stealthy Framework for Misleading Deep Neural Networks

Authors: Pu Zhao, Siyue Wang, Cheng Gongye, Yanzhi Wang, Yunsi Fei, Xue Lin

Abstract: Despite the great achievements of deep neural networks (DNNs), the vulnerability of state-of-the-art DNNs raises security concerns of DNNs in many application domains requiring high reliability.We propose the fault sneaking attack on DNNs, where the adversary aims to misclassify certain input images into any target labels by modifying the DNN parameters. We apply ADMM (alternating direction method of multipliers) for solving the optimization problem of the fault sneaking attack with two constraints: 1) the classification of the other images should be unchanged and 2) the parameter modifications should be minimized. Specifically, the first constraint requires us not only to inject designated faults (misclassifications), but also to hide the faults for stealthy or sneaking considerations by maintaining model accuracy. The second constraint requires us to minimize the parameter modifications (using L0 norm to measure the number of modifications and L2 norm to measure the magnitude of modifications). Comprehensive experimental evaluation demonstrates that the proposed framework can inject multiple sneaking faults without losing the overall test accuracy performance.

replace Federated Learning for Big Data: A Survey on Opportunities, Applications, and Future Directions

Authors: Thippa Reddy Gadekallu, Quoc-Viet Pham, Thien Huynh-The, Hailin Feng, Kai Fang, Sharnil Pandya, Madhusanka Liyanage, Wei Wang, Thanh Thi Nguyen

Abstract: In the recent years, generation of data have escalated to extensive dimensions and big data has emerged as a propelling force in the development of various machine learning advances and internet-of-things (IoT) devices. In this regard, the analytical and learning tools that transport data from several sources to a central cloud for its processing, training, and storage enable realization of the potential of big data. Nevertheless, since the data may contain sensitive information like banking account information, government information, and personal information, these traditional techniques often raise serious privacy concerns. To overcome such challenges, Federated Learning (FL) emerges as a sub-field of machine learning that focuses on scenarios where several entities (commonly termed as clients) work together to train a model while maintaining the decentralisation of their data. Although enormous efforts have been channelized for such studies, there still exists a gap in the literature wherein an extensive review of FL in the realm of big data services remains unexplored. The present paper thus emphasizes on the use of FL in handling big data and related services which encompasses comprehensive review of the potential of FL in big data acquisition, storage, big data analytics and further privacy preservation. Subsequently, the potential of FL in big data applications, such as smart city, smart healthcare, smart transportation, smart grid, and social media are also explored. The paper also highlights various projects pertaining to FL-big data and discusses the associated challenges related to such implementations. This acts as a direction of further research encouraging the development of plausible solutions.

replace Selective Prediction via Training Dynamics

Authors: Stephan Rabanser, Anvith Thudi, Kimia Hamidieh, Adam Dziedzic, Israfil Bahceci, Akram Bin Sediq, Hamza Sokun, Nicolas Papernot

Abstract: Selective Prediction is the task of rejecting inputs a model would predict incorrectly on. This involves a trade-off between input space coverage (how many data points are accepted) and model utility (how good is the performance on accepted data points). Current methods for selective prediction typically impose constraints on either the model architecture or the optimization objective; this inhibits their usage in practice and introduces unknown interactions with pre-existing loss functions. In contrast to prior work, we show that state-of-the-art selective prediction performance can be attained solely from studying the (discretized) training dynamics of a model. We propose a general framework that, given a test input, monitors metrics capturing the instability of predictions from intermediate models (i.e., checkpoints) obtained during training w.r.t. the final model's prediction. In particular, we reject data points exhibiting too much disagreement with the final prediction at late stages in training. The proposed rejection mechanism is domain-agnostic (i.e., it works for both discrete and real-valued prediction) and can be flexibly combined with existing selective prediction approaches as it does not require any train-time modifications. Our experimental evaluation on image classification, regression, and time series problems shows that our method beats past state-of-the-art accuracy/utility trade-offs on typical selective prediction benchmarks.

replace Normality-Guided Distributional Reinforcement Learning for Continuous Control

Authors: Ju-Seung Byun, Andrew Perrault

Abstract: Learning a predictive model of the mean return, or value function, plays a critical role in many reinforcement learning algorithms. Distributional reinforcement learning (DRL) has been shown to improve performance by modeling the value distribution, not just the mean. We study the value distribution in several continuous control tasks and find that the learned value distribution is empirically quite close to normal. We design a method that exploits this property, employing variances predicted from a variance network, along with returns, to analytically compute target quantile bars representing a normal for our distributional value function. In addition, we propose a policy update strategy based on the correctness as measured by structural characteristics of the value distribution not present in the standard value function. The approach we outline is compatible with many DRL structures. We use two representative on-policy algorithms, PPO and TRPO, as testbeds. Our method yields statistically significant improvements in 10 out of 16 continuous task settings, while utilizing a reduced number of weights and achieving faster training time compared to an ensemble-based method for quantifying value distribution uncertainty.

replace Predicting Drivers' Route Trajectories in Last-Mile Delivery Using A Pair-wise Attention-based Pointer Neural Network

Authors: Baichuan Mo, Qing Yi Wang, Xiaotong Guo, Matthias Winkenbach, Jinhua Zhao

Abstract: In last-mile delivery, drivers frequently deviate from planned delivery routes because of their tacit knowledge of the road and curbside infrastructure, customer availability, and other characteristics of the respective service areas. Hence, the actual stop sequences chosen by an experienced human driver may be potentially preferable to the theoretical shortest-distance routing under real-life operational conditions. Thus, being able to predict the actual stop sequence that a human driver would follow can help to improve route planning in last-mile delivery. This paper proposes a pair-wise attention-based pointer neural network for this prediction task using drivers' historical delivery trajectory data. In addition to the commonly used encoder-decoder architecture for sequence-to-sequence prediction, we propose a new attention mechanism based on an alternative specific neural network to capture the local pair-wise information for each pair of stops. To further capture the global efficiency of the route, we propose a new iterative sequence generation algorithm that is used after model training to identify the first stop of a route that yields the lowest operational cost. Results from an extensive case study on real operational data from Amazon's last-mile delivery operations in the US show that our proposed method can significantly outperform traditional optimization-based approaches and other machine learning methods (such as the Long Short-Term Memory encoder-decoder and the original pointer network) in finding stop sequences that are closer to high-quality routes executed by experienced drivers in the field. Compared to benchmark models, the proposed model can increase the average prediction accuracy of the first four stops from around 0.229 to 0.312, and reduce the disparity between the predicted route and the actual route by around 15%.

replace Continual Visual Reinforcement Learning with A Life-Long World Model

Authors: Minting Pan, Wendong Zhang, Geng Chen, Xiangming Zhu, Siyu Gao, Yunbo Wang, Xiaokang Yang

Abstract: Learning physical dynamics in a series of non-stationary environments is a challenging but essential task for model-based reinforcement learning (MBRL) with visual inputs. It requires the agent to consistently adapt to novel tasks without forgetting previous knowledge. In this paper, we present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control. The key assumption is that an ideal world model can provide a non-forgetting environment simulator, which enables the agent to optimize the policy in a multi-task learning manner based on the imagined trajectories from the world model. To this end, we first introduce the life-long world model, which learns task-specific latent dynamics using a mixture of Gaussians and incorporates generative experience replay to mitigate catastrophic forgetting. Then, we further address the value estimation challenge for previous tasks with the exploratory-conservative behavior learning approach. Our model remarkably outperforms the straightforward combinations of existing continual learning and visual RL algorithms on DeepMind Control Suite and Meta-World benchmarks with continual visual control tasks.

replace Fully Automatic Neural Network Reduction for Formal Verification

Authors: Tobias Ladner, Matthias Althoff

Abstract: Formal verification of neural networks is essential before their deployment in safety-critical applications. However, existing methods for formally verifying neural networks are not yet scalable enough to handle practical problems under strict time constraints. We address this challenge by introducing a fully automatic and sound reduction of neural networks using reachability analysis. The soundness ensures that the verification of the reduced network entails the verification of the original network. Our sound reduction approach is applicable to neural networks with any type of element-wise activation function, such as ReLU, sigmoid, and tanh. The network reduction is computed on the fly while simultaneously verifying the original network and its specification. All parameters are automatically tuned to minimize the network size without compromising verifiability. We further show the applicability of our approach to convolutional neural networks by explicitly exploiting similar neighboring pixels. Our evaluation shows that our approach reduces large neural networks to a fraction of the original number of neurons and thus shortens the verification time to a similar degree.

replace Learning Differentiable Logic Programs for Abstract Visual Reasoning

Authors: Hikaru Shindo, Viktor Pfanschilling, Devendra Singh Dhami, Kristian Kersting

Abstract: Visual reasoning is essential for building intelligent agents that understand the world and perform problem-solving beyond perception. Differentiable forward reasoning has been developed to integrate reasoning with gradient-based machine learning paradigms. However, due to the memory intensity, most existing approaches do not bring the best of the expressivity of first-order logic, excluding a crucial ability to solve abstract visual reasoning, where agents need to perform reasoning by using analogies on abstract concepts in different scenarios. To overcome this problem, we propose NEUro-symbolic Message-pAssiNg reasoNer (NEUMANN), which is a graph-based differentiable forward reasoner, passing messages in a memory-efficient manner and handling structured programs with functors. Moreover, we propose a computationally-efficient structure learning algorithm to perform explanatory program induction on complex visual scenes. To evaluate, in addition to conventional visual reasoning tasks, we propose a new task, visual reasoning behind-the-scenes, where agents need to learn abstract programs and then answer queries by imagining scenes that are not observed. We empirically demonstrate that NEUMANN solves visual reasoning tasks efficiently, outperforming neural, symbolic, and neuro-symbolic baselines.

replace Efficient Finite Initialization with Partial Norms for Tensorized Neural Networks and Tensor Networks Algorithms

Authors: Alejandro Mata Ali, I\~nigo Perez Delgado, Marina Ristol Roura, Aitor Moreno Fdez. de Leceta

Abstract: We present two algorithms to initialize layers of tensorized neural networks and general tensor network algorithms using partial computations of their Frobenius norms and lineal entrywise norms, depending on the type of tensor network involved. The core of this method is the use of the norm of subnetworks of the tensor network in an iterative way, so that we normalize by the finite values of the norms that led to the divergence or zero norm. In addition, the method benefits from the reuse of intermediate calculations. We have also applied it to the Matrix Product State/Tensor Train (MPS/TT) and Matrix Product Operator/Tensor Train Matrix (MPO/TT-M) layers and have seen its scaling versus the number of nodes, bond dimension, and physical dimension. All code is publicly available.

replace Transformers Can Solve Non-Linear and Non-Markovian Filtering Problems in Continuous Time For Conditionally Gaussian Signals

Authors: Blanka Horvath, Anastasis Kratsios, Yannick Limmer, Xuwei Yang

Abstract: The use of attention-based deep learning models in stochastic filtering, e.g.\ transformers and deep Kalman filters, has recently come into focus; however, the potential for these models to solve stochastic filtering problems remains largely unknown. The paper provides an affirmative answer to this open problem in the theoretical foundations of machine learning by showing that a class of continuous-time transformer models, called \textit{filterformers}, can approximately implement the conditional law of a broad class of non-Markovian and conditionally Gaussian signal processes given noisy continuous-time (possibly non-Gaussian) measurements. Our approximation guarantees hold uniformly over sufficiently regular compact subsets of continuous-time paths, where the worst-case 2-Wasserstein distance between the true optimal filter and our deep learning model quantifies the approximation error. Our construction relies on two new customizations of the standard attention mechanism: The first can losslessly adapt to the characteristics of a broad range of paths since we show that the attention mechanism implements bi-Lipschitz embeddings of sufficiently regular sets of paths into low-dimensional Euclidean spaces; thus, it incurs no ``dimension reduction error''. The latter attention mechanism is tailored to the geometry of Gaussian measures in the $2$-Wasserstein space. Our analysis relies on new stability estimates of robust optimal filters in the conditionally Gaussian setting.

replace EDALearn: A Comprehensive RTL-to-Signoff EDA Benchmark for Democratized and Reproducible ML for EDA Research

Authors: Jingyu Pan, Chen-Chia Chang, Zhiyao Xie, Yiran Chen, Hai Li

Abstract: The application of Machine Learning (ML) in Electronic Design Automation (EDA) for Very Large-Scale Integration (VLSI) design has garnered significant research attention. Despite the requirement for extensive datasets to build effective ML models, most studies are limited to smaller, internally generated datasets due to the lack of comprehensive public resources. In response, we introduce EDALearn, the first holistic, open-source benchmark suite specifically for ML tasks in EDA. This benchmark suite presents an end-to-end flow from synthesis to physical implementation, enriching data collection across various stages. It fosters reproducibility and promotes research into ML transferability across different technology nodes. Accommodating a wide range of VLSI design instances and sizes, our benchmark aptly represents the complexity of contemporary VLSI designs. Additionally, we provide an in-depth data analysis, enabling users to fully comprehend the attributes and distribution of our data, which is essential for creating efficient ML models. Our contributions aim to encourage further advances in the ML-EDA domain.

replace Shapley-Based Data Valuation with Mutual Information: A Key to Modified K-Nearest Neighbors

Authors: Mohammad Ali Vahedifar, Azim Akhtarshenas, Mohammad Mohammadi Rafatpanah, Maryam Sabbaghian

Abstract: The K-Nearest Neighbors (KNN) algorithm is widely used for classification and regression; however, it suffers from limitations, including the equal treatment of all samples. We propose Information-Modified KNN (IM-KNN), a novel approach that leverages Mutual Information ($\mathcal{I}$) and Shapley values to assign weighted values to neighbors, thereby bridging the gap in treating all samples with the same value and weight. On average, IM-KNN improves the accuracy, precision, and recall of traditional KNN by 16.80\%, 17.08\%, and 16.98\%, respectively, across 12 benchmark datasets. Experiments on four large-scale datasets further highlight IM-KNN's robustness to noise, imbalanced data, and skewed distributions.

replace Forecast Evaluation and the Relationship of Regret and Calibration

Authors: Rabanus Derr, Robert C. Williamson

Abstract: Machine learning is about forecasting. When the forecasts come with an evaluation metric the forecasts become useful. What are reasonable evaluation metrics? How do existing evaluation metrics relate? In this work, we provide a general structure which subsumes many currently used evaluation metrics in a two-dimensional hierarchy, e.g., external and swap regret, loss scores, and calibration scores. The framework embeds those evaluation metrics in a large set of single-instance-based comparisons of forecasts and observations which respect a meta-criterion for reasonable forecast evaluations which we term ``fairness''. In particular, this framework sheds light on the relationship on regret-type and calibration-type evaluation metrics showing a theoretical equivalence in their ability to evaluate, but practical incomparability of the obtained scores.

replace Score-based Diffusion Models via Stochastic Differential Equations -- a Technical Tutorial

Authors: Wenpin Tang, Hanyang Zhao

Abstract: This is an expository article on the score-based diffusion models, with a particular focus on the formulation via stochastic differential equations (SDE). After a gentle introduction, we discuss the two pillars in the diffusion modeling -- sampling and score matching, which encompass the SDE/ODE sampling, score matching efficiency, the consistency models, and reinforcement learning. Short proofs are given to illustrate the main idea of the stated results. The article is primarily a technical introduction to the field, and practitioners may also find some analysis useful in designing new models or algorithms.

replace Privacy-Preserving Collaborative Split Learning Framework for Smart Grid Load Forecasting

Authors: Asif Iqbal, Prosanta Gope, Biplab Sikdar

Abstract: Accurate load forecasting is crucial for energy management, infrastructure planning, and demand-supply balancing. Smart meter data availability has led to the demand for sensor-based load forecasting. Conventional ML allows training a single global model using data from multiple smart meters requiring data transfer to a central server, raising concerns for network requirements, privacy, and security. We propose a split learning-based framework for load forecasting to alleviate this issue. We split a deep neural network model into two parts, one for each Grid Station (GS) responsible for an entire neighbourhood's smart meters and the other for the Service Provider (SP). Instead of sharing their data, client smart meters use their respective GSs' model split for forward pass and only share their activations with the GS. Under this framework, each GS is responsible for training a personalized model split for their respective neighbourhoods, whereas the SP can train a single global or personalized model for each GS. Experiments show that the proposed models match or exceed a centrally trained model's performance and generalize well. Privacy is analyzed by assessing information leakage between data and shared activations of the GS model split. Additionally, differential privacy enhances local data privacy while examining its impact on performance. A transformer model is used as our base learner.

replace Tensor-based Graph Learning with Consistency and Specificity for Multi-view Clustering

Authors: Long Shi, Lei Cao, Yunshan Ye, Yu Zhao, Badong Chen

Abstract: In the context of multi-view clustering, graph learning is recognized as a crucial technique, which generally involves constructing an adaptive neighbor graph based on probabilistic neighbors, and then learning a consensus graph for clustering. However, it is worth noting that these graph learning methods encounter two significant limitations. Firstly, they often rely on Euclidean distance to measure similarity when constructing the adaptive neighbor graph, which proves inadequate in capturing the intrinsic structure among data points in practice, particularly for high-dimensional data. Secondly, most of these methods focus solely on consensus graph, ignoring unique information from each view. Although a few graph-based studies have considered using specific information as well, the modelling approach employed does not exclude the noise impact from the common or specific components. To this end, we propose a novel tensor-based multi-view graph learning framework that simultaneously considers consistency and specificity, while effectively eliminating the influence of noise. Specifically, we calculate similarity using pseudo-Stiefel manifold distance to preserve the intrinsic properties of data. By making an assumption that the learned neighbor graph of each view comprises a consistent part, a specific part, and a noise part, we formulate a new tensor-based target graph learning paradigm for noise-free graph fusion. Owing to the benefits of tensor singular value decomposition (t-SVD) in uncovering high-order correlations, this model is capable of achieving a comprehensive understanding of the target graph. Furthermore, we derive an algorithm to address the optimization problem. Experiments on six datasets have demonstrated the superiority of our method. We have released the source code on https://github.com/lshi91/CSTGL-Code.

URLs: https://github.com/lshi91/CSTGL-Code.

replace ST-LoRA: Low-rank Adaptation for Spatio-Temporal Forecasting

Authors: Weilin Ruan, Wei Chen, Xilin Dang, Jianxiang Zhou, Weichuang Li, Xu Liu, Yuxuan Liang

Abstract: Spatio-temporal forecasting is essential for understanding future dynamics within real-world systems by leveraging historical data from multiple locations. Existing methods often prioritize the development of intricate neural networks to capture the complex dependencies of the data. These methods neglect node-level heterogeneity and face over-parameterization when attempting to model node-specific characteristics. In this paper, we present a novel low-rank adaptation framework for existing spatio-temporal prediction models, termed \model, which alleviates the aforementioned problems through node-level adjustments. Specifically, we introduce the node-adaptive low-rank layer and node-specific predictor, capturing the complex functional characteristics of nodes while maintaining computational efficiency. Extensive experiments on multiple real-world datasets demonstrate that our method consistently achieves superior performance across various forecasting models with minimal computational overhead, improving performance by 7% with only 1% additional parameter cost. The source code is available at https://github.com/RWLinno/ST-LoRA.

URLs: https://github.com/RWLinno/ST-LoRA.

replace Curvature-Aligned Federated Learning (CAFe): Harmonizing Loss Landscapes for Fairness Without Demographics

Authors: Shaily Roy, Harshit Sharma, Asif Salekin

Abstract: Federated Learning (FL) enables privacy-preserving collaborative training, making it well-suited for decentralized human-sensing applications. Ensuring fairness in FL is challenging, as current methods rely on sensitive attribute knowledge, which conflicts with FL's privacy principles. Additionally, sensitive attributes in human-sensing data may be unknown or latent. To address this, we introduce Curvature-Aligned Federated Learning (CAFe), a theoretically grounded approach that achieves fairness in FL without requiring sensitive attribute knowledge, a concept termed "Fairness without Demographics" (FWD). CAFe introduces loss-landscape curvature regularization during local training and clients' loss-landscape sharpness-aware aggregation to align curvature both within and across clients, enabling a strong balance between higher fairness and performance. CAFe is especially suitable for real-world human-sensing FL scenarios involving single or multi-user edge devices with unknown or multiple bias factors. We validated CAFe through theoretical and empirical justifications, and comprehensive evaluations using three real-world datasets and a live real-world FL deployment with a heterogeneous testbed of resource-constrained devices. Additionally, we conduct sensitivity analyses on local training data volume, client sampling, communication overhead, resource costs, and runtime performance to demonstrate its feasibility for practical FL edge device deployment.

replace Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity

Authors: Calarina Muslimani, Bram Grooten, Deepak Ranganatha Sastry Mamillapalli, Mykola Pechenizkiy, Decebal Constantin Mocanu, Matthew E. Taylor

Abstract: To integrate into human-centered environments, autonomous agents must learn from and adapt to humans in their native settings. Preference-based reinforcement learning (PbRL) can enable this by learning reward functions from human preferences. However, humans live in a world full of diverse information, most of which is irrelevant to completing any particular task. It then becomes essential that agents learn to focus on the subset of task-relevant state features. To that end, this work proposes R2N (Robust-to-Noise), the first PbRL algorithm that leverages principles of dynamic sparse training to learn robust reward models that can focus on task-relevant features. In experiments with a simulated teacher, we demonstrate that R2N can adapt the sparse connectivity of its neural networks to focus on task-relevant features, enabling R2N to significantly outperform several sparse training and PbRL algorithms across simulated robotic environments.

replace Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning

Authors: Yuhui Wang, Qingyuan Wu, Dylan R. Ashley, Francesco Faccio, Weida Li, Chao Huang, J\"urgen Schmidhuber

Abstract: The Value Iteration Network (VIN) is an end-to-end differentiable neural network architecture for planning. It exhibits strong generalization to unseen domains by incorporating a differentiable planning module that operates on a latent Markov Decision Process (MDP). However, VINs struggle to scale to long-term and large-scale planning tasks, such as navigating a 100x100 maze -- a task that typically requires thousands of planning steps to solve. We observe that this deficiency is due to two issues: the representation capacity of the latent MDP and the planning module's depth. We address these by augmenting the latent MDP with a dynamic transition kernel, dramatically improving its representational capacity, and, to mitigate the vanishing gradient problem, introduce an "adaptive highway loss" that constructs skip connections to improve gradient flow. We evaluate our method on 2D/3D maze navigation environments, continuous control, and the real-world Lunar rover navigation task. We find that our new method, named Dynamic Transition VIN (DT-VIN), scales to 5000 layers and solves challenging versions of the above tasks. Altogether, we believe that DT-VIN represents a concrete step forward in performing long-term large-scale planning in complex environments.

replace LaCoOT: Layer Collapse through Optimal Transport

Authors: Victor Qu\'etu, Nour Hezbri, Enzo Tartaglione

Abstract: Although deep neural networks are well-known for their outstanding performance in tackling complex tasks, their hunger for computational resources remains a significant hurdle, posing energy-consumption issues and restricting their deployment on resource-constrained devices, preventing their widespread adoption. In this paper, we present an optimal transport-based method to reduce the depth of over-parametrized deep neural networks, alleviating their computational burden. More specifically, we propose a new regularization strategy based on the Max-Sliced Wasserstein distance to minimize the distance between the intermediate feature distributions in the neural network. We show that minimizing this distance enables the complete removal of intermediate layers in the network, achieving better performance/depth trade-off compared to existing techniques. We assess the effectiveness of our method on traditional image classification setups and extend it to generative image models. Our code is available at https://github.com/VGCQ/LaCoOT.

URLs: https://github.com/VGCQ/LaCoOT.

replace Predicting the Susceptibility of Examples to Catastrophic Forgetting

Authors: Guy Hacohen, Tinne Tuytelaars

Abstract: Catastrophic forgetting - the tendency of neural networks to forget previously learned data when learning new information - remains a central challenge in continual learning. In this work, we adopt a behavioral approach, observing a connection between learning speed and forgetting: examples learned more quickly are less prone to forgetting. Focusing on replay-based continual learning, we show that the composition of the replay buffer - specifically, whether it contains quickly or slowly learned examples - has a significant effect on forgetting. Motivated by this insight, we introduce Speed-Based Sampling (SBS), a simple yet general strategy that selects replay examples based on their learning speed. SBS integrates easily into existing buffer-based methods and improves performance across a wide range of competitive continual learning benchmarks, advancing state-of-the-art results. Our findings underscore the value of accounting for the forgetting dynamics when designing continual learning algorithms.

replace Backdooring Bias ($B^2$) into Stable Diffusion Models

Authors: Ali Naseh, Jaechul Roh, Eugene Bagdasarian, Amir Houmansadr

Abstract: Recent advances in large text-conditional diffusion models have revolutionized image generation by enabling users to create realistic, high-quality images from textual prompts, significantly enhancing artistic creation and visual communication. However, these advancements also introduce an underexplored attack opportunity: the possibility of inducing biases by an adversary into the generated images for malicious intentions, e.g., to influence public opinion and spread propaganda. In this paper, we study an attack vector that allows an adversary to inject arbitrary bias into a target model. The attack leverages low-cost backdooring techniques using a targeted set of natural textual triggers embedded within a small number of malicious data samples produced with public generative models. An adversary could pick common sequences of words that can then be inadvertently activated by benign users during inference. We investigate the feasibility and challenges of such attacks, demonstrating how modern generative models have made this adversarial process both easier and more adaptable. On the other hand, we explore various aspects of the detectability of such attacks and demonstrate that the model's utility remains intact in the absence of the triggers. Our extensive experiments using over 200,000 generated images and against hundreds of fine-tuned models demonstrate the feasibility of the presented backdoor attack. We illustrate how these biases maintain strong text-image alignment, highlighting the challenges in detecting biased images without knowing that bias in advance. Our cost analysis confirms the low financial barrier (\$10-\$15) to executing such attacks, underscoring the need for robust defensive strategies against such vulnerabilities in diffusion models.

replace How more data can hurt: Instability and regularization in next-generation reservoir computing

Authors: Yuanzhao Zhang, Edmilson Roque dos Santos, Huixin Zhang, Sean P. Cornelius

Abstract: It has been found recently that more data can, counter-intuitively, hurt the performance of deep neural networks. Here, we show that a more extreme version of the phenomenon occurs in data-driven models of dynamical systems. To elucidate the underlying mechanism, we focus on next-generation reservoir computing (NGRC) -- a popular framework for learning dynamics from data. We find that, despite learning a better representation of the flow map with more training data, NGRC can adopt an ill-conditioned ``integrator'' and lose stability. We link this data-induced instability to the auxiliary dimensions created by the delayed states in NGRC. Based on these findings, we propose simple strategies to mitigate the instability, either by increasing regularization strength in tandem with data size, or by carefully introducing noise during training. Our results highlight the importance of proper regularization in data-driven modeling of dynamical systems.

replace Improving Graph Out-of-distribution Generalization Beyond Causality

Authors: Can Xu, Yao Cheng, Jianxiang Yu, Haosen Wang, Jingsong Lv, Yao Liu, Xiang Li

Abstract: Existing methods for graph out-of-distribution (OOD) generalization primarily rely on empirical studies on synthetic datasets. Such approaches tend to overemphasize the causal relationships between invariant sub-graphs and labels, thereby neglecting the non-negligible role of environment in real-world scenarios. In contrast to previous studies that impose rigid independence assumptions on environments and invariant sub-graphs, this paper presents the theorems of environment-label dependency and mutable rationale invariance, where the former characterizes the usefulness of environments in determining graph labels while the latter refers to the mutable importance of graph rationales. Based on analytic investigations, a novel variational inference based method named ``Probability Dependency on Environments and Rationales for OOD Graphs on Real-world Data'' (DEROG) is introduced. To alleviate the adverse effect of unknown prior knowledge on environments and rationales, DEROG utilizes generalized Bayesian inference. Further, DEROG employs an EM-based algorithm for optimization. Finally, extensive experiments on real-world datasets under different distribution shifts are conducted to show the superiority of DEROG. Our code is publicly available at https://github.com/LEOXC1571/DEROG.

URLs: https://github.com/LEOXC1571/DEROG.

replace Interface Laplace Learning: Learnable Interface Term Helps Semi-Supervised Learning

Authors: Tangjun Wang, Chenglong Bao, Zuoqiang Shi

Abstract: We introduce a novel framework, called Interface Laplace learning, for graph-based semi-supervised learning. Motivated by the observation that an interface should exist between different classes where the function value is non-smooth, we introduce a Laplace learning model that incorporates an interface term. This model challenges the long-standing assumption that functions are smooth at all unlabeled points. In the proposed approach, we add an interface term to the Laplace learning model at the interface positions. We provide a practical algorithm to approximate the interface positions using k-hop neighborhood indices, and to learn the interface term from labeled data without artificial design. Our method is efficient and effective, and we present extensive experiments demonstrating that Interface Laplace learning achieves better performance than other recent semi-supervised learning approaches at extremely low label rates on the MNIST, FashionMNIST, and CIFAR-10 datasets.

replace Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards

Authors: Shresth Verma, Niclas Boehmer, Lingkai Kong, Milind Tambe

Abstract: LLMs are increasingly used to design reward functions based on human preferences in Reinforcement Learning (RL). We focus on LLM-designed rewards for Restless Multi-Armed Bandits, a framework for allocating limited resources among agents. In applications such as public health, this approach empowers grassroots health workers to tailor automated allocation decisions to community needs. In the presence of multiple agents, altering the reward function based on human preferences can impact subpopulations very differently, leading to complex tradeoffs and a multi-objective resource allocation problem. We are the first to present a principled method termed Social Choice Language Model for dealing with these tradeoffs for LLM-designed rewards for multiagent planners in general and restless bandits in particular. The novel part of our model is a transparent and configurable selection component, called an adjudicator, external to the LLM that controls complex tradeoffs via a user-selected social welfare function. Our experiments demonstrate that our model reliably selects more effective, aligned, and balanced reward functions compared to purely LLM-based approaches.

replace Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering

Authors: Peng Wang, Huijie Zhang, Zekai Zhang, Siyi Chen, Yi Ma, Qing Qu

Abstract: Recent empirical studies have demonstrated that diffusion models can effectively learn the image distribution and generate new samples. Remarkably, these models can achieve this even with a small number of training samples despite a large image dimension, circumventing the curse of dimensionality. In this work, we provide theoretical insights into this phenomenon by leveraging key empirical observations: (i) the low intrinsic dimensionality of image data, (ii) a union of manifold structure of image data, and (iii) the low-rank property of the denoising autoencoder in trained diffusion models. These observations motivate us to assume the underlying data distribution of image data as a mixture of low-rank Gaussians and to parameterize the denoising autoencoder as a low-rank model according to the score function of the assumed distribution. With these setups, we rigorously show that optimizing the training loss of diffusion models is equivalent to solving the canonical subspace clustering problem over the training samples. Based on this equivalence, we further show that the minimal number of samples required to learn the underlying distribution scales linearly with the intrinsic dimensions under the above data and model assumptions. This insight sheds light on why diffusion models can break the curse of dimensionality and exhibit the phase transition in learning distributions. Moreover, we empirically establish a correspondence between the subspaces and the semantic representations of image data, facilitating image editing. We validate these results with corroborated experimental results on both simulated distributions and image datasets.

replace Finetuning CLIP to Reason about Pairwise Differences

Authors: Dylan Sam, Devin Willmott, Joao D. Semedo, J. Zico Kolter

Abstract: Vision-language models (VLMs) such as CLIP are trained via contrastive learning between text and image pairs, resulting in aligned image and text embeddings that are useful for many downstream tasks. A notable drawback of CLIP, however, is that the resulting embedding space seems to lack some of the structure of its purely text-based alternatives. For instance, while text embeddings have long been noted to satisfy analogies in embedding space using vector arithmetic, CLIP has no such property. In this paper, we propose an approach to natively train CLIP in a contrastive manner to reason about differences in embedding space. We finetune CLIP so that text descriptions of differences between images correspond to their difference in image embedding space, using synthetically generated data with large language models on image-caption paired datasets. We first demonstrate that our approach yields significantly improved capabilities in ranking images by a certain attribute (e.g., elephants are larger than cats), which is useful in retrieval or constructing attribute-based classifiers, and improved zeroshot classification performance on many downstream image classification tasks. In addition, our approach enables a new mechanism for inference that we refer to as comparative prompting, where we leverage prior knowledge of text descriptions of differences between classes of interest, achieving even larger performance gains in classification. Finally, we illustrate that the resulting embeddings obey a larger degree of geometric properties in embedding space, such as in text-to-image generation.

replace Time Distributed Deep Learning Models for Purely Exogenous Forecasting: Application to Water Table Depth Prediction using Weather Image Time Series

Authors: Matteo Salis, Abdourrahmane M. Atto, Stefano Ferraris, Rosa Meo

Abstract: Groundwater resources are one of the most relevant elements in the water cycle, therefore developing models to accurately predict them is a pivotal task in the sustainable resource management framework. Deep Learning (DL) models have been revealed to be very effective in hydrology, especially by feeding spatially distributed data (e.g. raster data). In many regions, hydrological measurements are difficult to obtain regularly or periodically in time, and in some cases, the last available data are not up to date. Reversely, weather data, which significantly impacts water resources, are usually more available and with higher quality. More specifically, we have proposed two different DL models to predict the water table depth in the Grana-Maira catchment (Piemonte, IT) using only exogenous weather image time series. To deal with the image time series, both models are made of a first Time Distributed Convolutional Neural Network (TDC) which encodes the image available at each time step into a vectorial representation. The first model, TDC-LSTM uses then a Sequential Module based on an LSTM layer to learn temporal relations and output the predictions. The second model, TDC-UnPWaveNet uses instead a new version of the WaveNet architecture, adapted here to output a sequence shorter and completely shifted in the future with respect to the input one. To this aim, and to deal with the different sequence lengths in the UnPWaveNet, we have designed a new Channel Distributed layer, that acts like a Time Distributed one but on the channel dimension, i.e. applying the same set of operations to each channel of the input. TDC-LSTM and TDC-UnPWaveNet have shown both remarkable results. However, the two models have focused on different learnable information: TDC-LSTM has focused more on lowering the bias, while TDC-UnPWaveNet has focused more on the temporal dynamics, maximizing correlation, and KGE.

replace Deep Autoregressive Models as Causal Inference Engines

Authors: Daniel Jiwoong Im, Kevin Zhang, Nakul Verma, Kyunghyun Cho

Abstract: Existing causal inference (CI) models are often restricted to data with low-dimensional confounders and singleton actions. We propose an autoregressive (AR) CI framework capable of handling complex confounders and sequential actions commonly found in modern applications. Our approach accomplishes this using {\em sequencification}, which transforms data from an underlying causal diagram into a sequence of tokens. Sequencification not only accommodates training with data generated from a large class of DAGs, but also extends existing CI capabilities to estimate multiple causal quantities using a {\em single} model. We can directly compute probabilities from interventional distributions, simplifying inference and improving outcome prediction accuracy. We demonstrate that an AR model adapted for CI is efficient and effective in various complex applications such as navigating mazes, playing chess endgames, and evaluating the impact of certain keywords on paper acceptance rates, where we consider causal queries beyond standard reinforcement learning-type questions.

replace Gradient Boosting Decision Trees on Medical Diagnosis over Tabular Data

Authors: A. Yark{\i}n Y{\i}ld{\i}z, Asli Kalayci

Abstract: Medical diagnosis is a crucial task in the medical field, in terms of providing accurate classification and respective treatments. Having near-precise decisions based on correct diagnosis can affect a patient's life itself, and may extremely result in a catastrophe if not classified correctly. Several traditional machine learning (ML), such as support vector machines (SVMs) and logistic regression, and state-of-the-art tabular deep learning (DL) methods, including TabNet and TabTransformer, have been proposed and used over tabular medical datasets. Additionally, due to the superior performances, lower computational costs, and easier optimization over different tasks, ensemble methods have been used in the field more recently. They offer a powerful alternative in terms of providing successful medical decision-making processes in several diagnosis tasks. In this study, we investigated the benefits of ensemble methods, especially the Gradient Boosting Decision Tree (GBDT) algorithms in medical classification tasks over tabular data, focusing on XGBoost, CatBoost, and LightGBM. The experiments demonstrate that GBDT methods outperform traditional ML and deep neural network architectures and have the highest average rank over several benchmark tabular medical diagnosis datasets. Furthermore, they require much less computational power compared to DL models, creating the optimal methodology in terms of high performance and lower complexity.

replace Collapse-Proof Non-Contrastive Self-Supervised Learning

Authors: Emanuele Sansone, Tim Lebailly, Tinne Tuytelaars

Abstract: We present a principled and simplified design of the projector and loss function for non-contrastive self-supervised learning based on hyperdimensional computing. We theoretically demonstrate that this design introduces an inductive bias that encourages representations to be simultaneously decorrelated and clustered, without explicitly enforcing these properties. This bias provably enhances generalization and suffices to avoid known training failure modes, such as representation, dimensional, cluster, and intracluster collapses. We validate our theoretical findings on image datasets, including SVHN, CIFAR-10, CIFAR-100, and ImageNet-100. Our approach effectively combines the strengths of feature decorrelation and cluster-based self-supervised learning methods, overcoming training failure modes while achieving strong generalization in clustering and linear classification tasks.

replace SymmetryLens: Unsupervised Symmetry Learning via Locality and Density Preservation

Authors: Onur Efe, Arkadas Ozakin

Abstract: We develop a new unsupervised symmetry learning method that starts with raw data and provides the minimal generator of an underlying Lie group of symmetries, together with a symmetry-equivariant representation of the data, which turns the hidden symmetry into an explicit one. The method is able to learn the pixel translation operator from a dataset with only an approximate translation symmetry and can learn quite different types of symmetries that are not apparent to the naked eye. The method is based on the formulation of an information-theoretic loss function that measures both the degree of symmetry of a dataset under a candidate symmetry generator and a proposed notion of locality of the samples, which is coupled to symmetry. We demonstrate that this coupling between symmetry and locality, together with an optimization technique developed for entropy estimation, results in a stable system that provides reproducible results.

replace Network Topology Inference from Smooth Signals Under Partial Observability

Authors: Chuansen Peng, Hanning Tang, Zhiguo Wang, Xiaojing Shen

Abstract: Inferring network topology from smooth signals is a significant problem in data science and engineering. A common challenge in real-world scenarios is the availability of only partially observed nodes. While some studies have considered hidden nodes and proposed various optimization frameworks, existing methods often lack the practical efficiency needed for large-scale networks or fail to provide theoretical convergence guarantees. In this paper, we address the problem of inferring network topologies from smooth signals with partially observed nodes. We propose a first-order algorithmic framework that includes two variants: one based on column sparsity regularization and the other on a low-rank constraint. We establish theoretical convergence guarantees and demonstrate the linear convergence rate of our algorithms. Extensive experiments on both synthetic and real-world data show that our results align with theoretical predictions, exhibiting not only linear convergence but also superior speed compared to existing methods. To the best of our knowledge, this is the first work to propose a first-order algorithmic framework for inferring network structures from smooth signals under partial observability, offering both guaranteed linear convergence and practical effectiveness for large-scale networks.

replace Diversifying Robot Locomotion Behaviors with Extrinsic Behavioral Curiosity

Authors: Zhenglin Wan, Xingrui Yu, David Mark Bossens, Yueming Lyu, Qing Guo, Flint Xiaofeng Fan, Ivor Tsang

Abstract: Imitation learning (IL) has shown promise in robot locomotion but is often limited to learning a single expert policy, constraining behavior diversity and robustness in unpredictable real-world scenarios. To address this, we introduce Quality Diversity Inverse Reinforcement Learning (QD-IRL), a novel framework that integrates quality-diversity optimization with IRL methods, enabling agents to learn diverse behaviors from limited demonstrations. This work introduces Extrinsic Behavioral Curiosity (EBC), which allows agents to receive additional curiosity rewards from an external critic based on how novel the behaviors are with respect to a large behavioral archive. To validate the effectiveness of EBC in exploring diverse locomotion behaviors, we evaluate our method on multiple robot locomotion tasks. EBC improves the performance of QD-IRL instances with GAIL, VAIL, and DiffAIL across all included environments by up to 185%, 42%, and 150%, even surpassing expert performance by 20% in Humanoid. Furthermore, we demonstrate that EBC is applicable to Gradient-Arborescence-based Quality Diversity Reinforcement Learning (QD-RL) algorithms, where it substantially improves performance and provides a generic technique for diverse robot locomotion. The source code of this work is provided at https://github.com/vanzll/EBC.

URLs: https://github.com/vanzll/EBC.

replace Tackling Dimensional Collapse toward Comprehensive Universal Domain Adaptation

Authors: Hung-Chieh Fang, Po-Yi Lu, Hsuan-Tien Lin

Abstract: Universal Domain Adaptation (UniDA) addresses unsupervised domain adaptation where target classes may differ arbitrarily from source ones, except for a shared subset. A widely used approach, partial domain matching (PDM), aligns only shared classes but struggles in extreme cases where many source classes are absent in the target domain, underperforming the most naive baseline that trains on only source data. In this work, we identify that the failure of PDM for extreme UniDA stems from dimensional collapse (DC) in target representations. To address target DC, we propose to use the de-collapse techniques in self-supervised learning on the unlabeled target data to preserve the intrinsic structure of the learned representations. Our experimental results confirm that SSL consistently advances PDM and delivers new state-of-the-art results across a broader benchmark of UniDA scenarios with different portions of shared classes, representing a crucial step toward truly comprehensive UniDA. Project page: https://dc-unida.github.io/

URLs: https://dc-unida.github.io/

replace Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference

Authors: Jonathan Wenger, Kaiwen Wu, Philipp Hennig, Jacob R. Gardner, Geoff Pleiss, John P. Cunningham

Abstract: Model selection in Gaussian processes scales prohibitively with the size of the training dataset, both in time and memory. While many approximations exist, all incur inevitable approximation error. Recent work accounts for this error in the form of computational uncertainty, which enables -- at the cost of quadratic complexity -- an explicit tradeoff between computation and precision. Here we extend this development to model selection, which requires significant enhancements to the existing approach, including linear-time scaling in the size of the dataset. We propose a novel training loss for hyperparameter optimization and demonstrate empirically that the resulting method can outperform SGPR, CGGP and SVGP, state-of-the-art methods for GP model selection, on medium to large-scale datasets. Our experiments show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU. As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty -- a fundamental prerequisite for optimal decision-making.

replace FedAli: Personalized Federated Learning Alignment with Prototype Layers for Generalized Mobile Services

Authors: Sannara Ek, Kaile Wang, Fran\c{c}ois Portet, Philippe Lalanda, Jiannong Cao

Abstract: Personalized Federated Learning (PFL) enables distributed training on edge devices, allowing models to collaboratively learn global patterns while tailoring their parameters to better fit each client's local data, all while preserving data privacy. However, PFL faces two key challenges in mobile systems: client drift, where heterogeneous data cause model divergence, and the overlooked need for client generalization, as the dynamic of mobile sensing demands adaptation beyond local environments. To overcome these limitations, we introduce Federated Alignment (FedAli), a prototype-based regularization technique that enhances inter-client alignment while strengthening the robustness of personalized adaptations. At its core, FedAli introduces the ALignment with Prototypes (ALP) layer, inspired by human memory, to enhance generalization by guiding inference embeddings toward personalized prototypes while reducing client drift through alignment with shared prototypes during training. By leveraging an optimal transport plan to compute prototype-embedding assignments, our approach allows pre-training the prototypes without any class labels to further accelerate convergence and improve performance. Our extensive experiments show that FedAli significantly enhances client generalization while preserving strong personalization in heterogeneous settings.

replace Physics Encoded Blocks in Residual Neural Network Architectures for Digital Twin Models

Authors: Muhammad Saad Zia, Ashiq Anjum, Lu Liu, Anthony Conway, Anasol Pena Rios

Abstract: Physics Informed Machine Learning has emerged as a popular approach for modeling and simulation in digital twins, enabling the generation of accurate models of processes and behaviors in real-world systems. However, existing methods either rely on simple loss regularizations that offer limited physics integration or employ highly specialized architectures that are difficult to generalize across diverse physical systems. This paper presents a generic approach based on a novel physics-encoded residual neural network (PERNN) architecture that seamlessly combines data-driven and physics-based analytical models to overcome these limitations. Our method integrates differentiable physics blocks-implementing mathematical operators from physics-based models with feed-forward learning blocks, while intermediate residual blocks ensure stable gradient flow during training. Consequently, the model naturally adheres to the underlying physical principles even when prior physics knowledge is incomplete, thereby improving generalizability with low data requirements and reduced model complexity. We investigate our approach in two application domains. The first is a steering model for autonomous vehicles in a simulation environment, and the second is a digital twin for climate modeling using an ordinary differential equation (ODE)-based model of Net Ecosystem Exchange (NEE) to enable gap-filling in flux tower data. In both cases, our method outperforms conventional neural network approaches as well as state-of-the-art Physics Informed Machine Learning methods.

replace CODE-CL: Conceptor-Based Gradient Projection for Deep Continual Learning

Authors: Marco Paul E. Apolinario, Sakshi Choudhary, Kaushik Roy

Abstract: Continual learning (CL) - the ability to progressively acquire and integrate new concepts - is essential to intelligent systems to adapt to dynamic environments. However, deep neural networks struggle with catastrophic forgetting (CF) when learning tasks sequentially, as training for new tasks often overwrites previously learned knowledge. To address this, recent approaches constrain updates to orthogonal subspaces using gradient projection, effectively preserving important gradient directions for previous tasks. While effective in reducing forgetting, these approaches inadvertently hinder forward knowledge transfer (FWT), particularly when tasks are highly correlated. In this work, we propose Conceptor-based gradient projection for Deep Continual Learning (CODE-CL), a novel method that leverages conceptor matrix representations, a form of regularized reconstruction, to adaptively handle highly correlated tasks. CODE-CL mitigates CF by projecting gradients onto pseudo-orthogonal subspaces of previous task feature spaces while simultaneously promoting FWT. It achieves this by learning a linear combination of shared basis directions, allowing efficient balance between stability and plasticity and transfer of knowledge between overlapping input feature representations. Extensive experiments on continual learning benchmarks validate CODE-CL's efficacy, demonstrating superior performance, reduced forgetting, and improved FWT as compared to state-of-the-art methods.

replace Lion Cub: Minimizing Communication Overhead in Distributed Lion

Authors: Satoki Ishikawa, Tal Ben-Nun, Brian Van Essen, Rio Yokota, Nikoli Dryden

Abstract: Communication overhead is a key challenge in distributed deep learning, especially on slower Ethernet interconnects, and given current hardware trends, communication is likely to become a major bottleneck. While gradient compression techniques have been explored for SGD and Adam, the Lion optimizer has the distinct advantage that its update vectors are the output of a sign operation, enabling straightforward quantization. However, simply compressing updates for communication and using techniques like majority voting fails to lead to end-to-end speedups due to inefficient communication algorithms and reduced convergence. We analyze three factors critical to distributed learning with Lion: optimizing communication methods, identifying effective quantization methods, and assessing the necessity of momentum synchronization. Our findings show that quantization techniques adapted to Lion and selective momentum synchronization can significantly reduce communication costs while maintaining convergence. We combine these into Lion Cub, which enables up to 5x speedups in end-to-end training compared to Lion. This highlights Lion's potential as a communication-efficient solution for distributed training.

replace Using Machine Learning to Discover Parsimonious and Physically-Interpretable Representations of Catchment-Scale Rainfall-Runoff Dynamics

Authors: Yuan-Heng Wang, Hoshin V. Gupta

Abstract: Despite excellent real-world predictive performance of modern machine learning (ML) methods, many scientists hesitate to discard traditional physical-conceptual (PC) approaches due to their relative interpretability, which contributes to credibility during decision-making. In this context, a currently underexplored aspect of ML is how to develop minimally-optimal representations that can facilitate better insight regarding system functioning. Regardless of how this is achieved, parsimonious representations seem to better support the advancement of scientific understanding. Our own view is that ML-based modeling should be based in use of computational units that are fundamentally easy to interpret in a physical-conceptual sense. This paper continues our exploration of how ML can be exploited in the service of scientific investigation. We use the Mass-Conserving-Perceptron (MCP) as the fundamental computational unit in a generic network architecture to explore important issues related to the use of observational data for constructing models of dynamical systems. We show, in the context of lumped catchment modeling, that physical interpretability and predictive performance can both be achieved using a relatively parsimonious distributed-state multiple-flow-path network with context-dependent gating and information sharing across the nodes, suggesting that MCP-based modeling can play a significant role in application of ML to geoscientific investigation.

replace On the Adversarial Robustness of Graph Neural Networks with Graph Reduction

Authors: Kerui Wu, Ka-Ho Chow, Wenqi Wei, Lei Yu

Abstract: As Graph Neural Networks (GNNs) become increasingly popular for learning from large-scale graph data across various domains, their susceptibility to adversarial attacks when using graph reduction techniques for scalability remains underexplored. In this paper, we present an extensive empirical study to investigate the impact of graph reduction techniques, specifically graph coarsening and sparsification, on the robustness of GNNs against adversarial attacks. Through extensive experiments involving multiple datasets and GNN architectures, we examine the effects of four sparsification and six coarsening methods on the poisoning attacks. Our results indicate that, while graph sparsification can mitigate the effectiveness of certain poisoning attacks, such as Mettack, it has limited impact on others, like PGD. Conversely, graph coarsening tends to amplify the adversarial impact, significantly reducing classification accuracy as the reduction ratio decreases. Additionally, we provide a novel analysis of the causes driving these effects and examine how defensive GNN models perform under graph reduction, offering practical insights for designing robust GNNs within graph acceleration systems.

replace MMD-OPT : Maximum Mean Discrepancy Based Sample Efficient Collision Risk Minimization for Autonomous Driving

Authors: Basant Sharma, Arun Kumar Singh

Abstract: We propose MMD-OPT: a sample-efficient approach for minimizing the risk of collision under arbitrary prediction distribution of the dynamic obstacles. MMD-OPT is based on embedding distribution in Reproducing Kernel Hilbert Space (RKHS) and the associated Maximum Mean Discrepancy (MMD). We show how these two concepts can be used to define a sample efficient surrogate for collision risk estimate. We perform extensive simulations to validate the effectiveness of MMD-OPT on both synthetic and real-world datasets. Importantly, we show that trajectory optimization with our MMD-based collision risk surrogate leads to safer trajectories at low sample regimes than popular alternatives based on Conditional Value at Risk (CVaR).

replace On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages

Authors: Aleksandar Terzi\'c, Michael Hersche, Giacomo Camposampiero, Thomas Hofmann, Abu Sebastian, Abbas Rahimi

Abstract: Selective state-space models (SSMs) are an emerging alternative to the Transformer, offering the unique advantage of parallel training and sequential inference. Although these models have shown promising performance on a variety of tasks, their formal expressiveness and length generalization properties remain underexplored. In this work, we provide insight into the workings of selective SSMs by analyzing their expressiveness and length generalization performance on regular language tasks, i.e., finite-state automaton (FSA) emulation. We address certain limitations of modern SSM-based architectures by introducing the Selective Dense State-Space Model (SD-SSM), the first selective SSM that exhibits perfect length generalization on a set of various regular language tasks using a single layer. It utilizes a dictionary of dense transition matrices, a softmax selection mechanism that creates a convex combination of dictionary matrices at each time step, and a readout consisting of layer normalization followed by a linear map. We then proceed to evaluate variants of diagonal selective SSMs by considering their empirical performance on commutative and non-commutative automata. We explain the experimental results with theoretical considerations. Our code is available at https://github.com/IBM/selective-dense-state-space-model.

URLs: https://github.com/IBM/selective-dense-state-space-model.

replace Dementia Detection using Multi-modal Methods on Audio Data

Authors: Saugat Kannojia, Anirudh Praveen, Danish Vasdev, Saket Nandedkar, Divyansh Mittal, Sarthak Kalankar, Shaurya Johari, Vipul Arora

Abstract: Dementia is a neurodegenerative disease that causes gradual cognitive impairment, which is very common in the world and undergoes a lot of research every year to prevent and cure it. It severely impacts the patient's ability to remember events and communicate clearly, where most variations of it have no known cure, but early detection can help alleviate symptoms before they become worse. One of the main symptoms of dementia is difficulty in expressing ideas through speech. This paper attempts to talk about a model developed to predict the onset of the disease using audio recordings from patients. An ASR-based model was developed that generates transcripts from the audio files using Whisper model and then applies RoBERTa regression model to generate an MMSE score for the patient. This score can be used to predict the extent to which the cognitive ability of a patient has been affected. We use the PROCESS_V1 dataset for this task, which is introduced through the PROCESS Grand Challenge 2025. The model achieved an RMSE score of 2.6911 which is around 10 percent lower than the described baseline.

replace Random weights of DNNs and emergence of fixed points

Authors: L. Berlyand, O. Krupchytskyi, V. Slavin

Abstract: This paper is concerned with a special class of deep neural networks (DNNs) where the input and the output vectors have the same dimension. Such DNNs are widely used in applications, e.g., autoencoders. The training of such networks can be characterized by their fixed points (FPs). We are concerned with the dependence of the FPs number and their stability on the distribution of randomly initialized DNNs' weight matrices. Specifically, we consider the i.i.d. random weights with heavy and light-tail distributions. Our objectives are twofold. First, the dependence of FPs number and stability of FPs on the type of the distribution tail. Second, the dependence of the number of FPs on the DNNs' architecture. We perform extensive simulations and show that for light tails (e.g., Gaussian), which are typically used for initialization, a single stable FP exists for broad types of architectures. In contrast, for heavy tail distributions (e.g., Cauchy), which typically appear in trained DNNs, a number of FPs emerge. We further observe that these FPs are stable attractors and their basins of attraction partition the domain of input vectors. Finally, we observe an intriguing non-monotone dependence of the number of fixed points $Q(L)$ on the DNNs' depth $L$. The above results were first obtained for untrained DNNs with two types of distributions at initialization and then verified by considering DNNs in which the heavy tail distributions arise in training.

replace Gradient Purification: Defense Against Poisoning Attack in Decentralized Federated Learning

Authors: Bin Li, Xiaoye Miao, Yan Zhang, Jianwei Yin

Abstract: Decentralized federated learning (DFL) is inherently vulnerable to data poisoning attacks, as malicious clients can transmit manipulated gradients to neighboring clients. Existing defense methods either reject suspicious gradients per iteration or restart DFL aggregation after excluding all malicious clients. They all neglect the potential benefits that may exist within contributions from malicious clients. In this paper, we propose a novel gradient purification defense, termed GPD, to defend against data poisoning attacks in DFL. It aims to separately mitigate the harm in gradients and retain benefits embedded in model weights, thereby enhancing overall model accuracy. For each benign client in GPD, a recording variable is designed to track historically aggregated gradients from one of its neighbors. It allows benign clients to precisely detect malicious neighbors and mitigate all aggregated malicious gradients at once. Upon mitigation, benign clients optimize model weights using purified gradients. This optimization not only retains previously beneficial components from malicious clients but also exploits canonical contributions from benign clients. We analyze the convergence of GPD, as well as its ability to harvest high accuracy. Extensive experiments demonstrate that, GPD is capable of mitigating data poisoning attacks under both iid and non-iid data distributions. It also significantly outperforms state-of-the-art defense methods in terms of model accuracy.

replace Accelerated Diffusion Models via Speculative Sampling

Authors: Valentin De Bortoli, Alexandre Galashov, Arthur Gretton, Arnaud Doucet

Abstract: Speculative sampling is a popular technique for accelerating inference in Large Language Models by generating candidate tokens using a fast draft model and accepting or rejecting them based on the target model's distribution. While speculative sampling was previously limited to discrete sequences, we extend it to diffusion models, which generate samples via continuous, vector-valued Markov chains. In this context, the target model is a high-quality but computationally expensive diffusion model. We propose various drafting strategies, including a simple and effective approach that does not require training a draft model and is applicable out of the box to any diffusion model. Our experiments demonstrate significant generation speedup on various diffusion models, halving the number of function evaluations, while generating exact samples from the target model.

replace Synthesising Activity Participations and Scheduling with Deep Generative Machine Learning

Authors: Fred Shone, Tim Hillel

Abstract: Using a deep generative machine learning approach, we synthesise human activity participations and scheduling; i.e. the choices of what activities to participate in and when. Activity schedules are a core component of many applied transport, energy, and epidemiology models. Our data-driven approach directly learns the distributions resulting from human preferences and scheduling logic without the need for complex interacting combinations of sub-models and custom rules. This makes our approach significantly faster and simpler to operate than existing approaches to synthesise or anonymise schedule data. We additionally contribute a novel schedule representation and a comprehensive evaluation framework. We evaluate a range of schedule encoding and deep model architecture combinations. The evaluation shows our approach can rapidly generate large, diverse, novel, and realistic synthetic samples of activity schedules.

replace A Hybrid Supervised and Self-Supervised Graph Neural Network for Edge-Centric Applications

Authors: Eugenio Borzone, Leandro Di Persia, Matias Gerard

Abstract: This paper presents a novel graph-based deep learning model for tasks involving relations between two nodes (edge-centric tasks), where the focus lies on predicting relationships and interactions between pairs of nodes rather than node properties themselves. This model combines supervised and self-supervised learning, taking into account for the loss function the embeddings learned and patterns with and without ground truth. Additionally it incorporates an attention mechanism that leverages both node and edge features. The architecture, trained end-to-end, comprises two primary components: embedding generation and prediction. First, a graph neural network (GNN) transform raw node features into dense, low-dimensional embeddings, incorporating edge attributes. Then, a feedforward neural model processes the node embeddings to produce the final output. Experiments demonstrate that our model matches or exceeds existing methods for protein-protein interactions prediction and Gene Ontology (GO) terms prediction. The model also performs effectively with one-hot encoding for node features, providing a solution for the previously unsolved problem of predicting similarity between compounds with unknown structures.

replace DBA-DFL: Towards Distributed Backdoor Attacks with Network Detection in Decentralized Federated Learning

Authors: Bohan Liu, Yang Xiao, Ruimeng Ye, Zinan Ling, Xiaolong Ma, Bo Hui

Abstract: Distributed backdoor attacks (DBA) have shown a higher attack success rate than centralized attacks in centralized federated learning (FL). However, it has not been investigated in the decentralized FL. In this paper, we experimentally demonstrate that, while directly applying DBA to decentralized FL, the attack success rate depends on the distribution of attackers in the network architecture. Considering that the attackers can not decide their location, this paper aims to achieve a high attack success rate regardless of the attackers' location distribution. Specifically, we first design a method to detect the network by predicting the distance between any two attackers on the network. Then, based on the distance, we organize the attackers in different clusters. Lastly, we propose an algorithm to \textit{dynamically} embed local patterns decomposed from a global pattern into the different attackers in each cluster. We conduct a thorough empirical investigation and find that our method can, in benchmark datasets, outperform both centralized attacks and naive DBA in different decentralized frameworks.

replace Explainable AI for Mental Health Emergency Returns: Integrating LLMs with Predictive Modeling

Authors: Abdulaziz Ahmed, Mohammad Saleem, Mohammed Alzeen, Badari Birur, Rachel E Fargason, Bradley G Burk, Ahmed Alhassan, Mohammed Ali Al-Garadi

Abstract: Importance: Emergency department (ED) returns for mental health conditions pose a major healthcare burden, with 24-27% of patients returning within 30 days. Traditional machine learning models for predicting these returns often lack interpretability for clinical use. Objective: To assess whether integrating large language models (LLMs) with machine learning improves predictive accuracy and clinical interpretability of ED mental health return risk models. Methods: This retrospective cohort study analyzed 42,464 ED visits for 27,904 unique mental health patients at an academic medical center in the Deep South from January 2018 to December 2022. Main Outcomes and Measures: Two primary outcomes were evaluated: (1) 30-day ED return prediction accuracy and (2) model interpretability using a novel LLM-enhanced framework integrating SHAP (SHapley Additive exPlanations) values with clinical knowledge. Results: For chief complaint classification, LLaMA 3 (8B) with 10-shot learning outperformed traditional models (accuracy: 0.882, F1-score: 0.86). In SDoH classification, LLM-based models achieved 0.95 accuracy and 0.96 F1-score, with Alcohol, Tobacco, and Substance Abuse performing best (F1: 0.96-0.89), while Exercise and Home Environment showed lower performance (F1: 0.70-0.67). The LLM-based interpretability framework achieved 99% accuracy in translating model predictions into clinically relevant explanations. LLM-extracted features improved XGBoost AUC from 0.74 to 0.76 and AUC-PR from 0.58 to 0.61. Conclusions and Relevance: Integrating LLMs with machine learning models yielded modest but consistent accuracy gains while significantly enhancing interpretability through automated, clinically relevant explanations. This approach provides a framework for translating predictive analytics into actionable clinical insights.

replace K Nearest Neighbor-Guided Trajectory Similarity Learning

Authors: Yanchuan Chang, Xu Cai, Christian S. Jensen, Jianzhong Qi

Abstract: Trajectory similarity is fundamental to many spatio-temporal data mining applications. Recent studies propose deep learning models to approximate conventional trajectory similarity measures, exploiting their fast inference time once trained. Although efficient inference has been reported, challenges remain in similarity approximation accuracy due to difficulties in trajectory granularity modeling and in exploiting similarity signals in the training data. To fill this gap, we propose TSMini, a highly effective trajectory similarity model with a sub-view modeling mechanism capable of learning multi-granularity trajectory patterns and a k nearest neighbor-based loss that guides TSMini to learn not only absolute similarity values between trajectories but also their relative similarity ranks. Together, these two innovations enable highly accurate trajectory similarity approximation. Experiments show that TSMini can outperform the state-of-the-art models by 22% in accuracy on average when learning trajectory similarity measures.

replace Learning Traffic Anomalies from Generative Models on Real-Time Observations

Authors: Fotis I. Giasemis, Alexandros Sopasakis

Abstract: Accurate detection of traffic anomalies is crucial for effective urban traffic management and congestion mitigation. We use the Spatiotemporal Generative Adversarial Network (STGAN) framework combining Graph Neural Networks and Long Short-Term Memory networks to capture complex spatial and temporal dependencies in traffic data. We apply STGAN to real-time, minute-by-minute observations from 42 traffic cameras across Gothenburg, Sweden, collected over several months in 2020. The images are processed to compute a flow metric representing vehicle density, which serves as input for the model. Training is conducted on data from April to November 2020, and validation is performed on a separate dataset from November 14 to 23, 2020. Our results demonstrate that the model effectively detects traffic anomalies with high precision and low false positive rates. The detected anomalies include camera signal interruptions, visual artifacts, and extreme weather conditions affecting traffic flow.

replace An Efficient Local Search Approach for Polarized Community Discovery in Signed Networks

Authors: Linus Aronsson, Morteza Haghir Chehreghani

Abstract: Signed networks, where edges are labeled as positive or negative to represent friendly or antagonistic interactions, offer a natural framework for analyzing polarization, trust, and conflict in social systems. Detecting meaningful group structures in such networks is crucial for understanding online discourse, political divisions, and trust dynamics. A key challenge is to identify communities that are internally cohesive and externally antagonistic, while allowing for neutral or unaligned vertices. In this paper, we propose a method for identifying $k$ polarized communities that addresses a major limitation of prior methods: their tendency to produce highly size-imbalanced solutions. We introduce a novel optimization objective that avoids such imbalance. In addition, it is well known that approximation algorithms based on local search are highly effective for clustering signed networks when neutral vertices are not allowed. We build on this idea and design the first local search algorithm that extends to the setting with neutral vertices while scaling to large networks. By connecting our approach to block-coordinate Frank-Wolfe optimization, we prove a linear convergence rate, enabled by the structure of our objective. Experiments on real-world and synthetic datasets demonstrate that our method consistently outperforms state-of-the-art baselines in solution quality, while remaining competitive in computational efficiency.

replace Distributional Diffusion Models with Scoring Rules

Authors: Valentin De Bortoli, Alexandre Galashov, J. Swaroop Guntupalli, Guangyao Zhou, Kevin Murphy, Arthur Gretton, Arnaud Doucet

Abstract: Diffusion models generate high-quality synthetic data. They operate by defining a continuous-time forward process which gradually adds Gaussian noise to data until fully corrupted. The corresponding reverse process progressively "denoises" a Gaussian sample into a sample from the data distribution. However, generating high-quality outputs requires many discretization steps to obtain a faithful approximation of the reverse process. This is expensive and has motivated the development of many acceleration methods. We propose to accomplish sample generation by learning the posterior {\em distribution} of clean data samples given their noisy versions, instead of only the mean of this distribution. This allows us to sample from the probability transitions of the reverse process on a coarse time scale, significantly accelerating inference with minimal degradation of the quality of the output. This is accomplished by replacing the standard regression loss used to estimate conditional means with a scoring rule. We validate our method on image and robot trajectory generation, where we consistently outperform standard diffusion models at few discretization steps.

replace Code Simulation as a Proxy for High-order Tasks in Large Language Models

Authors: Emanuele La Malfa, Christoph Weinhuber, Orazio Torre, Fangru Lin, X. Angelo Huang, Samuele Marro, Anthony Cohn, Nigel Shadbolt, Michael Wooldridge

Abstract: Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. We collect pairs of naturalistic and synthetic reasoning tasks to assess the capabilities of Large Language Models (LLM). While naturalistic tasks often require careful human handcrafting, we show that synthetic data is, in many cases, a good proxy that is much easier to collect at scale. We leverage common constructs in programming as the counterpart of the building blocks of naturalistic reasoning tasks, such as straight-line programs, code that contains critical paths, and approximate and redundant instructions. We further assess the capabilities of LLMs on sorting problems and repeated operations via sorting algorithms and nested loops. Our synthetic datasets further reveal that while the most powerful LLMs exhibit relatively strong execution capabilities, the process is fragile: it is negatively affected by memorisation and seems to rely heavily on pattern recognition. Our contribution builds upon synthetically testing the reasoning capabilities of LLMs as a scalable complement to handcrafted human-annotated problems.

replace Consistency of augmentation graph and network approximability in contrastive learning

Authors: Chenghui Li, A. Martina Neuman

Abstract: Contrastive learning leverages data augmentation to develop feature representation without relying on large labeled datasets. However, despite its empirical success, the theoretical foundations of contrastive learning remain incomplete, with many essential guarantees left unaddressed, particularly the realizability assumption concerning neural approximability of an optimal spectral contrastive loss solution. In this work, we overcome these limitations by analyzing pointwise and spectral consistency of the augmentation graph Laplacian. We establish that, under specific conditions for data generation and graph connectivity, as the augmented dataset size increases, the augmentation graph Laplacian converges to a weighted Laplace-Beltrami operator on the natural data manifold. These consistency results ensure that the graph Laplacian spectrum effectively captures the manifold geometry. Consequently, they give way to a robust framework for establishing neural approximability, directly resolving the realizability assumption in a current paradigm.

replace Towards Cost-Effective Reward Guided Text Generation

Authors: Ahmad Rashid, Ruotian Wu, Rongqi Fan, Hongliang Li, Agustinus Kristiadi, Pascal Poupart

Abstract: Reward-guided text generation (RGTG) has emerged as a viable alternative to offline reinforcement learning from human feedback (RLHF). RGTG methods can align baseline language models to human preferences without further training like in standard RLHF methods. However, they rely on a reward model to score each candidate token generated by the language model at inference, incurring significant test-time overhead. Additionally, the reward model is usually only trained to score full sequences, which can lead to sub-optimal choices for partial sequences. In this work, we present a novel reward model architecture that is trained, using a Bradley-Terry loss, to prefer the optimal expansion of a sequence with just a \emph{single call} to the reward model at each step of the generation process. That is, a score for all possible candidate tokens is generated simultaneously, leading to efficient inference. We theoretically analyze various RGTG reward models and demonstrate that prior techniques prefer sub-optimal sequences compared to our method during inference. Empirically, our reward model leads to significantly faster inference than other RGTG methods. It requires fewer calls to the reward model and performs competitively compared to previous RGTG and offline RLHF methods.

replace An Extended Benchmarking of Multi-Agent Reinforcement Learning Algorithms in Complex Fully Cooperative Tasks

Authors: George Papadopoulos, Andreas Kontogiannis, Foteini Papadopoulou, Chaido Poulianou, Ioannis Koumentis, George Vouros

Abstract: Multi-Agent Reinforcement Learning (MARL) has recently emerged as a significant area of research. However, MARL evaluation often lacks systematic diversity, hindering a comprehensive understanding of algorithms' capabilities. In particular, cooperative MARL algorithms are predominantly evaluated on benchmarks such as SMAC and GRF, which primarily feature team game scenarios without assessing adequately various aspects of agents' capabilities required in fully cooperative real-world tasks such as multi-robot cooperation and warehouse, resource management, search and rescue, and human-AI cooperation. Moreover, MARL algorithms are mainly evaluated on low dimensional state spaces, and thus their performance on high-dimensional (e.g., image) observations is not well-studied. To fill this gap, this paper highlights the crucial need for expanding systematic evaluation across a wider array of existing benchmarks. To this end, we conduct extensive evaluation and comparisons of well-known MARL algorithms on complex fully cooperative benchmarks, including tasks with images as agents' observations. Interestingly, our analysis shows that many algorithms, hailed as state-of-the-art on SMAC and GRF, may underperform standard MARL baselines on fully cooperative benchmarks. Finally, towards more systematic and better evaluation of cooperative MARL algorithms, we have open-sourced PyMARLzoo+, an extension of the widely used (E)PyMARL libraries, which addresses an open challenge from [TBG++21], facilitating seamless integration and support with all benchmarks of PettingZoo, as well as Overcooked, PressurePlate, Capture Target and Box Pushing.

replace Fairness and Sparsity within Rashomon sets: Enumeration-Free Exploration and Characterization

Authors: Lucas Langlade, Julien Ferry, Gabriel Laberge, Thibaut Vidal

Abstract: We introduce an enumeration-free method based on mathematical programming to precisely characterize various properties such as fairness or sparsity within the set of "good models", known as Rashomon set. This approach is generically applicable to any hypothesis class, provided that a mathematical formulation of the model learning task exists. It offers a structured framework to define the notion of business necessity and evaluate how fairness can be improved or degraded towards a specific protected group, while remaining within the Rashomon set and maintaining any desired sparsity level. We apply our approach to two hypothesis classes: scoring systems and decision diagrams, leveraging recent mathematical programming formulations for training such models. As seen in our experiments, the method comprehensively and certifiably quantifies trade-offs between predictive performance, sparsity, and fairness. We observe that a wide range of fairness values are attainable, ranging from highly favorable to significantly unfavorable for a protected group, while staying within less than 1% of the best possible training accuracy for the hypothesis class. Additionally, we observe that sparsity constraints limit these trade-offs and may disproportionately harm specific subgroups. As we evidenced, thoroughly characterizing the tensions between these key aspects is critical for an informed and accountable selection of models.

replace Many-Task Federated Fine-Tuning via Unified Task Vectors

Authors: Vasileios Tsouvalas, Tanir Ozcelebi, Nirvana Meratnia

Abstract: Federated Learning (FL) traditionally assumes homogeneous client tasks; however, in real-world scenarios, clients often specialize in diverse tasks, introducing task heterogeneity. To address this challenge, Many-Task FL (MaT-FL) has emerged, enabling clients to collaborate effectively despite task diversity. Existing MaT-FL approaches rely on client grouping or personalized layers, requiring the server to manage individual models and failing to account for clients handling multiple tasks. We propose MaTU, a MaT-FL approach that enables joint learning of task vectors across clients, eliminating the need for clustering or client-specific weight storage at the server. Our method introduces a novel aggregation mechanism that determines task similarity based on the direction of clients task vectors and constructs a unified task vector encapsulating all tasks. To address task-specific requirements, we augment the unified task vector with lightweight modulators that facilitate knowledge transfer among related tasks while disentangling dissimilar ones. Evaluated across 30 datasets, MaTU achieves superior performance over state-of-the-art MaT-FL approaches, with results comparable to per-task fine-tuning, while delivering significant communication savings.

replace Federated Continual Learning: Concepts, Challenges, and Solutions

Authors: Parisa Hamedi, Roozbeh Razavi-Far, Ehsan Hallaji

Abstract: Federated Continual Learning (FCL) has emerged as a robust solution for collaborative model training in dynamic environments, where data samples are continuously generated and distributed across multiple devices. This survey provides a comprehensive review of FCL, focusing on key challenges such as heterogeneity, model stability, communication overhead, and privacy preservation. We explore various forms of heterogeneity and their impact on model performance. Solutions to non-IID data, resource-constrained platforms, and personalized learning are reviewed in an effort to show the complexities of handling heterogeneous data distributions. Next, we review techniques for ensuring model stability and avoiding catastrophic forgetting, which are critical in non-stationary environments. Privacy-preserving techniques are another aspect of FCL that have been reviewed in this work. This survey has integrated insights from federated learning and continual learning to present strategies for improving the efficacy and scalability of FCL systems, making it applicable to a wide range of real-world scenarios.

replace Toward Universal Laws of Outlier Propagation

Authors: Aram Ebtekar, Yuhao Wang, Dominik Janzing

Abstract: When a variety of anomalous features motivate flagging different samples as outliers, Algorithmic Information Theory (AIT) offers a principled way to unify them in terms of a sample's randomness deficiency. Subject to the algorithmic Markov condition on a causal Bayesian network, we show that the randomness deficiency of a joint sample decomposes into a sum of randomness deficiencies at each causal mechanism. Consequently, anomalous observations can be attributed to their root causes, i.e., the mechanisms that behaved anomalously. As an extension of Levin's law of randomness conservation, we show that weak outliers cannot cause strong ones. We show how these information theoretic laws clarify our understanding of outlier detection and attribution, in the context of more specialized outlier scores from prior literature.

replace A Comparative Study of Machine Learning Algorithms for Stock Price Prediction Using Insider Trading Data

Authors: Amitabh Chakravorty, Nelly Elsayed

Abstract: The research paper empirically investigates several machine learning algorithms to forecast stock prices depending on insider trading information. Insider trading offers special insights into market sentiment, pointing to upcoming changes in stock prices. This study examines the effectiveness of algorithms like decision trees, random forests, support vector machines (SVM) with different kernels, and K-Means Clustering using a dataset of Tesla stock transactions. Examining past data from April 2020 to March 2023, this study focuses on how well these algorithms identify trends and forecast stock price fluctuations. The paper uses Recursive Feature Elimination (RFE) and feature importance analysis to optimize the feature set and, hence, increase prediction accuracy. While it requires substantially greater processing time than other models, SVM with the Radial Basis Function (RBF) kernel displays the best accuracy. This paper highlights the trade-offs between accuracy and efficiency in machine learning models and proposes the possibility of pooling multiple data sources to raise prediction performance. The results of this paper aim to help financial analysts and investors in choosing strong algorithms to optimize investment strategies.

replace Convergence of Policy Mirror Descent Beyond Compatible Function Approximation

Authors: Uri Sherman, Tomer Koren, Yishay Mansour

Abstract: Modern policy optimization methods roughly follow the policy mirror descent (PMD) algorithmic template, for which there are by now numerous theoretical convergence results. However, most of these either target tabular environments, or can be applied effectively only when the class of policies being optimized over satisfies strong closure conditions, which is typically not the case when working with parametric policy classes in large-scale environments. In this work, we develop a theoretical framework for PMD for general policy classes where we replace the closure conditions with a strictly weaker variational gradient dominance assumption, and obtain upper bounds on the rate of convergence to the best-in-class policy. Our main result leverages a novel notion of smoothness with respect to a local norm induced by the occupancy measure of the current policy, and casts PMD as a particular instance of smooth non-convex optimization in non-Euclidean space.

replace LabTOP: A Unified Model for Lab Test Outcome Prediction on Electronic Health Records

Authors: Sujeong Im, Jungwoo Oh, Edward Choi

Abstract: Lab tests are fundamental for diagnosing diseases and monitoring patient conditions. However, frequent testing can be burdensome for patients, and test results may not always be immediately available. To address these challenges, we propose LabTOP, a unified model that predicts lab test outcomes by leveraging a language modeling approach on EHR data. Unlike conventional methods that estimate only a subset of lab tests or classify discrete value ranges, LabTOP performs continuous numerical predictions for a diverse range of lab items. We evaluate LabTOP on three publicly available EHR datasets and demonstrate that it outperforms existing methods, including traditional machine learning models and state-of-the-art large language models. We also conduct extensive ablation studies to confirm the effectiveness of our design choices. We believe that LabTOP will serve as an accurate and generalizable framework for lab test outcome prediction, with potential applications in clinical decision support and early detection of critical conditions.

replace Towards Efficient Contrastive PAC Learning

Authors: Jie Shen

Abstract: We study contrastive learning under the PAC learning framework. While a series of recent works have shown statistical results for learning under contrastive loss, based either on the VC-dimension or Rademacher complexity, their algorithms are inherently inefficient or not implying PAC guarantees. In this paper, we consider contrastive learning of the fundamental concept of linear representations. Surprisingly, even under such basic setting, the existence of efficient PAC learners is largely open. We first show that the problem of contrastive PAC learning of linear representations is intractable to solve in general. We then show that it can be relaxed to a semi-definite program when the distance between contrastive samples is measured by the $\ell_2$-norm. We then establish generalization guarantees based on Rademacher complexity, and connect it to PAC guarantees under certain contrastive large-margin conditions. To the best of our knowledge, this is the first efficient PAC learning algorithm for contrastive learning.

replace Do LLMs Understand the Safety of Their Inputs? Training-Free Moderation via Latent Prototypes

Authors: Maciej Chrab\k{a}szcz, Filip Szatkowski, Bartosz W\'ojcik, Jan Dubi\'nski, Tomasz Trzci\'nski, Sebastian Cygert

Abstract: With the rise of LLMs, ensuring model safety and alignment has become a critical concern. While modern instruction-finetuned LLMs incorporate alignment during training, they still frequently require moderation tools to prevent unsafe behavior. The most common approach to moderation are guard models that flag unsafe inputs. However, guards require costly training and are typically limited to fixed-size, pre-trained options, making them difficult to adapt to evolving risks and resource constraints. We hypothesize that instruction-finetuned LLMs already encode safety-relevant information internally and explore training-free safety assessment methods that work with off-the-shelf models. We show that simple prompting allows models to recognize harmful inputs they would otherwise mishandle. We also demonstrate that safe and unsafe prompts are distinctly separable in the models' latent space. Building on this, we introduce the Latent Prototype Moderator (LPM), a training-free moderation method that uses Mahalanobis distance in latent space to assess input safety. LPM is a lightweight, customizable add-on that generalizes across model families and sizes. Our method matches or exceeds state-of-the-art guard models across multiple safety benchmarks, offering a practical and flexible solution for scalable LLM moderation.

replace Posterior Inference with Diffusion Models for High-dimensional Black-box Optimization

Authors: Taeyoung Yun, Kiyoung Om, Jaewoo Lee, Sujin Yun, Jinkyoo Park

Abstract: Optimizing high-dimensional and complex black-box functions is crucial in numerous scientific applications. While Bayesian optimization (BO) is a powerful method for sample-efficient optimization, it struggles with the curse of dimensionality and scaling to thousands of evaluations. Recently, leveraging generative models to solve black-box optimization problems has emerged as a promising framework. However, those methods often underperform compared to BO methods due to limited expressivity and difficulty of uncertainty estimation in high-dimensional spaces. To overcome these issues, we introduce \textbf{DiBO}, a novel framework for solving high-dimensional black-box optimization problems. Our method iterates two stages. First, we train a diffusion model to capture the data distribution and deep ensembles to predict function values with uncertainty quantification. Second, we cast the candidate selection as a posterior inference problem to balance exploration and exploitation in high-dimensional spaces. Concretely, we fine-tune diffusion models to amortize posterior inference. Extensive experiments demonstrate that our method outperforms state-of-the-art baselines across synthetic and real-world tasks. Our code is publicly available \href{https://github.com/umkiyoung/DiBO}{here}.

URLs: https://github.com/umkiyoung/DiBO

replace Distributionally Robust Active Learning for Gaussian Process Regression

Authors: Shion Takeno, Yoshito Okura, Yu Inatsu, Tatsuya Aoyama, Tomonari Tanaka, Satoshi Akahane, Hiroyuki Hanada, Noriaki Hashimoto, Taro Murayama, Hanju Lee, Shinya Kojima, Ichiro Takeuchi

Abstract: Gaussian process regression (GPR) or kernel ridge regression is a widely used and powerful tool for nonlinear prediction. Therefore, active learning (AL) for GPR, which actively collects data labels to achieve an accurate prediction with fewer data labels, is an important problem. However, existing AL methods do not theoretically guarantee prediction accuracy for target distribution. Furthermore, as discussed in the distributionally robust learning literature, specifying the target distribution is often difficult. Thus, this paper proposes two AL methods that effectively reduce the worst-case expected error for GPR, which is the worst-case expectation in target distribution candidates. We show an upper bound of the worst-case expected squared error, which suggests that the error will be arbitrarily small by a finite number of data labels under mild conditions. Finally, we demonstrate the effectiveness of the proposed methods through synthetic and real-world datasets.

replace Graded Neural Networks

Authors: Tony Shaska

Abstract: This paper presents a novel framework for graded neural networks (GNNs) built over graded vector spaces $\V_\w^n$, extending classical neural architectures by incorporating algebraic grading. Leveraging a coordinate-wise grading structure with scalar action $\lambda \star \x = (\lambda^{q_i} x_i)$, defined by a tuple $\w = (q_0, \ldots, q_{n-1})$, we introduce graded neurons, layers, activation functions, and loss functions that adapt to feature significance. Theoretical properties of graded spaces are established, followed by a comprehensive GNN design, addressing computational challenges like numerical stability and gradient scaling. Potential applications span machine learning and photonic systems, exemplified by high-speed laser-based implementations. This work offers a foundational step toward graded computation, unifying mathematical rigor with practical potential, with avenues for future empirical and hardware exploration.

replace Learning Classifiers That Induce Markets

Authors: Yonatan Sommer, Ivri Hikri, Lotan Amit, Nir Rosenfeld

Abstract: When learning is used to inform decisions about humans, such as for loans, hiring, or admissions, this can incentivize users to strategically modify their features, at a cost, to obtain positive predictions. The common assumption is that the function governing costs is exogenous, fixed, and predetermined. We challenge this assumption, and assert that costs can emerge as a result of deploying a classifier. Our idea is simple: when users seek positive predictions, this creates demand for important features; and if features are available for purchase, then a market will form, and competition will give rise to prices. We extend the strategic classification framework to support this notion, and study learning in a setting where a classifier can induce a market for features. We present an analysis of the learning task, devise an algorithm for computing market prices, propose a differentiable learning framework, and conduct experiments to explore our novel setting and approach.

replace Offline RLAIF: Piloting VLM Feedback for RL via SFO

Authors: Jacob Beck

Abstract: While internet-scale image and textual data have enabled strong generalization in Vision-Language Models (VLMs), the absence of internet-scale control data has impeded the development of similar generalization in standard reinforcement learning (RL) agents. Although VLMs are fundamentally limited in their ability to solve control tasks due to their lack of action-conditioned training data, their capacity for image understanding allows them to provide valuable feedback in RL tasks by recognizing successful outcomes. A key challenge in Reinforcement Learning from AI Feedback (RLAIF) is determining how best to integrate VLM-derived signals into the learning process. We explore this question in the context of offline RL and introduce a class of methods called Sub-Trajectory Filtered Optimization (SFO). We identify three key insights. First, trajectory length plays a crucial role in offline RL, as full-trajectory preference learning exacerbates the stitching problem, necessitating the use of sub-trajectories. Second, even in Markovian environments, a non-Markovian reward signal from a sequence of images is required to assess trajectory improvement, as VLMs do not interpret control actions and must rely on visual cues over time. Third, a simple yet effective approach--filtered and weighted behavior cloning--consistently outperforms more complex RLHF-based methods. We propose Sub-Trajectory Filtered Behavior Cloning (SFBC), a method that leverages VLM feedback on sub-trajectories while incorporating a retrospective filtering mechanism that removes sub-trajectories preceding failures to improve robustness and prevent turbulence. Please enjoy our airport puns.

replace Inductive randomness predictors: beyond conformal

Authors: Vladimir Vovk

Abstract: This paper introduces inductive randomness predictors, which form a proper superset of inductive conformal predictors but have the same principal property of validity under the assumption of randomness (i.e., of IID data). It turns out that every non-trivial inductive conformal predictor is strictly dominated by an inductive randomness predictor, although the improvement is not great, at most a factor of $\mathrm{e}\approx2.72$ in the case of e-prediction. The dominating inductive randomness predictors are more complicated and more difficult to compute; besides, an improvement by a factor of $\mathrm{e}$ is rare. Therefore, this paper does not suggest replacing inductive conformal predictors by inductive randomness predictors and only calls for a more detailed study of the latter.

replace Language Models can Self-Improve at State-Value Estimation for Better Search

Authors: Ethan Mendes, Alan Ritter

Abstract: Collecting ground-truth rewards or human demonstrations for multi-step reasoning tasks is often prohibitively expensive and time consuming, especially in interactive domains like web tasks. To address this bottleneck, we present self-taught lookahead (STL), a self-supervised method that leverages state-transition dynamics to improve a value model capable of effectively guiding language model-controlled search without any labeled data. We find that moderately sized (8 billion parameters) open-weight value models improved with STL can match the performance of using a gpt-4o value model. Furthermore, we find that specialized value models learned with STL can be deployed with computationally lightweight search algorithms, achieving performance that matches that of more expensive tree search methods, while reducing costs by an order of magnitude.

replace Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States

Authors: Xin Wei Chia, Swee Liang Wong, Jonathan Pan

Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, yet they remain vulnerable to adversarial manipulations such as jailbreaking via prompt injection attacks. These attacks bypass safety mechanisms to generate restricted or harmful content. In this study, we investigated the underlying latent subspaces of safe and jailbroken states by extracting hidden activations from a LLM. Inspired by attractor dynamics in neuroscience, we hypothesized that LLM activations settle into semi stable states that can be identified and perturbed to induce state transitions. Using dimensionality reduction techniques, we projected activations from safe and jailbroken responses to reveal latent subspaces in lower dimensional spaces. We then derived a perturbation vector that when applied to safe representations, shifted the model towards a jailbreak state. Our results demonstrate that this causal intervention results in statistically significant jailbreak responses in a subset of prompts. Next, we probed how these perturbations propagate through the model's layers, testing whether the induced state change remains localized or cascades throughout the network. Our findings indicate that targeted perturbations induced distinct shifts in activations and model responses. Our approach paves the way for potential proactive defenses, shifting from traditional guardrail based methods to preemptive, model agnostic techniques that neutralize adversarial states at the representation level.

replace Mirror Online Conformal Prediction with Intermittent Feedback

Authors: Bowen Wang, Matteo Zecchin, Osvaldo Simeone

Abstract: Online conformal prediction enables the runtime calibration of a pre-trained artificial intelligence model using feedback on its performance. Calibration is achieved through set predictions that are updated via online rules so as to ensure long-term coverage guarantees. While recent research has demonstrated the benefits of incorporating prior knowledge into the calibration process, this has come at the cost of replacing coverage guarantees with less tangible regret guarantees based on the quantile loss. This work introduces intermittent mirror online conformal prediction (IM-OCP), a novel runtime calibration framework that integrates prior knowledge, operates under potentially intermittent feedback, and features minimal memory complexity. IM-OCP guarantees long-term coverage and sub-linear regret, both of which hold deterministically for any given data sequence and in expectation with respect to the intermittent feedback.

replace PENCIL: Long Thoughts with Short Memory

Authors: Chenxiao Yang, Nathan Srebro, David McAllester, Zhiyuan Li

Abstract: While state-of-the-art LLMs have demonstrated great promise of using long Chains-of-Thought (CoT) to boost reasoning, scaling it up to more challenging problems at test-time is fundamentally limited by suboptimal memory usage -- intermediate computations accumulate indefinitely in context even when no longer needed for future thoughts. We introduce PENCIL, which incorporates a novel reduction mechanism into the autoregressive generation process that recursively cleans up intermediate thoughts based on patterns learned from training. By iteratively generating and erasing thoughts, PENCIL can think deeper to solve harder problems using shorter context and less compute. Empirically, we observe PENCIL is significantly more effective and efficient than CoT. For example, we demonstrate PENCIL with a small 25M-parameter transformer and 2048 context length solves Einstein's puzzle -- a task that challenges much larger models like GPT-4. Theoretically, we prove PENCIL can perform universal efficient computation by simulating any Turing machines with optimal time and space complexity, and thus can solve arbitrary computable tasks that are otherwise intractable for vanilla CoT.

replace Membership Inference Attacks on Large-Scale Models: A Survey

Authors: Hengyu Wu, Yang Cao

Abstract: The adoption of the Large Language Model (LLM) has accelerated dramatically since ChatGPT from OpenAI went online in November 2022. Recent advances in Large Multimodal Models (LMMs), which process diverse data types and enable interaction through various channels, have expanded beyond the text-to-text limitations of early LLMs, attracting significant and concurrent attention from both researchers and industry. While LLMs and LMMs are starting to spread widely, concerns about their privacy risks are increasing as well. Membership Inference Attacks (MIAs) are techniques used to determine whether a particular data point was part of a model's training set, which is a key metric for assessing the privacy vulnerabilities of machine learning models. Hu et al. show that various machine learning algorithms are vulnerable to MIA. Despite extensive studies on MIAs in classic models, there remains a lack of systematic surveys addressing their effectiveness and limitations in advanced large-scale models like LLMs and LMMs. In this paper, we systematically reviewed recent studies of MIA against LLMs and LMMs. We analyzed and categorized each attack based on its methodology, scenario, and targeted model, and we discussed the limitations of existing research. In addition to examining attacks on pre-training and fine-tuning stages, we also explore MIAs that target other development pipelines, including Retrieval-Augmented Generation (RAG) and the model alignment process. Based on the survey, we provide suggestions for future studies to improve the robustness of MIA in large-scale AI models.

replace Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Authors: Jingcheng Hu, Yinmin Zhang, Qi Han, Daxin Jiang, Xiangyu Zhang, Heung-Yeung Shum

Abstract: We introduce Open-Reasoner-Zero, the first open source implementation of large-scale reasoning-oriented RL training on the base model focusing on scalability, simplicity and accessibility. Through extensive experiments, we demonstrate that a minimalist approach, vanilla PPO with GAE ($\lambda=1$, $\gamma=1$) and straightforward rule-based rewards, without any KL regularization, is sufficient to scale up both benchmark performance and response length, replicating the scaling phenomenon observed in DeepSeek-R1-Zero. Using the same base model, Qwen2.5-32B base, as DeepSeek-R1-Zero-Qwen-32B, our implementation achieves superior performance across AIME2024, MATH500, and GPQA Diamond, while demonstrating remarkable efficiency, requiring only 1/10 of the training steps compared to the DeepSeek-R1-Zero pipeline. Moreover, our analysis not only covers training dynamics and ablation for critical design choices, but also quantitatively shows how the learned critic in Reasoner-Zero training effectively identifies and devalues repetitive response patterns, yielding more robust advantage estimations and enhancing training stability. Embracing the principles of open-source, we release our source code, training data, and various model weights, fostering reproducibility and encouraging further exploration of the properties of related models.

replace Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical Systems

Authors: Alexander Windmann, Henrik Steude, Daniel Boschmann, Oliver Niggemann

Abstract: Cyber-Physical Systems (CPS) in domains such as manufacturing and energy distribution generate complex time series data crucial for Prognostics and Health Management (PHM). While Deep Learning (DL) methods have demonstrated strong forecasting capabilities, their adoption in industrial CPS remains limited due insufficient robustness. Existing robustness evaluations primarily focus on formal verification or adversarial perturbations, inadequately representing the complexities encountered in real-world CPS scenarios. To address this, we introduce a practical robustness definition grounded in distributional robustness, explicitly tailored to industrial CPS, and propose a systematic framework for robustness evaluation. Our framework simulates realistic disturbances, such as sensor drift, noise and irregular sampling, enabling thorough robustness analyses of forecasting models on real-world CPS datasets. The robustness definition provides a standardized score to quantify and compare model performance across diverse datasets, assisting in informed model selection and architecture design. Through extensive empirical studies evaluating prominent DL architectures (including recurrent, convolutional, attention-based, modular, and structured state-space models) we demonstrate the applicability and effectiveness of our approach. We publicly release our robustness benchmark to encourage further research and reproducibility.

replace Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models

Authors: Runpeng Dai, Run Yang, Fan Zhou, Hongtu Zhu

Abstract: Large Language Models (LLMs) and Vision-Language Models (VLMs) have achieved impressive performance across a wide range of tasks, yet they remain vulnerable to carefully crafted perturbations. In this study, we seek to pinpoint the sources of this fragility by identifying parameters and input dimensions (pixels or token embeddings) that are susceptible to such perturbations. To this end, we propose a stability measure called \textbf{FI}, \textbf{F}irst order local \textbf{I}nfluence, which is rooted in information geometry and quantifies the sensitivity of individual parameter and input dimensions. Our extensive analysis across LLMs and VLMs (from 1.5B to 13B parameters) reveals that: (I) A small subset of parameters or input dimensions with high FI values disproportionately contribute to model brittleness. (II) Mitigating the influence of these vulnerable parameters during model merging leads to improved performance.

replace Follow-the-Perturbed-Leader Approaches Best-of-Both-Worlds for the m-Set Semi-Bandit Problems

Authors: Jingxin Zhan, Yuchen Xin, Chenjie Sun, Zhihua Zhang

Abstract: We consider a common case of the combinatorial semi-bandit problem, the $m$-set semi-bandit, where the learner exactly selects $m$ arms from the total $d$ arms. In the adversarial setting, the best regret bound, known to be $\mathcal{O}(\sqrt{nmd})$ for time horizon $n$, is achieved by the well-known Follow-the-Regularized-Leader (FTRL) policy. However, this requires to explicitly compute the arm-selection probabilities via optimizing problems at each time step and sample according to them. This problem can be avoided by the Follow-the-Perturbed-Leader (FTPL) policy, which simply pulls the $m$ arms that rank among the $m$ smallest (estimated) loss with random perturbation. In this paper, we show that FTPL with a Fr\'echet perturbation also enjoys the near optimal regret bound $\mathcal{O}(\sqrt{nm}(\sqrt{d\log(d)}+m^{5/6}))$ in the adversarial setting and approaches best-of-both-world regret bounds, i.e., achieves a logarithmic regret for the stochastic setting. Moreover, our lower bounds show that the extra factors are unavoidable with our approach; any improvement would require a fundamentally different and more challenging method.

replace Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment Analysis

Authors: Miaosen Luo, Yuncheng Jiang, Sijie Mai

Abstract: Multimodal Sentiment Analysis (MSA) faces two critical challenges: the lack of interpretability in the decision logic of multimodal fusion and modality imbalance caused by disparities in inter-modal information density. To address these issues, we propose KAN-MCP, a novel framework that integrates the interpretability of Kolmogorov-Arnold Networks (KAN) with the robustness of the Multimodal Clean Pareto (MCPareto) framework. First, KAN leverages its univariate function decomposition to achieve transparent analysis of cross-modal interactions. This structural design allows direct inspection of feature transformations without relying on external interpretation tools, thereby ensuring both high expressiveness and interpretability. Second, the proposed MCPareto enhances robustness by addressing modality imbalance and noise interference. Specifically, we introduce the Dimensionality Reduction and Denoising Modal Information Bottleneck (DRD-MIB) method, which jointly denoises and reduces feature dimensionality. This approach provides KAN with discriminative low-dimensional inputs to reduce the modeling complexity of KAN while preserving critical sentiment-related information. Furthermore, MCPareto dynamically balances gradient contributions across modalities using the purified features output by DRD-MIB, ensuring lossless transmission of auxiliary signals and effectively alleviating modality imbalance. This synergy of interpretability and robustness not only achieves superior performance on benchmark datasets such as CMU-MOSI, CMU-MOSEI, and CH-SIMS v2 but also offers an intuitive visualization interface through KAN's interpretable architecture. Our code is released on https://github.com/LuoMSen/KAN-MCP.

URLs: https://github.com/LuoMSen/KAN-MCP.

replace Can Local Representation Alignment RNNs Solve Temporal Tasks?

Authors: Nikolay Manchev, Luis C. Garcia-Peraza-Herrera

Abstract: Recurrent Neural Networks (RNNs) are commonly used for real-time processing, streaming data, and cases where the amount of training samples is limited. Backpropagation Through Time (BPTT) is the predominant algorithm for training RNNs; however, it is frequently criticized for being prone to exploding and vanishing gradients and being biologically implausible. In this paper, we present and evaluate a target propagation-based method for RNNs, which uses local updates and seeks to reduce the said instabilities. Having stable RNN models increases their practical use in a wide range of fields such as natural language processing, time-series forecasting, anomaly detection, control systems, and robotics. The proposed solution uses local representation alignment (LRA). We thoroughly analyze the performance of this method, experiment with normalization and different local error functions, and invalidate certain assumptions about the behavior of this type of learning. Namely, we demonstrate that despite the decomposition of the network into sub-graphs, the model still suffers from vanishing gradients. We also show that gradient clipping as proposed in LRA has little to no effect on network performance. This results in an LRA RNN model that is very difficult to train due to vanishing gradients. We address this by introducing gradient regularization in the direction of the update and demonstrate that this modification promotes gradient flow and meaningfully impacts convergence. We compare and discuss the performance of the algorithm, and we show that the regularized LRA RNN considerably outperforms the unregularized version on three landmark tasks: temporal order, 3-bit temporal order, and random permutation.

replace CAOTE: KV Cache Eviction for LLMs via Attention Output Error-Based Token Selection

Authors: Raghavv Goel, Junyoung Park, Mukul Gagrani, Dalton Jones, Matthew Morse, Harper Langston, Mingu Lee, Chris Lott

Abstract: While long context support of large language models has extended their abilities, it also incurs challenges in memory and compute which becomes crucial bottlenecks in resource-restricted devices. Token eviction, a widely adopted post-training methodology designed to alleviate the bottlenecks by evicting less important tokens from the cache, typically uses attention scores as proxy metrics for token importance. However, one major limitation of attention score as a token-wise importance metrics is that it lacks the information about contribution of tokens to the attention output. In this paper, we propose a simple eviction criterion based on the contribution of cached tokens to attention outputs. Our method, CAOTE, optimizes for eviction error due to token eviction, by seamlessly integrating attention scores and value vectors. This is the first method which uses value vector information on top of attention-based eviction scores. Additionally, CAOTE can act as a meta-heuristic method with flexible usage with any token eviction method. We show that CAOTE, when combined with the state-of-the-art attention score-based methods, always improves accuracies on the downstream task, indicating the importance of leveraging information from values during token eviction process.

replace Improving RL Exploration for LLM Reasoning through Retrospective Replay

Authors: Shihan Dou, Muling Wu, Jingwen Xu, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang

Abstract: Reinforcement learning (RL) has increasingly become a pivotal technique in the post-training of large language models (LLMs). The effective exploration of the output space is essential for the success of RL. We observe that for complex problems, during the early stages of training, the model exhibits strong exploratory capabilities and can identify promising solution ideas. However, its limited capability at this stage prevents it from successfully solving these problems. The early suppression of these potentially valuable solution ideas by the policy gradient hinders the model's ability to revisit and re-explore these ideas later. Consequently, although the LLM's capabilities improve in the later stages of training, it still struggles to effectively address these complex problems. To address this exploration issue, we propose a novel algorithm named Retrospective Replay-based Reinforcement Learning (RRL), which introduces a dynamic replay mechanism throughout the training process. RRL enables the model to revisit promising states identified in the early stages, thereby improving its efficiency and effectiveness in exploration. To evaluate the effectiveness of RRL, we conduct extensive experiments on complex reasoning tasks, including mathematical reasoning and code generation, and general dialogue tasks. The results indicate that RRL maintains high exploration efficiency throughout the training period, significantly enhancing the effectiveness of RL in optimizing LLMs for complicated reasoning tasks. Moreover, it also improves the performance of RLHF, making the model both safer and more helpful.

replace AI for the Open-World: the Learning Principles

Authors: Jianyu Zhang

Abstract: During the past decades, numerous successes of AI has been made on "specific capabilities", named closed-world, such as artificial environments or specific real-world tasks. This well-defined narrow capability brings two nice benefits, a clear criterion of success and the opportunity to collect a lot of examples. The criteria not only reveal whether a machine has achieved a goal, but reveal how the machine falls short of the goal. As a result, human designers can fix the problems one after the other until the machine is deemed good enough for the task. Furthermore, the large set of collected examples reduces the difficulty of this problem-fixing process (by the central limit theorem). Do the success in closed-world translate into broad open-world, where a machine is required to perform any task that a human could possibly undertake with fewer examples and less priori knowledge from human designers? No. Because competence in a specific task provides little insight in handling other tasks, the valuable criteria for specific tasks become helpless when handling broader unseen tasks. Furthermore, due to the shortage of examples in unseen tasks, central limit theorem does not stand on our side. At the end, human designers lose the oscilloscope to "hack" an AI system for the open-world. Achieving AI for the open-world requires unique learning principles and innovated techniques, which are different from the ones in building AI for the closed-world. This thesis explores necessary learning principles required to construct AI for the open-world, including rich features (analogy a large tool box), disentangled representation (an organized tool box), and inference-time learning (a tool-savvy hand). Driven by the learning principles, this thesis further proposes techniques to use the learning principles, conducts enormous large-scale experiments to verify the learning principles.

replace Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

Authors: Kazuki Fujii, Yukito Tajima, Sakae Mizuki, Hinari Shimada, Taihei Shiotani, Koshiro Saito, Masanari Ohi, Masaki Kawamura, Taishi Nakamura, Takumi Okamoto, Shigeki Ishida, Kakeru Hattori, Youmi Ma, Hiroya Takamura, Rio Yokota, Naoaki Okazaki

Abstract: The performance of large language models (LLMs) in program synthesis and mathematical reasoning is fundamentally limited by the quality of their pre-training corpora. We introduce two openly licensed datasets, released under the Llama 3.3 Community License, that significantly enhance LLM performance by systematically rewriting public data. SwallowCode (approximately 16.1 billion tokens) refines Python snippets from The-Stack-v2 through a novel four-stage pipeline: syntax validation, pylint-based style filtering, and a two-stage LLM rewriting process that enforces style conformity and transforms snippets into self-contained, algorithmically efficient examples. Unlike prior methods that rely on exclusionary filtering or limited transformations, our transform-and-retain approach upgrades low-quality code, maximizing data utility. SwallowMath (approximately 2.3 billion tokens) enhances Finemath-4+ by removing boilerplate, restoring context, and reformatting solutions into concise, step-by-step explanations. Within a fixed 50 billion token training budget, continual pre-training of Llama-3.1-8B with SwallowCode boosts pass@1 by +17.0 on HumanEval and +17.7 on HumanEval+ compared to Stack-Edu, surpassing the baseline model's code generation capabilities. Similarly, substituting SwallowMath yields +12.4 accuracy on GSM8K and +7.6 on MATH. Ablation studies confirm that each pipeline stage contributes incrementally, with rewriting delivering the largest gains. All datasets, prompts, and checkpoints are publicly available, enabling reproducible research and advancing LLM pre-training for specialized domains.

replace PRUNE: A Patching Based Repair Framework for Certifiable Unlearning of Neural Networks

Authors: Xuran Li, Jingyi Wang, Xiaohan Yuan, Peixin Zhang

Abstract: It is often desirable to remove (a.k.a. unlearn) a specific part of the training data from a trained neural network model. A typical application scenario is to protect the data holder's right to be forgotten, which has been promoted by many recent regulation rules. Existing unlearning methods involve training alternative models with remaining data, which may be costly and challenging to verify from the data holder or a thirdparty auditor's perspective. In this work, we provide a new angle and propose a novel unlearning approach by imposing carefully crafted "patch" on the original neural network to achieve targeted "forgetting" of the requested data to delete. Specifically, inspired by the research line of neural network repair, we propose to strategically seek a lightweight minimum "patch" for unlearning a given data point with certifiable guarantee. Furthermore, to unlearn a considerable amount of data points (or an entire class), we propose to iteratively select a small subset of representative data points to unlearn, which achieves the effect of unlearning the whole set. Extensive experiments on multiple categorical datasets demonstrates our approach's effectiveness, achieving measurable unlearning while preserving the model's performance and being competitive in efficiency and memory consumption compared to various baseline methods.

replace Relative Overfitting and Accept-Reject Framework

Authors: Yanxin Liu, Yunqi Zhang

Abstract: The scaling of Large Language Models (LLMs) currently faces significant challenges. Model assembly is widely considered a promising solution to break through these performance bottlenecks. However, current ensembling methods are primarily guided by the statistical expectation that combining multiple models over large samples will lead to performance gains. We propose an ensemble framework that transitions from such stochastic, sample-dependent methods to a regular, controllable approach based on fine-grained model segmentation. This regularity governs how models are segmented to ensure performance improvement, how the magnitude of this improvement varies with model selection, and what factors determine its theoretical maximum. To formalize this pattern, we introduce the concept of'relative overfitting,' which is derived from the performance discrepancies between constituent models and builds a bridge between ensemble outcomes and the inherent attributes of these models. We detail the patterns of this framework within the domain of NLP and briefly describe its extensibility to other fields, such as computer vision (CV) and AI for science. Our approach was validated using both custom-built and pre-trained mainstream models across diverse benchmarks, including language modeling, long-context tasks, and question-answering (QA). The results indicate that the ensemble rules we proposed are generally effective and that we provide a rigorous proof of these rules in certain experimental scenarios. The proposed framework offers a new perspective for understanding ensemble theory and provides a systematic approach to addressing the performance bottlenecks of LLMs.

replace Aggregating Concepts of Fairness and Accuracy in Prediction Algorithms

Authors: David Kinney

Abstract: An algorithm that outputs predictions about the state of the world will almost always be designed with the implicit or explicit goal of outputting accurate predictions (i.e., predictions that are likely to be true). In addition, the rise of increasingly powerful predictive algorithms brought about by the recent revolution in artificial intelligence has led to an emphasis on building predictive algorithms that are fair, in the sense that their predictions do not systematically evince bias or bring about harm to certain individuals or groups. This state of affairs presents two conceptual challenges. First, the goals of accuracy and fairness can sometimes be in tension, and there are no obvious normative guidelines for managing the trade-offs between these two desiderata when they arise. Second, there are many distinct ways of measuring both the accuracy and fairness of a predictive algorithm; here too, there are no obvious guidelines on how to aggregate our preferences for predictive algorithms that satisfy disparate measures of fairness and accuracy to various extents. The goal of this paper is to address these challenges by arguing that there are good reasons for using a linear combination of accuracy and fairness metrics to measure the all-things-considered value of a predictive algorithm for agents who care about both accuracy and fairness. My argument depends crucially on a classic result in the preference aggregation literature due to Harsanyi. After making this formal argument, I apply my result to an analysis of accuracy-fairness trade-offs using the COMPAS dataset compiled by Angwin et al.

replace Adaptive Inference-Time Scaling via Cyclic Diffusion Search

Authors: Gyubin Lee, Truong Nhat Nguyen Bao, Jaesik Yoon, Dongwoo Lee, Minsu Kim, Yoshua Bengio, Sungjin Ahn

Abstract: Diffusion models have demonstrated strong generative capabilities across domains ranging from image synthesis to complex reasoning tasks. However, most inference-time scaling methods rely on fixed denoising schedules, limiting their ability to allocate computation based on instance difficulty or task-specific demands adaptively. We introduce the challenge of adaptive inference-time scaling-dynamically adjusting computational effort during inference-and propose Adaptive Bi-directional Cyclic Diffusion (ABCD), a flexible, search-based inference framework. ABCD refines outputs through bi-directional diffusion cycles while adaptively controlling exploration depth and termination. It comprises three components: Cyclic Diffusion Search, Automatic Exploration-Exploitation Balancing, and Adaptive Thinking Time. Experiments show that ABCD improves performance across diverse tasks while maintaining computational efficiency.

replace Understanding Differential Transformer Unchains Pretrained Self-Attentions

Authors: Chaerin Kong, Jiho Jang, Nojun Kwak

Abstract: Differential Transformer has recently gained significant attention for its impressive empirical performance, often attributed to its ability to perform noise canceled attention. However, precisely how differential attention achieves its empirical benefits remains poorly understood. Moreover, Differential Transformer architecture demands large-scale training from scratch, hindering utilization of open pretrained weights. In this work, we conduct an in-depth investigation of Differential Transformer, uncovering three key factors behind its success: (1) enhanced expressivity via negative attention, (2) reduced redundancy among attention heads, and (3) improved learning dynamics. Based on these findings, we propose DEX, a novel method to efficiently integrate the advantages of differential attention into pretrained language models. By reusing the softmax attention scores and adding a lightweight differential operation on the output value matrix, DEX effectively incorporates the key advantages of differential attention while remaining lightweight in both training and inference. Evaluations confirm that DEX substantially improves the pretrained LLMs across diverse benchmarks, achieving significant performance gains with minimal adaptation data (< 0.01%).

replace JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model

Authors: Qihao Duan, Bingding Huang, Zhenqiao Song, Irina Lehmann, Lei Gu, Roland Eils, Benjamin Wild

Abstract: Large language models (LLMs) have revolutionized natural language processing and are increasingly applied to other sequential data types, including genetic sequences. However, adapting LLMs to genomics presents significant challenges. Capturing complex genomic interactions requires modeling long-range dependencies within DNA sequences, where interactions often span over 10,000 base pairs, even within a single gene, posing substantial computational burdens under conventional model architectures and training paradigms. Moreover, standard LLM training approaches are suboptimal for DNA: autoregressive training, while efficient, supports only unidirectional understanding. However, DNA is inherently bidirectional, e.g., bidirectional promoters regulate transcription in both directions and account for nearly 11% of human gene expression. Masked language models (MLMs) allow bidirectional understanding but are inefficient, as only masked tokens contribute to the loss per step. To address these limitations, we introduce JanusDNA, the first bidirectional DNA foundation model built upon a novel pretraining paradigm that combines the optimization efficiency of autoregressive modeling with the bidirectional comprehension of masked modeling. JanusDNA adopts a hybrid Mamba, Attention and Mixture of Experts (MoE) architecture, combining long-range modeling of Attention with efficient sequential learning of Mamba. MoE layers further scale model capacity via sparse activation while keeping computational cost low. Notably, JanusDNA processes up to 1 million base pairs at single nucleotide resolution on a single 80GB GPU. Extensive experiments and ablations show JanusDNA achieves new SOTA results on three genomic representation benchmarks, outperforming models with 250x more activated parameters. Code: https://github.com/Qihao-Duan/JanusDNA

URLs: https://github.com/Qihao-Duan/JanusDNA

replace Graph Inverse Style Transfer for Counterfactual Explainability

Authors: Bardh Prenkaj, Efstratios Zaradoukas, Gjergji Kasneci

Abstract: Counterfactual explainability seeks to uncover model decisions by identifying minimal changes to the input that alter the predicted outcome. This task becomes particularly challenging for graph data due to preserving structural integrity and semantic meaning. Unlike prior approaches that rely on forward perturbation mechanisms, we introduce Graph Inverse Style Transfer (GIST), the first framework to re-imagine graph counterfactual generation as a backtracking process, leveraging spectral style transfer. By aligning the global structure with the original input spectrum and preserving local content faithfulness, GIST produces valid counterfactuals as interpolations between the input style and counterfactual content. Tested on 8 binary and multi-class graph classification benchmarks, GIST achieves a remarkable +7.6% improvement in the validity of produced counterfactuals and significant gains (+45.5%) in faithfully explaining the true class distribution. Additionally, GIST's backtracking mechanism effectively mitigates overshooting the underlying predictor's decision boundary, minimizing the spectral differences between the input and the counterfactuals. These results challenge traditional forward perturbation methods, offering a novel perspective that advances graph explainability.

replace Test-Time Alignment of Discrete Diffusion Models with Sequential Monte Carlo

Authors: Chinmay Pani, Zijing Ou, Yingzhen Li

Abstract: Discrete diffusion models have become highly effective across various domains. However, real-world applications often require the generative process to adhere to certain constraints but without task-specific fine-tuning. To this end, we propose a training-free method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution at the test time. Our approach leverages twisted SMC with an approximate locally optimal proposal, obtained via a first-order Taylor expansion of the reward function. To address the challenge of ill-defined gradients in discrete spaces, we incorporate a Gumbel-Softmax relaxation, enabling efficient gradient-based approximation within the discrete generative framework. Empirical results on both synthetic datasets and image modelling validate the effectiveness of our approach.

replace SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA

Authors: Minrui Luo, Fuhang Kuang, Yu Wang, Zirui Liu, Tianxing He

Abstract: Parameter-Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), are indispensable for efficiently customizing Large Language Models (LLMs). However, vanilla LoRA suffers from slow convergence speed and knowledge forgetting problems. Recent studies have leveraged the power of designed LoRA initialization, to enhance the fine-tuning efficiency, or to preserve knowledge in the pre-trained LLM. However, none of these works can address the two cases at the same time. To this end, we introduce Subspace-Constrained LoRA (SC-LoRA), a novel LoRA initialization framework engineered to navigate the trade-off between efficient fine-tuning and knowledge preservation. We achieve this by constraining the output of trainable LoRA adapters in a low-rank subspace, where the context information of fine-tuning data is most preserved while the context information of preserved knowledge is least retained, in a balanced way. Such constraint enables the trainable weights to primarily focus on the main features of fine-tuning data while avoiding damaging the preserved knowledge features. We provide theoretical analysis on our method, and conduct extensive experiments including safety preservation and world knowledge preservation, on various downstream tasks. In our experiments, SC-LoRA succeeds in delivering superior fine-tuning performance while markedly diminishing knowledge forgetting, surpassing contemporary LoRA initialization methods.

replace VCDiag: Classifying Erroneous Waveforms for Failure Triage Acceleration

Authors: Minh Luu, Surya Jasper, Khoi Le, Evan Pan, Michael Quinn, Aakash Tyagi, Jiang Hu

Abstract: Failure triage in design functional verification is critical but time-intensive, relying on manual specification reviews, log inspections, and waveform analyses. While machine learning (ML) has improved areas like stimulus generation and coverage closure, its application to RTL-level simulation failure triage, particularly for large designs, remains limited. VCDiag offers an efficient, adaptable approach using VCD data to classify failing waveforms and pinpoint likely failure locations. In the largest experiment, VCDiag achieves over 94% accuracy in identifying the top three most likely modules. The framework introduces a novel signal selection and statistical compression approach, achieving over 120x reduction in raw data size while preserving features essential for classification. It can also be integrated into diverse Verilog/SystemVerilog designs and testbenches.

replace On the Convergence of Gradient Descent on Learning Transformers with Residual Connections

Authors: Zhen Qin, Jinxin Zhou, Zhihui Zhu

Abstract: Transformer models have emerged as fundamental tools across various scientific and engineering disciplines, owing to their outstanding performance in diverse applications. Despite this empirical success, the theoretical foundations of Transformers remain relatively underdeveloped, particularly in understanding their training dynamics. Existing research predominantly examines isolated components--such as self-attention mechanisms and feedforward networks--without thoroughly investigating the interdependencies between these components, especially when residual connections are present. In this paper, we aim to bridge this gap by analyzing the convergence behavior of a structurally complete yet single-layer Transformer, comprising self-attention, a feedforward network, and residual connections. We demonstrate that, under appropriate initialization, gradient descent exhibits a linear convergence rate, where the convergence speed is determined by the minimum and maximum singular values of the output matrix from the attention layer. Moreover, our analysis reveals that residual connections serve to ameliorate the ill-conditioning of this output matrix, an issue stemming from the low-rank structure imposed by the softmax operation, thereby promoting enhanced optimization stability. We also extend our theoretical findings to a multi-layer Transformer architecture, confirming the linear convergence rate of gradient descent under suitable initialization. Empirical results corroborate our theoretical insights, illustrating the beneficial role of residual connections in promoting convergence stability.

replace Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning

Authors: Seungho Baek, Taegeon Park, Jongchan Park, Seungjun Oh, Yusung Kim

Abstract: Existing offline hierarchical reinforcement learning methods rely on high-level policy learning to generate subgoal sequences. However, their efficiency degrades as task horizons increase, and they lack effective strategies for stitching useful state transitions across different trajectories. We propose Graph-Assisted Stitching (GAS), a novel framework that formulates subgoal selection as a graph search problem rather than learning an explicit high-level policy. By embedding states into a Temporal Distance Representation (TDR) space, GAS clusters semantically similar states from different trajectories into unified graph nodes, enabling efficient transition stitching. A shortest-path algorithm is then applied to select subgoal sequences within the graph, while a low-level policy learns to reach the subgoals. To improve graph quality, we introduce the Temporal Efficiency (TE) metric, which filters out noisy or inefficient transition states, significantly enhancing task performance. GAS outperforms prior offline HRL methods across locomotion, navigation, and manipulation tasks. Notably, in the most stitching-critical task, it achieves a score of 88.3, dramatically surpassing the previous state-of-the-art score of 1.0. Our source code is available at: https://github.com/qortmdgh4141/GAS.

URLs: https://github.com/qortmdgh4141/GAS.

replace The Geometries of Truth Are Orthogonal Across Tasks

Authors: Waiss Azizian, Michael Kirchhof, Eugene Ndiaye, Louis Bethune, Michal Klein, Pierre Ablin, Marco Cuturi

Abstract: Large Language Models (LLMs) have demonstrated impressive generalization capabilities across various tasks, but their claim to practical relevance is still mired by concerns on their reliability. Recent works have proposed examining the activations produced by an LLM at inference time to assess whether its answer to a question is correct. Some works claim that a "geometry of truth" can be learned from examples, in the sense that the activations that generate correct answers can be distinguished from those leading to mistakes with a linear classifier. In this work, we underline a limitation of these approaches: we observe that these "geometries of truth" are intrinsically task-dependent and fail to transfer across tasks. More precisely, we show that linear classifiers trained across distinct tasks share little similarity and, when trained with sparsity-enforcing regularizers, have almost disjoint supports. We show that more sophisticated approaches (e.g., using mixtures of probes and tasks) fail to overcome this limitation, likely because activation vectors commonly used to classify answers form clearly separated clusters when examined across tasks.

replace LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment

Authors: Shipeng Li, Shikun Li, Zhiqin Yang, Xinghua Zhang, Gaode Chen, Xiaobo Xia, Hengyu Liu, Zhe Peng

Abstract: Reinforcement learning (RL) has become a key technique for enhancing LLMs' reasoning abilities, yet its data inefficiency remains a major bottleneck. To address this critical yet challenging issue, we present a novel gradient-alignment-based method, named LearnAlign, which intelligently selects the learnable and representative training reasoning data for RL post-training. To overcome the issue of response-length bias in gradient norms, we introduce the data learnability based on the success rate, which can indicate the learning potential of each data point. Experiments across three mathematical reasoning benchmarks demonstrate that our method significantly reduces training data requirements while achieving minor performance degradation or even improving performance compared to full-data training. For example, it reduces data requirements by up to 1,000 data points with better performance (77.53%) than that on the full dataset on GSM8K benchmark (77.04%). Furthermore, we show its effectiveness in the staged RL setting. This work provides valuable insights into data-efficient RL post-training and establishes a foundation for future research in optimizing reasoning data selection. To facilitate future work, we will release code.

replace Robust Molecular Property Prediction via Densifying Scarce Labeled Data

Authors: Jina Kim, Jeffrey Willette, Bruno Andreis, Sung Ju Hwang

Abstract: A widely recognized limitation of molecular prediction models is their reliance on structures observed in the training data, resulting in poor generalization to out-of-distribution compounds. Yet in drug discovery, the compounds most critical for advancing research often lie beyond the training set, making the bias toward the training data particularly problematic. This mismatch introduces substantial covariate shift, under which standard deep learning models produce unstable and inaccurate predictions. Furthermore, the scarcity of labeled data, stemming from the onerous and costly nature of experimental validation, further exacerbates the difficulty of achieving reliable generalization. To address these limitations, we propose a novel meta-learning-based approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data, enabling the model to meta-learn how to generalize beyond the training distribution. We demonstrate significant performance gains on challenging real-world datasets with substantial covariate shift, supported by t-SNE visualizations highlighting our interpolation method.

replace KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction

Authors: Han Liu, Keyan Ding, Peilin Chen, Yinwei Wei, Liqiang Nie, Dapeng Wu, Shiqi Wang

Abstract: Accurate prediction of protein-ligand binding affinity is critical for drug discovery. While recent deep learning approaches have demonstrated promising results, they often rely solely on structural features of proteins and ligands, overlooking their valuable biochemical knowledge associated with binding affinity. To address this limitation, we propose KEPLA, a novel deep learning framework that explicitly integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance. KEPLA takes protein sequences and ligand molecular graphs as input and optimizes two complementary objectives: (1) aligning global representations with knowledge graph relations to capture domain-specific biochemical insights, and (2) leveraging cross attention between local representations to construct fine-grained joint embeddings for prediction. Experiments on two benchmark datasets across both in-domain and cross-domain scenarios demonstrate that KEPLA consistently outperforms state-of-the-art baselines. Furthermore, interpretability analyses based on knowledge graph relations and cross attention maps provide valuable insights into the underlying predictive mechanisms.

replace Discrete Diffusion in Large Language and Multimodal Models: A Survey

Authors: Runpeng Yu, Qi Li, Xinchao Wang

Abstract: In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decoding paradigm using full attention and a denoising-based generation strategy. This paradigm naturally enables parallel generation, fine-grained output controllability, and dynamic, response-aware perception. These capabilities are previously difficult to achieve with AR models. Recently, a growing number of industrial-scale proprietary d(M)LLMs, as well as a large number of open-source academic d(M)LLMs, have demonstrated performance comparable to their autoregressive counterparts, while achieving up to 10x acceleration in inference speed. The advancement of discrete diffusion LLMs and MLLMs has been largely driven by progress in two domains. The first is the development of autoregressive LLMs and MLLMs, which has accumulated vast amounts of data, benchmarks, and foundational infrastructure for training and inference. The second contributing domain is the evolution of the mathematical models underlying discrete diffusion. Together, these advancements have catalyzed a surge in dLLMs and dMLLMs research in early 2025. In this work, we present a comprehensive overview of the research in the dLLM and dMLLM domains. We trace the historical development of dLLMs and dMLLMs, formalize the underlying mathematical frameworks, and categorize representative models. We further analyze key techniques for training and inference, and summarize emerging applications across language, vision-language, and biological domains. We conclude by discussing future directions for research and deployment. Paper collection: https://github.com/LiQiiiii/DLLM-Survey

URLs: https://github.com/LiQiiiii/DLLM-Survey

replace Structured Generative Modeling with the Thermodynamic Kolmogorov-Arnold Model

Authors: Prithvi Raj

Abstract: Learning an energy-based model (EBM) in the latent space of a top-down generative model offers an expressive and interpretable framework for text and image generation. However, it remains unclear how this interpretability can be systematically leveraged to guide model design, improve generative quality, and reduce training time. Moreover, the reliance on Langevin Monte Carlo (LMC) sampling presents challenges in efficiency and exploring multimodal latent distributions. In this work, we propose a novel adaptation of the Kolmogorov-Arnold representation theorem for generative modeling and introduce the Thermodynamic Kolmogorov-Arnold Model (T-KAM) as a new framework for incorporating structural and inductive biases. By constraining the prior to univariate relationships, T-KAM enables fast and exact inference via the inverse transform method. We also introduce a training strategy based on population-based LMC, which decomposes posterior sampling into a sequence of annealed distributions to improve multimodal exploration. We empirically demonstrate how inductive biases enable more efficient training strategies and compare our novel approaches to scaling and prior sampling. T-KAM provides an elegant balance among common trade-offs in generative modeling, offering fast inference, high sample quality, and stable training, while being naturally suited to upcoming Zettascale Computing Co. hardware and extendable to other high-impact research directions in generative intelligence.

replace Bayesian Multiobject Tracking With Neural-Enhanced Motion and Measurement Models

Authors: Shaoxiu Wei, Mingchao Liang, Florian Meyer

Abstract: Multiobject tracking (MOT) is an important task in applications including autonomous driving, ocean sciences, and aerospace surveillance. Traditional MOT methods are model-based and combine sequential Bayesian estimation with data association and an object birth model. More recent methods are fully data-driven and rely on the training of neural networks. Both approaches offer distinct advantages in specific settings. In particular, model-based methods are generally applicable across a wide range of scenarios, whereas data-driven MOT achieves superior performance in scenarios where abundant labeled data for training is available. A natural thought is whether a general framework can integrate the two approaches. This paper introduces a hybrid method that utilizes neural networks to enhance specific aspects of the statistical model in Bayesian MOT that have been identified as overly simplistic. By doing so, the performance of the prediction and update steps of Bayesian MOT is improved. To ensure tractable computation, our framework uses belief propagation to avoid high-dimensional operations combined with sequential Monte Carlo methods to perform low-dimensional operations efficiently. The resulting method combines the flexibility and robustness of model-based approaches with the capability to learn complex information from data of neural networks. We evaluate the performance of the proposed method based on the nuScenes autonomous driving dataset and demonstrate that it has state-of-the-art performance

replace Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment

Authors: Yuhui Sun (University of Alberta), Xiyao Wang (University of Toronto), Zixi Li (Zhejiang University), Jinman Zhao (University of Toronto)

Abstract: While large-scale unsupervised language models (LMs) capture broad world knowledge and reasoning capabilities, steering their behavior toward desired objectives remains challenging due to the lack of explicit supervision. Existing alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on training a reward model and performing reinforcement learning to align with human preferences. However, RLHF is often computationally intensive, unstable, and sensitive to hyperparameters. To address these limitations, Direct Preference Optimization (DPO) was introduced as a lightweight and stable alternative, enabling direct alignment of language models with pairwise preference data via classification loss. However, DPO and its extensions generally assume a single static preference distribution, limiting flexibility in multi-objective or dynamic alignment settings. In this paper, we propose a novel framework: Multi-Preference Lambda-weighted Listwise DPO, which extends DPO to incorporate multiple human preference dimensions (e.g., helpfulness, harmlessness, informativeness) and enables dynamic interpolation through a controllable simplex-weighted formulation. Our method supports both listwise preference feedback and flexible alignment across varying user intents without re-training. Empirical and theoretical analysis demonstrates that our method is as effective as traditional DPO on static objectives while offering greater generality and adaptability for real-world deployment.

replace Position: Machine Learning Conferences Should Establish a "Refutations and Critiques" Track

Authors: Rylan Schaeffer, Joshua Kazdan, Yegor Denisov-Blanch, Brando Miranda, Matthias Gerstgrasser, Susan Zhang, Andreas Haupt, Isha Gupta, Elyas Obbad, Jesse Dodge, Jessica Zosa Forde, Francesco Orabona, Sanmi Koyejo, David Donoho

Abstract: Science progresses by iteratively advancing and correcting humanity's understanding of the world. In machine learning (ML) research, rapid advancements have led to an explosion of publications, but have also led to misleading, incorrect, flawed or perhaps even fraudulent studies being accepted and sometimes highlighted at ML conferences due to the fallibility of peer review. While such mistakes are understandable, ML conferences do not offer robust processes to help the field systematically correct when such errors are made. This position paper argues that ML conferences should establish a dedicated "Refutations and Critiques" (R&C) Track. This R&C Track would provide a high-profile, reputable platform to support vital research that critically challenges prior research, thereby fostering a dynamic self-correcting research ecosystem. We discuss key considerations including track design, review principles, potential pitfalls, and provide an illustrative example submission concerning a recent ICLR 2025 Oral. We conclude that ML conferences should create official, reputable mechanisms to help ML research self-correct.

replace EFRame: Deeper Reasoning via Exploration-Filter-Replay Reinforcement Learning Framework

Authors: Chen Wang, Lai Wei, Yanzhi Zhang, Chenyang Shao, Zedong Dan, Weiran Huang, Yue Wang, Yuzhi Zhang

Abstract: Recent advances in reinforcement learning (RL) have significantly enhanced the reasoning capabilities of large language models (LLMs). Group Relative Policy Optimization (GRPO), an efficient variant of PPO that lowers RL's computational cost, still faces limited exploration, low sample efficiency and instability, constraining its performance on complex reasoning tasks. To address these limitations, we introduce EFRame, an Exploration-Filter-Replay framework that systematically augments GRPO along three critical dimensions. EFRame performs additional rollouts to explore high-quality trajectories, applies online filtering to eliminate low-quality samples that introduce noise and variance, and leverages experience replay to repeatedly exploit rare but informative samples. EFRame establishes a complete and stable learning cycle, guiding the model through a structured transition from exploration to convergence. Our experiments across a variety of reasoning benchmarks demonstrate that EFRame not only improves the robustness and efficiency of training, but also enables access to deeper reasoning capabilities that remain unattainable under vanilla GRPO. Furthermore, EFRame not only enables fine-grained categorization of training samples for deeper insight into their contributions, but also introduces an efficient and precise mechanism for entropy control, which is critical for balancing exploration and convergence in RL training. Our code is available at https://github.com/597358816/EFRame.

URLs: https://github.com/597358816/EFRame.

replace Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections

Authors: Bo Wang, Qinyuan Cheng, Runyu Peng, Rong Bao, Peiji Li, Qipeng Guo, Linyang Li, Zhiyuan Zeng, Yunhua Zhou, Xipeng Qiu

Abstract: Post-training processes are essential phases in grounding pre-trained language models to real-world tasks, with learning from demonstrations or preference signals playing a crucial role in this adaptation. We present a unified theoretical framework bridging Supervised Fine-Tuning (SFT) and preference learning in Large Language Model (LLM) post-training. Through rigorous mathematical derivation, we demonstrate that both SFT and preference learning methods like Direct Preference Optimization (DPO) operate within the same optimal policy-reward subspace, with SFT representing a special case of implicit reward learning. Our analysis reveals a critical limitation in conventional SFT: the KL divergence term in distribution matching becomes constant with respect to the policy during optimization, failing to constrain model updates. To address this, we propose a simple yet effective learning rate reduction approach that yields significant performance improvements (up to \textbf{25\%} relative gain and \textbf{6\%} absolute win rate increase in instruction following tasks. Additionally, we derive alternative SFT objectives from various f-divergence functions that preserve the KL term during optimization, further enhancing post-DPO model performance. Finally, we extend the theoretical relationship between LLM logits and Q-functions from preference learning to the SFT context, providing mathematical derivations and experimental validation.

replace GLU Attention Improve Transformer

Authors: Zehao Wang

Abstract: Gated Linear Units (GLU) have shown great potential in enhancing neural network performance. In this paper, I introduce a novel attention mechanism called GLU Attention, which introduces nonlinearity into the values of Attention. My experiments demonstrate that GLU Attention improves both model performance and convergence speed across text and vision modalities with zero additional parameters and negligible computational costs. GLU Attention is lightweight and can seamlessly integrate with other technologies, such as Flash Attention, Rotary Position Embedding (RoPE), and various Multi-Head Attention (MHA) variants such as Grouped-Query Attention (GQA). This project is open-sourced at github.

replace PPFL-RDSN: Privacy-Preserving Federated Learning-based Residual Dense Spatial Networks for Encrypted Lossy Image Reconstruction

Authors: Peilin He, James Joshi

Abstract: Reconstructing high-quality images from low-resolution inputs using Residual Dense Spatial Networks (RDSNs) is crucial yet challenging, particularly in collaborative scenarios where centralized training poses significant privacy risks, including data leakage and inference attacks, as well as high computational costs. We propose a novel Privacy-Preserving Federated Learning-based RDSN (PPFL-RDSN) framework specifically tailored for lossy image reconstruction. PPFL-RDSN integrates Federated Learning (FL), local differential privacy, and robust model watermarking techniques, ensuring data remains secure on local devices, safeguarding sensitive information, and maintaining model authenticity without revealing underlying data. Empirical evaluations show that PPFL-RDSN achieves comparable performance to the state-of-the-art centralized methods while reducing computational burdens, and effectively mitigates security and privacy vulnerabilities, making it a practical solution for secure and privacy-preserving collaborative computer vision applications.

replace Diversity Conscious Refined Random Forest

Authors: Sijan Bhattarai, Saurav Bhandari, Girija Bhusal, Saroj Shakya, Tapendra Pandey

Abstract: Random Forest (RF) is a widely used ensemble learning technique known for its robust classification performance across diverse domains. However, it often relies on hundreds of trees and all input features, leading to high inference cost and model redundancy. In this work, our goal is to grow trees dynamically only on informative features and then enforce maximal diversity by clustering and retaining uncorrelated trees. Therefore, we propose a Refined Random Forest Classifier that iteratively refines itself by first removing the least informative features and then analytically determines how many new trees should be grown, followed by correlation-based clustering to remove redundant trees. The classification accuracy of our model was compared against the standard RF on the same number of trees. Experiments on 8 multiple benchmark datasets, including binary and multiclass datasets, demonstrate that the proposed model achieves improved accuracy compared to standard RF.

replace SCAWaveNet: A Spatial-Channel Attention-Based Network for Global Significant Wave Height Retrieval

Authors: Chong Zhang, Xichao Liu, Yibing Zhan, Dapeng Tao, Jun Ni, Jinwei Bu

Abstract: Recent advancements in spaceborne GNSS missions have produced extensive global datasets, providing a robust basis for deep learning-based significant wave height (SWH) retrieval. While existing deep learning models predominantly utilize CYGNSS data with four-channel information, they often adopt single-channel inputs or simple channel concatenation without leveraging the benefits of cross-channel information interaction during training. To address this limitation, a novel spatial-channel attention-based network, namely SCAWaveNet, is proposed for SWH retrieval. Specifically, features from each channel of the DDMs are modeled as independent attention heads, enabling the fusion of spatial and channel-wise information. For auxiliary parameters, a lightweight attention mechanism is designed to assign weights along the spatial and channel dimensions. The final feature integrates both spatial and channel-level characteristics. Model performance is evaluated using four-channel CYGNSS data. When ERA5 is used as a reference, SCAWaveNet achieves an average RMSE of 0.438 m. When using buoy data from NDBC, the average RMSE reaches 0.432 m. Compared to state-of-the-art models, SCAWaveNet reduces the average RMSE by at least 3.52% on the ERA5 dataset and by 5.68% on the NDBC buoy observations. The code is available at https://github.com/Clifx9908/SCAWaveNet.

URLs: https://github.com/Clifx9908/SCAWaveNet.

replace On-Policy Optimization of ANFIS Policies Using Proximal Policy Optimization

Authors: Kaaustaaub Shankar, Wilhelm Louw, Kelly Cohen

Abstract: We present a reinforcement learning method for training neuro-fuzzy controllers using Proximal Policy Optimization (PPO). Unlike prior approaches that used Deep Q-Networks (DQN) with Adaptive Neuro-Fuzzy Inference Systems (ANFIS), our PPO-based framework leverages a stable on-policy actor-critic setup. Evaluated on the CartPole-v1 environment across multiple seeds, PPO-trained fuzzy agents consistently achieved the maximum return of 500 with zero variance after 20000 updates, outperforming ANFIS-DQN baselines in both stability and convergence speed. This highlights PPO's potential for training explainable neuro-fuzzy agents in reinforcement learning tasks.

replace Text Detoxification: Data Efficiency, Semantic Preservation and Model Generalization

Authors: Jing Yu, Yibo Zhao, Jiapeng Zhu, Wenming Shao, Bo Pang, Zhao Zhang, Xiang Li

Abstract: The widespread dissemination of toxic content on social media poses a serious threat to both online environments and public discourse, highlighting the urgent need for detoxification methods that effectively remove toxicity while preserving the original semantics. However, existing approaches often struggle to simultaneously achieve strong detoxification performance, semantic preservation, and robustness to out-of-distribution data. Moreover, they typically rely on costly, manually annotated parallel corpora while showing poor data efficiency. To address these challenges, we propose a two-stage training framework that jointly optimizes for data efficiency, semantic preservation, and model generalization. We first perform supervised fine-tuning on a small set of high-quality, filtered parallel data to establish a strong initialization. Then, we leverage unlabeled toxic inputs and a custom-designed reward model to train the LLM using Group Relative Policy Optimization. Experimental results demonstrate that our method effectively mitigates the trade-offs faced by previous work, achieving state-of-the-art performance with improved generalization and significantly reduced dependence on annotated data. Our code is available at: https://github.com/allacnobug/Detoxification-of-Text.

URLs: https://github.com/allacnobug/Detoxification-of-Text.

replace Diffusion Explorer: Interactive Exploration of Diffusion Models

Authors: Alec Helbling, Duen Horng Chau

Abstract: Diffusion models have been central to the development of recent image, video, and even text generation systems. They posses striking geometric properties that can be faithfully portrayed in low-dimensional settings. However, existing resources for explaining diffusion either require an advanced theoretical foundation or focus on their neural network architectures rather than their rich geometric properties. We introduce Diffusion Explorer, an interactive tool to explain the geometric properties of diffusion models. Users can train 2D diffusion models in the browser and observe the temporal dynamics of their sampling process. Diffusion Explorer leverages interactive animation, which has been shown to be a powerful tool for making engaging visualizations of dynamic systems, making it well suited to explaining diffusion models which represent stochastic processes that evolve over time. Diffusion Explorer is open source and a live demo is available at alechelbling.com/Diffusion-Explorer.

replace A Real-Time Digital Twin for Type 1 Diabetes using Simulation-Based Inference

Authors: Trung-Dung Hoang, Alceu Bissoto, Vihangkumar V. Naik, Tim Fl\"uhmann, Artemii Shlychkov, Jose Garcia-Tirado, Lisa M. Koch

Abstract: Accurately estimating parameters of physiological models is essential to achieving reliable digital twins. For Type 1 Diabetes, this is particularly challenging due to the complexity of glucose-insulin interactions. Traditional methods based on Markov Chain Monte Carlo struggle with high-dimensional parameter spaces and fit parameters from scratch at inference time, making them slow and computationally expensive. In this study, we propose a Simulation-Based Inference approach based on Neural Posterior Estimation to efficiently capture the complex relationships between meal intake, insulin, and glucose level, providing faster, amortized inference. Our experiments demonstrate that SBI not only outperforms traditional methods in parameter estimation but also generalizes better to unseen conditions, offering real-time posterior inference with reliable uncertainty quantification.

replace Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks

Authors: Shikai Qiu, Lechao Xiao, Andrew Gordon Wilson, Jeffrey Pennington, Atish Agarwala

Abstract: What scaling limits govern neural network training dynamics when model size and training time grow in tandem? We show that despite the complex interactions between architecture, training algorithms, and data, compute-optimally trained models exhibit a remarkably precise universality. Specifically, loss curves from models of varying sizes collapse onto a single universal curve when training compute and loss are normalized to unity at the end of training. With learning rate decay, the collapse becomes so tight that differences in the normalized curves across models fall below the noise floor of individual loss curves across random seeds, a phenomenon we term supercollapse. We observe supercollapse across learning rate schedules, datasets, and architectures, including transformers trained on next-token prediction, and find it breaks down when hyperparameters are scaled suboptimally, providing a precise and practical indicator of good scaling. We explain these phenomena by connecting collapse to the power-law structure in typical neural scaling laws, and analyzing a simple yet surprisingly effective model of SGD noise dynamics that accurately predicts loss curves across various learning rate schedules and quantitatively explains the origin of supercollapse.

replace Order Acquisition Under Competitive Pressure: A Rapidly Adaptive Reinforcement Learning Approach for Ride-Hailing Subsidy Strategies

Authors: Fangzhou Shi, Xiaopeng Ke, Xinye Xiong, Kexin Meng, Chang Men, Zhengdan Zhu

Abstract: The proliferation of ride-hailing aggregator platforms presents significant growth opportunities for ride-service providers by increasing order volume and gross merchandise value (GMV). On most ride-hailing aggregator platforms, service providers that offer lower fares are ranked higher in listings and, consequently, are more likely to be selected by passengers. This competitive ranking mechanism creates a strong incentive for service providers to adopt coupon strategies that lower prices to secure a greater number of orders, as order volume directly influences their long-term viability and sustainability. Thus, designing an effective coupon strategy that can dynamically adapt to market fluctuations while optimizing order acquisition under budget constraints is a critical research challenge. However, existing studies in this area remain scarce. To bridge this gap, we propose FCA-RL, a novel reinforcement learning-based subsidy strategy framework designed to rapidly adapt to competitors' pricing adjustments. Our approach integrates two key techniques: Fast Competition Adaptation (FCA), which enables swift responses to dynamic price changes, and Reinforced Lagrangian Adjustment (RLA), which ensures adherence to budget constraints while optimizing coupon decisions on new price landscape. Furthermore, we introduce RideGym, the first dedicated simulation environment tailored for ride-hailing aggregators, facilitating comprehensive evaluation and benchmarking of different pricing strategies without compromising real-world operational efficiency. Experimental results demonstrate that our proposed method consistently outperforms baseline approaches across diverse market conditions, highlighting its effectiveness in subsidy optimization for ride-hailing service providers.

replace-cross HRVGAN: High Resolution Video Generation using Spatio-Temporal GAN

Authors: Abhinav Sagar

Abstract: High-resolution video generation has emerged as a crucial task in computer vision, with wide-ranging applications in entertainment, simulation, and data augmentation. However, generating temporally coherent and visually realistic videos remains a significant challenge due to the high dimensionality and complex dynamics of video data. In this paper, we propose a novel deep generative network architecture designed specifically for high-resolution video synthesis. Our approach integrates key concepts from Wasserstein Generative Adversarial Networks (WGANs), enforcing a k-Lipschitz continuity constraint on the discriminator to stabilize training and enhance convergence. We further leverage Conditional GAN (cGAN) techniques by incorporating class labels during both training and inference, enabling class-specific video generation with improved semantic consistency. We provide a detailed layer-wise description of the Generator and Discriminator networks, highlighting architectural design choices promoting temporal coherence and spatial detail. The overall combined architecture, training algorithm, and optimization strategy are thoroughly presented. Our training objective combines a pixel-wise mean squared error loss with an adversarial loss to balance frame-level accuracy and video realism. We validate our approach on benchmark datasets including UCF101, Golf, and Aeroplane, encompassing diverse motion patterns and scene contexts. Quantitative evaluations using Inception Score (IS) and Fr\'echet Inception Distance (FID) demonstrate that our model significantly outperforms previous state-of-the-art unsupervised video generation methods in terms of both quality and diversity.

replace-cross Non-negative matrix factorization algorithms generally improve topic model fits

Authors: Peter Carbonetto, Abhishek Sarkar, Zihao Wang, Matthew Stephens

Abstract: In an effort to develop topic modeling methods that can be quickly applied to large data sets, we revisit the problem of maximum-likelihood estimation in topic models. It is known, at least informally, that maximum-likelihood estimation in topic models is closely related to non-negative matrix factorization (NMF). Yet, to our knowledge, this relationship has not been exploited previously to fit topic models. We show that recent advances in NMF optimization methods can be leveraged to fit topic models very efficiently, often resulting in much better fits and in less time than existing algorithms for topic models. We also formally make the connection between the NMF optimization problem and maximum-likelihood estimation for the topic model, and using this result we show that the expectation maximization (EM) algorithm for the topic model is essentially the same as the classic multiplicative updates for NMF (the only difference being that the operations are performed in a different order). Our methods are implemented in the R package fastTopics.

replace-cross AASeg: Attention Aware Network for Real Time Semantic Segmentation

Authors: Abhinav Sagar

Abstract: Semantic segmentation is a fundamental task in computer vision that involves dense pixel-wise classification for scene understanding. Despite significant progress, achieving high accuracy while maintaining real-time performance remains a challenging trade-off, particularly for deployment in resource-constrained or latency-sensitive applications. In this paper, we propose AASeg, a novel Attention-Aware Network for real-time semantic segmentation. AASeg effectively captures both spatial and channel-wise dependencies through lightweight Spatial Attention (SA) and Channel Attention (CA) modules, enabling enhanced feature discrimination without incurring significant computational overhead. To enrich contextual representation, we introduce a Multi-Scale Context (MSC) module that aggregates dense local features across multiple receptive fields. The outputs from attention and context modules are adaptively fused to produce high-resolution segmentation maps. Extensive experiments on Cityscapes, ADE20K, and CamVid demonstrate that AASeg achieves a compelling trade-off between accuracy and efficiency, outperforming prior real-time methods.

replace-cross Agnostic Online Learning and Excellent Sets

Authors: Maryanthe Malliaris, Shay Moran

Abstract: We use algorithmic methods from online learning to explore some important objects at the intersection of model theory and combinatorics, and find natural ways that algorithmic methods can detect and explain (and improve our understanding of) stable structure in the sense of model theory. The main theorem deals with existence of $\epsilon$-excellent sets (which are key to the Stable Regularity Lemma, a theorem characterizing the appearance of irregular pairs in Szemer\'edi's celebrated Regularity Lemma). We prove that $\epsilon$-excellent sets exist for any $\epsilon < \frac{1}{2}$ in $k$-edge stable graphs in the sense of model theory (equivalently, Littlestone classes); earlier proofs had given this only for $\epsilon < 1/{2^{2^k}}$ or so. We give two proofs: the first uses regret bounds from online learning, the second uses Boolean closure properties of Littlestone classes and sampling. We also give a version of the dynamic Sauer-Shelah-Perles lemma appropriate to this setting, related to definability of types. We conclude by characterizing stable/Littlestone classes as those supporting a certain abstract notion of majority: the proof shows that the two distinct, natural notions of majority, arising from measure and from dimension, densely often coincide.

replace-cross Iterative Linear Quadratic Optimization for Nonlinear Control: Differentiable Programming Algorithmic Templates

Authors: Vincent Roulet, Siddhartha Srinivasa, Maryam Fazel, Zaid Harchaoui

Abstract: Iterative optimization algorithms depend on access to information about the objective function. In a differentiable programming framework, this information, such as gradients, can be automatically derived from the computational graph. We explore how nonlinear control algorithms, often employing linear and/or quadratic approximations, can be effectively cast within this framework. Our approach illuminates shared components and differences between gradient descent, Gauss-Newton, Newton, and differential dynamic programming methods in the context of discrete time nonlinear control. Furthermore, we present line-search strategies and regularized variants of these algorithms, along with a comprehensive analysis of their computational complexities. We study the performance of the aforementioned algorithms on various nonlinear control benchmarks, including autonomous car racing simulations using a simplified car model. All implementations are publicly available in a package coded in a differentiable programming language.

replace-cross Mixtures of Gaussian Process Experts with SMC$^2$

Authors: Teemu H\"ark\"onen, Sara Wade, Kody Law, Lassi Roininen

Abstract: Gaussian processes are a key component of many flexible statistical and machine learning models. However, they exhibit cubic computational complexity and high memory constraints due to the need of inverting and storing a full covariance matrix. To circumvent this, mixtures of Gaussian process experts have been considered where data points are assigned to independent experts, reducing the complexity by allowing inference based on smaller, local covariance matrices. Moreover, mixtures of Gaussian process experts substantially enrich the model's flexibility, allowing for behaviors such as non-stationarity, heteroscedasticity, and discontinuities. In this work, we construct a novel inference approach based on nested sequential Monte Carlo samplers to simultaneously infer both the gating network and Gaussian process expert parameters. This greatly improves inference compared to importance sampling, particularly in settings when a stationary Gaussian process is inappropriate, while still being thoroughly parallelizable.

replace-cross QuTE: decentralized multiple testing on sensor networks with false discovery rate control

Authors: Aaditya Ramdas, Jianbo Chen, Martin J. Wainwright, Michael I. Jordan

Abstract: This paper designs methods for decentralized multiple hypothesis testing on graphs that are equipped with provable guarantees on the false discovery rate (FDR). We consider the setting where distinct agents reside on the nodes of an undirected graph, and each agent possesses p-values corresponding to one or more hypotheses local to its node. Each agent must individually decide whether to reject one or more of its local hypotheses by only communicating with its neighbors, with the joint aim that the global FDR over the entire graph must be controlled at a predefined level. We propose a simple decentralized family of Query-Test-Exchange (QuTE) algorithms and prove that they can control FDR under independence or positive dependence of the p-values. Our algorithm reduces to the Benjamini-Hochberg (BH) algorithm when after graph-diameter rounds of communication, and to the Bonferroni procedure when no communication has occurred or the graph is empty. To avoid communicating real-valued p-values, we develop a quantized BH procedure, and extend it to a quantized QuTE procedure. QuTE works seamlessly in streaming data settings, where anytime-valid p-values may be continually updated at each node. Last, QuTE is robust to arbitrary dropping of packets, or a graph that changes at every step, making it particularly suitable to mobile sensor networks involving drones or other multi-agent systems. We study the power of our procedure using a simulation suite of different levels of connectivity and communication on a variety of graph structures, and also provide an illustrative real-world example.

replace-cross Uncertainty in Real-Time Semantic Segmentation on Embedded Systems

Authors: Ethan Goan, Clinton Fookes

Abstract: Application for semantic segmentation models in areas such as autonomous vehicles and human computer interaction require real-time predictive capabilities. The challenges of addressing real-time application is amplified by the need to operate on resource constrained hardware. Whilst development of real-time methods for these platforms has increased, these models are unable to sufficiently reason about uncertainty present when applied on embedded real-time systems. This paper addresses this by combining deep feature extraction from pre-trained models with Bayesian regression and moment propagation for uncertainty aware predictions. We demonstrate how the proposed method can yield meaningful epistemic uncertainty on embedded hardware in real-time whilst maintaining predictive performance.

replace-cross Learning from Similar Linear Representations: Adaptivity, Minimaxity, and Robustness

Authors: Ye Tian, Yuqi Gu, Yang Feng

Abstract: Representation multi-task learning (MTL) has achieved tremendous success in practice. However, the theoretical understanding of these methods is still lacking. Most existing theoretical works focus on cases where all tasks share the same representation, and claim that MTL almost always improves performance. Nevertheless, as the number of tasks grows, assuming all tasks share the same representation is unrealistic. Furthermore, empirical findings often indicate that a shared representation does not necessarily improve single-task learning performance. In this paper, we aim to understand how to learn from tasks with \textit{similar but not exactly the same} linear representations, while dealing with outlier tasks. Assuming a known intrinsic dimension, we propose a penalized empirical risk minimization method and a spectral method that are \textit{adaptive} to the similarity structure and \textit{robust} to outlier tasks. Both algorithms outperform single-task learning when representations across tasks are sufficiently similar and the proportion of outlier tasks is small. Moreover, they always perform at least as well as single-task learning, even when the representations are dissimilar. We provide information-theoretic lower bounds to demonstrate that both methods are nearly \textit{minimax} optimal in a large regime, with the spectral method being optimal in the absence of outlier tasks. Additionally, we introduce a thresholding algorithm to adapt to an unknown intrinsic dimension. We conduct extensive numerical experiments to validate our theoretical findings.

replace-cross The geometry of financial institutions -- Wasserstein clustering of financial data

Authors: Lorenz Riess, Mathias Beiglb\"ock, Johannes Temme, Andreas Wolf, Julio Backhoff

Abstract: The increasing availability of granular and big data on various objects of interest has made it necessary to develop methods for condensing this information into a representative and intelligible map. Financial regulation is a field that exemplifies this need, as regulators require diverse and often highly granular data from financial institutions to monitor and assess their activities. However, processing and analyzing such data can be a daunting task, especially given the challenges of dealing with missing values and identifying clusters based on specific features. To address these challenges, we propose a variant of Lloyd's algorithm that applies to probability distributions and uses generalized Wasserstein barycenters to construct a metric space which represents given data on various objects in condensed form. By applying our method to the financial regulation context, we demonstrate its usefulness in dealing with the specific challenges faced by regulators in this domain. We believe that our approach can also be applied more generally to other fields where large and complex data sets need to be represented in concise form.

replace-cross Relation-Aware Network with Attention-Based Loss for Few-Shot Knowledge Graph Completion

Authors: Qiao Qiao, Yuepei Li, Kang Zhou, Qi Li

Abstract: Few-shot knowledge graph completion (FKGC) task aims to predict unseen facts of a relation with few-shot reference entity pairs. Current approaches randomly select one negative sample for each reference entity pair to minimize a margin-based ranking loss, which easily leads to a zero-loss problem if the negative sample is far away from the positive sample and then out of the margin. Moreover, the entity should have a different representation under a different context. To tackle these issues, we propose a novel Relation-Aware Network with Attention-Based Loss (RANA) framework. Specifically, to better utilize the plentiful negative samples and alleviate the zero-loss issue, we strategically select relevant negative samples and design an attention-based loss function to further differentiate the importance of each negative sample. The intuition is that negative samples more similar to positive samples will contribute more to the model. Further, we design a dynamic relation-aware entity encoder for learning a context-dependent entity representation. Experiments demonstrate that RANA outperforms the state-of-the-art models on two benchmark datasets.

replace-cross On the quality of randomized approximations of Tukey's depth

Authors: Simon Briend, G\'abor Lugosi, Roberto Imbuzeiro Oliveira

Abstract: Tukey's depth (or halfspace depth) is a widely used measure of centrality for multivariate data. However, exact computation of Tukey's depth is known to be a hard problem in high dimensions. As a remedy, randomized approximations of Tukey's depth have been proposed. In this paper we explore when such randomized algorithms return a good approximation of Tukey's depth. We study the case when the data are sampled from a log-concave isotropic distribution. We prove that, if one requires that the algorithm runs in polynomial time in the dimension, the randomized algorithm correctly approximates the maximal depth $1/2$ and depths close to zero. On the other hand, for any point of intermediate depth, any good approximation requires exponential complexity.

replace-cross Learning unitaries with quantum statistical queries

Authors: Armando Angrisani

Abstract: We propose several algorithms for learning unitary operators from quantum statistical queries with respect to their Choi-Jamiolkowski state. Quantum statistical queries capture the capabilities of a learner with limited quantum resources, which receives as input only noisy estimates of expected values of measurements. Our approach leverages quantum statistical queries to estimate the Fourier mass of a unitary on a subset of Pauli strings, generalizing previous techniques developed for uniform quantum examples. Specifically, we show that the celebrated quantum Goldreich-Levin algorithm can be implemented with quantum statistical queries, whereas the prior version of the algorithm involves oracle access to the unitary and its inverse. As an application, we prove that quantum Boolean functions with constant total influence or with constant degree are efficiently learnable in our model. Moreover, we prove that $\mathcal{O}(\log n)$-juntas are efficiently learnable and constant-depth circuits are learnable query-efficiently with quantum statistical queries. On the other hand, all previous algorithms for these tasks demand significantly greater resources, such as oracle access to the unitary or direct access to the Choi-Jamiolkowski state. We also demonstrate that, despite these positive results, quantum statistical queries lead to an exponentially larger query complexity for certain tasks, compared to separable measurements to the Choi-Jamiolkowski state. In particular, we show an exponential lower bound for learning a class of phase-oracle unitaries and a double exponential lower bound for testing the unitarity of channels. Taken together, our results indicate that quantum statistical queries offer a unified framework for various unitary learning tasks, with potential applications in quantum machine learning, many-body physics and benchmarking of near-term devices.

replace-cross The surrogate Gibbs-posterior of a corrected stochastic MALA: Towards uncertainty quantification for neural networks

Authors: Sebastian Bieringer, Gregor Kasieczka, Maximilian F. Steffen, Mathias Trabs

Abstract: MALA is a popular gradient-based Markov chain Monte Carlo method to access the Gibbs-posterior distribution. Stochastic MALA (sMALA) scales to large data sets, but changes the target distribution from the Gibbs-posterior to a surrogate posterior which only exploits a reduced sample size. We introduce a corrected stochastic MALA (csMALA) with a simple correction term for which distance between the resulting surrogate posterior and the original Gibbs-posterior decreases in the full sample size while retaining scalability. In a nonparametric regression model, we prove a PAC-Bayes oracle inequality for the surrogate posterior. Uncertainties can be quantified by sampling from the surrogate posterior. Focusing on Bayesian neural networks, we analyze the diameter and coverage of credible balls for shallow neural networks and we show optimal contraction rates for deep neural networks. Our credibility result is independent of the correction and can also be applied to the standard Gibbs-posterior. A simulation study in a high-dimensional parameter space demonstrates that an estimator drawn from csMALA based on its surrogate Gibbs-posterior indeed exhibits these advantages in practice.

replace-cross AVTENet: A Human-Cognition-Inspired Audio-Visual Transformer-Based Ensemble Network for Video Deepfake Detection

Authors: Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, Hsin-Min Wang

Abstract: The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries. Most previous studies on detecting artificial intelligence-generated fake videos only utilize visual modality or audio modality. While some methods exploit audio and visual modalities to detect forged videos, they have not been comprehensively evaluated on multimodal datasets of deepfake videos involving acoustic and visual manipulations, and are mostly based on convolutional neural networks with low detection accuracy. Considering that human cognition instinctively integrates multisensory information including audio and visual cues to perceive and interpret content and the success of transformer in various fields, this study introduces the audio-visual transformer-based ensemble network (AVTENet). This innovative framework tackles the complexities of deepfake technology by integrating both acoustic and visual manipulations to enhance the accuracy of video forgery detection. Specifically, the proposed model integrates several purely transformer-based variants that capture video, audio, and audio-visual salient cues to reach a consensus in prediction. For evaluation, we use the recently released benchmark multimodal audio-video FakeAVCeleb dataset. For a detailed analysis, we evaluate AVTENet, its variants, and several existing methods on multiple test sets of the FakeAVCeleb dataset. Experimental results show that the proposed model outperforms all existing methods and achieves state-of-the-art performance on Testset-I and Testset-II of the FakeAVCeleb dataset. We also compare AVTENet against humans in detecting video forgery. The results show that AVTENet significantly outperforms humans.

replace-cross Model-free Posterior Sampling via Learning Rate Randomization

Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

Abstract: In this paper, we introduce Randomized Q-learning (RandQL), a novel randomized model-free algorithm for regret minimization in episodic Markov Decision Processes (MDPs). To the best of our knowledge, RandQL is the first tractable model-free posterior sampling-based algorithm. We analyze the performance of RandQL in both tabular and non-tabular metric space settings. In tabular MDPs, RandQL achieves a regret bound of order $\widetilde{O}(\sqrt{H^{5}SAT})$, where $H$ is the planning horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the number of episodes. For a metric state-action space, RandQL enjoys a regret bound of order $\widetilde{O}(H^{5/2} T^{(d_z+1)/(d_z+2)})$, where $d_z$ denotes the zooming dimension. Notably, RandQL achieves optimistic exploration without using bonuses, relying instead on a novel idea of learning rate randomization. Our empirical study shows that RandQL outperforms existing approaches on baseline exploration environments.

replace-cross Efficient generative adversarial networks using linear additive-attention Transformers

Authors: Emilio Morales-Juarez, Gibran Fuentes-Pineda

Abstract: Although the capacity of deep generative models for image generation, such as Diffusion Models (DMs) and Generative Adversarial Networks (GANs), has dramatically improved in recent years, much of their success can be attributed to computationally expensive architectures. This has limited their adoption and use to research laboratories and companies with large resources, while significantly raising the carbon footprint for training, fine-tuning, and inference. In this work, we present a novel GAN architecture which we call LadaGAN. This architecture is based on a linear attention Transformer block named Ladaformer. The main component of this block is a linear additive-attention mechanism that computes a single attention vector per head instead of the quadratic dot-product attention. We employ Ladaformer in both the generator and discriminator, which reduces the computational complexity and overcomes the training instabilities often associated with Transformer GANs. LadaGAN consistently outperforms existing convolutional and Transformer GANs on benchmark datasets at different resolutions while being significantly more efficient. Moreover, LadaGAN shows competitive performance compared to state-of-the-art multi-step generative models (e.g. DMs) using orders of magnitude less computational resources.

replace-cross Statistical Significance of Feature Importance Rankings

Authors: Jeremy Goldwasser, Giles Hooker

Abstract: Feature importance scores are ubiquitous tools for understanding the predictions of machine learning models. However, many popular attribution methods suffer from high instability due to random sampling. Leveraging novel ideas from hypothesis testing, we devise techniques that ensure the most important features are correct with high-probability guarantees. These assess the set of $K$ top-ranked features, as well as the order of its elements. Given a set of local or global importance scores, we demonstrate how to retrospectively verify the stability of the highest ranks. We then introduce two efficient sampling algorithms that identify the $K$ most important features, perhaps in order, with probability exceeding $1-\alpha$. The theoretical justification for these procedures is validated empirically on SHAP and LIME.

replace-cross VGMShield: Mitigating Misuse of Video Generative Models

Authors: Yan Pang, Baicheng Chen, Yang Zhang, Tianhao Wang

Abstract: With the rapid advancement in video generation, people can conveniently use video generation models to create videos tailored to their specific desires. As a result, there are also growing concerns about the potential misuse of video generation for spreading illegal content and misinformation. In this work, we introduce VGMShield: a set of straightforward but effective mitigations through the lifecycle of fake video generation. We start from fake video detection, trying to understand whether there is uniqueness in generated videos and whether we can differentiate them from real videos; then, we investigate the fake video source tracing problem, which maps a fake video back to the model that generated it. Towards these, we propose to leverage pre-trained models that focus on spatial-temporal dynamics as the backbone to identify inconsistencies in videos. In detail, we analyze fake videos from the perspective of the generation process. Based on the observation of attention shifts, motion variations, and frequency fluctuations, we identify common patterns in the generated video. These patterns serve as the foundation for our experiments on fake video detection and source tracing. Through experiments on seven state-of-the-art open-source models, we demonstrate that current models still cannot reliably reproduce spatial-temporal relationships, and thus, we can accomplish detection and source tracing with over 90% accuracy. Furthermore, anticipating future generative model improvements, we propose a prevention method that adds invisible perturbations to the query images to make the generated videos look unreal. Together with detection and tracing, our multi-faceted set of solutions can effectively mitigate misuse of video generative models.

replace-cross Geometric Deep Learning for Computer-Aided Design: A Survey

Authors: Negar Heidari, Alexandros Iosifidis

Abstract: Geometric Deep Learning techniques have become a transformative force in the field of Computer-Aided Design (CAD), and have the potential to revolutionize how designers and engineers approach and enhance the design process. By harnessing the power of machine learning-based methods, CAD designers can optimize their workflows, save time and effort while making better informed decisions, and create designs that are both innovative and practical. The ability to process the CAD designs represented by geometric data and to analyze their encoded features enables the identification of similarities among diverse CAD models, the proposition of alternative designs and enhancements, and even the generation of novel design alternatives. This survey offers a comprehensive overview of learning-based methods in computer-aided design across various categories, including similarity analysis and retrieval, 2D and 3D CAD model synthesis, and CAD generation from point clouds, and single/multi-view images. Additionally, it provides a complete list of benchmark datasets and their characteristics, along with open-source codes that have propelled research in this domain. The final discussion delves into the challenges prevalent in this field, followed by potential future research directions in this rapidly evolving field.

replace-cross Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

Authors: Donghoon Ahn, Hyoungwon Cho, Jaewon Min, Wooseok Jang, Jungwoo Kim, SeonHwa Kim, Hyun Hee Park, Kyong Hwan Jin, Seungryong Kim

Abstract: Recent studies have demonstrated that diffusion models are capable of generating high-quality samples, but their quality heavily depends on sampling guidance techniques, such as classifier guidance (CG) and classifier-free guidance (CFG). These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration. In this paper, we propose a novel sampling guidance, called Perturbed-Attention Guidance (PAG), which improves diffusion sample quality across both unconditional and conditional settings, achieving this without requiring additional training or the integration of external modules. PAG is designed to progressively enhance the structure of samples throughout the denoising process. It involves generating intermediate samples with degraded structure by substituting selected self-attention maps in diffusion U-Net with an identity matrix, by considering the self-attention mechanisms' ability to capture structural information, and guiding the denoising process away from these degraded samples. In both ADM and Stable Diffusion, PAG surprisingly improves sample quality in conditional and even unconditional scenarios. Moreover, PAG significantly improves the baseline performance in various downstream tasks where existing guidances such as CG or CFG cannot be fully utilized, including ControlNet with empty prompts and image restoration such as inpainting and deblurring.

replace-cross Elements of World Knowledge (EWoK): A Cognition-Inspired Framework for Evaluating Basic World Knowledge in Language Models

Authors: Anna A. Ivanova, Aalok Sathe, Benjamin Lipkin, Unnathi Kumar, Setayesh Radkani, Thomas H. Clark, Carina Kauf, Jennifer Hu, R. T. Pramod, Gabriel Grand, Vivian Paulun, Maria Ryskina, Ekin Aky\"urek, Ethan Wilcox, Nafisa Rashid, Leshem Choshen, Roger Levy, Evelina Fedorenko, Joshua Tenenbaum, Jacob Andreas

Abstract: The ability to build and reason about models of the world is essential for situated language understanding. But evaluating world modeling capabilities in modern AI systems -- especially those based on language models -- has proven challenging, in large part because of the difficulty of disentangling conceptual knowledge about the world from knowledge of surface co-occurrence statistics. This paper presents Elements of World Knowledge (EWoK), a framework for evaluating language models' understanding of the conceptual knowledge underlying world modeling. EWoK targets specific concepts from multiple knowledge domains known to be important for world modeling in humans, from social interactions (help, deceive) to spatial relations (left, right). Objects, agents, and locations in the items can be flexibly filled in, enabling easy generation of multiple controlled datasets. We then introduce EWoK-core-1.0, a dataset of 4,374 items covering 11 world knowledge domains. We evaluate 20 open-weights large language models (1.3B--70B parameters) and compare them with human performance. All tested models perform worse than humans, with results varying drastically across domains. Performance on social interactions and social properties was highest and performance on physical relations and spatial relations was lowest. Overall, this dataset highlights simple cases where even large models struggle and presents rich avenues for targeted research on LLM world modeling capabilities.

replace-cross Training-Conditional Coverage Bounds under Covariate Shift

Authors: Mehrdad Pournaderi, Yu Xiang

Abstract: Conformal prediction methodology has recently been extended to the covariate shift setting, where the distribution of covariates differs between training and test data. While existing results ensure that the prediction sets from these methods achieve marginal coverage above a nominal level, their coverage rate conditional on the training dataset (referred to as training-conditional coverage) remains unexplored. In this paper, we address this gap by deriving upper bounds on the tail of the training-conditional coverage distribution, offering probably approximately correct (PAC) guarantees for these methods. Our results quantify the relationship between the quality of the prediction sets and the severity of distributional changes, and can potentially be used to compute more efficient prediction sets.

replace-cross SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound

Authors: Rishit Dagli, Shivesh Prakash, Robert Wu, Houman Khosravani

Abstract: Generating combined visual and auditory sensory experiences is critical for the consumption of immersive content. Recent advances in neural generative models have enabled the creation of high-resolution content across multiple modalities such as images, text, speech, and videos. Despite these successes, there remains a significant gap in the generation of high-quality spatial audio that complements generated visual content. Furthermore, current audio generation models excel in either generating natural audio or speech or music but fall short in integrating spatial audio cues necessary for immersive experiences. In this work, we introduce SEE-2-SOUND, a zero-shot approach that decomposes the task into (1) identifying visual regions of interest; (2) locating these elements in 3D space; (3) generating mono-audio for each; and (4) integrating them into spatial audio. Using our framework, we demonstrate compelling results for generating spatial audio for high-quality videos, images, and dynamic images from the internet, as well as media generated by learned approaches.

replace-cross Structured Difference-of-Q via Orthogonal Learning

Authors: Defu Cao, Angela Zhou

Abstract: Offline reinforcement learning is important in many settings with available observational data but the inability to deploy new policies online due to safety, cost, and other concerns. Many recent advances in causal inference and machine learning target estimation of causal contrast functions such as CATE, which is sufficient for optimizing decisions and can adapt to potentially smoother structure. We develop a dynamic generalization of the R-learner (Nie and Wager 2021, Lewis and Syrgkanis 2021) for estimating and optimizing the difference of $Q^\pi$-functions, $Q^\pi(s,1)-Q^\pi(s,0)$ (which can be used to optimize multiple-valued actions). We leverage orthogonal estimation to improve convergence rates in the presence of slower nuisance estimation rates and prove consistency of policy optimization under a margin condition. The method can leverage black-box nuisance estimators of the $Q$-function and behavior policy to target estimation of a more structured $Q$-function contrast.

replace-cross On the Utility of Domain-Adjacent Fine-Tuned Model Ensembles for Few-shot Problems

Authors: Md Ibrahim Ibne Alam, Parikshit Ram, Soham Dan, Horst Samulowitz, Koushik Kar

Abstract: Large Language Models (LLMs) have been observed to perform well on a wide range of downstream tasks when fine-tuned on domain-specific data. However, such data may not be readily available in many applications, motivating zero-shot or few-shot approaches using domain-adjacent models. While several fine-tuned models for various tasks are available, finding an appropriate domain-adjacent model for a given task is often not straight forward. In this paper, we study DAFT-E, a framework that utilizes an Ensemble of Domain-Adjacent Fine-Tuned Foundation Models for few-shot problems. We show that for zero-shot problems, this ensembling method provides an accuracy performance close to that of the single best model. With few-shot problems, this performance improves further, at which point DEFT-E can outperform any single domain-adjacent model while requiring much less data for domain-specific fine-tuning.

replace-cross Universal Checkpointing: A Flexible and Efficient Distributed Checkpointing System for Large-Scale DNN Training with Reconfigurable Parallelis

Authors: Xinyu Lian, Sam Ade Jacobs, Lev Kurilenko, Masahiro Tanaka, Stas Bekman, Olatunji Ruwase, Minjia Zhang

Abstract: Deep neural network (DNN) training continues to scale rapidly in terms of model size, data volume, and sequence length, to the point where multiple machines are required to fit large models for training. Different distributed and parallel training strategies have been developed to support large-scale DNN training by partitioning the training state across GPUs. However, existing DNN training systems provide very limited support for reconfiguring parallelism strategies in the middle of the training via checkpointing. This limitation arises because distributed checkpoints are tightly coupled to specific model parallelism and hardware configurations, preventing large-scale training jobs from efficiently adapting to hardware failures or resource elasticity. This paper presents Universal Checkpointing (UCP), a novel checkpointing system that enables flexible and efficient DNN training with reconfigurable parallelism. UCP overcomes challenges in existing systems by decoupling checkpoint structure from parallel training strategies and hardware configurations. In addition, we present a pattern-based reconfiguration pipeline that enables automatic, flexible, and efficient mapping of checkpoint state to various parallelism strategies. Evaluation on a range of DNN models, including state-of-the-art dense and sparse LLMs, shows that UCP enables reconfiguration for a broader set of widely used parallelism strategies than existing solutions while adding negligible reconfiguration cost. UCP has been successfully employed in real LLM training workloads, greatly enhancing their flexibility and resilience to dynamic hardware environments.

replace-cross ReCAP: Recursive Cross Attention Network for Pseudo-Label Generation in Robotic Surgical Skill Assessment

Authors: Julien Quarez, Marc Modat, Sebastien Ourselin, Jonathan Shapey, Alejandro Granados

Abstract: In surgical skill assessment, the Objective Structured Assessments of Technical Skills (OSATS) and Global Rating Scale (GRS) are well-established tools for evaluating surgeons during training. These metrics, along with performance feedback, help surgeons improve and reach practice standards. Recent research on the open-source JIGSAWS dataset, which includes both GRS and OSATS labels, has focused on regressing GRS scores from kinematic data, video, or their combination. However, we argue that regressing GRS alone is limiting, as it aggregates OSATS scores and overlooks clinically meaningful variations during a surgical trial. To address this, we developed a weakly-supervised recurrent transformer model that tracks a surgeon's performance throughout a session by mapping hidden states to six OSATS, derived from kinematic data. These OSATS scores are averaged to predict GRS, allowing us to compare our model's performance against state-of-the-art (SOTA) methods. We report Spearman's Correlation Coefficients (SCC) demonstrating that our model outperforms SOTA using kinematic data (SCC 0.83-0.88), and matches performance with video-based models. Our model also surpasses SOTA in most tasks for average OSATS predictions (SCC 0.46-0.70) and specific OSATS (SCC 0.56-0.95). The generation of pseudo-labels at the segment level translates quantitative predictions into qualitative feedback, vital for automated surgical skill assessment pipelines. A senior surgeon validated our model's outputs, agreeing with 77\% of the weakly-supervised predictions \(p=0.006\).

replace-cross Quantum Computing and Neuromorphic Computing for Safe, Reliable, and explainable Multi-Agent Reinforcement Learning: Optimal Control in Autonomous Robotics

Authors: Mazyar Taghavi, Rahman Farnoosh

Abstract: This paper investigates the utilization of Quantum Computing and Neuromorphic Computing for Safe, Reliable, and Explainable Multi_Agent Reinforcement Learning (MARL) in the context of optimal control in autonomous robotics. The objective was to address the challenges of optimizing the behavior of autonomous agents while ensuring safety, reliability, and explainability. Quantum Computing techniques, including Quantum Approximate Optimization Algorithm (QAOA), were employed to efficiently explore large solution spaces and find approximate solutions to complex MARL problems. Neuromorphic Computing, inspired by the architecture of the human brain, provided parallel and distributed processing capabilities, which were leveraged to develop intelligent and adaptive systems. The combination of these technologies held the potential to enhance the safety, reliability, and explainability of MARL in autonomous robotics. This research contributed to the advancement of autonomous robotics by exploring cutting-edge technologies and their applications in multi-agent systems. Codes and data are available.

replace-cross DTN: Deep Multiple Task-specific Feature Interactions Network for Multi-Task Recommendation

Authors: Yaowen Bi, Yuteng Lian, Jie Cui, Jun Liu, Peijian Wang, Guanghui Li, Xuejun Chen, Jinglin Zhao, Hao Wen, Jing Zhang, Zhaoqi Zhang, Wenzhuo Song, Yang Sun, Weiwei Zhang, Mingchen Cai, Jian Dong, Guanxing Zhang

Abstract: Neural-based multi-task learning (MTL) has been successfully applied to many recommendation applications. However, these MTL models (e.g., MMoE, PLE) did not consider feature interaction during the optimization, which is crucial for capturing complex high-order features and has been widely used in ranking models for real-world recommender systems. Moreover, through feature importance analysis across various tasks in MTL, we have observed an interesting divergence phenomenon that the same feature can have significantly different importance across different tasks in MTL. To address these issues, we propose Deep Multiple Task-specific Feature Interactions Network (DTN) with a novel model structure design. DTN introduces multiple diversified task-specific feature interaction methods and task-sensitive network in MTL networks, enabling the model to learn task-specific diversified feature interaction representations, which improves the efficiency of joint representation learning in a general setup. We applied DTN to our company's real-world E-commerce recommendation dataset, which consisted of over 6.3 billion samples, the results demonstrated that DTN significantly outperformed state-of-the-art MTL models. Moreover, during online evaluation of DTN in a large-scale E-commerce recommender system, we observed a 3.28% in clicks, a 3.10% increase in orders and a 2.70% increase in GMV (Gross Merchandise Value) compared to the state-of-the-art MTL models. Finally, extensive offline experiments conducted on public benchmark datasets demonstrate that DTN can be applied to various scenarios beyond recommendations, enhancing the performance of ranking models.

replace-cross Evaluating the Impact of Multiple DER Aggregators on Wholesale Energy Markets: A Hybrid Mean Field Approach

Authors: Jun He, Andrew L. Liu

Abstract: The integration of distributed energy resources (DERs) into wholesale energy markets can greatly enhance grid flexibility, improve market efficiency, and contribute to a more sustainable energy future. As DERs -- such as solar PV panels and energy storage -- proliferate, effective mechanisms are needed to ensure that small prosumers can participate meaningfully in these markets. We study a wholesale market model featuring multiple DER aggregators, each controlling a portfolio of DER resources and bidding into the market on behalf of the DER asset owners. The key of our approach lies in recognizing the repeated nature of market interactions the ability of participants to learn and adapt over time. Specifically, Aggregators repeatedly interact with each other and with other suppliers in the wholesale market, collectively shaping wholesale electricity prices (aka the locational marginal prices (LMPs)). We model this multi-agent interaction using a mean-field game (MFG), which uses market information -- reflecting the average behavior of market participants -- to enable each aggregator to predict long-term LMP trends and make informed decisions. For each aggregator, because they control the DERs within their portfolio under certain contract structures, we employ a mean-field control (MFC) approach (as opposed to a MFG) to learn an optimal policy that maximizes the total rewards of the DERs under their management. We also propose a reinforcement learning (RL)-based method to help each agent learn optimal strategies within the MFG framework, enhancing their ability to adapt to market conditions and uncertainties. Numerical simulations show that LMPs quickly reach a steady state in the hybrid mean-field approach. Furthermore, our results demonstrate that the combination of energy storage and mean-field learning significantly reduces price volatility compared to scenarios without storage.

replace-cross Graphons of Line Graphs

Authors: Sevvandi Kandanaarachchi, Cheng Soon Ong

Abstract: We consider the problem of estimating graph limits, known as graphons, from observations of sequences of sparse finite graphs. In this paper we show a simple method that can shed light on a subset of sparse graphs. The method involves mapping the original graphs to their line graphs. We show that graphs satisfying a particular property, which we call the square-degree property are sparse, but give rise to dense line graphs. This enables the use of results on graph limits of dense graphs to derive convergence. In particular, star graphs satisfy the square-degree property resulting in dense line graphs and non-zero graphons of line graphs. We demonstrate empirically that we can distinguish different numbers of stars (which are sparse) by the graphons of their corresponding line graphs. Whereas in the original graphs, the different number of stars all converge to the zero graphon due to sparsity. Similarly, superlinear preferential attachment graphs give rise to dense line graphs almost surely. In contrast, dense graphs, including Erdos-Renyi graphs make the line graphs sparse, resulting in the zero graphon.

replace-cross Label-free evaluation of lung and heart transplant biopsies using tissue autofluorescence-based virtual staining

Authors: Yuzhu Li, Nir Pillar, Tairan Liu, Guangdong Ma, Yuxuan Qi, Kevin de Haan, Yijie Zhang, Xilin Yang, Adrian J. Correa, Guangqian Xiao, Kuang-Yu Jen, Kenneth A. Iczkowski, Yulun Wu, William Dean Wallace, Aydogan Ozcan

Abstract: Organ transplantation serves as the primary therapeutic strategy for end-stage organ failures. However, allograft rejection is a common complication of organ transplantation. Histological assessment is essential for the timely detection and diagnosis of transplant rejection and remains the gold standard. Nevertheless, the traditional histochemical staining process is time-consuming, costly, and labor-intensive. Here, we present a panel of virtual staining neural networks for lung and heart transplant biopsies, which digitally convert autofluorescence microscopic images of label-free tissue sections into their brightfield histologically stained counterparts, bypassing the traditional histochemical staining process. Specifically, we virtually generated Hematoxylin and Eosin (H&E), Masson's Trichrome (MT), and Elastic Verhoeff-Van Gieson (EVG) stains for label-free transplant lung tissue, along with H&E and MT stains for label-free transplant heart tissue. Subsequent blind evaluations conducted by three board-certified pathologists have confirmed that the virtual staining networks consistently produce high-quality histology images with high color uniformity, closely resembling their well-stained histochemical counterparts across various tissue features. The use of virtually stained images for the evaluation of transplant biopsies achieved comparable diagnostic outcomes to those obtained via traditional histochemical staining, with a concordance rate of 82.4% for lung samples and 91.7% for heart samples. Moreover, virtual staining models create multiple stains from the same autofluorescence input, eliminating structural mismatches observed between adjacent sections stained in the traditional workflow, while also saving tissue, expert time, and staining costs.

replace-cross Learning Representations for Independence Testing

Authors: Nathaniel Xu, Feng Liu, Danica J. Sutherland

Abstract: Many tools exist to detect dependence between random variables, a core question across a wide range of machine learning, statistical, and scientific endeavors. Although several statistical tests guarantee eventual detection of any dependence with enough samples, standard tests may require an exorbitant amount of samples for detecting subtle dependencies between high-dimensional random variables with complex distributions. In this work, we study two related ways to learn powerful independence tests. First, we show how to construct powerful statistical tests with finite-sample validity by using variational estimators of mutual information, such as the InfoNCE or NWJ estimators. Second, we establish a close connection between these variational mutual information-based tests and tests based on the Hilbert-Schmidt Independence Criterion (HSIC); in particular, learning a variational bound (typically parameterized by a deep network) for mutual information is closely related to learning a kernel for HSIC. Finally, we show how to, rather than selecting a representation to maximize the statistic itself, select a representation which can maximize the power of a test, in either setting; we term the former case a Neural Dependency Statistic (NDS). While HSIC power optimization has been recently considered in the literature, we correct some important misconceptions and expand to considering deep kernels. In our experiments, while all approaches can yield powerful tests with exact level control, optimized HSIC tests generally outperform the other approaches on difficult problems of detecting structured dependence.

replace-cross FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs

Authors: Ehsan Kabir, Md. Arafat Kabir, Austin R. J. Downey, Jason D. Bakos, David Andrews, Miaoqing Huang

Abstract: Transformer neural networks (TNNs) are being applied across a widening range of application domains, including natural language processing (NLP), machine translation, and computer vision (CV). Their popularity is largely attributed to the exceptional performance of their multi-head self-attention blocks when analyzing sequential data and extracting features. To date, there are limited hardware accelerators tailored for this mechanism, which is the first step before designing an accelerator for a complete model. This paper proposes \textit{FAMOUS}, a flexible hardware accelerator for dense multi-head attention (MHA) computation of TNNs on field-programmable gate arrays (FPGAs). It is optimized for high utilization of processing elements and on-chip memories to improve parallelism and reduce latency. An efficient tiling of large matrices has been employed to distribute memory and computing resources across different modules on various FPGA platforms. The design is evaluated on Xilinx Alveo U55C and U200 data center cards containing Ultrascale+ FPGAs. Experimental results are presented that show that it can attain a maximum throughput, number of parallel attention heads, embedding dimension and tile size of 328 (giga operations/second (GOPS)), 8, 768 and 64 respectively on the U55C. Furthermore, it is 3.28$\times$ and 2.6$\times$ faster than the Intel Xeon Gold 5220R CPU and NVIDIA V100 GPU respectively. It is also 1.3$\times$ faster than the fastest state-of-the-art FPGA-based accelerator.

replace-cross Multilevel Picard approximations and deep neural networks with ReLU, leaky ReLU, and softplus activation overcome the curse of dimensionality when approximating semilinear parabolic partial differential equations in $L^p$-sense

Authors: Ariel Neufeld, Tuan Anh Nguyen

Abstract: We prove that multilevel Picard approximations and deep neural networks with ReLU, leaky ReLU, and softplus activation are capable of approximating solutions of semilinear Kolmogorov PDEs in $L^\mathfrak{p}$-sense, $\mathfrak{p}\in [2,\infty)$, in the case of gradient-independent, Lipschitz-continuous nonlinearities, while the computational effort of the multilevel Picard approximations and the required number of parameters in the neural networks grow at most polynomially in both dimension $d\in \mathbb{N}$ and reciprocal of the prescribed accuracy $\epsilon$.

replace-cross Symmetry-Robust 3D Orientation Estimation

Authors: Christopher Scarvelis, David Benhaim, Paul Zhang

Abstract: Orientation estimation is a fundamental task in 3D shape analysis which consists of estimating a shape's orientation axes: its side-, up-, and front-axes. Using this data, one can rotate a shape into canonical orientation, where its orientation axes are aligned with the coordinate axes. Developing an orientation algorithm that reliably estimates complete orientations of general shapes remains an open problem. We introduce a two-stage orientation pipeline that achieves state of the art performance on up-axis estimation and further demonstrate its efficacy on full-orientation estimation, where one seeks all three orientation axes. Unlike previous work, we train and evaluate our method on all of Shapenet rather than a subset of classes. We motivate our engineering contributions by theory describing fundamental obstacles to orientation estimation for rotationally-symmetric shapes, and show how our method avoids these obstacles.

replace-cross Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions

Authors: Marko Tuononen, Ville Hautam\"aki

Abstract: Mutual information provides a powerful, general-purpose metric for quantifying the amount of shared information between variables. Estimating normalized mutual information using a k-Nearest Neighbor (k-NN) based approach involves the calculation of the scaling-invariant k-NN radius. Calculation of the radius suffers from numerical overflow when the joint dimensionality of the data becomes high, typically in the range of several hundred dimensions. To address this issue, we propose a logarithmic transformation technique that improves the numerical stability of the radius calculation in high-dimensional spaces. By applying the proposed transformation during the calculation of the radius, numerical overflow is avoided, and precision is maintained. Proposed transformation is validated through both theoretical analysis and empirical evaluation, demonstrating its ability to stabilize the calculation without compromising precision, increasing bias, or adding significant computational overhead, while also helping to maintain estimator variance.

replace-cross SHyPar: A Spectral Coarsening Approach to Hypergraph Partitioning

Authors: Hamed Sajadinia, Ali Aghdaei, Zhuo Feng

Abstract: State-of-the-art hypergraph partitioners utilize a multilevel paradigm to construct progressively coarser hypergraphs across multiple layers, guiding cut refinements at each level of the hierarchy. Traditionally, these partitioners employ heuristic methods for coarsening and do not consider the structural features of hypergraphs. In this work, we introduce a multilevel spectral framework, SHyPar, for partitioning large-scale hypergraphs by leveraging hyperedge effective resistances and flow-based community detection techniques. Inspired by the latest theoretical spectral clustering frameworks, such as HyperEF and HyperSF, SHyPar aims to decompose large hypergraphs into multiple subgraphs with few inter-partition hyperedges (cut size). A key component of SHyPar is a flow-based local clustering scheme for hypergraph coarsening, which incorporates a max-flow-based algorithm to produce clusters with substantially improved conductance. Additionally, SHyPar utilizes an effective resistance-based rating function for merging nodes that are strongly connected (coupled). Compared with existing state-of-the-art hypergraph partitioning methods, our extensive experimental results on real-world VLSI designs demonstrate that SHyPar can more effectively partition hypergraphs, achieving state-of-the-art solution quality.

replace-cross Universal approximation results for neural networks with non-polynomial activation function over non-compact domains

Authors: Ariel Neufeld, Philipp Schmocker

Abstract: This paper extends the universal approximation property of single-hidden-layer feedforward neural networks beyond compact domains, which is of particular interest for the approximation within weighted $C^k$-spaces and weighted Sobolev spaces over unbounded domains. More precisely, by assuming that the activation function is non-polynomial, we establish universal approximation results within function spaces defined over non-compact subsets of a Euclidean space, including $L^p$-spaces, weighted $C^k$-spaces, and weighted Sobolev spaces, where the latter two include the approximation of the (weak) derivatives. Moreover, we provide some dimension-independent rates for approximating a function with sufficiently regular and integrable Fourier transform by neural networks with non-polynomial activation function.

replace-cross MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning

Authors: Suning Huang, Zheyu Zhang, Tianhai Liang, Yihan Xu, Zhehao Kou, Chenhao Lu, Guowei Xu, Zhengrong Xue, Huazhe Xu

Abstract: Visual deep reinforcement learning (RL) enables robots to acquire skills from visual input for unstructured tasks. However, current algorithms suffer from low sample efficiency, limiting their practical applicability. In this work, we present MENTOR, a method that improves both the architecture and optimization of RL agents. Specifically, MENTOR replaces the standard multi-layer perceptron (MLP) with a mixture-of-experts (MoE) backbone and introduces a task-oriented perturbation mechanism. MENTOR outperforms state-of-the-art methods across three simulation benchmarks and achieves an average of 83% success rate on three challenging real-world robotic manipulation tasks, significantly surpassing the 32% success rate of the strongest existing model-free visual RL algorithm. These results underscore the importance of sample efficiency in advancing visual RL for real-world robotics. Experimental videos are available at https://suninghuang19.github.io/mentor_page/.

URLs: https://suninghuang19.github.io/mentor_page/.

replace-cross Music102: An $D_{12}$-equivariant transformer for chord progression accompaniment

Authors: Weiliang Luo

Abstract: We present Music102, an advanced model aimed at enhancing chord progression accompaniment through a $D_{12}$-equivariant transformer. Inspired by group theory and symbolic music structures, Music102 leverages musical symmetry--such as transposition and reflection operations--integrating these properties into the transformer architecture. By encoding prior music knowledge, the model maintains equivariance across both melody and chord sequences. The POP909 dataset was employed to train and evaluate Music102, revealing significant improvements over the non-equivariant Music101 prototype Music101 in both weighted loss and exact accuracy metrics, despite using fewer parameters. This work showcases the adaptability of self-attention mechanisms and layer normalization to the discrete musical domain, addressing challenges in computational music analysis. With its stable and flexible neural framework, Music102 sets the stage for further exploration in equivariant music generation and computational composition tools, bridging mathematical theory with practical music performance.

replace-cross Learning Maximal Safe Sets Using Hypernetworks for MPC-based Local Trajectory Planning in Unknown Environments

Authors: Bojan Deraji\'c, Mohamed-Khalil Bouzidi, Sebastian Bernhard, Wolfgang H\"onig

Abstract: This paper presents a novel learning-based approach for online estimation of maximal safe sets for local trajectory planning in unknown static environments. The neural representation of a set is used as the terminal set constraint for a model predictive control (MPC) local planner, resulting in improved recursive feasibility and safety. To achieve real-time performance and desired generalization properties, we employ the idea of hypernetworks. We use the Hamilton-Jacobi (HJ) reachability analysis as the source of supervision during the training process, allowing us to consider general nonlinear dynamics and arbitrary constraints. The proposed method is extensively evaluated against relevant baselines in simulations for different environments and robot dynamics. The results show an increase in success rate of up to 52% compared to the best baseline while maintaining comparable execution speed. Additionally, we deploy our proposed method, NTC-MPC, on a physical robot and demonstrate its ability to safely avoid obstacles in scenarios where the baselines fail.

replace-cross Data Matters: The Case of Predicting Mobile Cellular Traffic

Authors: Natalia Vesselinova, Matti Harjula, Pauliina Ilmonen

Abstract: Accurate predictions of base stations' traffic load are essential to mobile cellular operators and their users as they support the efficient use of network resources and allow delivery of services that sustain smart cities and roads. Traditionally, cellular network time-series have been considered for this prediction task. More recently, exogenous factors such as points of interest and other environmental knowledge have been explored too. In contrast to incorporating external factors, we propose to learn the processes underlying cellular load generation by employing population dynamics data. In this study, we focus on smart roads and use road traffic measures to improve prediction accuracy. Comprehensive experiments demonstrate that by employing road flow and speed, in addition to cellular network metrics, base station load prediction errors can be substantially reduced, by as much as $56.5\%.$ The code, visualizations and extensive results are available on https://github.com/nvassileva/DataMatters.

URLs: https://github.com/nvassileva/DataMatters.

replace-cross Self-Consistency Preference Optimization

Authors: Archiki Prasad, Weizhe Yuan, Richard Yuanzhe Pang, Jing Xu, Maryam Fazel-Zarandi, Mohit Bansal, Sainbayar Sukhbaatar, Jason Weston, Jane Yu

Abstract: Self-alignment, whereby models learn to improve themselves without human annotation, is a rapidly growing research area. However, existing techniques often fail to improve complex reasoning tasks due to the difficulty of assigning correct rewards. An orthogonal approach that is known to improve correctness is self-consistency, a method applied at inference time based on multiple sampling in order to find the most consistent answer. In this work, we extend the self-consistency concept to help train models. We thus introduce self-consistency preference optimization (ScPO), which iteratively trains consistent answers to be preferred over inconsistent ones on unsupervised new problems. We show ScPO leads to large improvements over conventional reward model training on reasoning tasks such as GSM8K and MATH, closing the gap with supervised training with gold answers or preferences, and that combining ScPO with standard supervised learning improves results even further. On ZebraLogic, ScPO finetunes Llama-3 8B to be superior to Llama-3 70B, Gemma-2 27B, and Claude-3 Haiku.

replace-cross A Novel Automatic Real-time Motion Tracking Method in MRI-guided Radiotherapy Using Enhanced Tracking-Learning-Detection Framework with Automatic Segmentation

Authors: Shengqi Chen, Zilin Wang, Jianrong Dai, Shirui Qin, Ying Cao, Ruiao Zhao, Jiayun Chen, Guohua Wu, Yuan Tang

Abstract: Background and Purpose: Accurate motion tracking in MRI-guided Radiotherapy (MRIgRT) is essential for effective treatment delivery. This study aimed to enhance motion tracking precision in MRIgRT through an automatic real-time markerless tracking method using an enhanced Tracking-Learning-Detection (ETLD) framework with automatic segmentation. Materials and Methods: We developed a novel MRIgRT motion tracking and segmentation method by integrating the ETLD framework with an improved Chan-Vese model (ICV), named ETLD+ICV. The ETLD framework was upgraded for real-time cine MRI, including advanced image preprocessing, no-reference image quality assessment, an enhanced median-flow tracker, and a refined detector with dynamic search region adjustments. ICV was used for precise target volume coverage, refining the segmented region frame by frame using tracking results, with key parameters optimized. The method was tested on 3.5D MRI scans from 10 patients with liver metastases. Results: Evaluation of 106,000 frames across 77 treatment fractions showed sub-millimeter tracking errors of less than 0.8mm, with over 99% precision and 98% recall for all subjects in the Beam Eye View(BEV)/Beam Path View(BPV) orientation. The ETLD+ICV method achieved a dice global score of more than 82% for all subjects, demonstrating the method's extensibility and precise target volume coverage. Conclusion: This study successfully developed an automatic real-time markerless motion tracking method for MRIgRT that significantly outperforms current methods. The novel method not only delivers exceptional precision in tracking and segmentation but also shows enhanced adaptability to clinical demands, making it an indispensable asset in improving the efficacy of radiotherapy treatments.

replace-cross Sensor-fusion based Prognostics for Deep-space Habitats Exhibiting Multiple Unlabeled Failure Modes

Authors: Benjamin Peters, Ayush Mohanty, Xiaolei Fang, Stephen K. Robinson, Nagi Gebraeel

Abstract: Deep-space habitats are complex systems that must operate autonomously over extended durations without ground-based maintenance. These systems are vulnerable to multiple, often unknown, failure modes that affect different subsystems and sensors in mode-specific ways. Developing accurate remaining useful life (RUL) prognostics is challenging, especially when failure labels are unavailable and sensor relevance varies by failure mode. In this paper, we propose an unsupervised prognostics framework that jointly identifies latent failure modes and selects informative sensors using only unlabeled training data. The methodology consists of two phases. In the offline phase, we model system failure times using a mixture of Gaussian regressions and apply a novel Expectation-Maximization algorithm to cluster degradation trajectories and select mode-specific sensors. In the online phase, we extract low-dimensional features from the selected sensors to diagnose the active failure mode and predict RUL using a weighted regression model. We demonstrate the effectiveness of our approach on a simulated dataset that reflects deep-space telemetry characteristics and on a real-world engine degradation dataset, showing improved accuracy and interpretability over existing methods.

replace-cross A Runtime-Adaptive Transformer Neural Network Accelerator on FPGAs

Authors: Ehsan Kabir, Austin R. J. Downey, Jason D. Bakos, David Andrews, Miaoqing Huang

Abstract: Transformer neural networks (TNN) excel in natural language processing (NLP), machine translation, and computer vision (CV) without relying on recurrent or convolutional layers. However, they have high computational and memory demands, particularly on resource-constrained devices like FPGAs. Moreover, transformer models vary in processing time across applications, requiring custom models with specific parameters. Designing custom accelerators for each model is complex and time-intensive. Some custom accelerators exist with no runtime adaptability, and they often rely on sparse matrices to reduce latency. However, hardware designs become more challenging due to the need for application-specific sparsity patterns. This paper introduces ADAPTOR, a runtime-adaptive accelerator for dense matrix computations in transformer encoders and decoders on FPGAs. ADAPTOR enhances the utilization of processing elements and on-chip memory, enhancing parallelism and reducing latency. It incorporates efficient matrix tiling to distribute resources across FPGA platforms and is fully quantized for computational efficiency and portability. Evaluations on Xilinx Alveo U55C data center cards and embedded platforms like VC707 and ZCU102 show that our design is 1.2$\times$ and 2.87$\times$ more power efficient than the NVIDIA K80 GPU and the i7-8700K CPU respectively. Additionally, it achieves a speedup of 1.7 to 2.25$\times$ compared to some state-of-the-art FPGA-based accelerators.

replace-cross QCResUNet: Joint Subject-level and Voxel-level Segmentation Quality Prediction

Authors: Peijie Qiu, Satrajit Chakrabarty, Phuc Nguyen, Soumyendu Sekhar Ghosh, Aristeidis Sotiras

Abstract: Deep learning has made significant strides in automated brain tumor segmentation from magnetic resonance imaging (MRI) scans in recent years. However, the reliability of these tools is hampered by the presence of poor-quality segmentation outliers, particularly in out-of-distribution samples, making their implementation in clinical practice difficult. Therefore, there is a need for quality control (QC) to screen the quality of the segmentation results. Although numerous automatic QC methods have been developed for segmentation quality screening, most were designed for cardiac MRI segmentation, which involves a single modality and a single tissue type. Furthermore, most prior works only provided subject-level predictions of segmentation quality and did not identify erroneous parts segmentation that may require refinement. To address these limitations, we proposed a novel multi-task deep learning architecture, termed QCResUNet, which produces subject-level segmentation-quality measures as well as voxel-level segmentation error maps for each available tissue class. To validate the effectiveness of the proposed method, we conducted experiments on assessing its performance on evaluating the quality of two distinct segmentation tasks. First, we aimed to assess the quality of brain tumor segmentation results. For this task, we performed experiments on one internal and two external datasets. Second, we aimed to evaluate the segmentation quality of cardiac Magnetic Resonance Imaging (MRI) data from the Automated Cardiac Diagnosis Challenge. The proposed method achieved high performance in predicting subject-level segmentation-quality metrics and accurately identifying segmentation errors on a voxel basis. This has the potential to be used to guide human-in-the-loop feedback to improve segmentations in clinical settings.

replace-cross A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment

Authors: Raanan Y. Rohekar, Yaniv Gurwicz, Sungduk Yu, Estelle Aflalo, Vasudev Lal

Abstract: Are generative pre-trained transformer (GPT) models, trained only to predict the next token, implicitly learning a world model from which sequences are generated one token at a time? We address this question by deriving a causal interpretation of the attention mechanism in GPT and presenting a causal world model that arises from this interpretation. Furthermore, we propose that GPT models, at inference time, can be utilized for zero-shot causal structure learning for input sequences, and introduce a corresponding confidence score. Empirical tests were conducted in controlled environments using the setups of the Othello and Chess strategy games. A GPT, pre-trained on real-world games played with the intention of winning, was tested on out-of-distribution synthetic data consisting of sequences of random legal moves. We find that the GPT model is likely to generate legal next moves for out-of-distribution sequences for which a causal structure is encoded in the attention mechanism with high confidence. In cases where it generates illegal moves, it also fails to capture a causal structure.

replace-cross Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation

Authors: Shengqi Liu, Yuhao Cheng, Zhuo Chen, Xingyu Ren, Wenhan Zhu, Lincheng Li, Mengxiao Bi, Xiaokang Yang, Yichao Yan

Abstract: Generating sewing patterns in garment design is receiving increasing attention due to its CG-friendly and flexible-editing nature. Previous sewing pattern generation methods have been able to produce exquisite clothing, but struggle to design complex garments with detailed control. To address these issues, we propose SewingLDM, a multi-modal generative model that generates sewing patterns controlled by text prompts, body shapes, and garment sketches. Initially, we extend the original vector of sewing patterns into a more comprehensive representation to cover more intricate details and then compress them into a compact latent space. To learn the sewing pattern distribution in the latent space, we design a two-step training strategy to inject the multi-modal conditions, \ie, body shapes, text prompts, and garment sketches, into a diffusion model, ensuring the generated garments are body-suited and detail-controlled. Comprehensive qualitative and quantitative experiments show the effectiveness of our proposed method, significantly surpassing previous approaches in terms of complex garment design and various body adaptability. Our project page: https://shengqiliu1.github.io/SewingLDM.

URLs: https://shengqiliu1.github.io/SewingLDM.

replace-cross LEDA: Log-Euclidean Diffeomorphism Autoencoder for Efficient Statistical Analysis of Diffeomorphisms

Authors: Krithika Iyer, Shireen Elhabian, Sarang Joshi

Abstract: Image registration is a core task in computational anatomy that establishes correspondences between images. Invertible deformable registration, which computes a deformation field and handles complex, non-linear transformations, is essential for tracking anatomical variations, especially in neuroimaging applications where inter-subject differences and longitudinal changes are key. Analyzing the deformation fields is challenging due to their non-linearity, which limits statistical analysis. However, traditional approaches for analyzing deformation fields are computationally expensive, sensitive to initialization, and prone to numerical errors, especially when the deformation is far from the identity. To address these limitations, we propose the Log-Euclidean Diffeomorphism Autoencoder (LEDA), an innovative framework designed to compute the principal logarithm of deformation fields by efficiently predicting consecutive square roots. LEDA operates within a linearized latent space that adheres to the diffeomorphisms group action laws, enhancing our model's robustness and applicability. We also introduce a loss function to enforce inverse consistency, ensuring accurate latent representations of deformation fields. Extensive experiments with the OASIS-1 dataset demonstrate the effectiveness of LEDA in accurately modeling and analyzing complex non-linear deformations while maintaining inverse consistency. Additionally, we evaluate its ability to capture and incorporate clinical variables, enhancing its relevance for clinical applications.

replace-cross DMesh++: An Efficient Differentiable Mesh for Complex Shapes

Authors: Sanghyun Son, Matheus Gadelha, Yang Zhou, Matthew Fisher, Zexiang Xu, Yi-Ling Qiao, Ming C. Lin, Yi Zhou

Abstract: Recent probabilistic methods for 3D triangular meshes capture diverse shapes by differentiable mesh connectivity, but face high computational costs with increased shape details. We introduce a new differentiable mesh processing method that addresses this challenge and efficiently handles meshes with intricate structures. Our method reduces time complexity from O(N) to O(log N) and requires significantly less memory than previous approaches. Building on this innovation, we present a reconstruction algorithm capable of generating complex 2D and 3D shapes from point clouds or multi-view images. Visit our project page (https://sonsang.github.io/dmesh2-project) for source code and supplementary material.

URLs: https://sonsang.github.io/dmesh2-project)

replace-cross Enhancing Long Video Generation Consistency without Tuning

Authors: Xingyao Li, Fengzhuo Zhang, Jiachun Pan, Yunlong Hou, Vincent Y. F. Tan, Zhuoran Yang

Abstract: Despite the considerable progress achieved in the long video generation problem, there is still significant room to improve the consistency of the generated videos, particularly in terms of their smoothness and transitions between scenes. We address these issues to enhance the consistency and coherence of videos generated with either single or multiple prompts. We propose the Time-frequency based temporal Attention Reweighting Algorithm (TiARA), which judiciously edits the attention score matrix based on the Discrete Short-Time Fourier Transform. This method is supported by a frequency-based analysis, ensuring that the edited attention score matrix achieves improved consistency across frames. It represents the first-of-its-kind for frequency-based methods in video diffusion models. For videos generated by multiple prompts, we further uncover key factors such as the alignment of the prompts affecting prompt interpolation quality. Inspired by our analyses, we propose PromptBlend, an advanced prompt interpolation pipeline that systematically aligns the prompts. Extensive experimental results validate the efficacy of our proposed method, demonstrating consistent and substantial improvements over multiple baselines.

replace-cross Mask Approximation Net: A Novel Diffusion Model Approach for Remote Sensing Change Captioning

Authors: Dongwei Sun, Jing Yao, Wu Xue, Changsheng Zhou, Pedram Ghamisi, Xiangyong Cao

Abstract: Remote sensing image change description represents an innovative multimodal task within the realm of remote sensing processing.This task not only facilitates the detection of alterations in surface conditions, but also provides comprehensive descriptions of these changes, thereby improving human interpretability and interactivity.Current deep learning methods typically adopt a three stage framework consisting of feature extraction, feature fusion, and change localization, followed by text generation. Most approaches focus heavily on designing complex network modules but lack solid theoretical guidance, relying instead on extensive empirical experimentation and iterative tuning of network components. This experience-driven design paradigm may lead to overfitting and design bottlenecks, thereby limiting the model's generalizability and adaptability.To address these limitations, this paper proposes a paradigm that shift towards data distribution learning using diffusion models, reinforced by frequency-domain noise filtering, to provide a theoretically motivated and practically effective solution to multimodal remote sensing change description.The proposed method primarily includes a simple multi-scale change detection module, whose output features are subsequently refined by a well-designed diffusion model.Furthermore, we introduce a frequency-guided complex filter module to boost the model performance by managing high-frequency noise throughout the diffusion process. We validate the effectiveness of our proposed method across several datasets for remote sensing change detection and description, showcasing its superior performance compared to existing techniques. The code will be available at \href{https://github.com/sundongwei}{MaskApproxNet}.

URLs: https://github.com/sundongwei

replace-cross Learning Physically Interpretable Atmospheric Models from Data with WSINDy

Authors: Seth Minor, Daniel A. Messenger, Vanja Dukic, David M. Bortz

Abstract: The multiscale and turbulent nature of Earth's atmosphere has historically rendered accurate weather modeling a hard problem. Recently, there has been an explosion of interest surrounding data-driven approaches to weather modeling, which in many cases show improved forecasting accuracy and computational efficiency when compared to traditional methods. However, many of the current data-driven approaches employ highly parameterized neural networks, often resulting in uninterpretable models and limited gains in scientific understanding. In this work, we address the interpretability problem by explicitly discovering partial differential equations governing atmospheric phenomena, identifying symbolic mathematical models with direct physical interpretations. The purpose of this paper is to demonstrate that, in particular, the Weak form Sparse Identification of Nonlinear Dynamics (WSINDy) algorithm can learn effective atmospheric models from both simulated and assimilated data. Our approach adapts the standard WSINDy algorithm to work with high-dimensional fluid data of arbitrary spatial dimension.

replace-cross REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models

Authors: Jian Hu, Jason Klein Liu, Wei Shen

Abstract: Reinforcement Learning from Human Feedback (RLHF) is crucial in aligning large language models (LLMs) with human values and preferences. While state-of-the-art applications like ChatGPT/GPT-4 commonly employ Proximal Policy Optimization (PPO), including a critic network introduces significant computational overhead. REINFORCE-based methods, such as REINFORCE Leave One-Out (RLOO), ReMax, and Group Relative Policy Optimization (GRPO), address this limitation by eliminating the critic network. However, these approaches face challenges in accurate advantage estimation. Specifically, they estimate advantages independently for responses to each prompt, which can lead to overfitting on more straightforward prompts and vulnerability to reward hacking. To address these challenges, we introduce REINFORCE++, a novel approach that removes the critic model while using the normalized reward of a batch as the baseline. Our empirical evaluation demonstrates that REINFORCE++ exhibits robust performance across various reward models without requiring prompt set truncation. Furthermore, it achieves superior generalization in RLHF and long chain-of-thought (CoT) settings compared to REINFORCE-based methods. The implementation is available at https://github.com/OpenRLHF/OpenRLHF.

URLs: https://github.com/OpenRLHF/OpenRLHF.

replace-cross Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos

Authors: Yanlai Yang, Mengye Ren

Abstract: Self-supervised learning holds the promise of learning good representations from real-world continuous uncurated data streams. However, most existing works in visual self-supervised learning focus on static images or artificial data streams. Towards exploring a more realistic learning substrate, we investigate streaming self-supervised learning from long-form real-world egocentric video streams. Inspired by the event segmentation mechanism in human perception and memory, we propose "Memory Storyboard" that groups recent past frames into temporal segments for more effective summarization of the past visual streams for memory replay. To accommodate efficient temporal segmentation, we propose a two-tier memory hierarchy: the recent past is stored in a short-term memory, and the storyboard temporal segments are then transferred to a long-term memory. Experiments on real-world egocentric video datasets including SAYCam and KrishnaCam show that contrastive learning objectives on top of storyboard frames result in semantically meaningful representations that outperform those produced by state-of-the-art unsupervised continual learning methods.

replace-cross A dimensionality reduction technique based on the Gromov-Wasserstein distance

Authors: Rafael P. Eufrazio, Eduardo Fernandes Montesuma, Charles C. Cavalcante

Abstract: Analyzing relationships between objects is a pivotal problem within data science. In this context, Dimensionality reduction (DR) techniques are employed to generate smaller and more manageable data representations. This paper proposes a new method for dimensionality reduction, based on optimal transportation theory and the Gromov-Wasserstein distance. We offer a new probabilistic view of the classical Multidimensional Scaling (MDS) algorithm and the nonlinear dimensionality reduction algorithm, Isomap (Isometric Mapping or Isometric Feature Mapping) that extends the classical MDS, in which we use the Gromov-Wasserstein distance between the probability measure of high-dimensional data, and its low-dimensional representation. Through gradient descent, our method embeds high-dimensional data into a lower-dimensional space, providing a robust and efficient solution for analyzing complex high-dimensional datasets.

replace-cross A Neurosymbolic Framework for Geometric Reduction of Binary Forms

Authors: Ilias Kotsireas, Tony Shaska

Abstract: This paper compares Julia reduction and hyperbolic reduction with the aim of finding equivalent binary forms with minimal coefficients. We demonstrate that hyperbolic reduction generally outperforms Julia reduction, particularly in the cases of sextics and decimics, though neither method guarantees achieving the minimal form. We further propose an additional shift and scaling to approximate the minimal form more closely. Finally, we introduce a machine learning framework to identify optimal transformations that minimize the heights of binary forms. This study provides new insights into the geometry and algebra of binary forms and highlights the potential of AI in advancing symbolic computation and reduction techniques. The findings, supported by extensive computational experiments, lay the groundwork for hybrid approaches that integrate traditional reduction methods with data-driven techniques.

replace-cross MORPH-LER: Log-Euclidean Regularization for Population-Aware Image Registration

Authors: Mokshagna Sai Teja Karanam, Krithika Iyer, Sarang Joshi, Shireen Elhabian

Abstract: Spatial transformations that capture population-level morphological statistics are critical for medical image analysis. Commonly used smoothness regularizers for image registration fail to integrate population statistics, leading to anatomically inconsistent transformations. Inverse consistency regularizers promote geometric consistency but lack population morphometrics integration. Regularizers that constrain deformation to low-dimensional manifold methods address this. However, they prioritize reconstruction over interpretability and neglect diffeomorphic properties, such as group composition and inverse consistency. We introduce MORPH-LER, a Log-Euclidean regularization framework for population-aware unsupervised image registration. MORPH-LER learns population morphometrics from spatial transformations to guide and regularize registration networks, ensuring anatomically plausible deformations. It features a bottleneck autoencoder that computes the principal logarithm of deformation fields via iterative square-root predictions. It creates a linearized latent space that respects diffeomorphic properties and enforces inverse consistency. By integrating a registration network with a diffeomorphic autoencoder, MORPH-LER produces smooth, meaningful deformation fields. The framework offers two main contributions: (1) a data-driven regularization strategy that incorporates population-level anatomical statistics to enhance transformation validity and (2) a linearized latent space that enables compact and interpretable deformation fields for efficient population morphometrics analysis. We validate MORPH-LER across two families of deep learning-based registration networks, demonstrating its ability to produce anatomically accurate, computationally efficient, and statistically meaningful transformations on the OASIS-1 brain imaging dataset. https://github.com/iyerkrithika21/MORPH_LER

URLs: https://github.com/iyerkrithika21/MORPH_LER

replace-cross Identifying Large-Scale Linear Parameter Varying Systems with Dynamic Mode Decomposition Methods

Authors: Jean Panaioti Jordanou, Eduardo Camponogara, Eduardo Gildin

Abstract: Linear Parameter Varying (LPV) Systems are a well-established class of nonlinear systems with a rich theory for stability analysis, control, and analytical response finding, among other aspects. Although there are works on data-driven identification of such systems, the literature is quite scarce in terms of works that tackle the identification of LPV models for large-scale systems. Since large-scale systems are ubiquitous in practice, this work develops a methodology for the local and global identification of large-scale LPV systems based on nonintrusive reduced-order modeling. The developed method is coined as DMD-LPV for being inspired in the Dynamic Mode Decomposition (DMD). To validate the proposed identification method, we identify a system described by a discretized linear diffusion equation, with the diffusion gain defined by a polynomial over a parameter. The experiments show that the proposed method can easily identify a reduced-order LPV model of a given large-scale system without the need to perform identification in the full-order dimension, and with almost no performance decay over performing a reduction, given that the model structure is well-established.

replace-cross Particle Trajectory Representation Learning with Masked Point Modeling

Authors: Sam Young, Yeon-jae Jwa, Kazuhiro Terao

Abstract: Effective self-supervised learning (SSL) techniques have been key to unlocking large datasets for representation learning. While many promising methods have been developed using online corpora and captioned photographs, their application to scientific domains, where data encodes highly specialized knowledge, remains a challenge. Liquid Argon Time Projection Chambers (LArTPCs) provide high-resolution 3D imaging for fundamental physics, but analysis of their sparse, complex point cloud data often relies on supervised methods trained on large simulations, introducing potential biases. We introduce the Point-based Liquid Argon Masked Autoencoder (PoLAr-MAE), applying masked point modeling to unlabeled LArTPC images using domain-specific volumetric tokenization and energy prediction. We show this SSL approach learns physically meaningful trajectory representations directly from data. This yields remarkable data efficiency: fine-tuning on just 100 labeled events achieves track/shower semantic segmentation performance comparable to the state-of-the-art supervised baseline trained on $>$100,000 events. Furthermore, internal attention maps exhibit emergent instance segmentation of particle trajectories. While challenges remain, particularly for fine-grained features, we make concrete SSL's potential for building a foundation model for LArTPC image analysis capable of serving as a common base for all data reconstruction tasks. To facilitate further progress, we release PILArNet-M, a large dataset of 1M LArTPC events. Project site: https://youngsm.com/polarmae.

URLs: https://youngsm.com/polarmae.

replace-cross Linear Bandits with Partially Observable Features

Authors: Wonyoung Kim, Sungwoo Park, Garud Iyengar, Assaf Zeevi, Min-hwan Oh

Abstract: We study the linear bandit problem that accounts for partially observable features. Without proper handling, unobserved features can lead to linear regret in the decision horizon $T$, as their influence on rewards is unknown. To tackle this challenge, we propose a novel theoretical framework and an algorithm with sublinear regret guarantees. The core of our algorithm consists of: (i) feature augmentation, by appending basis vectors that are orthogonal to the row space of the observed features; and (ii) the introduction of a doubly robust estimator. Our approach achieves a regret bound of $\tilde{O}(\sqrt{(d + d_h)T})$, where $d$ denotes the dimension of the observed features, and $d_h$ represents the number of nonzero coefficients in the parameter associated with the reward component projected onto the subspace orthogonal to the row space spanned by the observed features. Notably, our algorithm requires no prior knowledge of the unobserved feature space, which may expand as more features become hidden. Numerical experiments confirm that our algorithm outperforms both non-contextual multi-armed bandits and linear bandit algorithms depending solely on observed features.

replace-cross Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach

Authors: Timo Fudala, Vasileios Tsouvalas, Nirvana Meratnia

Abstract: Multimodal transformers integrate diverse data types like images, audio, and text, advancing tasks such as audio-visual understanding and image-text retrieval; yet their high parameterization limits deployment on resource-constrained edge devices. Split Learning (SL), which partitions models at a designated cut-layer to offload compute-intensive operations to the server, offers a promising approach for distributed training of multimodal transformers, though its application remains underexplored. We present MPSL, a parallel SL approach for computational efficient fine-tuning of multimodal transformers in a distributed manner, while eliminating label sharing, client synchronization, and per-client sub-model management. MPSL employs lightweight client-side tokenizers and a unified modality-agnostic encoder, allowing flexible adaptation to task-specific needs. Our evaluation across 7 multimodal datasets demonstrates that MPSL matches or outperforms Federated Learning, reduces client-side computations by 250x, and achieves superior scalability in communication cost with model growth. Through extensive analysis, we highlight task suitability, trade-offs, and scenarios where MPSL excels, inspiring further exploration.

replace-cross Confidence Intervals for Evaluation of Data Mining

Authors: Zheng Yuan, Wenxin Jiang

Abstract: In data mining, when binary prediction rules are used to predict a binary outcome, many performance measures are used in a vast array of literature for the purposes of evaluation and comparison. Some examples include classification accuracy, precision, recall, F measures, and Jaccard index. Typically, these performance measures are only approximately estimated from a finite dataset, which may lead to findings that are not statistically significant. In order to properly quantify such statistical uncertainty, it is important to provide confidence intervals associated with these estimated performance measures. We consider statistical inference about general performance measures used in data mining, with both individual and joint confidence intervals. These confidence intervals are based on asymptotic normal approximations and can be computed fast, without needs to do bootstrap resampling. We study the finite sample coverage probabilities for these confidence intervals and also propose a `blurring correction' on the variance to improve the finite sample performance. This 'blurring correction' generalizes the plus-four method from binomial proportion to general performance measures used in data mining. Our framework allows multiple performance measures of multiple classification rules to be inferred simultaneously for comparisons.

replace-cross Monte Carlo Tree Diffusion for System 2 Planning

Authors: Jaesik Yoon, Hyeonseo Cho, Doojin Baek, Yoshua Bengio, Sungjin Ahn

Abstract: Diffusion models have recently emerged as a powerful tool for planning. However, unlike Monte Carlo Tree Search (MCTS)-whose performance naturally improves with inference-time computation scaling-standard diffusion-based planners offer only limited avenues for the scalability. In this paper, we introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of MCTS. Our method reconceptualizes denoising as a tree-structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined. By selectively expanding promising trajectories while retaining the flexibility to revisit and improve suboptimal branches, MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework. Empirical results on challenging long-horizon tasks show that MCTD outperforms diffusion baselines, yielding higher-quality solutions as inference-time computation increases.

replace-cross Advancing Precision Oncology Through Modeling of Longitudinal and Multimodal Data

Authors: Luoting Zhuang, Stephen H. Park, Steven J. Skates, Ashley E. Prosper, Denise R. Aberle, William Hsu

Abstract: Cancer evolves continuously over time through a complex interplay of genetic, epigenetic, microenvironmental, and phenotypic changes. This dynamic behavior drives uncontrolled cell growth, metastasis, immune evasion, and therapy resistance, posing challenges for effective monitoring and treatment. However, today's data-driven research in oncology has primarily focused on cross-sectional analysis using data from a single modality, limiting the ability to fully characterize and interpret the disease's dynamic heterogeneity. Advances in multiscale data collection and computational methods now enable the discovery of longitudinal multimodal biomarkers for precision oncology. Longitudinal data reveal patterns of disease progression and treatment response that are not evident from single-timepoint data, enabling timely abnormality detection and dynamic treatment adaptation. Multimodal data integration offers complementary information from diverse sources for more precise risk assessment and targeting of cancer therapy. In this review, we survey methods of longitudinal and multimodal modeling, highlighting their synergy in providing multifaceted insights for personalized care tailored to the unique characteristics of a patient's cancer. We summarize the current challenges and future directions of longitudinal multimodal analysis in advancing precision oncology.

replace-cross MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation

Authors: Sihyun Yu, Meera Hahn, Dan Kondratyuk, Jinwoo Shin, Agrim Gupta, Jos\'e Lezama, Irfan Essa, David Ross, Jonathan Huang

Abstract: Diffusion models are successful for synthesizing high-quality videos but are limited to generating short clips (e.g., 2-10 seconds). Synthesizing sustained footage (e.g. over minutes) still remains an open research question. In this paper, we propose MALT Diffusion (using Memory-Augmented Latent Transformers), a new diffusion model specialized for long video generation. MALT Diffusion (or just MALT) handles long videos by subdividing them into short segments and doing segment-level autoregressive generation. To achieve this, we first propose recurrent attention layers that encode multiple segments into a compact memory latent vector; by maintaining this memory vector over time, MALT is able to condition on it and continuously generate new footage based on a long temporal context. We also present several training techniques that enable the model to generate frames over a long horizon with consistent quality and minimal degradation. We validate the effectiveness of MALT through experiments on long video benchmarks. We first perform extensive analysis of MALT in long-contextual understanding capability and stability using popular long video benchmarks. For example, MALT achieves an FVD score of 220.4 on 128-frame video generation on UCF-101, outperforming the previous state-of-the-art of 648.4. Finally, we explore MALT's capabilities in a text-to-video generation setting and show that it can produce long videos compared with recent techniques for long text-to-video generation.

replace-cross A Concise Lyapunov Analysis of Nesterov's Accelerated Gradient Method

Authors: Jun Liu

Abstract: Convergence analysis of Nesterov's accelerated gradient method has attracted significant attention over the past decades. While extensive work has explored its theoretical properties and elucidated the intuition behind its acceleration, a simple and direct proof of its convergence rates is still lacking. We provide a concise Lyapunov analysis of the convergence rates of Nesterov's accelerated gradient method for both general convex and strongly convex functions.

replace-cross Integrating Biological and Machine Intelligence: Attention Mechanisms in Brain-Computer Interfaces

Authors: Jiyuan Wang, Weishan Ye, Jialin He, Li Zhang, Gan Huang, Zhuliang Yu, Zhen Liang

Abstract: With the rapid advancement of deep learning, attention mechanisms have become indispensable in electroencephalography (EEG) signal analysis, significantly enhancing Brain-Computer Interface (BCI) applications. This paper presents a comprehensive review of traditional and Transformer-based attention mechanisms, their embedding strategies, and their applications in EEG-based BCI, with a particular emphasis on multimodal data fusion. By capturing EEG variations across time, frequency, and spatial channels, attention mechanisms improve feature extraction, representation learning, and model robustness. These methods can be broadly categorized into traditional attention mechanisms, which typically integrate with convolutional and recurrent networks, and Transformer-based multi-head self-attention, which excels in capturing long-range dependencies. Beyond single-modality analysis, attention mechanisms also enhance multimodal EEG applications, facilitating effective fusion between EEG and other physiological or sensory data. Finally, we discuss existing challenges and emerging trends in attention-based EEG modeling, highlighting future directions for advancing BCI technology. This review aims to provide valuable insights for researchers seeking to leverage attention mechanisms for improved EEG interpretation and application.

replace-cross Hallucinatory Image Tokens: A Training-free EAZY Approach on Detecting and Mitigating Object Hallucinations in LVLMs

Authors: Liwei Che, Tony Qingze Liu, Jing Jia, Weiyi Qin, Ruixiang Tang, Vladimir Pavlovic

Abstract: Despite their remarkable potential, Large Vision-Language Models (LVLMs) still face challenges with object hallucination, a problem where their generated outputs mistakenly incorporate objects that do not actually exist. Although most works focus on addressing this issue within the language-model backbone, our work shifts the focus to the image input source, investigating how specific image tokens contribute to hallucinations. Our analysis reveals a striking finding: a small subset of image tokens with high attention scores are the primary drivers of object hallucination. By removing these hallucinatory image tokens (only 1.5% of all image tokens), the issue can be effectively mitigated. This finding holds consistently across different models and datasets. Building on this insight, we introduce EAZY, a novel, training-free method that automatically identifies and Eliminates hAllucinations by Zeroing out hallucinatorY image tokens. We utilize EAZY for unsupervised object hallucination detection, achieving 15% improvement compared to previous methods. Additionally, EAZY demonstrates remarkable effectiveness in mitigating hallucinations while preserving model utility and seamlessly adapting to various LVLM architectures.

replace-cross Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation

Authors: Yu Wang, Kamalika Das, Xiang Gao, Wendi Cui, Peng Li, Jiaxin Zhang

Abstract: In tasks like summarization and open-book question answering (QA), Large Language Models (LLMs) often encounter "contextual hallucination", where they produce irrelevant or incorrect responses despite having access to accurate source information. This typically occurs because these models tend to prioritize self-generated content over the input context, causing them to disregard pertinent details. To address this challenge, we introduce a novel method called "Guided Attention Map Editing" (GAME), which dynamically adjusts attention maps to improve contextual relevance. During inference, GAME employs a trained classifier to identify attention maps prone to inducing hallucinations and executes targeted interventions. These interventions, guided by gradient-informed "edit directions'', strategically redistribute attention weights across various heads to effectively reduce hallucination. Comprehensive evaluations on challenging summarization and open-book QA tasks show that GAME consistently reduces hallucinations across a variety of open-source models. Specifically, GAME reduces hallucinations by 10% in the XSum summarization task while achieving a 7X speed-up in computational efficiency compared to the state-of-the-art baselines.

replace-cross Have LLMs Made Active Learning Obsolete? Surveying the NLP Community

Authors: Julia Romberg, Christopher Schr\"oder, Julius Gonsior, Katrin Tomanek, Fredrik Olsson

Abstract: Supervised learning relies on annotated data, which is expensive to obtain. A longstanding strategy to reduce annotation costs is active learning, an iterative process, in which a human annotates only data instances deemed informative by a model. Large language models (LLMs) have pushed the effectiveness of active learning, while also advancing methods such as few- or zero-shot learning, and text synthesis -- all of which can reduce the need for active learning. This naturally raises the question: has active learning become obsolete? To answer this fully, we must look beyond literature to practical experiences. We conduct an online survey in the NLP community to collect previously intangible insights on the perceived relevance of data annotation, particularly focusing on active learning, including best practices, obstacles, and future prospects. Our findings show that annotated data is expected to remain a key factor and active learning to stay highly relevant while benefiting from LLMs. Consistent with a community survey from over a decade ago, however, we find that three key challenges persist -- setup complexity, risks in the cost reduction, and tooling -- for which we propose alleviation strategies. We publish an anonymized version of the collected dataset.

replace-cross Long Context Modeling with Ranked Memory-Augmented Retrieval

Authors: Ghadir Alselwi, Hao Xue, Shoaib Jameel, Basem Suleiman, Hakim Hacid, Flora D. Salim, Imran Razzak

Abstract: Effective long-term memory management is crucial for language models handling extended contexts. We introduce a novel framework that dynamically ranks memory entries based on relevance. Unlike previous works, our model introduces a novel relevance scoring and a pointwise re-ranking model for key-value embeddings, inspired by learning-to-rank techniques in information retrieval. Enhanced Ranked Memory Augmented Retrieval ERMAR achieves state-of-the-art results on standard benchmarks.

replace-cross Enhancing variational quantum algorithms by balancing training on classical and quantum hardware

Authors: Rahul Bhowmick, Harsh Wadhwa, Avinash Singh, Tania Sidana, Quoc Hoan Tran, Krishna Kumar Sabapathy

Abstract: Quantum computers offer a promising route to tackling problems that are classically intractable such as in prime-factorization, solving large-scale linear algebra and simulating complex quantum systems, but potentially require fault-tolerant quantum hardware. On the other hand, variational quantum algorithms (VQAs) are a promising approach for leveraging near-term quantum computers to solve complex problems. However, there remain major challenges in their trainability and resource costs on quantum hardware. Here we address these challenges by adopting Hardware Efficient and dynamical LIe algebra supported Ansatz (HELIA), and propose two training methods that combine an existing classical-enhanced g-sim method and the quantum-based Parameter-Shift Rule (PSR). Our improvement comes from distributing the resources required for gradient estimation and training to both classical and quantum hardware. We numerically evaluate our approach for ground-state estimation of 6 to 18-qubit Hamiltonians using the Variational Quantum Eigensolver (VQE) and quantum phase classification for up to 12-qubit Hamiltonians using quantum neural networks. For VQE, our method achieves higher accuracy and success rates, with an average reduction in quantum hardware calls of up to 60% compared to purely quantum-based PSR. For classification, we observe test accuracy improvements of up to 2.8%. We also numerically demonstrate the capability of HELIA in mitigating barren plateaus, paving the way for training large-scale quantum models.

replace-cross Decision Feedback In-Context Learning for Wireless Symbol Detection

Authors: Li Fan, Wei Shen, Jing Yang, Cong Shen

Abstract: Pre-trained Transformers, through in-context learning (ICL), have demonstrated exceptional capabilities to adapt to new tasks using example prompts without model update. Transformer-based wireless receivers, where prompts consist of the pilot data in the form of transmitted and received signal pairs, have shown high detection accuracy when pilot data are abundant. However, pilot information is often costly and limited in practice. In this work, we propose DEcision Feedback IN-ContExt Detection (DEFINED) as a new wireless receiver design, which bypasses channel estimation and directly performs symbol detection using the (sometimes extremely) limited pilot data. The key innovation in DEFINED is the proposed decision feedback mechanism in ICL, where we sequentially incorporate the detected symbols into the prompts as pseudo-labels to improve the detection for subsequent symbols. We further establish an error lower bound and provide theoretical insights into the model's generalization under channel distribution mismatch. Extensive experiments across a broad range of wireless settings demonstrate that a small Transformer trained with DEFINED achieves significant performance improvements over conventional methods, in some cases only needing a single pilot pair to achieve similar performance to the latter with more than 4 pilot pairs.

replace-cross Fast online node labeling with graph subsampling

Authors: Yushen Huang, Ertai Luo, Reza Babenezhad, Yifan Sun

Abstract: Large data applications rely on storing data in massive, sparse graphs with millions to trillions of nodes. Graph-based methods, such as node prediction, aim for computational efficiency regardless of graph size. Techniques like localized approximate personalized page rank (APPR) solve sparse linear systems with complexity independent of graph size, but is in terms of the maximum node degree, which can be much larger in practice than the average node degree for real-world large graphs. In this paper, we consider an \emph{online subsampled APPR method}, where messages are intentionally dropped at random. We use tools from graph sparsifiers and matrix linear algebra to give approximation bounds on the graph's spectral properties ($O(1/\epsilon^2)$ edges), and node classification performance (added $O(n\epsilon)$ overhead).

replace-cross Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models

Authors: Haichao Zhang, Yun Fu

Abstract: Token-based video representation has emerged as a promising approach for enabling large language models (LLMs) to interpret video content. However, existing token reduction techniques, such as pruning and merging, often disrupt essential positional embeddings and rely on continuous visual tokens sampled from nearby pixels with similar spatial-temporal locations. By removing only a small fraction of tokens, these methods still produce relatively lengthy continuous sequences, which falls short of the extreme compression required to balance computational efficiency and token count in video LLMs. In this paper, we introduce the novel task of Extreme Short Token Reduction, which aims to represent entire videos using a minimal set of discrete tokens. We propose VQToken, a neural discrete token representation framework that (i) applies adaptive vector quantization to continuous ViT embeddings to learn a compact codebook and (ii) preserves spatial-temporal positions via a token hash function by assigning each grid-level token to its nearest codebook entry. On the Extreme Short Token Reduction task, our VQToken compresses sequences to just 0.07 percent of their original length while incurring only a 0.66 percent drop in accuracy on the NextQA-MC benchmark. It also achieves comparable performance on ActNet-QA, Long Video Bench, and VideoMME. We further introduce the Token Information Density (TokDense) metric and formalize fixed-length and adaptive-length subtasks, achieving state-of-the-art results in both settings. Our approach dramatically lowers theoretical complexity, increases information density, drastically reduces token counts, and enables efficient video LLMs in resource-constrained environments.

replace-cross Quantum Doeblin Coefficients: Interpretations and Applications

Authors: Ian George, Christoph Hirche, Theshani Nuradha, Mark M. Wilde

Abstract: In classical information theory, the Doeblin coefficient of a classical channel provides an efficiently computable upper bound on the total-variation contraction coefficient of the channel, leading to what is known as a strong data-processing inequality. Here, we investigate quantum Doeblin coefficients as a generalization of the classical concept. In particular, we define various new quantum Doeblin coefficients, one of which has several desirable properties, including concatenation and multiplicativity, in addition to being efficiently computable. We also develop various interpretations of two of the quantum Doeblin coefficients, including representations as minimal singlet fractions, exclusion values, reverse max-mutual and oveloH informations, reverse robustnesses, and hypothesis testing reverse mutual and oveloH informations. Our interpretations of quantum Doeblin coefficients as either entanglement-assisted or unassisted exclusion values are particularly appealing, indicating that they are proportional to the best possible error probabilities one could achieve in state-exclusion tasks by making use of the channel. We also outline various applications of quantum Doeblin coefficients, ranging from limitations on quantum machine learning algorithms that use parameterized quantum circuits (noise-induced barren plateaus), on error mitigation protocols, on the sample complexity of noisy quantum hypothesis testing, and on mixing, distinguishability, and decoupling times of time-varying channels. All of these applications make use of the fact that quantum Doeblin coefficients appear in upper bounds on various trace-distance contraction coefficients of a channel. Furthermore, in all of these applications, our analysis using Doeblin coefficients provides improvements of various kinds over contributions from prior literature, both in terms of generality and being efficiently computable.

replace-cross Improving Predictions of Convective Storm Wind Gusts through Statistical Post-Processing of Neural Weather Models

Authors: Antoine Leclerc, Erwan Koch, Monika Feldmann, Daniele Nerini, Tom Beucler

Abstract: Issuing timely severe weather warnings helps mitigate potentially disastrous consequences. Recent advancements in Neural Weather Models (NWMs) offer a computationally inexpensive and fast approach for forecasting atmospheric environments on a 0.25{\deg} global grid. For thunderstorms, these environments can be empirically post-processed to predict wind gust distributions at specific locations. With the Pangu-Weather NWM, we apply a hierarchy of statistical and deep learning post-processing methods to forecast hourly wind gusts up to three days ahead. To ensure statistical robustness, we constrain our probabilistic forecasts using generalised extreme-value distributions across five regions in Switzerland. Using a convolutional neural network to post-process the predicted atmospheric environment's spatial patterns yields the best results, outperforming direct forecasting approaches across lead times and wind gust speeds. Our results confirm the added value of NWMs for extreme wind forecasting, especially for designing more responsive early-warning systems.

replace-cross Diffusion Factor Models: Generating High-Dimensional Returns with Factor Structure

Authors: Minshuo Chen, Renyuan Xu, Yumin Xu, Ruixun Zhang

Abstract: Financial scenario simulation is essential for risk management and portfolio optimization, yet it remains challenging especially in high-dimensional and small data settings common in finance. We propose a diffusion factor model that integrates latent factor structure into generative diffusion processes, bridging econometrics with modern generative AI to address the challenges of the curse of dimensionality and data scarcity in financial simulation. By exploiting the low-dimensional factor structure inherent in asset returns, we decompose the score function--a key component in diffusion models--using time-varying orthogonal projections, and this decomposition is incorporated into the design of neural network architectures. We derive rigorous statistical guarantees, establishing nonasymptotic error bounds for both score estimation at O(d^{5/2} n^{-2/(k+5)}) and generated distribution at O(d^{5/4} n^{-1/2(k+5)}), primarily driven by the intrinsic factor dimension k rather than the number of assets d, surpassing the dimension-dependent limits in the classical nonparametric statistics literature and making the framework viable for markets with thousands of assets. Numerical studies confirm superior performance in latent subspace recovery under small data regimes. Empirical analysis demonstrates the economic significance of our framework in constructing mean-variance optimal portfolios and factor portfolios. This work presents the first theoretical integration of factor structure with diffusion models, offering a principled approach for high-dimensional financial simulation with limited data. Our code is available at https://github.com/xymmmm00/diffusion_factor_model.

URLs: https://github.com/xymmmm00/diffusion_factor_model.

replace-cross From Prompting to Alignment: A Generative Framework for Query Recommendation

Authors: Erxue Min, Hsiu-Yuan Huang, Xihong Yang, Min Yang, Xin Jia, Yunfang Wu, Hengyi Cai, Junfeng Wang, Shuaiqiang Wang, Dawei Yin

Abstract: In modern search systems, search engines often suggest relevant queries to users through various panels or components, helping refine their information needs. Traditionally, these recommendations heavily rely on historical search logs to build models, which suffer from cold-start or long-tail issues. Furthermore, tasks such as query suggestion, completion or clarification are studied separately by specific design, which lacks generalizability and hinders adaptation to novel applications. Despite recent attempts to explore the use of LLMs for query recommendation, these methods mainly rely on the inherent knowledge of LLMs or external sources like few-shot examples, retrieved documents, or knowledge bases, neglecting the importance of the calibration and alignment with user feedback, thus limiting their practical utility. To address these challenges, we first propose a general Generative Query Recommendation (GQR) framework that aligns LLM-based query generation with user preference. Specifically, we unify diverse query recommendation tasks by a universal prompt framework, leveraging the instruct-following capability of LLMs for effective generation. Secondly, we align LLMs with user feedback via presenting a CTR-alignment framework, which involves training a query-wise CTR predictor as a process reward model and employing list-wise preference alignment to maximize the click probability of the generated query list. Furthermore, recognizing the inconsistency between LLM knowledge and proactive search intents arising from the separation of user-initiated queries from models, we align LLMs with user initiative via retrieving co-occurrence queries as side information when historical logs are available.

replace-cross Learned enclosure method for experimental EIT data

Authors: Sara Sippola, Siiri Rautio, Andreas Hauptmann, Takanori Ide, Samuli Siltanen

Abstract: Electrical impedance tomography (EIT) is a non-invasive imaging method with diverse applications, including medical imaging and non-destructive testing. The inverse problem of reconstructing internal electrical conductivity from boundary measurements is nonlinear and highly ill-posed, making it difficult to solve accurately. In recent years, there has been growing interest in combining analytical methods with machine learning to solve inverse problems. In this paper, we propose a method for estimating the convex hull of inclusions from boundary measurements by combining the enclosure method proposed by Ikehata with neural networks. We demonstrate its performance using experimental data. Compared to the classical enclosure method with least squares fitting, the learned convex hull achieves superior performance on both simulated and experimental data.

replace-cross EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models

Authors: Ziwen Xu, Shuxun Wang, Kewei Xu, Haoming Xu, Mengru Wang, Xinle Deng, Yunzhi Yao, Guozhou Zheng, Huajun Chen, Ningyu Zhang

Abstract: In this paper, we introduce EasyEdit2, a framework designed to enable plug-and-play adjustability for controlling Large Language Model (LLM) behaviors. EasyEdit2 supports a wide range of test-time interventions, including safety, sentiment, personality, reasoning patterns, factuality, and language features. Unlike its predecessor, EasyEdit2 features a new architecture specifically designed for seamless model steering. It comprises key modules such as the steering vector generator and the steering vector applier, which enable automatic generation and application of steering vectors to influence the model's behavior without modifying its parameters. One of the main advantages of EasyEdit2 is its ease of use-users do not need extensive technical knowledge. With just a single example, they can effectively guide and adjust the model's responses, making precise control both accessible and efficient. Empirically, we report model steering performance across different LLMs, demonstrating the effectiveness of these techniques. We have released the source code on GitHub at https://github.com/zjunlp/EasyEdit along with a demonstration notebook. In addition, we provide a demo video at https://www.youtube.com/watch?v=AkfoiPfp5rQ for a quick introduction.

URLs: https://github.com/zjunlp/EasyEdit, https://www.youtube.com/watch?v=AkfoiPfp5rQ

replace-cross A Statistical Approach for Synthetic EEG Data Generation

Authors: Gideon Vos, Maryam Ebrahimpour, Liza van Eijk, Zoltan Sarnyai, Mostafa Rahimi Azghadi

Abstract: Electroencephalogram (EEG) data is crucial for diagnosing mental health conditions but is costly and time-consuming to collect at scale. Synthetic data generation offers a promising solution to augment datasets for machine learning applications. However, generating high-quality synthetic EEG that preserves emotional and mental health signals remains challenging. This study proposes a method combining correlation analysis and random sampling to generate realistic synthetic EEG data. We first analyze interdependencies between EEG frequency bands using correlation analysis. Guided by this structure, we generate synthetic samples via random sampling. Samples with high correlation to real data are retained and evaluated through distribution analysis and classification tasks. A Random Forest model trained to distinguish synthetic from real EEG performs at chance level, indicating high fidelity. The generated synthetic data closely match the statistical and structural properties of the original EEG, with similar correlation coefficients and no significant differences in PERMANOVA tests. This method provides a scalable, privacy-preserving approach for augmenting EEG datasets, enabling more efficient model training in mental health research.

replace-cross Scalable Multi-Task Learning for Particle Collision Event Reconstruction with Heterogeneous Graph Neural Networks

Authors: William Sutcliffe, Marta Calvi, Simone Capelli, Jonas Eschle, Juli\'an Garc\'ia Pardi\~nas, Abhijit Mathad, Azusa Uzuki, Nicola Serra

Abstract: The growing luminosity frontier at the Large Hadron Collider is challenging the reconstruction and analysis of particle collision events. Increased particle multiplicities are straining latency and storage requirements at the data acquisition stage, while new complications are emerging, including higher background levels and more frequent particle vertex misassociations. This in turn necessitates the development of more holistic and scalable reconstruction methods that take advantage of recent advances in machine learning. We propose a novel Heterogeneous Graph Neural Network (HGNN) architecture featuring unique representations for diverse particle collision relationships and integrated graph pruning layers for scalability. Trained with a multi-task paradigm in an environment mimicking the LHCb experiment, this HGNN significantly improves beauty hadron reconstruction performance. Notably, it concurrently performs particle vertex association and graph pruning within a single framework. We quantify reconstruction and pruning performance, demonstrate enhanced inference time scaling with event complexity, and mitigate potential performance loss using a weighted message passing scheme.

replace-cross Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models

Authors: Aishwarya Venkataramanan, Paul Bodesheim, Joachim Denzler

Abstract: Vision-Language Models (VLMs) learn joint representations by mapping images and text into a shared latent space. However, recent research highlights that deterministic embeddings from standard VLMs often struggle to capture the uncertainties arising from the ambiguities in visual and textual descriptions and the multiple possible correspondences between images and texts. Existing approaches tackle this by learning probabilistic embeddings during VLM training, which demands large datasets and does not leverage the powerful representations already learned by large-scale VLMs like CLIP. In this paper, we propose GroVE, a post-hoc approach to obtaining probabilistic embeddings from frozen VLMs. GroVE builds on Gaussian Process Latent Variable Model (GPLVM) to learn a shared low-dimensional latent space where image and text inputs are mapped to a unified representation, optimized through single-modal embedding reconstruction and cross-modal alignment objectives. Once trained, the Gaussian Process model generates uncertainty-aware probabilistic embeddings. Evaluation shows that GroVE achieves state-of-the-art uncertainty calibration across multiple downstream tasks, including cross-modal retrieval, visual question answering, and active learning.

replace-cross Role of scrambling and noise in temporal information processing with quantum systems

Authors: Weijie Xiong, Zo\"e Holmes, Armando Angrisani, Yudai Suzuki, Thiparat Chotibut, Supanut Thanasilp

Abstract: Scrambling quantum systems have attracted attention as effective substrates for temporal information processing. Here we consider a quantum reservoir processing framework that captures a broad range of physical computing models with quantum systems. We examine the scalability and memory retention of the model with scrambling reservoirs modelled by high-order unitary designs in both noiseless and noisy settings. In the former regime, we show that measurement readouts become exponentially concentrated with increasing reservoir size, yet strikingly do not worsen with the reservoir iterations. Thus, while repeatedly reusing a small scrambling reservoir with quantum data might be viable, scaling up the problem size deteriorates generalization unless one can afford an exponential shot overhead. In contrast, the memory of early inputs and initial states decays exponentially in both reservoir size and reservoir iterations. In the noisy regime, we also prove that memory decays exponentially in time for local noisy channels. These results required us to introduce new proof techniques for bounding concentration in temporal quantum models.

replace-cross Learning collision risk proactively from naturalistic driving data at scale

Authors: Yiru Jiao, Simeon C. Calvert, Sander van Cranenburgh, Hans van Lint

Abstract: Accurately and proactively alerting drivers or automated systems to emerging collisions is crucial for road safety, particularly in highly interactive and complex urban environments. However, existing approaches to identifying potential collisions either require labour-intensive annotation of sparse risk, struggle to consider varying contextual factors, or are only useful in specific scenarios. To address these limits, this study introduces the Generalised Surrogate Safety Measure (GSSM), a new data-driven approach that learns collision risk exclusively from naturalistic driving without the need for crash or risk labels. GSSM captures the patterns of normal driving and estimates the extent to which a traffic interaction deviates from the norm towards an unsafe state. Diverse data from naturalistic driving, including motion kinematics, weather, lighting, etc., are used to train multiple GSSMs, which are tested with 2,591 reconstructed real-world crashes and near-crashes. These test events are also released here as the largest dataset of its kind to date. A basic GSSM using only instantaneous motion kinematics achieves an area under the precision-recall curve of 0.9 and secures a median time advance of 2.6 seconds to prevent potential collisions. Additional interaction patterns and contextual factors provide further performance gains. Across various types of collision risk scenarios (such as rear-end, merging, and turning interactions), the accuracy and timeliness of GSSM consistently outperforms existing baselines. GSSM therefore establishes a scalable, context-aware, and generalisable foundation for proactively quantifying collision risk in traffic interactions. This can support and facilitate autonomous driving systems, traffic safety assessment, and road emergency management. Code and experiment data are openly accessible at https://github.com/Yiru-Jiao/GSSM.

URLs: https://github.com/Yiru-Jiao/GSSM.

replace-cross RewardAnything: Generalizable Principle-Following Reward Models

Authors: Zhuohao Yu, Jiali Zeng, Weizheng Gu, Yidong Wang, Jindong Wang, Fandong Meng, Jie Zhou, Yue Zhang, Shikun Zhang, Wei Ye

Abstract: Reward Models, essential for guiding Large Language Model optimization, are typically trained on fixed preference datasets, resulting in rigid alignment to single, implicit preference distributions. This prevents adaptation to diverse real-world needs-from conciseness in one task to detailed explanations in another. The standard practice of collecting task-specific preference data and retraining reward models is resource-intensive, often producing biased rewards, and limits practical application. We introduce generalizable, principle-following reward models. We propose that RMs should understand and adhere to dynamically provided natural language specifications of reward principles, similar to instruction-following in LLMs. To measure this capability, we develop RABench, a comprehensive benchmark for RMs focusing on generalization across diverse principles. Evaluations on RABench reveal poor generalization of current RMs. As a solution, we present RewardAnything, a novel RM designed and trained to explicitly follow natural language principles. We achieve SotA performance with RewardAnything in traditional RM benchmark simply by specifying a well-defined principle, and results on RABench show we excel in adapting to novel principles without retraining. Furthermore, RewardAnything integrates seamlessly with existing RLHF methods and we show by a case study on how to automatically and efficiently align LLMs with only natural language principles.

replace-cross High-Dimensional Learning in Finance

Authors: Hasan Fallahgoul

Abstract: Recent advances in machine learning have shown promising results for financial prediction using large, over-parameterized models. This paper provides theoretical foundations and empirical validation for understanding when and how these methods achieve predictive success. I examine two key aspects of high-dimensional learning in finance. First, I prove that within-sample standardization in Random Fourier Features implementations fundamentally alters the underlying Gaussian kernel approximation, replacing shift-invariant kernels with training-set dependent alternatives. Second, I establish information-theoretic lower bounds that identify when reliable learning is impossible no matter how sophisticated the estimator. A detailed quantitative calibration of the polynomial lower bound shows that with typical parameter choices, e.g., 12,000 features, 12 monthly observations, and R-square 2-3%, the required sample size to escape the bound exceeds 25-30 years of data--well beyond any rolling-window actually used. Thus, observed out-of-sample success must originate from lower-complexity artefacts rather than from the intended high-dimensional mechanism.

replace-cross SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL

Authors: Jiaheng Hu, Peter Stone, Roberto Mart\'in-Mart\'in

Abstract: Building capable household and industrial robots requires mastering the control of versatile, high-degree-of-freedom (DoF) systems such as mobile manipulators. While reinforcement learning (RL) holds promise for autonomously acquiring robot control policies, scaling it to high-DoF embodiments remains challenging. Direct RL in the real world demands both safe exploration and high sample efficiency, which are difficult to achieve in practice. Sim-to-real RL, on the other hand, is often brittle due to the reality gap. This paper introduces SLAC, a method that renders real-world RL feasible for complex embodiments by leveraging a low-fidelity simulator to pretrain a task-agnostic latent action space. SLAC trains this latent action space via a customized unsupervised skill discovery method designed to promote temporal abstraction, disentanglement, and safety, thereby facilitating efficient downstream learning. Once a latent action space is learned, SLAC uses it as the action interface for a novel off-policy RL algorithm to autonomously learn downstream tasks through real-world interactions. We evaluate SLAC against existing methods on a suite of bimanual mobile manipulation tasks, where it achieves state-of-the-art performance. Notably, SLAC learns contact-rich whole-body tasks in under an hour of real-world interactions, without relying on any demonstrations or hand-crafted behavior priors. More information and robot videos at robo-rl.github.io

replace-cross Enhancing Neural Autoregressive Distribution Estimators for Image Reconstruction

Authors: Ambrose Emmett-Iwaniw, Nathan Kirk

Abstract: Autoregressive models are often employed to learn distributions of image data by decomposing the $D$-dimensional density function into a product of one-dimensional conditional distributions. Each conditional depends on preceding variables (pixels, in the case of image data), making the order in which variables are processed fundamental to the model performance. In this paper, we study the problem of observing a small subset of image pixels (referred to as a pixel patch) to predict the unobserved parts of the image. As our prediction mechanism, we propose a generalized version of the convolutional neural autoregressive distribution estimation (ConvNADE) model adapted for real-valued and color images. Moreover, we investigate the quality of image reconstruction when observing both random pixel patches and low-discrepancy pixel patches inspired by quasi-Monte Carlo theory. Experiments on benchmark datasets demonstrate that, where design permits, pixels sampled or stored to preserve uniform coverage improves reconstruction fidelity and test performance.

replace-cross Training-Free Query Optimization via LLM-Based Plan Similarity

Authors: Nikita Vasilenko, Alexander Demin, Vladimir Boorlakov

Abstract: Large language model (LLM) embeddings offer a promising new avenue for database query optimization. In this paper, we explore how pre-trained execution plan embeddings can guide SQL query execution without the need for additional model training. We introduce LLM-PM (LLM-based Plan Mapping), a framework that embeds the default execution plan of a query, finds its k nearest neighbors among previously executed plans, and recommends database hintsets based on neighborhood voting. A lightweight consistency check validates the selected hint, while a fallback mechanism searches the full hint space when needed. Evaluated on the JOB-CEB benchmark using OpenGauss, LLM-PM achieves an average speed-up of 21% query latency reduction. This work highlights the potential of LLM-powered embeddings to deliver practical improvements in query performance and opens new directions for training-free, embedding-based optimizer guidance systems.

replace-cross Making a Pipeline Production-Ready: Challenges and Lessons Learned in the Healthcare Domain

Authors: Daniel Angelo Esteves Lawand (University of S\~ao Paulo), Lucas Quaresma Medina Lam (University of S\~ao Paulo), Roberto Oliveira Bolgheroni (University of S\~ao Paulo), Renato Cordeiro Ferreira (University of S\~ao Paulo, Jheronimus Academy of Data Science, Technical University of Eindhoven, Tilburg University), Alfredo Goldman (University of S\~ao Paulo), Marcelo Finger (University of S\~ao Paulo)

Abstract: Deploying a Machine Learning (ML) training pipeline into production requires good software engineering practices. Unfortunately, the typical data science workflow often leads to code that lacks critical software quality attributes. This experience report investigates this problem in SPIRA, a project whose goal is to create an ML-Enabled System (MLES) to pre-diagnose insufficiency respiratory via speech analysis. This paper presents an overview of the architecture of the MLES, then compares three versions of its Continuous Training subsystem: from a proof of concept Big Ball of Mud (v1), to a design pattern-based Modular Monolith (v2), to a test-driven set of Microservices (v3) Each version improved its overall extensibility, maintainability, robustness, and resiliency. The paper shares challenges and lessons learned in this process, offering insights for researchers and practitioners seeking to productionize their pipelines.

replace-cross Predicting function of evolutionarily implausible DNA sequences

Authors: Shiyu Jiang, Xuyin Liu, Zitong Jerry Wang

Abstract: Genomic language models (gLMs) show potential for generating novel, functional DNA sequences for synthetic biology, but doing so requires them to learn not just evolutionary plausibility, but also sequence-to-function relationships. We introduce a set of prediction tasks called Nullsettes, which assesses a model's ability to predict loss-of-function mutations created by translocating key control elements in synthetic expression cassettes. Across 12 state-of-the-art models, we find that mutation effect prediction performance strongly correlates with the predicted likelihood of the nonmutant. Furthermore, the range of likelihood values predictive of strong model performance is highly dependent on sequence length. Our work highlights the importance of considering both sequence likelihood and sequence length when using gLMs for mutation effect prediction.

replace-cross Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers

Authors: Yixiao Huang, Hanlin Zhu, Tianyu Guo, Jiantao Jiao, Somayeh Sojoudi, Michael I. Jordan, Stuart Russell, Song Mei

Abstract: Large language models (LLMs) can acquire new knowledge through fine-tuning, but this process exhibits a puzzling duality: models can generalize remarkably from new facts, yet are also prone to hallucinating incorrect information. However, the reasons for this phenomenon remain poorly understood. In this work, we argue that both behaviors stem from a single mechanism known as out-of-context reasoning (OCR): the ability to deduce implications by associating concepts, even those without a causal link. Our experiments across five prominent LLMs confirm that OCR indeed drives both generalization and hallucination, depending on whether the associated concepts are causally related. To build a rigorous theoretical understanding of this phenomenon, we then formalize OCR as a synthetic factual recall task. We empirically show that a one-layer single-head attention-only transformer with factorized output and value matrices can learn to solve this task, while a model with combined weights cannot, highlighting the crucial role of matrix factorization. Our theoretical analysis shows that the OCR capability can be attributed to the implicit bias of gradient descent, which favors solutions that minimize the nuclear norm of the combined output-value matrix. This mathematical structure explains why the model learns to associate facts and implications with high sample efficiency, regardless of whether the correlation is causal or merely spurious. Ultimately, our work provides a theoretical foundation for understanding the OCR phenomenon, offering a new lens for analyzing and mitigating undesirable behaviors from knowledge injection.

replace-cross AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

Authors: Yixin Ou, Yujie Luo, Jingsheng Zheng, Lanning Wei, Shuofei Qiao, Jintian Zhang, Da Zheng, Huajun Chen, Ningyu Zhang

Abstract: Large Language Model (LLM) agents have shown great potential in addressing real-world data science problems. LLM-driven data science agents promise to automate the entire machine learning pipeline, yet their real-world effectiveness remains limited. Existing frameworks depend on rigid, pre-defined workflows and inflexible coding strategies; consequently, they excel only on relatively simple, classical problems and fail to capture the empirical expertise that human practitioners bring to complex, innovative tasks. In this work, we introduce AutoMind, an adaptive, knowledgeable LLM-agent framework that overcomes these deficiencies through three key advances: (1) a curated expert knowledge base that grounds the agent in domain expert knowledge, (2) an agentic knowledgeable tree search algorithm that strategically explores possible solutions, and (3) a self-adaptive coding strategy that dynamically tailors code generation to task complexity. Evaluations on two automated data science benchmarks demonstrate that AutoMind delivers superior performance versus state-of-the-art baselines. Additional analyses confirm favorable effectiveness, efficiency, and qualitative solution quality, highlighting AutoMind as an efficient and robust step toward fully automated data science.

replace-cross Adversarial Disentanglement by Backpropagation with Physics-Informed Variational Autoencoder

Authors: Ioannis Christoforos Koune, Alice Cicirello

Abstract: Inference and prediction under partial knowledge of a physical system is challenging, particularly when multiple confounding sources influence the measured response. Explicitly accounting for these influences in physics-based models is often infeasible due to epistemic uncertainty, cost, or time constraints, resulting in models that fail to accurately describe the behavior of the system. On the other hand, data-driven machine learning models such as variational autoencoders are not guaranteed to identify a parsimonious representation. As a result, they can suffer from poor generalization performance and reconstruction accuracy in the regime of limited and noisy data. We propose a physics-informed variational autoencoder architecture that combines the interpretability of physics-based models with the flexibility of data-driven models. To promote disentanglement of the known physics and confounding influences, the latent space is partitioned into physically meaningful variables that parametrize a physics-based model, and data-driven variables that capture variability in the domain and class of the physical system. The encoder is coupled with a decoder that integrates physics-based and data-driven components, and constrained by an adversarial training objective that prevents the data-driven components from overriding the known physics, ensuring that the physics-grounded latent variables remain interpretable. We demonstrate that the model is able to disentangle features of the input signal and separate the known physics from confounding influences using supervision in the form of class and domain observables. The model is evaluated on a series of synthetic case studies relevant to engineering structures, demonstrating the feasibility of the proposed approach.

replace-cross Intelligent Assistants for the Semiconductor Failure Analysis with LLM-Based Planning Agents

Authors: Aline Dobrovsky, Konstantin Schekotihin, Christian Burmer

Abstract: Failure Analysis (FA) is a highly intricate and knowledge-intensive process. The integration of AI components within the computational infrastructure of FA labs has the potential to automate a variety of tasks, including the detection of non-conformities in images, the retrieval of analogous cases from diverse data sources, and the generation of reports from annotated images. However, as the number of deployed AI models increases, the challenge lies in orchestrating these components into cohesive and efficient workflows that seamlessly integrate with the FA process. This paper investigates the design and implementation of a Large Language Model (LLM)-based Planning Agent (LPA) to assist FA engineers in solving their analysis cases. The LPA integrates LLMs with advanced planning capabilities and external tool utilization, enabling autonomous processing of complex queries, retrieval of relevant data from external systems, and generation of human-readable responses. Evaluation results demonstrate the agent's operational effectiveness and reliability in supporting FA tasks.

replace-cross Human2LocoMan: Learning Versatile Quadrupedal Manipulation with Human Pretraining

Authors: Yaru Niu, Yunzhe Zhang, Mingyang Yu, Changyi Lin, Chenhao Li, Yikai Wang, Yuxiang Yang, Wenhao Yu, Tingnan Zhang, Zhenzhen Li, Jonathan Francis, Bingqing Chen, Jie Tan, Ding Zhao

Abstract: Quadrupedal robots have demonstrated impressive locomotion capabilities in complex environments, but equipping them with autonomous versatile manipulation skills in a scalable way remains a significant challenge. In this work, we introduce a cross-embodiment imitation learning system for quadrupedal manipulation, leveraging data collected from both humans and LocoMan, a quadruped equipped with multiple manipulation modes. Specifically, we develop a teleoperation and data collection pipeline, which unifies and modularizes the observation and action spaces of the human and the robot. To effectively leverage the collected data, we propose an efficient modularized architecture that supports co-training and pretraining on structured modality-aligned data across different embodiments. Additionally, we construct the first manipulation dataset for the LocoMan robot, covering various household tasks in both unimanual and bimanual modes, supplemented by a corresponding human dataset. We validate our system on six real-world manipulation tasks, where it achieves an average success rate improvement of 41.9% overall and 79.7% under out-of-distribution (OOD) settings compared to the baseline. Pretraining with human data contributes a 38.6% success rate improvement overall and 82.7% under OOD settings, enabling consistently better performance with only half the amount of robot data. Our code, hardware, and data are open-sourced at: https://human2bots.github.io.

URLs: https://human2bots.github.io.

replace-cross PEVLM: Parallel Encoding for Vision-Language Models

Authors: Letian Kang, Shixian Luo, Yiqiang Li, Xiaoyang Yu, Shenxuan Zhou, Yong Wu

Abstract: Vision-Language Models (VLMs) have demonstrated strong capabilities in multimodal understanding and generation tasks. However, their application to long video understanding remains hindered by the quadratic complexity of standard attention mechanisms. In this work, we introduce \textbf{PEVLM}, a fine-tuning-free parallel encoding method designed to enhance the prefilling efficiency of VLMs in long video scenarios. PEVLM partitions the input video into context blocks with a shared sink block, while preserving sequential position embeddings to align the attention weight distribution with that of Full-Attention. This design reduces attention complexity from $O((T \times N)^2)$ to $O(T \times N)$ where $T$ is the number of frames and $N$ the number of tokens per frame, without sacrificing accuracy. Extensive experiments across multiple state-of-the-art models and benchmarks demonstrate that PEVLM consistently outperforms existing parallel encoding approaches, achieving up to \textbf{7.47x} speedup in attention computation and reducing end-to-end latency by \textbf{40\%}. Remarkably, PEVLM not only maintains high accuracy, but in some settings even surpasses Full-Attention performance. Under strict latency constraints, it achieves substantial gains, improving accuracy from \textbf{23.26\%} to \textbf{61.03\%}. These results underscore the effectiveness of PEVLM for low-latency, long-context video understanding, making it a promising solution for real-world applications.

replace-cross Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

Authors: Yuqi Zhu, Yi Zhong, Jintian Zhang, Ziheng Zhang, Shuofei Qiao, Yujie Luo, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang

Abstract: Large Language Models (LLMs) hold promise in automating data analysis tasks, yet open-source models face significant limitations in these kinds of reasoning-intensive scenarios. In this work, we investigate strategies to enhance the data analysis capabilities of open-source LLMs. By curating a seed dataset of diverse, realistic scenarios, we evaluate models across three dimensions: data understanding, code generation, and strategic planning. Our analysis reveals three key findings: (1) Strategic planning quality serves as the primary determinant of model performance; (2) Interaction design and task complexity significantly influence reasoning capabilities; (3) Data quality demonstrates a greater impact than diversity in achieving optimal performance. We leverage these insights to develop a data synthesis methodology, demonstrating significant improvements in open-source LLMs' analytical reasoning capabilities.

replace-cross KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality

Authors: Baochang Ren, Shuofei Qiao, Wenhao Yu, Huajun Chen, Ningyu Zhang

Abstract: Large Language Models (LLMs), particularly slow-thinking models, often exhibit severe hallucination, outputting incorrect content due to an inability to accurately recognize knowledge boundaries during reasoning. While Reinforcement Learning (RL) can enhance complex reasoning abilities, its outcome-oriented reward mechanism often lacks factual supervision over the thinking process, further exacerbating the hallucination problem. To address the high hallucination in slow-thinking models, we propose Knowledge-enhanced RL, KnowRL. KnowRL guides models to perform fact-based slow thinking by integrating a factuality reward, based on knowledge verification, into the RL training process, helping them recognize their knowledge boundaries. KnowRL guides models to perform fact-based slow thinking by integrating a factuality reward, based on knowledge verification, into the RL training process, helping them recognize their knowledge boundaries. This targeted factual input during RL training enables the model to learn and internalize fact-based reasoning strategies. By directly rewarding adherence to facts within the reasoning steps, KnowRL fosters a more reliable thinking process. Experimental results on three hallucination evaluation datasets and two reasoning evaluation datasets demonstrate that KnowRL effectively mitigates hallucinations in slow-thinking models while maintaining their original strong reasoning capabilities. Our code is available at https://github.com/zjunlp/KnowRL.

URLs: https://github.com/zjunlp/KnowRL.

replace-cross EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora

Authors: Fangyuan Zhang, Zhengjun Huang, Yingli Zhou, Qintian Guo, Zhixun Li, Wensheng Luo, Di Jiang, Yixiang Fang, Xiaofang Zhou

Abstract: Graph-based Retrieval-Augmented Generation (Graph-RAG) enhances large language models (LLMs) by structuring retrieval over an external corpus. However, existing approaches typically assume a static corpus, requiring expensive full-graph reconstruction whenever new documents arrive, limiting their scalability in dynamic, evolving environments. To address these limitations, we introduce EraRAG, a novel multi-layered Graph-RAG framework that supports efficient and scalable dynamic updates. Our method leverages hyperplane-based Locality-Sensitive Hashing (LSH) to partition and organize the original corpus into hierarchical graph structures, enabling efficient and localized insertions of new data without disrupting the existing topology. The design eliminates the need for retraining or costly recomputation while preserving high retrieval accuracy and low latency. Experiments on large-scale benchmarks demonstrate that EraRag achieves up to an order of magnitude reduction in update time and token consumption compared to existing Graph-RAG systems, while providing superior accuracy performance. This work offers a practical path forward for RAG systems that must operate over continually growing corpora, bridging the gap between retrieval efficiency and adaptability. Our code and data are available at https://github.com/EverM0re/EraRAG-Official.

URLs: https://github.com/EverM0re/EraRAG-Official.

replace-cross Neural models of multiscale systems: conceptual limitations, stochastic parametrizations, and a climate application

Authors: Fabrizio Falasca

Abstract: This work explores key conceptual limitations in data-driven modeling of multiscale dynamical systems, focusing on neural emulators and stochastic climate modeling. A skillful climate model should capture both stationary statistics and responses to external perturbations. While current autoregressive neural models often reproduce the former, they typically struggle with the latter. We begin by analyzing a low-dimensional dynamical system to expose, by analogy, fundamental limitations that persist in high-dimensional settings. Specifically, we construct neural stochastic models under two scenarios: one where the full state vector is observed, and another with only partial observations (i.e. a subset of variables). In the first case, the models accurately capture both equilibrium statistics and forced responses in ensemble mean and variance. In the more realistic case of partial observations, two key challenges emerge: (i) identifying the \textit{proper} variables to model, and (ii) parameterizing the influence of unobserved degrees of freedom. These issues are not specific to neural networks but reflect fundamental limitations of data-driven modeling and the need to target the slow dynamics of the system. We argue that physically grounded strategies -- such as coarse-graining and stochastic parameterizations -- are critical, both conceptually and practically, for the skillful emulation of complex systems like the coupled climate system. Building on these insights, we turn to a more realistic application: a stochastic reduced neural model of the sea surface temperature field and the net radiative flux at the top of the atmosphere, assessing its stationary statistics, response to temperature forcing, and interpretability.

replace-cross CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation

Authors: Xinlei Yu, Changmiao Wang, Hui Jin, Ahmed Elazab, Gangyong Jia, Xiang Wan, Changqing Zou, Ruiquan Ge

Abstract: Multi-organ medical segmentation is a crucial component of medical image processing, essential for doctors to make accurate diagnoses and develop effective treatment plans. Despite significant progress in this field, current multi-organ segmentation models often suffer from inaccurate details, dependence on geometric prompts and loss of spatial information. Addressing these challenges, we introduce a novel model named CRISP-SAM2 with CRoss-modal Interaction and Semantic Prompting based on SAM2. This model represents a promising approach to multi-organ medical segmentation guided by textual descriptions of organs. Our method begins by converting visual and textual inputs into cross-modal contextualized semantics using a progressive cross-attention interaction mechanism. These semantics are then injected into the image encoder to enhance the detailed understanding of visual information. To eliminate reliance on geometric prompts, we use a semantic prompting strategy, replacing the original prompt encoder to sharpen the perception of challenging targets. In addition, a similarity-sorting self-updating strategy for memory and a mask-refining process is applied to further adapt to medical imaging and enhance localized details. Comparative experiments conducted on seven public datasets indicate that CRISP-SAM2 outperforms existing models. Extensive analysis also demonstrates the effectiveness of our method, thereby confirming its superior performance, especially in addressing the limitations mentioned earlier. Our code is available at: https://github.com/YU-deep/CRISP_SAM2.git.

URLs: https://github.com/YU-deep/CRISP_SAM2.git.

replace-cross Minimax and Bayes Optimal Best-arm Identification: Adaptive Experimental Design for Treatment Choice

Authors: Masahiro Kato

Abstract: This study investigates adaptive experimental design for treatment choice, also known as fixed-budget best-arm identification. We consider an adaptive procedure consisting of a treatment-allocation phase followed by a treatment-choice phase, and we design an adaptive experiment for this setup to efficiently identify the best treatment arm, defined as the one with the highest expected outcome. In our designed experiment, the treatment-allocation phase consists of two stages. The first stage is a pilot phase, where we allocate each treatment arm uniformly with equal proportions to eliminate clearly suboptimal arms and estimate outcome variances. In the second stage, we allocate treatment arms in proportion to the variances estimated in the first stage. After the treatment-allocation phase, the procedure enters the treatment-choice phase, where we choose the treatment arm with the highest sample mean as our estimate of the best treatment arm. We prove that this single design is simultaneously asymptotically minimax and Bayes optimal for the simple regret, with upper bounds that match our lower bounds up to exact constants. Therefore, our designed experiment achieves the sharp efficiency limits without requiring separate tuning for minimax and Bayesian objectives.

replace-cross SEZ-HARN: Self-Explainable Zero-shot Human Activity Recognition Network

Authors: Devin Y. De Silva, Sandareka Wickramanayake, Dulani Meedeniya, Sanka Rasnayaka

Abstract: Human Activity Recognition (HAR), which uses data from Inertial Measurement Unit (IMU) sensors, has many practical applications in healthcare and assisted living environments. However, its use in real-world scenarios has been limited by the lack of comprehensive IMU-based HAR datasets that cover a wide range of activities and the lack of transparency in existing HAR models. Zero-shot HAR (ZS-HAR) overcomes the data limitations, but current models struggle to explain their decisions, making them less transparent. This paper introduces a novel IMU-based ZS-HAR model called the Self-Explainable Zero-shot Human Activity Recognition Network (SEZ-HARN). It can recognize activities not encountered during training and provide skeleton videos to explain its decision-making process. We evaluate the effectiveness of the proposed SEZ-HARN on four benchmark datasets PAMAP2, DaLiAc, HTD-MHAD and MHealth and compare its performance against three state-of-the-art black-box ZS-HAR models. The experiment results demonstrate that SEZ-HARN produces realistic and understandable explanations while achieving competitive Zero-shot recognition accuracy. SEZ-HARN achieves a Zero-shot prediction accuracy within 3\% of the best-performing black-box model on PAMAP2 while maintaining comparable performance on the other three datasets.

replace-cross Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer

Authors: Satadeep Bhattacharjee, Seung-Cheol Lee

Abstract: The recently proposed physics-based framework by Huo and Johnson~\cite{huo2024capturing} models the attention mechanism of Large Language Models (LLMs) as an interacting two-body spin system, offering a first-principles explanation for phenomena like repetition and bias. Building on this hypothesis, we extract the complete Query-Key weight matrices from a production-grade GPT-2 model and derive the corresponding effective Hamiltonian for every attention head. From these Hamiltonians we obtain analytic \textit{phase boundaries} logit gap criteria that predict which token should dominate the next-token distribution for a given context. A systematic evaluation on 144 heads across 20 factual-recall prompts reveals a strong negative correlation between the theoretical logit gaps and the model's empirical token rankings ($r\approx-0.70$, $p<10^{-3}$).Targeted ablations further show that suppressing the heads most aligned with the spin-bath predictions induces the anticipated shifts in output probabilities, confirming a causal link rather than a coincidental association. Taken together, our findings provide the first strong empirical evidence for the spin-bath analogy in a production-grade model. This validation not only furnishes a tractable, physics-inspired lens for interpretability but also provides the groundwork for novel generative models, bridging the gap between theoretical condensed matter physics and AI.

replace-cross A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory

Authors: Felix Windisch, Lukas Radl, Thomas K\"ohler, Michael Steiner, Dieter Schmalstieg, Markus Steinberger

Abstract: Gaussian Splatting has emerged as a high-performance technique for novel view synthesis, enabling real-time rendering and high-quality reconstruction of small scenes. However, scaling to larger environments has so far relied on partitioning the scene into chunks -- a strategy that introduces artifacts at chunk boundaries, complicates training across varying scales, and is poorly suited to unstructured scenarios such as city-scale flyovers combined with street-level views. Moreover, rendering remains fundamentally limited by GPU memory, as all visible chunks must reside in VRAM simultaneously. We introduce A LoD of Gaussians, a framework for training and rendering ultra-large-scale Gaussian scenes on a single consumer-grade GPU -- without partitioning. Our method stores the full scene out-of-core (e.g., in CPU memory) and trains a Level-of-Detail (LoD) representation directly, dynamically streaming only the relevant Gaussians. A hybrid data structure combining Gaussian hierarchies with Sequential Point Trees enables efficient, view-dependent LoD selection, while a lightweight caching and view scheduling system exploits temporal coherence to support real-time streaming and rendering. Together, these innovations enable seamless multi-scale reconstruction and interactive visualization of complex scenes -- from broad aerial views to fine-grained ground-level details.

replace-cross Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention

Authors: Jiawei Gu, Ziyue Qiao, Zechao Li

Abstract: Out-of-Distribution (OOD) detection is critical for safely deploying deep models in open-world environments, where inputs may lie outside the training distribution. During inference on a model trained exclusively with In-Distribution (ID) data, we observe a salient gradient phenomenon: around an ID sample, the local gradient directions for "enhancing" that sample's predicted class remain relatively consistent, whereas OOD samples--unseen in training--exhibit disorganized or conflicting gradient directions in the same neighborhood. Motivated by this observation, we propose an inference-stage technique to short-circuit those feature coordinates that spurious gradients exploit to inflate OOD confidence, while leaving ID classification largely intact. To circumvent the expense of recomputing the logits after this gradient short-circuit, we further introduce a local first-order approximation that accurately captures the post-modification outputs without a second forward pass. Experiments on standard OOD benchmarks show our approach yields substantial improvements. Moreover, the method is lightweight and requires minimal changes to the standard inference pipeline, offering a practical path toward robust OOD detection in real-world applications.

replace-cross Pensieve Grader: An AI-Powered, Ready-to-Use Platform for Effortless Handwritten STEM Grading

Authors: Yoonseok Yang, Minjune Kim, Marlon Rondinelli, Keren Shao

Abstract: Grading handwritten, open-ended responses remains a major bottleneck in large university STEM courses. We introduce Pensieve (https://www.pensieve.co), an AI-assisted grading platform that leverages large language models (LLMs) to transcribe and evaluate student work, providing instructors with rubric-aligned scores, transcriptions, and confidence ratings. Unlike prior tools that focus narrowly on specific tasks like transcription or rubric generation, Pensieve supports the entire grading pipeline-from scanned student submissions to final feedback-within a human-in-the-loop interface. Pensieve has been deployed in real-world courses at over 20 institutions and has graded more than 300,000 student responses. We present system details and empirical results across four core STEM disciplines: Computer Science, Mathematics, Physics, and Chemistry. Our findings show that Pensieve reduces grading time by an average of 65%, while maintaining a 95.4% agreement rate with instructor-assigned grades for high-confidence predictions.

URLs: https://www.pensieve.co),

replace-cross Alleviating Attack Data Scarcity: SCANIA's Experience Towards Enhancing In-Vehicle Cyber Security Measures

Authors: Frida Sundfeldt, Bianca Widstam, Mahshid Helali Moghadam, Kuo-Yun Liang, Anders Vesterberg

Abstract: The digital evolution of connected vehicles and the subsequent security risks emphasize the critical need for implementing in-vehicle cyber security measures such as intrusion detection and response systems. The continuous advancement of attack scenarios further highlights the need for adaptive detection mechanisms that can detect evolving, unknown, and complex threats. The effective use of ML-driven techniques can help address this challenge. However, constraints on implementing diverse attack scenarios on test vehicles due to safety, cost, and ethical considerations result in a scarcity of data representing attack scenarios. This limitation necessitates alternative efficient and effective methods for generating high-quality attack-representing data. This paper presents a context-aware attack data generator that generates attack inputs and corresponding in-vehicle network log, i.e., controller area network (CAN) log, representing various types of attack including denial of service (DoS), fuzzy, spoofing, suspension, and replay attacks. It utilizes parameterized attack models augmented with CAN message decoding and attack intensity adjustments to configure the attack scenarios with high similarity to real-world scenarios and promote variability. We evaluate the practicality of the generated attack-representing data within an intrusion detection system (IDS) case study, in which we develop and perform an empirical evaluation of two deep neural network IDS models using the generated data. In addition to the efficiency and scalability of the approach, the performance results of IDS models, high detection and classification capabilities, validate the consistency and effectiveness of the generated data as well. In this experience study, we also elaborate on the aspects influencing the fidelity of the data to real-world scenarios and provide insights into its application.

replace-cross Detection of Disengagement from Voluntary Quizzes: An Explainable Machine Learning Approach in Higher Distance Education

Authors: Behnam Parsaeifard, Christof Imhof, Tansu Pancar, Ioan-Sorin Comsa, Martin Hlosta, Nicole Bergamin, Per Bergamin

Abstract: Students disengaging from their tasks can have serious long-term consequences, including academic drop-out. This is particularly relevant for students in distance education. One way to measure the level of disengagement in distance education is to observe participation in non-mandatory exercises in different online courses. In this paper, we detect student disengagement in the non-mandatory quizzes of 42 courses in four semesters from a distance-based university. We carefully identified the most informative student log data that could be extracted and processed from Moodle. Then, eight machine learning algorithms were trained and compared to obtain the highest possible prediction accuracy. Using the SHAP method, we developed an explainable machine learning framework that allows practitioners to better understand the decisions of the trained algorithm. The experimental results show a balanced accuracy of 91\%, where about 85\% of disengaged students were correctly detected. On top of the highly predictive performance and explainable framework, we provide a discussion on how to design a timely intervention to minimise disengagement from voluntary tasks in online learning.

replace-cross KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs

Authors: Yuzhang Xie, Hejie Cui, Ziyang Zhang, Jiaying Lu, Kai Shu, Fadi Nahab, Xiao Hu, Carl Yang

Abstract: Medical diagnosis prediction plays a critical role in disease detection and personalized healthcare. While machine learning (ML) models have been widely adopted for this task, their reliance on supervised training limits their ability to generalize to unseen cases, particularly given the high cost of acquiring large, labeled datasets. Large language models (LLMs) have shown promise in leveraging language abilities and biomedical knowledge for diagnosis prediction. However, they often suffer from hallucinations, lack structured medical reasoning, and produce useless outputs. To address these challenges, we propose KERAP, a knowledge graph (KG)-enhanced reasoning approach that improves LLM-based diagnosis prediction through a multi-agent architecture. Our framework consists of a linkage agent for attribute mapping, a retrieval agent for structured knowledge extraction, and a prediction agent that iteratively refines diagnosis predictions. Experimental results demonstrate that KERAP enhances diagnostic reliability efficiently, offering a scalable and interpretable solution for zero-shot medical diagnosis prediction.

replace-cross Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance

Authors: Jakob Kienegger, Alina Mannanova, Huajian Fang, Timo Gerkmann

Abstract: Recent works on deep non-linear spatially selective filters demonstrate exceptional enhancement performance with computationally lightweight architectures for stationary speakers of known directions. However, to maintain this performance in dynamic scenarios, resource-intensive data-driven tracking algorithms become necessary to provide precise spatial guidance conditioned on the initial direction of a target speaker. As this additional computational overhead hinders application in resource-constrained scenarios such as real-time speech enhancement, we present a novel strategy utilizing a low-complexity tracking algorithm in the form of a particle filter instead. Assuming a causal, sequential processing style, we introduce temporal feedback to leverage the enhanced speech signal of the spatially selective filter to compensate for the limited modeling capabilities of the particle filter. Evaluation on a synthetic dataset illustrates how the autoregressive interplay between both algorithms drastically improves tracking accuracy and leads to strong enhancement performance. A listening test with real-world recordings complements these findings by indicating a clear trend towards our proposed self-steering pipeline as preferred choice over comparable methods.

replace-cross DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift

Authors: Po-Heng Chou, Ching-Wen Chen, Wan-Jen Huang, Walid Saad, Yu Tsao, Ronald Y. Chang

Abstract: In this paper, the precoding design is investigated for maximizing the throughput of millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems with obstructed direct communication paths. In particular, a reconfigurable intelligent surface (RIS) is employed to enhance MIMO transmissions, considering mmWave characteristics related to line-of-sight (LoS) and multipath effects. The traditional exhaustive search (ES) for optimal codewords in the continuous phase shift is computationally intensive and time-consuming. To reduce computational complexity, permuted discrete Fourier transform (DFT) vectors are used for finding codebook design, incorporating amplitude responses for practical or ideal RIS systems. However, even if the discrete phase shift is adopted in the ES, it results in significant computation and is time-consuming. Instead, the trained deep neural network (DNN) is developed to facilitate faster codeword selection. Simulation results show that the DNN maintains sub-optimal spectral efficiency even as the distance between the end-user and the RIS has variations in the testing phase. These results highlight the potential of DNN in advancing RIS-aided systems.